RCParsing

RCParsing - the fluent, lightweight and powerful .NET lexerless parsing library for language development (DSL) and data scraping.

This library focuses on Developer-experience (DX) first, providing best toolkit for creating your programming languages, file formats or even data extraction tools with declarative API, debugging tools, and more. This allows you to design your parser directly in code and easily fix it using stack and walk traces with detailed error messages.

Here is some useful links:

Why RCParsing?

🐍 Hybrid Power: Unique support for barrier tokens to parse indent-sensitive languages like Python and YAML.
☄️ Incremental Parsing: Edit large documents with instant feedback. Our persistent AST enables efficient re-parsing of only changed sections, perfect for LSP servers and real-time editing scenarios.
💪 Regex on Steroids: You can find all matches for target structure in the input text with detailed AST information and transformed value.
🌀 Lexerless Freedom: No token priority headaches. Parse directly from raw text, even with keywords embedded in identifiers. Tokens are used just as lightweight matching primitives.
🎨 Fluent API: Write parsers in C# that read like clean BNF grammars, boosting readability and maintainability compared to imperative, functional or code-generation approaches.
🧩 Combinator Style: Unlock maximum performance by defining complex tokens with immediate value transformation, bypassing the AST construction entirely for a direct, allocation-free result. Perfect for high-speed parsing of well-defined formats. Also can be used with AST mode.
🐛 Superior Debugging: Get detailed, actionable error messages with stack traces, walk traces and precise source locations. Richest API for manual error information included.
🚑 Error Recovery: Define custom recovery strategies per rule to handle syntax errors and go further.
⚡ Blazing Fast: Performance is now on par with the fastest .NET parsing libraries, even with most complex grammars (see benchmarks below).
🌳 Rich AST: Parser makes an AST (Abstract Syntax Tree) from raw text, with ability to optimize, fully analyze and calculate the result value entirely lazy, reducing unnecessary allocations.
🔧 Configurable Skipping: Advanced strategies for whitespace and comments, allowing you to use conflicting tokens in your main rules.
📦 Batteries Included: Useful built-in tokens and rules (regex, identifiers, numbers, escaped strings, separated lists, custom tokens, and more...).
🖥️ Broad Compatibility: Targets .NET Standard 2.0 (runs on .NET Framework 4.6.1+), .NET 6.0, and .NET 8.0.

Installation
Tutorials, docs and examples
Simple examples - The examples that you can copy, paste, run or look!
- A + B - Basic arithmetic expression parser with result calculation.
- JSON (with incremental parsing) - A complete JSON parser with comments and skipping (with incremental parsing example included).
- Python-like - Demonstrating barrier tokens for indentation.
- JSON token combination - A maximum speed approach for getting values without AST or just to validate inputs with zero-overhead.
- Finding patterns - How to find all occurrences of a rule in a string.
- Errors example - Just a simple example of how errors look in default and debug modes.
Comparison with other parsing libraries
Benchmarks
- JSON AST - Comparing JSON parsing with ANTLR, uses JSON parser with default rule-based style.
- JSON Combinators - Comparing JSON parsing across combinators, uses parser with token combination style for maximum speed.
- Expressions - Calculating expressions with '+-*/' operators with precedence rules.
- Regex - Finding identifiers and emails in plain text using regex-like FindAllMatches feature.
- Python - Parsing entire the Python 3.13 grammar.
Projects using RCParsing
Roadmap
Contributing

Installation

You can install the package via NuGet Package Manager or console window, using one of these commands:

dotnet add package RCParsing Install-Package RCParsing

Or do it manually by cloning this repository.

Tutorials, docs and examples

Tutorials - The tutorial website.
Rules and Tokens Library - The library of tutorials for primitives that you can build your parser from.

Tests Library - The tests directory that contains tests for various things, including C, GraphQL and Python.

Syntax colorizer - The syntax colorizer sample that automatically colorizes text based on provided parser.
Math calculator - Math expression evaluator with support of power, math functions and constants.
ANTLR to RCParsing converter - Simple tool for generating RCParsing API code from ANTLR rules.

Simple examples

A + B

Here is simple example how to make simple parser that parses "a + b" string with numbers and transforms the result:

using RCParsing;

// First, you need to create a builder
var builder = new ParserBuilder();

// Enable and configure the auto-skip for 'Whitespaces' (you can replace it with any other rule)
builder.Settings.SkipWhitespaces();

// Create a main sequential expression rule
builder.CreateMainRule("expression")
    .Number<double>()
    .LiteralChoice("+", "-")
    .Number<double>()
    .Transform(v => {
        var value1 = v.GetValue<double>(0);
        var op = v.GetValue<string>(1);
        var value2 = v.GetValue<double>(2);
        return op == "+" ? value1 + value2 : value1 - value2;
    });

// Build the parser
var parser = builder.Build();

// Parse a string using 'expression' rule and get the raw AST (value will be calculated lazily)
var parsedRule = parser.Parse("10 + 15");

// We can now get the value from our 'Transform' functions (value calculates now)
var transformedValue = parsedRule.GetValue<double>();
Console.WriteLine(transformedValue); // 25

JSON (with incremental parsing)

And here is JSON example that also shows the partial re-parsing of parse tree:

var builder = new ParserBuilder();

// Configure AST type and skip-rule for whitespace and comments 
builder.Settings
	.Skip(r => r.Rule("skip"), ParserSkippingStrategy.SkipBeforeParsingGreedy)
	.UseLazyAST(); // Use lazy AST type to store cached resuls

// The rule that will be skipped before every parsing attempt
builder.CreateRule("skip")
	.Choice(
		b => b.Whitespaces(),
		b => b.Literal("//").TextUntil('\n', '\r'))
	.ConfigureForSkip();

builder.CreateToken("string")
	.Literal('"')
	.EscapedTextPrefix(prefix: '\\', '\\', '\"') // This sub-token automatically escapes the source string and puts it into intermediate value
	.Literal('"')
	.Pass(index: 1); // Pass the EscapedTextPrefix's intermediate value up (it will be used as token's result value)

builder.CreateToken("number")
	.Number<double>();

builder.CreateToken("boolean")
	.LiteralChoice("true", "false").Transform(v => v.Text == "true");

builder.CreateToken("null")
	.Literal("null").Transform(v => null);

builder.CreateRule("value")
	.Choice(
		c => c.Token("string"),
		c => c.Token("number"),
		c => c.Token("boolean"),
		c => c.Token("null"),
		c => c.Rule("array"),
		c => c.Rule("object")
	); // Choice rule propagates child's value by default

builder.CreateRule("array")
	.Literal("[")
	.ZeroOrMoreSeparated(v => v.Rule("value"), s => s.Literal(","),
		allowTrailingSeparator: true, includeSeparatorsInResult: false)
		.TransformLast(v => v.SelectArray())
	.Literal("]")
	.TransformSelect(index: 1); // Selects the Children[1]'s value

builder.CreateRule("object")
	.Literal("{")
	.ZeroOrMoreSeparated(v => v.Rule("pair"), s => s.Literal(","),
		allowTrailingSeparator: true, includeSeparatorsInResult: false)
		.TransformLast(v => v.SelectValues<KeyValuePair<string, object>>().ToDictionary(k => k.Key, v => v.Value))
	.Literal("}")
	.TransformSelect(index: 1);

builder.CreateRule("pair")
	.Token("string")
	.Literal(":")
	.Rule("value")
	.Transform(v => KeyValuePair.Create(v.GetValue<string>(0), v.GetValue(2)));

builder.CreateMainRule("content")
	.Rule("value")
	.EOF() // Sure that we captured all the input
	.TransformSelect(0);

var jsonParser = builder.Build();

var json =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00", // This is a comment
	"tags": ["tag1", "tag2", "tag3"],
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// The same JSON, but with 'tags' value changed
var changedJson =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00", // This is a comment
	"tags": { "nested": ["tag1", "tag2", "tag3"] },
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Parse the input text and calculate values (them will be recorded into the cache because we're using lazy AST)
var ast = jsonParser.Parse(json);
var value = ast.Value as Dictionary<string, object>;
var tags = value!["tags"] as object[];
var nested = value!["nested"] as Dictionary<string, object>;

// Prints: Sample Data
Console.WriteLine(value["name"]);
// Prints: tag1
Console.WriteLine(tags![0]);

// Re-parse the sligtly changed input string and get the values
var changedAst = ast.Reparsed(changedJson);
var changedValue = changedAst.Value as Dictionary<string, object>;
var changedTags = changedValue!["tags"] as Dictionary<string, object>;
var nestedTags = changedTags!["nested"] as object[];
var changedNested = changedValue!["nested"] as Dictionary<string, object>;

// Prints type: System.Object[]
Console.WriteLine(changedTags["nested"]);
// Prints: tag1
Console.WriteLine(nestedTags![0]);

// And untouched values remains the same!
// Prints: True
Console.WriteLine(ReferenceEquals(nested, changedNested));

Python-like

This example involves our killer-feature, barrier tokens that allows to parse indentations without missing them:

using RCParsing;
using RCParsing.Building;

var builder = new ParserBuilder();

builder.Settings.SkipWhitespaces();

// Add the 'INDENT' and 'DEDENT' barrier tokenizer
// 'INDENT' is emitted when indentation grows
// And 'DEDENT' is emitted when indentation cuts
// They are indentation delta tokens
builder.BarrierTokenizers
	.AddIndent(indentSize: 4, "INDENT", "DEDENT");

// Create the statement rule
builder.CreateRule("statement")
	.Choice(
	b => b
		.Literal("def")
		.Identifier()
		.Literal("():")
		.Rule("block"),
	b => b
		.Literal("if")
		.Identifier()
		.Literal(":")
		.Rule("block"),
	b => b
		.Identifier()
		.Literal("=")
		.Identifier()
		.Literal(";"));

// Create the 'block' rule that matches our 'INDENT' and 'DEDENT' barrier tokens
builder.CreateRule("block")
	.Token("INDENT")
	.OneOrMore(b => b.Rule("statement"))
	.Token("DEDENT");

builder.CreateMainRule("program")
	.ZeroOrMore(b => b.Rule("statement"))
	.EOF();

var parser = builder.Build();

string inputStr =
"""
def a():
    b = c;
    c = a;
a = p;
if c:
    h = i;
    if b:
        a = aa;
""";

// Get the optimized AST...
var ast = parser.Parse(inputStr).Optimized();

// And print it!
foreach (var statement in ast.Children)
{
	Console.WriteLine(statement.Text);
	Console.Write("\n\n");
}

// Outputs:

/*
def a():
    b = c;
    c = a;

a = p;

if c:
    h = i;
    if b:
        a = aa;
*/

JSON token combination

Tokens in this parser can be complex enough to act like the combinators, with immediate value transformation without AST:

var builder = new ParserBuilder();

// Use lookahead for 'Choice' tokens
builder.Settings.UseFirstCharacterMatch();

builder.CreateToken("string")
	// 'Between' token pattern matches a sequence of three elements,
	// but calculates and propagates intermediate value of second element
	.Between(
		b => b.Literal('"'),
		b => b.TextUntil('"'),
		b => b.Literal('"'));

builder.CreateToken("number")
	.Number<double>();

builder.CreateToken("boolean")
	// 'Map' token pattern applies intermediate value transformer to child's value
	.Map<string>(b => b.LiteralChoice("true", "false"), m => m == "true");

builder.CreateToken("null")
	// 'Return' does not calculates value for child element, just returns 'null' here
	.Return(b => b.Literal("null"), null);

builder.CreateToken("value")
	// Skip whitespaces before value token
	.SkipWhitespaces(b =>
		// 'Choice' token selects the matched token's value
		b.Choice(
			c => c.Token("string"),
			c => c.Token("number"),
			c => c.Token("boolean"),
			c => c.Token("null"),
			c => c.Token("array"),
			c => c.Token("object")
	));

builder.CreateToken("value_list")
	.ZeroOrMoreSeparated(
		b => b.Token("value"),
		b => b.SkipWhitespaces(b => b.Literal(',')),
		includeSeparatorsInResult: false)
	// You can apply passage function for tokens that
	// matches multiple and variable amount of child elements
	.Pass(v =>
	{
		return v.ToArray();
	});

builder.CreateToken("array")
	.Between(
		b => b.Literal('['),
		b => b.Token("value_list"),
		b => b.SkipWhitespaces(b => b.Literal(']')));

builder.CreateToken("pair")
	.SkipWhitespaces(b => b.Token("string"))
	.SkipWhitespaces(b => b.Literal(':'))
	.Token("value")
	.Pass(v =>
	{
		return KeyValuePair.Create((string)v[0]!, v[2]);
	});

builder.CreateToken("pair_list")
	.ZeroOrMoreSeparated(
		b => b.Token("pair"),
		b => b.SkipWhitespaces(b => b.Literal(',')))
	.Pass(v =>
	{
		return v.Cast<KeyValuePair<string, object>>().ToDictionary();
	});

builder.CreateToken("object")
	.Between(
		b => b.Literal('{'),
		b => b.Token("pair_list"),
		b => b.SkipWhitespaces(b => b.Literal('}')));

var parser = builder.Build();

var json =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00",
	"tags": ["tag1", "tag2", "tag3"],
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Match the token directly and produce intermediate value
var result = parser.MatchToken<Dictionary<string, object>>("value", json);
Console.WriteLine(result["name"]); // Outputs: Sample Data

var invalidJson =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00",
	"tags": ["tag1", "tag2", "tag3"],,
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Retrieve the furthest error
var error = parser.TryMatchToken("value", invalidJson).Context.CreateErrorGroups().Last!;
Console.WriteLine(error.Column); // 35
Console.WriteLine(error.Line);   // 5

// Also you can check if the input matches token the fastest way, without value calculation:
Console.WriteLine(parser.MatchesToken("value", "[90, 60, true, null]", out int matchedLength)); // true

Finding patterns

The FindAllMatches method allows you to extract all occurrences of a pattern from a string, even in complex inputs, while handling optional transformations. Here's an example where will find the Price: *PRICE* (USD|EUR) pattern:

var builder = new ParserBuilder();

// Skip unnecessary whitespace (you can configure comments here and they will be ignored when matching)
builder.Settings.SkipWhitespaces();

// Create the rule that we will find in text
builder.CreateMainRule()
	.Literal("Price:")
	.Number<double>() // 1
	.LiteralChoice("USD", "EUR") // 2
	.Transform(v =>
	{
		var number = v[1].Value; // Get the number value
		var currency = v[2].Text; // Get the 'USD' or 'EUR' text
		return new { Amount = number, Currency = currency };
	});

var input =
"""
Some log entries.
Price: 42.99 USD
Error: something happened.
Price: 99.50 EUR
Another line.
Price: 2.50 USD
""";

// Find all transformed matches
var prices = builder.Build().FindAllMatches<dynamic>(input).ToList();

foreach (var price in prices)
{
	Console.WriteLine($"Price: {price.Amount}; Currency: {price.Currency}");
}

Errors example

There is how errors are displayed in the default mode:

RCParsing.ParsingException : An error occurred during parsing:

The line where the error occurred (position 130):
	"tags": ["tag1", "tag2", "tag3"],,
                   line 5, column 35 ^

',' is unexpected character, expected one of:
  'string'
  literal '}'

... and more errors omitted

And there is errors when using the builder.Settings.UseDebug() setting:

RCParsing.ParsingException : An error occurred during parsing:

['string']: Failed to parse token.
['pair']: Failed to parse sequence rule.
[literal '}']: Failed to parse token.
['object']: Failed to parse sequence rule.

The line where the error occurred (position 130):
	"tags": ["tag1", "tag2", "tag3"],,
                   line 5, column 35 ^

',' is unexpected character, expected one of:
  'string'
  'pair'
  literal '}'
  'object'

['string'] Stack trace (top call recently):
- Sequence 'pair':
    'string' <-- here
    literal ':'
    'value'
- SeparatedRepeat[0..] (allow trailing): 'pair' <-- here
  sep literal ','
- Sequence 'object':
    literal '{'
    SeparatedRepeat[0..] (allow trailing)... <-- here
    literal '}'
- Choice 'value':
    'string'
    'number'
    'boolean'
    'null'
    'array'
    'object' <-- here
- Sequence 'content':
    'value' <-- here
    end of file

[literal '}'] Stack trace (top call recently):
- Sequence 'object':
    literal '{'
    SeparatedRepeat[0..] (allow trailing)...
    literal '}' <-- here
- Choice 'value':
    'string'
    'number'
    'boolean'
    'null'
    'array'
    'object' <-- here
- Sequence 'content':
    'value' <-- here
    end of file

... and more errors omitted

Walk Trace:

... 316 hidden parsing steps. Total: 356 ...
[ENTER]   pos:128   literal '//'
[FAIL]    pos:128   literal '//' failed to match: '],,\r\n\t"isActive...'
[FAIL]    pos:128   Sequence... failed to match: '],,\r\n\t"isActive...'
[FAIL]    pos:128   'skip' failed to match: '],,\r\n\t"isActive...'
[ENTER]   pos:128   literal ','
[FAIL]    pos:128   literal ',' failed to match: '],,\r\n\t"isActive...'
[SUCCESS] pos:106   SeparatedRepeat[0..] (allow trailing)... matched: '"tag1", "tag2", "tag3"' [22 chars]
[ENTER]   pos:128   literal ']'
[SUCCESS] pos:128   literal ']' matched: ']' [1 chars]
[SUCCESS] pos:105   'array' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:105   'value' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:97    'pair' matched: '"tags": ["tag1" ..... ", "tag3"]' [32 chars]
[ENTER]   pos:129   'skip'
[ENTER]   pos:129   whitespaces
[FAIL]    pos:129   whitespaces failed to match: ',,\r\n\t"isActive"...'
[ENTER]   pos:129   Sequence...
[ENTER]   pos:129   literal '//'
[FAIL]    pos:129   literal '//' failed to match: ',,\r\n\t"isActive"...'
[FAIL]    pos:129   Sequence... failed to match: ',,\r\n\t"isActive"...'
[FAIL]    pos:129   'skip' failed to match: ',,\r\n\t"isActive"...'
[ENTER]   pos:129   literal ','
[SUCCESS] pos:129   literal ',' matched: ',' [1 chars]
[ENTER]   pos:130   'skip'
[ENTER]   pos:130   whitespaces
[FAIL]    pos:130   whitespaces failed to match: ',\r\n\t"isActive":...'
[ENTER]   pos:130   Sequence...
[ENTER]   pos:130   literal '//'
[FAIL]    pos:130   literal '//' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   Sequence... failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   'skip' failed to match: ',\r\n\t"isActive":...'
[ENTER]   pos:130   'pair'
[ENTER]   pos:130   'string'
[FAIL]    pos:130   'string' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   'pair' failed to match: ',\r\n\t"isActive":...'
[SUCCESS] pos:4     SeparatedRepeat[0..] (allow trailing)... matched: '"id": 1,\r\n\t"nam ..... , "tag3"],' [126 chars]
[ENTER]   pos:130   literal '}'
[FAIL]    pos:130   literal '}' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:0     'object' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL]    pos:0     'value' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL]    pos:0     'content' failed to match: '{\r\n\t"id": 1,\r\n\t...'

... End of walk trace ...

Comparison with Other Parsing Libraries

RCParsing is designed to outstand with unique features, and easy developer experience, but it performance is good enough to compete with other fastest parser tools.

Feature Comparison

This table highlights the unique architectural and usability features of each library.

Feature	RCParsing	Pidgin	Parlot	Superpower	ANTLR4
Architecture	Scannerless hybrid	Scannerless	Scannerless	Lexer-based	Lexer-based with modes
API	Fluent, lambda-based	Functional	Fluent/functional	Fluent/functional	Grammar Files
Barrier/complex Tokens	Yes, built-in or manual	None	None	Yes, manual	Yes, manual
Skipping	6 strategies, global or manual	Manual	Global or manual	Lexer-based	Lexer-based
Error Messages	Extremely Detailed, extendable with API	Simple	Manual messages	Simple	Simple by default, extendable
Minimum .NET Target	.NET Standard 2.0	.NET 7.0	.NET Standard 2.0	.NET Standard 2.0	.NET Framework 4.5

Benchmarks

All benchmarks are done via BenchmarkDotNet.

Here is machine and runtime information:

BenchmarkDotNet v0.15.2, Windows 10 (10.0.19045.3448/22H2/2022Update)
AMD Ryzen 5 5600 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 9.0.302
  [Host]     : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2
  Job-KTXINV : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2

JSON AST

The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses visitors to transform value from AST (Abstract Syntax Tree).

Method	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
JsonBig_RCParsing	172.574 us	1.0484 us	0.4655 us	1.00	13.1836	4.1504	218.23 KB	1.00
JsonBig_RCParsing_NoValue	143.908 us	1.2655 us	0.5619 us	0.83	8.3008	2.6855	136.34 KB	0.62
JsonBig_RCParsing_Optimized	94.066 us	1.0540 us	0.4680 us	0.55	9.3994	2.0752	154.15 KB	0.71
JsonBig_RCParsing_Optimized_NoValue	60.354 us	0.2330 us	0.0831 us	0.35	4.3945	0.9155	72.25 KB	0.33
JsonBig_ANTLR	182.729 us	0.9600 us	0.4263 us	1.06	19.5313	7.5684	322.84 KB	1.48
JsonBig_ANTLR_NoValue	123.729 us	0.9649 us	0.4284 us	0.72	10.7422	3.9063	176.01 KB	0.81

JsonShort_RCParsing	9.260 us	0.0524 us	0.0233 us	1.00	0.6561	-	10.73 KB	1.00
JsonShort_RCParsing_NoValue	7.351 us	0.0396 us	0.0141 us	0.79	0.3891	-	6.44 KB	0.60
JsonShort_RCParsing_Optimized	5.222 us	0.0447 us	0.0159 us	0.56	0.5341	0.0076	8.77 KB	0.82
JsonShort_RCParsing_Optimized_NoValue	3.349 us	0.0514 us	0.0183 us	0.36	0.2708	0.0038	4.47 KB	0.42
JsonShort_ANTLR	10.521 us	0.0886 us	0.0393 us	1.14	1.1444	0.0305	18.91 KB	1.76
JsonShort_ANTLR_NoValue	7.029 us	0.0323 us	0.0143 us	0.76	0.6332	0.0229	10.35 KB	0.96

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), UseFirstCharacterMatch(), IgnoreErrors() and SkipWhitespacesOptimized() settings.
*_NoValue methods does not calculates a value from AST.
JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.
JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).

JSON Combinators

The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses token combination style for immediate transformations without AST.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
JsonBig_RCParsing	30,539.1 ns	966.16 ns	344.54 ns	1.00	0.01	2.5635	0.1831	43096 B	1.00
JsonBig_RCParsing_NoValue	18,568.7 ns	90.52 ns	32.28 ns	0.61	0.01	0.4883	-	8312 B	0.19
JsonBig_Parlot	43,390.4 ns	2,386.59 ns	1,059.66 ns	1.42	0.04	1.9531	0.1221	32848 B	0.76
JsonBig_Pidgin	222,498.3 ns	43,307.75 ns	19,228.91 ns	7.29	0.59	3.9063	0.2441	66816 B	1.55
JsonBig_Superpower	1,296,556.1 ns	149,894.47 ns	66,554.07 ns	42.46	2.09	39.0625	5.8594	653627 B	15.17
JsonBig_Sprache	1,188,669.8 ns	23,721.34 ns	10,532.42 ns	38.93	0.52	232.4219	27.3438	3899736 B	90.49

JsonShort_RCParsing	1,591.3 ns	12.23 ns	4.36 ns	1.00	0.00	0.1354	-	2280 B	1.00
JsonShort_RCParsing_NoValue	990.8 ns	5.21 ns	1.86 ns	0.62	0.00	0.0324	-	568 B	0.25
JsonShort_Parlot	2,339.5 ns	8.47 ns	3.76 ns	1.47	0.00	0.1144	-	1960 B	0.86
JsonShort_Pidgin	10,735.2 ns	38.67 ns	13.79 ns	6.75	0.02	0.2136	-	3664 B	1.61
JsonShort_Superpower	65,377.8 ns	610.65 ns	217.76 ns	41.08	0.16	1.9531	-	34117 B	14.96
JsonShort_Sprache	63,140.1 ns	597.33 ns	213.01 ns	39.68	0.16	12.6953	0.2441	213168 B	93.49

Notes:

RCParsing uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.
RCParsing_NoValue method does not calculates a value, just validation.
Parlot uses Compiled() version of parser.
JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.
JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).

Expressions

The int value calculation from expression with parentheses (), spaces and operators +-/* with precedence.

Method	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
ExpressionBig_RCParsing	291,542.2 ns	5,064.93 ns	1,315.35 ns	1.00	23.9258	11.7188	403312 B	1.00
ExpressionBig_RCParsing_Optimized	169,101.5 ns	3,900.30 ns	603.58 ns	0.58	20.0195	9.0332	337688 B	0.84
ExpressionBig_RCParsing_TokenCombination	57,988.6 ns	911.87 ns	236.81 ns	0.20	4.1504	0.0610	70288 B	0.17
ExpressionBig_Parlot	64,083.7 ns	278.18 ns	72.24 ns	0.22	3.2959	-	56608 B	0.14
ExpressionBig_Pidgin	678,366.6 ns	7,911.06 ns	2,054.48 ns	2.33	0.9766	-	23536 B	0.06

ExpressionShort_RCParsing	2,317.5 ns	49.43 ns	7.65 ns	1.00	0.2213	-	3736 B	1.00
ExpressionShort_RCParsing_Optimized	1,546.2 ns	32.98 ns	8.57 ns	0.67	0.2136	-	3584 B	0.96
ExpressionShort_RCParsing_TokenCombination	512.7 ns	5.18 ns	1.35 ns	0.22	0.0391	-	656 B	0.18
ExpressionShort_Parlot	580.2 ns	10.32 ns	2.68 ns	0.25	0.0534	-	896 B	0.24
ExpressionShort_Pidgin	6,522.9 ns	107.10 ns	27.81 ns	2.81	0.0153	-	344 B	0.09

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and SkipWhitespacesOptimized() settings.
RCParsing_TokenCombination uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.
Parlot uses Compiled() version of parser.
ExpressionShort methods uses single line with 4 operators of hardcoded (not generated) expression.
ExpressionBig methods uses single line with ~400 operators of hardcoded (not generated) expression.

Regex

Matching identifiers and emails in the plain text.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
EmailsBig_RCParsing	236,175.3 ns	26,801.07 ns	6,960.15 ns	1.00	0.04	0.9766	-	16568 B	1.00
EmailsBig_RCParsing_Optimized	157,271.9 ns	5,076.92 ns	1,318.46 ns	0.67	0.02	0.9766	-	16568 B	1.00
EmailsBig_Regex	27,638.6 ns	711.08 ns	184.66 ns	0.12	0.00	1.5564	0.1221	26200 B	1.58

EmailsShort_RCParsing	6,658.5 ns	78.57 ns	20.40 ns	1.00	0.00	0.0916	-	1600 B	1.00
EmailsShort_RCParsing_Optimized	3,799.0 ns	35.69 ns	5.52 ns	0.57	0.00	0.0954	-	1600 B	1.00
EmailsShort_Regex	931.5 ns	13.52 ns	3.51 ns	0.14	0.00	0.0601	-	1008 B	0.63

IdentifiersBig_RCParsing	158,034.1 ns	4,041.56 ns	625.44 ns	1.00	0.01	5.8594	-	101664 B	1.00
IdentifiersBig_RCParsing_Optimized	99,086.9 ns	1,619.80 ns	420.66 ns	0.63	0.00	5.9814	-	101664 B	1.00
IdentifiersBig_Regex	71,439.8 ns	4,727.93 ns	731.65 ns	0.45	0.00	11.1084	3.6621	187248 B	1.84

IdentifiersShort_RCParsing	4,041.5 ns	172.86 ns	44.89 ns	1.00	0.01	0.2518	-	4240 B	1.00
IdentifiersShort_RCParsing_Optimized	2,930.9 ns	56.37 ns	14.64 ns	0.73	0.01	0.2518	-	4240 B	1.00
IdentifiersShort_Regex	2,386.2 ns	160.57 ns	41.70 ns	0.59	0.01	0.3624	0.0076	6104 B	1.44

Notes:

RCParsing uses naive pattern for matching, without any optimization settings applied.
RCParsing_Optimized uses the same pattern, but with configured skip-rule for making it faster.
Regex uses RegexOptions.Compiled flags.
Identifiers pattern is [a-zA-Z_][a-zA-Z0-9_]*.
Emails pattern is [a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+.

GraphQL

Just GraphQL parsing without transformations from AST. GraphQL is a mid-complex language that can be described in 600 lines of ANTLR's version of BNF notation.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
QueryBig_RCParsing_Default	1,610.33 us	9.340 us	3.331 us	1.00	0.00	31.2500	13.6719	3.9063	603.94 KB	1.00
QueryBig_RCParsing_Optimized	369.89 us	1.312 us	0.468 us	0.23	0.00	20.9961	5.3711	-	345.59 KB	0.57
QueryBig_ANTLR	1,206.50 us	19.478 us	8.648 us	0.75	0.01	35.1563	11.7188	-	590.55 KB	0.98

QueryShort_RCParsing_Default	166.01 us	0.599 us	0.266 us	1.00	0.00	4.3945	0.4883	-	72.58 KB	1.00
QueryShort_RCParsing_Optimized	37.10 us	0.132 us	0.058 us	0.22	0.00	2.3193	0.1221	-	38.31 KB	0.53
QueryShort_ANTLR	68.37 us	0.321 us	0.142 us	0.41	0.00	5.9814	0.7324	-	99.2 KB	1.37

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.
RCParsing grammar was ported from this ANTLR Grammar.
QueryShort methods uses ~40 lines of GraphQL query.
QueryBig methods uses ~400 lines of GraphQL query with various content (all syntax structures, long and deep queries).

Python

Yes, seriously, the entire Python 3.13 parsing, without transformations from AST. Involves barrier tokens for RCParsing and custom lexer for ANTLR.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
PythonBig_RCParsing_Default	35,562.6 us	1,261.52 us	560.12 us	1.00	0.02	357.1429	285.7143	142.8571	37508.89 KB	1.00
PythonBig_RCParsing_Optimized	4,969.6 us	17.83 us	6.36 us	0.14	0.00	226.5625	140.6250	-	3974.76 KB	0.11
PythonBig_RCParsing_Memoized	22,798.5 us	238.79 us	85.15 us	0.64	0.01	281.2500	250.0000	125.0000	26926.87 KB	0.72
PythonBig_RCParsing_MemoizedOptimized	11,273.2 us	356.30 us	158.20 us	0.32	0.01	234.3750	218.7500	93.7500	11964.9 KB	0.32
PythonBig_ANTLR	5,583.7 us	36.85 us	13.14 us	0.16	0.00	406.2500	281.2500	-	6699.11 KB	0.18

PythonShort_RCParsing_Default	3,535.2 us	18.01 us	6.42 us	1.00	0.00	46.8750	19.5313	7.8125	2569.46 KB	1.00
PythonShort_RCParsing_Optimized	580.7 us	3.29 us	1.46 us	0.16	0.00	28.3203	6.8359	-	475.3 KB	0.18
PythonShort_RCParsing_Memoized	1,355.7 us	34.23 us	12.21 us	0.38	0.00	35.1563	25.3906	9.7656	1467.47 KB	0.57
PythonShort_RCParsing_MemoizedOptimized	634.0 us	27.53 us	9.82 us	0.18	0.00	19.5313	15.6250	3.9063	634.17 KB	0.25
PythonShort_ANTLR	556.4 us	1.50 us	0.67 us	0.16	0.00	46.8750	12.6953	-	780.65 KB	0.30

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.
RCParsing_Memoized uses UseCaching() setting.
RCParsing_MemoizedOptimized uses UseInlining(), IgnoreErrors(), UseFirstCharacterMatch() and UseCaching() settings.
RCParsing grammar was ported using this ANTLR Grammar and Python Reference Grammar.
PythonShort methods uses ~20 lines of Python code, see source.
PythonBig methods uses ~430 lines of Python code, see source.

More benchmarks will be later here...

Projects using RCParsing

LLTSharp: Used for LLT, the template Razor-like language.

Using RCParsing in your project? We'd love to feature it here! Submit a pull request to add your project to the list.

Roadmap

The future development of RCParsing is focused on:

Performance: Continued profiling and optimization, especially for large files with deep structures.
API Ergonomics: Introducing even more expressive and fluent methods (such as expression builder).
New Built-in Rules: Adding common patterns (e.g., number with wide range of notations).
Visualization Tooling: Exploring tools for debugging and visualizing resulting AST.
Grammar Transformers: Builder extensions that can be used to optimize parsers, eliminate left recursion and more.
Semantic analysis: Multi-stage tools that simplifies AST semantic analysis.
NFA Algorithm: Adaptive parsing algorithm, which is more powerful for parsing complex rules.

Contributing

Contributions are welcome!

This framework is born recently (4 months ago) and some little features may not be tested and be buggy.

If you have an idea about this project, you can report it to Issues.
For contributing code, please fork the repository and make your changes in a new branch. Once you're ready, create a pull request to merge your changes into the main branch. Pull requests should include a clear description of what was changed and why.

RCParsing

RCParsing - the fluent, lightweight and powerful .NET lexerless parsing library for language development (DSL) and data scraping.

Here is some useful links:

Why RCParsing?

🐍 Hybrid Power: Unique support for barrier tokens to parse indent-sensitive languages like Python and YAML.
☄️ Incremental Parsing: Edit large documents with instant feedback. Our persistent AST enables efficient re-parsing of only changed sections, perfect for LSP servers and real-time editing scenarios.
💪 Regex on Steroids: You can find all matches for target structure in the input text with detailed AST information and transformed value.
🌀 Lexerless Freedom: No token priority headaches. Parse directly from raw text, even with keywords embedded in identifiers. Tokens are used just as lightweight matching primitives.
🎨 Fluent API: Write parsers in C# that read like clean BNF grammars, boosting readability and maintainability compared to imperative, functional or code-generation approaches.
🧩 Combinator Style: Unlock maximum performance by defining complex tokens with immediate value transformation, bypassing the AST construction entirely for a direct, allocation-free result. Perfect for high-speed parsing of well-defined formats. Also can be used with AST mode.
🐛 Superior Debugging: Get detailed, actionable error messages with stack traces, walk traces and precise source locations. Richest API for manual error information included.
🚑 Error Recovery: Define custom recovery strategies per rule to handle syntax errors and go further.
⚡ Blazing Fast: Performance is now on par with the fastest .NET parsing libraries, even with most complex grammars (see benchmarks below).
🌳 Rich AST: Parser makes an AST (Abstract Syntax Tree) from raw text, with ability to optimize, fully analyze and calculate the result value entirely lazy, reducing unnecessary allocations.
🔧 Configurable Skipping: Advanced strategies for whitespace and comments, allowing you to use conflicting tokens in your main rules.
📦 Batteries Included: Useful built-in tokens and rules (regex, identifiers, numbers, escaped strings, separated lists, custom tokens, and more...).
🖥️ Broad Compatibility: Targets .NET Standard 2.0 (runs on .NET Framework 4.6.1+), .NET 6.0, and .NET 8.0.

Installation
Tutorials, docs and examples
Simple examples - The examples that you can copy, paste, run or look!
- A + B - Basic arithmetic expression parser with result calculation.
- JSON (with incremental parsing) - A complete JSON parser with comments and skipping (with incremental parsing example included).
- Python-like - Demonstrating barrier tokens for indentation.
- JSON token combination - A maximum speed approach for getting values without AST or just to validate inputs with zero-overhead.
- Finding patterns - How to find all occurrences of a rule in a string.
- Errors example - Just a simple example of how errors look in default and debug modes.
Comparison with other parsing libraries
Benchmarks
- JSON AST - Comparing JSON parsing with ANTLR, uses JSON parser with default rule-based style.
- JSON Combinators - Comparing JSON parsing across combinators, uses parser with token combination style for maximum speed.
- Expressions - Calculating expressions with '+-*/' operators with precedence rules.
- Regex - Finding identifiers and emails in plain text using regex-like FindAllMatches feature.
- Python - Parsing entire the Python 3.13 grammar.
Projects using RCParsing
Roadmap
Contributing

Installation

You can install the package via NuGet Package Manager or console window, using one of these commands:

dotnet add package RCParsing Install-Package RCParsing

Or do it manually by cloning this repository.

Tutorials, docs and examples

Tutorials - The tutorial website.
Rules and Tokens Library - The library of tutorials for primitives that you can build your parser from.

Tests Library - The tests directory that contains tests for various things, including C, GraphQL and Python.

Syntax colorizer - The syntax colorizer sample that automatically colorizes text based on provided parser.
Math calculator - Math expression evaluator with support of power, math functions and constants.
ANTLR to RCParsing converter - Simple tool for generating RCParsing API code from ANTLR rules.

Simple examples

A + B

Here is simple example how to make simple parser that parses "a + b" string with numbers and transforms the result:

using RCParsing;

// First, you need to create a builder
var builder = new ParserBuilder();

// Enable and configure the auto-skip for 'Whitespaces' (you can replace it with any other rule)
builder.Settings.SkipWhitespaces();

// Create a main sequential expression rule
builder.CreateMainRule("expression")
    .Number<double>()
    .LiteralChoice("+", "-")
    .Number<double>()
    .Transform(v => {
        var value1 = v.GetValue<double>(0);
        var op = v.GetValue<string>(1);
        var value2 = v.GetValue<double>(2);
        return op == "+" ? value1 + value2 : value1 - value2;
    });

// Build the parser
var parser = builder.Build();

// Parse a string using 'expression' rule and get the raw AST (value will be calculated lazily)
var parsedRule = parser.Parse("10 + 15");

// We can now get the value from our 'Transform' functions (value calculates now)
var transformedValue = parsedRule.GetValue<double>();
Console.WriteLine(transformedValue); // 25

JSON (with incremental parsing)

And here is JSON example that also shows the partial re-parsing of parse tree:

var builder = new ParserBuilder();

// Configure AST type and skip-rule for whitespace and comments 
builder.Settings
	.Skip(r => r.Rule("skip"), ParserSkippingStrategy.SkipBeforeParsingGreedy)
	.UseLazyAST(); // Use lazy AST type to store cached resuls

// The rule that will be skipped before every parsing attempt
builder.CreateRule("skip")
	.Choice(
		b => b.Whitespaces(),
		b => b.Literal("//").TextUntil('\n', '\r'))
	.ConfigureForSkip();

builder.CreateToken("string")
	.Literal('"')
	.EscapedTextPrefix(prefix: '\\', '\\', '\"') // This sub-token automatically escapes the source string and puts it into intermediate value
	.Literal('"')
	.Pass(index: 1); // Pass the EscapedTextPrefix's intermediate value up (it will be used as token's result value)

builder.CreateToken("number")
	.Number<double>();

builder.CreateToken("boolean")
	.LiteralChoice("true", "false").Transform(v => v.Text == "true");

builder.CreateToken("null")
	.Literal("null").Transform(v => null);

builder.CreateRule("value")
	.Choice(
		c => c.Token("string"),
		c => c.Token("number"),
		c => c.Token("boolean"),
		c => c.Token("null"),
		c => c.Rule("array"),
		c => c.Rule("object")
	); // Choice rule propagates child's value by default

builder.CreateRule("array")
	.Literal("[")
	.ZeroOrMoreSeparated(v => v.Rule("value"), s => s.Literal(","),
		allowTrailingSeparator: true, includeSeparatorsInResult: false)
		.TransformLast(v => v.SelectArray())
	.Literal("]")
	.TransformSelect(index: 1); // Selects the Children[1]'s value

builder.CreateRule("object")
	.Literal("{")
	.ZeroOrMoreSeparated(v => v.Rule("pair"), s => s.Literal(","),
		allowTrailingSeparator: true, includeSeparatorsInResult: false)
		.TransformLast(v => v.SelectValues<KeyValuePair<string, object>>().ToDictionary(k => k.Key, v => v.Value))
	.Literal("}")
	.TransformSelect(index: 1);

builder.CreateRule("pair")
	.Token("string")
	.Literal(":")
	.Rule("value")
	.Transform(v => KeyValuePair.Create(v.GetValue<string>(0), v.GetValue(2)));

builder.CreateMainRule("content")
	.Rule("value")
	.EOF() // Sure that we captured all the input
	.TransformSelect(0);

var jsonParser = builder.Build();

var json =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00", // This is a comment
	"tags": ["tag1", "tag2", "tag3"],
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// The same JSON, but with 'tags' value changed
var changedJson =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00", // This is a comment
	"tags": { "nested": ["tag1", "tag2", "tag3"] },
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Parse the input text and calculate values (them will be recorded into the cache because we're using lazy AST)
var ast = jsonParser.Parse(json);
var value = ast.Value as Dictionary<string, object>;
var tags = value!["tags"] as object[];
var nested = value!["nested"] as Dictionary<string, object>;

// Prints: Sample Data
Console.WriteLine(value["name"]);
// Prints: tag1
Console.WriteLine(tags![0]);

// Re-parse the sligtly changed input string and get the values
var changedAst = ast.Reparsed(changedJson);
var changedValue = changedAst.Value as Dictionary<string, object>;
var changedTags = changedValue!["tags"] as Dictionary<string, object>;
var nestedTags = changedTags!["nested"] as object[];
var changedNested = changedValue!["nested"] as Dictionary<string, object>;

// Prints type: System.Object[]
Console.WriteLine(changedTags["nested"]);
// Prints: tag1
Console.WriteLine(nestedTags![0]);

// And untouched values remains the same!
// Prints: True
Console.WriteLine(ReferenceEquals(nested, changedNested));

Python-like

This example involves our killer-feature, barrier tokens that allows to parse indentations without missing them:

using RCParsing;
using RCParsing.Building;

var builder = new ParserBuilder();

builder.Settings.SkipWhitespaces();

// Add the 'INDENT' and 'DEDENT' barrier tokenizer
// 'INDENT' is emitted when indentation grows
// And 'DEDENT' is emitted when indentation cuts
// They are indentation delta tokens
builder.BarrierTokenizers
	.AddIndent(indentSize: 4, "INDENT", "DEDENT");

// Create the statement rule
builder.CreateRule("statement")
	.Choice(
	b => b
		.Literal("def")
		.Identifier()
		.Literal("():")
		.Rule("block"),
	b => b
		.Literal("if")
		.Identifier()
		.Literal(":")
		.Rule("block"),
	b => b
		.Identifier()
		.Literal("=")
		.Identifier()
		.Literal(";"));

// Create the 'block' rule that matches our 'INDENT' and 'DEDENT' barrier tokens
builder.CreateRule("block")
	.Token("INDENT")
	.OneOrMore(b => b.Rule("statement"))
	.Token("DEDENT");

builder.CreateMainRule("program")
	.ZeroOrMore(b => b.Rule("statement"))
	.EOF();

var parser = builder.Build();

string inputStr =
"""
def a():
    b = c;
    c = a;
a = p;
if c:
    h = i;
    if b:
        a = aa;
""";

// Get the optimized AST...
var ast = parser.Parse(inputStr).Optimized();

// And print it!
foreach (var statement in ast.Children)
{
	Console.WriteLine(statement.Text);
	Console.Write("\n\n");
}

// Outputs:

/*
def a():
    b = c;
    c = a;

a = p;

if c:
    h = i;
    if b:
        a = aa;
*/

JSON token combination

Tokens in this parser can be complex enough to act like the combinators, with immediate value transformation without AST:

var builder = new ParserBuilder();

// Use lookahead for 'Choice' tokens
builder.Settings.UseFirstCharacterMatch();

builder.CreateToken("string")
	// 'Between' token pattern matches a sequence of three elements,
	// but calculates and propagates intermediate value of second element
	.Between(
		b => b.Literal('"'),
		b => b.TextUntil('"'),
		b => b.Literal('"'));

builder.CreateToken("number")
	.Number<double>();

builder.CreateToken("boolean")
	// 'Map' token pattern applies intermediate value transformer to child's value
	.Map<string>(b => b.LiteralChoice("true", "false"), m => m == "true");

builder.CreateToken("null")
	// 'Return' does not calculates value for child element, just returns 'null' here
	.Return(b => b.Literal("null"), null);

builder.CreateToken("value")
	// Skip whitespaces before value token
	.SkipWhitespaces(b =>
		// 'Choice' token selects the matched token's value
		b.Choice(
			c => c.Token("string"),
			c => c.Token("number"),
			c => c.Token("boolean"),
			c => c.Token("null"),
			c => c.Token("array"),
			c => c.Token("object")
	));

builder.CreateToken("value_list")
	.ZeroOrMoreSeparated(
		b => b.Token("value"),
		b => b.SkipWhitespaces(b => b.Literal(',')),
		includeSeparatorsInResult: false)
	// You can apply passage function for tokens that
	// matches multiple and variable amount of child elements
	.Pass(v =>
	{
		return v.ToArray();
	});

builder.CreateToken("array")
	.Between(
		b => b.Literal('['),
		b => b.Token("value_list"),
		b => b.SkipWhitespaces(b => b.Literal(']')));

builder.CreateToken("pair")
	.SkipWhitespaces(b => b.Token("string"))
	.SkipWhitespaces(b => b.Literal(':'))
	.Token("value")
	.Pass(v =>
	{
		return KeyValuePair.Create((string)v[0]!, v[2]);
	});

builder.CreateToken("pair_list")
	.ZeroOrMoreSeparated(
		b => b.Token("pair"),
		b => b.SkipWhitespaces(b => b.Literal(',')))
	.Pass(v =>
	{
		return v.Cast<KeyValuePair<string, object>>().ToDictionary();
	});

builder.CreateToken("object")
	.Between(
		b => b.Literal('{'),
		b => b.Token("pair_list"),
		b => b.SkipWhitespaces(b => b.Literal('}')));

var parser = builder.Build();

var json =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00",
	"tags": ["tag1", "tag2", "tag3"],
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Match the token directly and produce intermediate value
var result = parser.MatchToken<Dictionary<string, object>>("value", json);
Console.WriteLine(result["name"]); // Outputs: Sample Data

var invalidJson =
"""
{
	"id": 1,
	"name": "Sample Data",
	"created": "2023-01-01T00:00:00",
	"tags": ["tag1", "tag2", "tag3"],,
	"isActive": true,
	"nested": {
		"value": 123.456,
		"description": "Nested description"
	}
}
""";

// Retrieve the furthest error
var error = parser.TryMatchToken("value", invalidJson).Context.CreateErrorGroups().Last!;
Console.WriteLine(error.Column); // 35
Console.WriteLine(error.Line);   // 5

// Also you can check if the input matches token the fastest way, without value calculation:
Console.WriteLine(parser.MatchesToken("value", "[90, 60, true, null]", out int matchedLength)); // true

Finding patterns

var builder = new ParserBuilder();

// Skip unnecessary whitespace (you can configure comments here and they will be ignored when matching)
builder.Settings.SkipWhitespaces();

// Create the rule that we will find in text
builder.CreateMainRule()
	.Literal("Price:")
	.Number<double>() // 1
	.LiteralChoice("USD", "EUR") // 2
	.Transform(v =>
	{
		var number = v[1].Value; // Get the number value
		var currency = v[2].Text; // Get the 'USD' or 'EUR' text
		return new { Amount = number, Currency = currency };
	});

var input =
"""
Some log entries.
Price: 42.99 USD
Error: something happened.
Price: 99.50 EUR
Another line.
Price: 2.50 USD
""";

// Find all transformed matches
var prices = builder.Build().FindAllMatches<dynamic>(input).ToList();

foreach (var price in prices)
{
	Console.WriteLine($"Price: {price.Amount}; Currency: {price.Currency}");
}

Errors example

There is how errors are displayed in the default mode:

RCParsing.ParsingException : An error occurred during parsing:

The line where the error occurred (position 130):
	"tags": ["tag1", "tag2", "tag3"],,
                   line 5, column 35 ^

',' is unexpected character, expected one of:
  'string'
  literal '}'

... and more errors omitted

And there is errors when using the builder.Settings.UseDebug() setting:

RCParsing.ParsingException : An error occurred during parsing:

['string']: Failed to parse token.
['pair']: Failed to parse sequence rule.
[literal '}']: Failed to parse token.
['object']: Failed to parse sequence rule.

The line where the error occurred (position 130):
	"tags": ["tag1", "tag2", "tag3"],,
                   line 5, column 35 ^

',' is unexpected character, expected one of:
  'string'
  'pair'
  literal '}'
  'object'

['string'] Stack trace (top call recently):
- Sequence 'pair':
    'string' <-- here
    literal ':'
    'value'
- SeparatedRepeat[0..] (allow trailing): 'pair' <-- here
  sep literal ','
- Sequence 'object':
    literal '{'
    SeparatedRepeat[0..] (allow trailing)... <-- here
    literal '}'
- Choice 'value':
    'string'
    'number'
    'boolean'
    'null'
    'array'
    'object' <-- here
- Sequence 'content':
    'value' <-- here
    end of file

[literal '}'] Stack trace (top call recently):
- Sequence 'object':
    literal '{'
    SeparatedRepeat[0..] (allow trailing)...
    literal '}' <-- here
- Choice 'value':
    'string'
    'number'
    'boolean'
    'null'
    'array'
    'object' <-- here
- Sequence 'content':
    'value' <-- here
    end of file

... and more errors omitted

Walk Trace:

... 316 hidden parsing steps. Total: 356 ...
[ENTER]   pos:128   literal '//'
[FAIL]    pos:128   literal '//' failed to match: '],,\r\n\t"isActive...'
[FAIL]    pos:128   Sequence... failed to match: '],,\r\n\t"isActive...'
[FAIL]    pos:128   'skip' failed to match: '],,\r\n\t"isActive...'
[ENTER]   pos:128   literal ','
[FAIL]    pos:128   literal ',' failed to match: '],,\r\n\t"isActive...'
[SUCCESS] pos:106   SeparatedRepeat[0..] (allow trailing)... matched: '"tag1", "tag2", "tag3"' [22 chars]
[ENTER]   pos:128   literal ']'
[SUCCESS] pos:128   literal ']' matched: ']' [1 chars]
[SUCCESS] pos:105   'array' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:105   'value' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:97    'pair' matched: '"tags": ["tag1" ..... ", "tag3"]' [32 chars]
[ENTER]   pos:129   'skip'
[ENTER]   pos:129   whitespaces
[FAIL]    pos:129   whitespaces failed to match: ',,\r\n\t"isActive"...'
[ENTER]   pos:129   Sequence...
[ENTER]   pos:129   literal '//'
[FAIL]    pos:129   literal '//' failed to match: ',,\r\n\t"isActive"...'
[FAIL]    pos:129   Sequence... failed to match: ',,\r\n\t"isActive"...'
[FAIL]    pos:129   'skip' failed to match: ',,\r\n\t"isActive"...'
[ENTER]   pos:129   literal ','
[SUCCESS] pos:129   literal ',' matched: ',' [1 chars]
[ENTER]   pos:130   'skip'
[ENTER]   pos:130   whitespaces
[FAIL]    pos:130   whitespaces failed to match: ',\r\n\t"isActive":...'
[ENTER]   pos:130   Sequence...
[ENTER]   pos:130   literal '//'
[FAIL]    pos:130   literal '//' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   Sequence... failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   'skip' failed to match: ',\r\n\t"isActive":...'
[ENTER]   pos:130   'pair'
[ENTER]   pos:130   'string'
[FAIL]    pos:130   'string' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:130   'pair' failed to match: ',\r\n\t"isActive":...'
[SUCCESS] pos:4     SeparatedRepeat[0..] (allow trailing)... matched: '"id": 1,\r\n\t"nam ..... , "tag3"],' [126 chars]
[ENTER]   pos:130   literal '}'
[FAIL]    pos:130   literal '}' failed to match: ',\r\n\t"isActive":...'
[FAIL]    pos:0     'object' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL]    pos:0     'value' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL]    pos:0     'content' failed to match: '{\r\n\t"id": 1,\r\n\t...'

... End of walk trace ...

Comparison with Other Parsing Libraries

RCParsing is designed to outstand with unique features, and easy developer experience, but it performance is good enough to compete with other fastest parser tools.

Feature Comparison

This table highlights the unique architectural and usability features of each library.

Feature	RCParsing	Pidgin	Parlot	Superpower	ANTLR4
Architecture	Scannerless hybrid	Scannerless	Scannerless	Lexer-based	Lexer-based with modes
API	Fluent, lambda-based	Functional	Fluent/functional	Fluent/functional	Grammar Files
Barrier/complex Tokens	Yes, built-in or manual	None	None	Yes, manual	Yes, manual
Skipping	6 strategies, global or manual	Manual	Global or manual	Lexer-based	Lexer-based
Error Messages	Extremely Detailed, extendable with API	Simple	Manual messages	Simple	Simple by default, extendable
Minimum .NET Target	.NET Standard 2.0	.NET 7.0	.NET Standard 2.0	.NET Standard 2.0	.NET Framework 4.5

Benchmarks

All benchmarks are done via BenchmarkDotNet.

Here is machine and runtime information:

BenchmarkDotNet v0.15.2, Windows 10 (10.0.19045.3448/22H2/2022Update)
AMD Ryzen 5 5600 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 9.0.302
  [Host]     : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2
  Job-KTXINV : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2

JSON AST

The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses visitors to transform value from AST (Abstract Syntax Tree).

Method	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
JsonBig_RCParsing	172.574 us	1.0484 us	0.4655 us	1.00	13.1836	4.1504	218.23 KB	1.00
JsonBig_RCParsing_NoValue	143.908 us	1.2655 us	0.5619 us	0.83	8.3008	2.6855	136.34 KB	0.62
JsonBig_RCParsing_Optimized	94.066 us	1.0540 us	0.4680 us	0.55	9.3994	2.0752	154.15 KB	0.71
JsonBig_RCParsing_Optimized_NoValue	60.354 us	0.2330 us	0.0831 us	0.35	4.3945	0.9155	72.25 KB	0.33
JsonBig_ANTLR	182.729 us	0.9600 us	0.4263 us	1.06	19.5313	7.5684	322.84 KB	1.48
JsonBig_ANTLR_NoValue	123.729 us	0.9649 us	0.4284 us	0.72	10.7422	3.9063	176.01 KB	0.81

JsonShort_RCParsing	9.260 us	0.0524 us	0.0233 us	1.00	0.6561	-	10.73 KB	1.00
JsonShort_RCParsing_NoValue	7.351 us	0.0396 us	0.0141 us	0.79	0.3891	-	6.44 KB	0.60
JsonShort_RCParsing_Optimized	5.222 us	0.0447 us	0.0159 us	0.56	0.5341	0.0076	8.77 KB	0.82
JsonShort_RCParsing_Optimized_NoValue	3.349 us	0.0514 us	0.0183 us	0.36	0.2708	0.0038	4.47 KB	0.42
JsonShort_ANTLR	10.521 us	0.0886 us	0.0393 us	1.14	1.1444	0.0305	18.91 KB	1.76
JsonShort_ANTLR_NoValue	7.029 us	0.0323 us	0.0143 us	0.76	0.6332	0.0229	10.35 KB	0.96

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), UseFirstCharacterMatch(), IgnoreErrors() and SkipWhitespacesOptimized() settings.
*_NoValue methods does not calculates a value from AST.
JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.
JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).

JSON Combinators

The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses token combination style for immediate transformations without AST.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
JsonBig_RCParsing	30,539.1 ns	966.16 ns	344.54 ns	1.00	0.01	2.5635	0.1831	43096 B	1.00
JsonBig_RCParsing_NoValue	18,568.7 ns	90.52 ns	32.28 ns	0.61	0.01	0.4883	-	8312 B	0.19
JsonBig_Parlot	43,390.4 ns	2,386.59 ns	1,059.66 ns	1.42	0.04	1.9531	0.1221	32848 B	0.76
JsonBig_Pidgin	222,498.3 ns	43,307.75 ns	19,228.91 ns	7.29	0.59	3.9063	0.2441	66816 B	1.55
JsonBig_Superpower	1,296,556.1 ns	149,894.47 ns	66,554.07 ns	42.46	2.09	39.0625	5.8594	653627 B	15.17
JsonBig_Sprache	1,188,669.8 ns	23,721.34 ns	10,532.42 ns	38.93	0.52	232.4219	27.3438	3899736 B	90.49

JsonShort_RCParsing	1,591.3 ns	12.23 ns	4.36 ns	1.00	0.00	0.1354	-	2280 B	1.00
JsonShort_RCParsing_NoValue	990.8 ns	5.21 ns	1.86 ns	0.62	0.00	0.0324	-	568 B	0.25
JsonShort_Parlot	2,339.5 ns	8.47 ns	3.76 ns	1.47	0.00	0.1144	-	1960 B	0.86
JsonShort_Pidgin	10,735.2 ns	38.67 ns	13.79 ns	6.75	0.02	0.2136	-	3664 B	1.61
JsonShort_Superpower	65,377.8 ns	610.65 ns	217.76 ns	41.08	0.16	1.9531	-	34117 B	14.96
JsonShort_Sprache	63,140.1 ns	597.33 ns	213.01 ns	39.68	0.16	12.6953	0.2441	213168 B	93.49

Notes:

RCParsing uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.
RCParsing_NoValue method does not calculates a value, just validation.
Parlot uses Compiled() version of parser.
JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.
JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).

Expressions

The int value calculation from expression with parentheses (), spaces and operators +-/* with precedence.

Method	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
ExpressionBig_RCParsing	291,542.2 ns	5,064.93 ns	1,315.35 ns	1.00	23.9258	11.7188	403312 B	1.00
ExpressionBig_RCParsing_Optimized	169,101.5 ns	3,900.30 ns	603.58 ns	0.58	20.0195	9.0332	337688 B	0.84
ExpressionBig_RCParsing_TokenCombination	57,988.6 ns	911.87 ns	236.81 ns	0.20	4.1504	0.0610	70288 B	0.17
ExpressionBig_Parlot	64,083.7 ns	278.18 ns	72.24 ns	0.22	3.2959	-	56608 B	0.14
ExpressionBig_Pidgin	678,366.6 ns	7,911.06 ns	2,054.48 ns	2.33	0.9766	-	23536 B	0.06

ExpressionShort_RCParsing	2,317.5 ns	49.43 ns	7.65 ns	1.00	0.2213	-	3736 B	1.00
ExpressionShort_RCParsing_Optimized	1,546.2 ns	32.98 ns	8.57 ns	0.67	0.2136	-	3584 B	0.96
ExpressionShort_RCParsing_TokenCombination	512.7 ns	5.18 ns	1.35 ns	0.22	0.0391	-	656 B	0.18
ExpressionShort_Parlot	580.2 ns	10.32 ns	2.68 ns	0.25	0.0534	-	896 B	0.24
ExpressionShort_Pidgin	6,522.9 ns	107.10 ns	27.81 ns	2.81	0.0153	-	344 B	0.09

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and SkipWhitespacesOptimized() settings.
RCParsing_TokenCombination uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.
Parlot uses Compiled() version of parser.
ExpressionShort methods uses single line with 4 operators of hardcoded (not generated) expression.
ExpressionBig methods uses single line with ~400 operators of hardcoded (not generated) expression.

Regex

Matching identifiers and emails in the plain text.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
EmailsBig_RCParsing	236,175.3 ns	26,801.07 ns	6,960.15 ns	1.00	0.04	0.9766	-	16568 B	1.00
EmailsBig_RCParsing_Optimized	157,271.9 ns	5,076.92 ns	1,318.46 ns	0.67	0.02	0.9766	-	16568 B	1.00
EmailsBig_Regex	27,638.6 ns	711.08 ns	184.66 ns	0.12	0.00	1.5564	0.1221	26200 B	1.58

EmailsShort_RCParsing	6,658.5 ns	78.57 ns	20.40 ns	1.00	0.00	0.0916	-	1600 B	1.00
EmailsShort_RCParsing_Optimized	3,799.0 ns	35.69 ns	5.52 ns	0.57	0.00	0.0954	-	1600 B	1.00
EmailsShort_Regex	931.5 ns	13.52 ns	3.51 ns	0.14	0.00	0.0601	-	1008 B	0.63

IdentifiersBig_RCParsing	158,034.1 ns	4,041.56 ns	625.44 ns	1.00	0.01	5.8594	-	101664 B	1.00
IdentifiersBig_RCParsing_Optimized	99,086.9 ns	1,619.80 ns	420.66 ns	0.63	0.00	5.9814	-	101664 B	1.00
IdentifiersBig_Regex	71,439.8 ns	4,727.93 ns	731.65 ns	0.45	0.00	11.1084	3.6621	187248 B	1.84

IdentifiersShort_RCParsing	4,041.5 ns	172.86 ns	44.89 ns	1.00	0.01	0.2518	-	4240 B	1.00
IdentifiersShort_RCParsing_Optimized	2,930.9 ns	56.37 ns	14.64 ns	0.73	0.01	0.2518	-	4240 B	1.00
IdentifiersShort_Regex	2,386.2 ns	160.57 ns	41.70 ns	0.59	0.01	0.3624	0.0076	6104 B	1.44

Notes:

RCParsing uses naive pattern for matching, without any optimization settings applied.
RCParsing_Optimized uses the same pattern, but with configured skip-rule for making it faster.
Regex uses RegexOptions.Compiled flags.
Identifiers pattern is [a-zA-Z_][a-zA-Z0-9_]*.
Emails pattern is [a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+.

GraphQL

Just GraphQL parsing without transformations from AST. GraphQL is a mid-complex language that can be described in 600 lines of ANTLR's version of BNF notation.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
QueryBig_RCParsing_Default	1,610.33 us	9.340 us	3.331 us	1.00	0.00	31.2500	13.6719	3.9063	603.94 KB	1.00
QueryBig_RCParsing_Optimized	369.89 us	1.312 us	0.468 us	0.23	0.00	20.9961	5.3711	-	345.59 KB	0.57
QueryBig_ANTLR	1,206.50 us	19.478 us	8.648 us	0.75	0.01	35.1563	11.7188	-	590.55 KB	0.98

QueryShort_RCParsing_Default	166.01 us	0.599 us	0.266 us	1.00	0.00	4.3945	0.4883	-	72.58 KB	1.00
QueryShort_RCParsing_Optimized	37.10 us	0.132 us	0.058 us	0.22	0.00	2.3193	0.1221	-	38.31 KB	0.53
QueryShort_ANTLR	68.37 us	0.321 us	0.142 us	0.41	0.00	5.9814	0.7324	-	99.2 KB	1.37

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.
RCParsing grammar was ported from this ANTLR Grammar.
QueryShort methods uses ~40 lines of GraphQL query.
QueryBig methods uses ~400 lines of GraphQL query with various content (all syntax structures, long and deep queries).

Python

Yes, seriously, the entire Python 3.13 parsing, without transformations from AST. Involves barrier tokens for RCParsing and custom lexer for ANTLR.

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
PythonBig_RCParsing_Default	35,562.6 us	1,261.52 us	560.12 us	1.00	0.02	357.1429	285.7143	142.8571	37508.89 KB	1.00
PythonBig_RCParsing_Optimized	4,969.6 us	17.83 us	6.36 us	0.14	0.00	226.5625	140.6250	-	3974.76 KB	0.11
PythonBig_RCParsing_Memoized	22,798.5 us	238.79 us	85.15 us	0.64	0.01	281.2500	250.0000	125.0000	26926.87 KB	0.72
PythonBig_RCParsing_MemoizedOptimized	11,273.2 us	356.30 us	158.20 us	0.32	0.01	234.3750	218.7500	93.7500	11964.9 KB	0.32
PythonBig_ANTLR	5,583.7 us	36.85 us	13.14 us	0.16	0.00	406.2500	281.2500	-	6699.11 KB	0.18

PythonShort_RCParsing_Default	3,535.2 us	18.01 us	6.42 us	1.00	0.00	46.8750	19.5313	7.8125	2569.46 KB	1.00
PythonShort_RCParsing_Optimized	580.7 us	3.29 us	1.46 us	0.16	0.00	28.3203	6.8359	-	475.3 KB	0.18
PythonShort_RCParsing_Memoized	1,355.7 us	34.23 us	12.21 us	0.38	0.00	35.1563	25.3906	9.7656	1467.47 KB	0.57
PythonShort_RCParsing_MemoizedOptimized	634.0 us	27.53 us	9.82 us	0.18	0.00	19.5313	15.6250	3.9063	634.17 KB	0.25
PythonShort_ANTLR	556.4 us	1.50 us	0.67 us	0.16	0.00	46.8750	12.6953	-	780.65 KB	0.30

Notes:

RCParsing uses its default configuration, without any optimizations and settings applied.
RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.
RCParsing_Memoized uses UseCaching() setting.
RCParsing_MemoizedOptimized uses UseInlining(), IgnoreErrors(), UseFirstCharacterMatch() and UseCaching() settings.
RCParsing grammar was ported using this ANTLR Grammar and Python Reference Grammar.
PythonShort methods uses ~20 lines of Python code, see source.
PythonBig methods uses ~430 lines of Python code, see source.

More benchmarks will be later here...

Projects using RCParsing

LLTSharp: Used for LLT, the template Razor-like language.

Using RCParsing in your project? We'd love to feature it here! Submit a pull request to add your project to the list.

Roadmap

The future development of RCParsing is focused on:

Performance: Continued profiling and optimization, especially for large files with deep structures.
API Ergonomics: Introducing even more expressive and fluent methods (such as expression builder).
New Built-in Rules: Adding common patterns (e.g., number with wide range of notations).
Visualization Tooling: Exploring tools for debugging and visualizing resulting AST.
Grammar Transformers: Builder extensions that can be used to optimize parsers, eliminate left recursion and more.
Semantic analysis: Multi-stage tools that simplifies AST semantic analysis.
NFA Algorithm: Adaptive parsing algorithm, which is more powerful for parsing complex rules.

Contributing

Contributions are welcome!

This framework is born recently (4 months ago) and some little features may not be tested and be buggy.

RomeCore/RCParsingv5.1.0

Get Started

Readme

RCParsing

Why RCParsing?

Table of contents

Installation

Tutorials, docs and examples

Simple examples

A + B

JSON (with incremental parsing)

Python-like

JSON token combination

Finding patterns

Errors example

Comparison with Other Parsing Libraries

Feature Comparison

Benchmarks

JSON AST

JSON Combinators

Expressions

Regex

GraphQL

Python

Projects using RCParsing

Roadmap

Contributing

Contributions are welcome!

RomeCore/RCParsingv5.1.0

Get Started

Readme

RCParsing

Why RCParsing?

Table of contents

Installation

Tutorials, docs and examples

Simple examples

A + B

JSON (with incremental parsing)

Python-like

JSON token combination

Finding patterns

Errors example

Comparison with Other Parsing Libraries

Feature Comparison

Benchmarks

JSON AST

JSON Combinators

Expressions

Regex

GraphQL

Python

Projects using RCParsing

Roadmap

Contributing

Contributions are welcome!