A flexible, readable, fast and powerful .NET lexerless parser building framework.
$ dotnet add package RCParsingRCParsing - the fluent, lightweight and powerful .NET lexerless parsing library for language development (DSL) and data scraping.
This library focuses on Developer-experience (DX) first, providing best toolkit for creating your programming languages, file formats or even data extraction tools with declarative API, debugging tools, and more. This allows you to design your parser directly in code and easily fix it using stack and walk traces with detailed error messages.
Here is some useful links:
.NET Standard 2.0 (runs on .NET Framework 4.6.1+), .NET 6.0, and .NET 8.0.FindAllMatches feature.You can install the package via NuGet Package Manager or console window, using one of these commands:
dotnet add package RCParsing
Install-Package RCParsing
Or do it manually by cloning this repository.
Here is simple example how to make simple parser that parses "a + b" string with numbers and transforms the result:
using RCParsing;
// First, you need to create a builder
var builder = new ParserBuilder();
// Enable and configure the auto-skip for 'Whitespaces' (you can replace it with any other rule)
builder.Settings.SkipWhitespaces();
// Create a main sequential expression rule
builder.CreateMainRule("expression")
.Number<double>()
.LiteralChoice("+", "-")
.Number<double>()
.Transform(v => {
var value1 = v.GetValue<double>(0);
var op = v.GetValue<string>(1);
var value2 = v.GetValue<double>(2);
return op == "+" ? value1 + value2 : value1 - value2;
});
// Build the parser
var parser = builder.Build();
// Parse a string using 'expression' rule and get the raw AST (value will be calculated lazily)
var parsedRule = parser.Parse("10 + 15");
// We can now get the value from our 'Transform' functions (value calculates now)
var transformedValue = parsedRule.GetValue<double>();
Console.WriteLine(transformedValue); // 25
And here is JSON example that also shows the partial re-parsing of parse tree:
var builder = new ParserBuilder();
// Configure AST type and skip-rule for whitespace and comments
builder.Settings
.Skip(r => r.Rule("skip"), ParserSkippingStrategy.SkipBeforeParsingGreedy)
.UseLazyAST(); // Use lazy AST type to store cached resuls
// The rule that will be skipped before every parsing attempt
builder.CreateRule("skip")
.Choice(
b => b.Whitespaces(),
b => b.Literal("//").TextUntil('\n', '\r'))
.ConfigureForSkip();
builder.CreateToken("string")
.Literal('"')
.EscapedTextPrefix(prefix: '\\', '\\', '\"') // This sub-token automatically escapes the source string and puts it into intermediate value
.Literal('"')
.Pass(index: 1); // Pass the EscapedTextPrefix's intermediate value up (it will be used as token's result value)
builder.CreateToken("number")
.Number<double>();
builder.CreateToken("boolean")
.LiteralChoice("true", "false").Transform(v => v.Text == "true");
builder.CreateToken("null")
.Literal("null").Transform(v => null);
builder.CreateRule("value")
.Choice(
c => c.Token("string"),
c => c.Token("number"),
c => c.Token("boolean"),
c => c.Token("null"),
c => c.Rule("array"),
c => c.Rule("object")
); // Choice rule propagates child's value by default
builder.CreateRule("array")
.Literal("[")
.ZeroOrMoreSeparated(v => v.Rule("value"), s => s.Literal(","),
allowTrailingSeparator: true, includeSeparatorsInResult: false)
.TransformLast(v => v.SelectArray())
.Literal("]")
.TransformSelect(index: 1); // Selects the Children[1]'s value
builder.CreateRule("object")
.Literal("{")
.ZeroOrMoreSeparated(v => v.Rule("pair"), s => s.Literal(","),
allowTrailingSeparator: true, includeSeparatorsInResult: false)
.TransformLast(v => v.SelectValues<KeyValuePair<string, object>>().ToDictionary(k => k.Key, v => v.Value))
.Literal("}")
.TransformSelect(index: 1);
builder.CreateRule("pair")
.Token("string")
.Literal(":")
.Rule("value")
.Transform(v => KeyValuePair.Create(v.GetValue<string>(0), v.GetValue(2)));
builder.CreateMainRule("content")
.Rule("value")
.EOF() // Sure that we captured all the input
.TransformSelect(0);
var jsonParser = builder.Build();
var json =
"""
{
"id": 1,
"name": "Sample Data",
"created": "2023-01-01T00:00:00", // This is a comment
"tags": ["tag1", "tag2", "tag3"],
"isActive": true,
"nested": {
"value": 123.456,
"description": "Nested description"
}
}
""";
// The same JSON, but with 'tags' value changed
var changedJson =
"""
{
"id": 1,
"name": "Sample Data",
"created": "2023-01-01T00:00:00", // This is a comment
"tags": { "nested": ["tag1", "tag2", "tag3"] },
"isActive": true,
"nested": {
"value": 123.456,
"description": "Nested description"
}
}
""";
// Parse the input text and calculate values (them will be recorded into the cache because we're using lazy AST)
var ast = jsonParser.Parse(json);
var value = ast.Value as Dictionary<string, object>;
var tags = value!["tags"] as object[];
var nested = value!["nested"] as Dictionary<string, object>;
// Prints: Sample Data
Console.WriteLine(value["name"]);
// Prints: tag1
Console.WriteLine(tags![0]);
// Re-parse the sligtly changed input string and get the values
var changedAst = ast.Reparsed(changedJson);
var changedValue = changedAst.Value as Dictionary<string, object>;
var changedTags = changedValue!["tags"] as Dictionary<string, object>;
var nestedTags = changedTags!["nested"] as object[];
var changedNested = changedValue!["nested"] as Dictionary<string, object>;
// Prints type: System.Object[]
Console.WriteLine(changedTags["nested"]);
// Prints: tag1
Console.WriteLine(nestedTags![0]);
// And untouched values remains the same!
// Prints: True
Console.WriteLine(ReferenceEquals(nested, changedNested));
This example involves our killer-feature, barrier tokens that allows to parse indentations without missing them:
using RCParsing;
using RCParsing.Building;
var builder = new ParserBuilder();
builder.Settings.SkipWhitespaces();
// Add the 'INDENT' and 'DEDENT' barrier tokenizer
// 'INDENT' is emitted when indentation grows
// And 'DEDENT' is emitted when indentation cuts
// They are indentation delta tokens
builder.BarrierTokenizers
.AddIndent(indentSize: 4, "INDENT", "DEDENT");
// Create the statement rule
builder.CreateRule("statement")
.Choice(
b => b
.Literal("def")
.Identifier()
.Literal("():")
.Rule("block"),
b => b
.Literal("if")
.Identifier()
.Literal(":")
.Rule("block"),
b => b
.Identifier()
.Literal("=")
.Identifier()
.Literal(";"));
// Create the 'block' rule that matches our 'INDENT' and 'DEDENT' barrier tokens
builder.CreateRule("block")
.Token("INDENT")
.OneOrMore(b => b.Rule("statement"))
.Token("DEDENT");
builder.CreateMainRule("program")
.ZeroOrMore(b => b.Rule("statement"))
.EOF();
var parser = builder.Build();
string inputStr =
"""
def a():
b = c;
c = a;
a = p;
if c:
h = i;
if b:
a = aa;
""";
// Get the optimized AST...
var ast = parser.Parse(inputStr).Optimized();
// And print it!
foreach (var statement in ast.Children)
{
Console.WriteLine(statement.Text);
Console.Write("\n\n");
}
// Outputs:
/*
def a():
b = c;
c = a;
a = p;
if c:
h = i;
if b:
a = aa;
*/
Tokens in this parser can be complex enough to act like the combinators, with immediate value transformation without AST:
var builder = new ParserBuilder();
// Use lookahead for 'Choice' tokens
builder.Settings.UseFirstCharacterMatch();
builder.CreateToken("string")
// 'Between' token pattern matches a sequence of three elements,
// but calculates and propagates intermediate value of second element
.Between(
b => b.Literal('"'),
b => b.TextUntil('"'),
b => b.Literal('"'));
builder.CreateToken("number")
.Number<double>();
builder.CreateToken("boolean")
// 'Map' token pattern applies intermediate value transformer to child's value
.Map<string>(b => b.LiteralChoice("true", "false"), m => m == "true");
builder.CreateToken("null")
// 'Return' does not calculates value for child element, just returns 'null' here
.Return(b => b.Literal("null"), null);
builder.CreateToken("value")
// Skip whitespaces before value token
.SkipWhitespaces(b =>
// 'Choice' token selects the matched token's value
b.Choice(
c => c.Token("string"),
c => c.Token("number"),
c => c.Token("boolean"),
c => c.Token("null"),
c => c.Token("array"),
c => c.Token("object")
));
builder.CreateToken("value_list")
.ZeroOrMoreSeparated(
b => b.Token("value"),
b => b.SkipWhitespaces(b => b.Literal(',')),
includeSeparatorsInResult: false)
// You can apply passage function for tokens that
// matches multiple and variable amount of child elements
.Pass(v =>
{
return v.ToArray();
});
builder.CreateToken("array")
.Between(
b => b.Literal('['),
b => b.Token("value_list"),
b => b.SkipWhitespaces(b => b.Literal(']')));
builder.CreateToken("pair")
.SkipWhitespaces(b => b.Token("string"))
.SkipWhitespaces(b => b.Literal(':'))
.Token("value")
.Pass(v =>
{
return KeyValuePair.Create((string)v[0]!, v[2]);
});
builder.CreateToken("pair_list")
.ZeroOrMoreSeparated(
b => b.Token("pair"),
b => b.SkipWhitespaces(b => b.Literal(',')))
.Pass(v =>
{
return v.Cast<KeyValuePair<string, object>>().ToDictionary();
});
builder.CreateToken("object")
.Between(
b => b.Literal('{'),
b => b.Token("pair_list"),
b => b.SkipWhitespaces(b => b.Literal('}')));
var parser = builder.Build();
var json =
"""
{
"id": 1,
"name": "Sample Data",
"created": "2023-01-01T00:00:00",
"tags": ["tag1", "tag2", "tag3"],
"isActive": true,
"nested": {
"value": 123.456,
"description": "Nested description"
}
}
""";
// Match the token directly and produce intermediate value
var result = parser.MatchToken<Dictionary<string, object>>("value", json);
Console.WriteLine(result["name"]); // Outputs: Sample Data
var invalidJson =
"""
{
"id": 1,
"name": "Sample Data",
"created": "2023-01-01T00:00:00",
"tags": ["tag1", "tag2", "tag3"],,
"isActive": true,
"nested": {
"value": 123.456,
"description": "Nested description"
}
}
""";
// Retrieve the furthest error
var error = parser.TryMatchToken("value", invalidJson).Context.CreateErrorGroups().Last!;
Console.WriteLine(error.Column); // 35
Console.WriteLine(error.Line); // 5
// Also you can check if the input matches token the fastest way, without value calculation:
Console.WriteLine(parser.MatchesToken("value", "[90, 60, true, null]", out int matchedLength)); // true
The FindAllMatches method allows you to extract all occurrences of a pattern from a string, even in complex inputs, while handling optional transformations. Here's an example where will find the Price: *PRICE* (USD|EUR) pattern:
var builder = new ParserBuilder();
// Skip unnecessary whitespace (you can configure comments here and they will be ignored when matching)
builder.Settings.SkipWhitespaces();
// Create the rule that we will find in text
builder.CreateMainRule()
.Literal("Price:")
.Number<double>() // 1
.LiteralChoice("USD", "EUR") // 2
.Transform(v =>
{
var number = v[1].Value; // Get the number value
var currency = v[2].Text; // Get the 'USD' or 'EUR' text
return new { Amount = number, Currency = currency };
});
var input =
"""
Some log entries.
Price: 42.99 USD
Error: something happened.
Price: 99.50 EUR
Another line.
Price: 2.50 USD
""";
// Find all transformed matches
var prices = builder.Build().FindAllMatches<dynamic>(input).ToList();
foreach (var price in prices)
{
Console.WriteLine($"Price: {price.Amount}; Currency: {price.Currency}");
}
There is how errors are displayed in the default mode:
RCParsing.ParsingException : An error occurred during parsing:
The line where the error occurred (position 130):
"tags": ["tag1", "tag2", "tag3"],,
line 5, column 35 ^
',' is unexpected character, expected one of:
'string'
literal '}'
... and more errors omitted
And there is errors when using the builder.Settings.UseDebug() setting:
RCParsing.ParsingException : An error occurred during parsing:
['string']: Failed to parse token.
['pair']: Failed to parse sequence rule.
[literal '}']: Failed to parse token.
['object']: Failed to parse sequence rule.
The line where the error occurred (position 130):
"tags": ["tag1", "tag2", "tag3"],,
line 5, column 35 ^
',' is unexpected character, expected one of:
'string'
'pair'
literal '}'
'object'
['string'] Stack trace (top call recently):
- Sequence 'pair':
'string' <-- here
literal ':'
'value'
- SeparatedRepeat[0..] (allow trailing): 'pair' <-- here
sep literal ','
- Sequence 'object':
literal '{'
SeparatedRepeat[0..] (allow trailing)... <-- here
literal '}'
- Choice 'value':
'string'
'number'
'boolean'
'null'
'array'
'object' <-- here
- Sequence 'content':
'value' <-- here
end of file
[literal '}'] Stack trace (top call recently):
- Sequence 'object':
literal '{'
SeparatedRepeat[0..] (allow trailing)...
literal '}' <-- here
- Choice 'value':
'string'
'number'
'boolean'
'null'
'array'
'object' <-- here
- Sequence 'content':
'value' <-- here
end of file
... and more errors omitted
Walk Trace:
... 316 hidden parsing steps. Total: 356 ...
[ENTER] pos:128 literal '//'
[FAIL] pos:128 literal '//' failed to match: '],,\r\n\t"isActive...'
[FAIL] pos:128 Sequence... failed to match: '],,\r\n\t"isActive...'
[FAIL] pos:128 'skip' failed to match: '],,\r\n\t"isActive...'
[ENTER] pos:128 literal ','
[FAIL] pos:128 literal ',' failed to match: '],,\r\n\t"isActive...'
[SUCCESS] pos:106 SeparatedRepeat[0..] (allow trailing)... matched: '"tag1", "tag2", "tag3"' [22 chars]
[ENTER] pos:128 literal ']'
[SUCCESS] pos:128 literal ']' matched: ']' [1 chars]
[SUCCESS] pos:105 'array' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:105 'value' matched: '["tag1", "tag2", "tag3"]' [24 chars]
[SUCCESS] pos:97 'pair' matched: '"tags": ["tag1" ..... ", "tag3"]' [32 chars]
[ENTER] pos:129 'skip'
[ENTER] pos:129 whitespaces
[FAIL] pos:129 whitespaces failed to match: ',,\r\n\t"isActive"...'
[ENTER] pos:129 Sequence...
[ENTER] pos:129 literal '//'
[FAIL] pos:129 literal '//' failed to match: ',,\r\n\t"isActive"...'
[FAIL] pos:129 Sequence... failed to match: ',,\r\n\t"isActive"...'
[FAIL] pos:129 'skip' failed to match: ',,\r\n\t"isActive"...'
[ENTER] pos:129 literal ','
[SUCCESS] pos:129 literal ',' matched: ',' [1 chars]
[ENTER] pos:130 'skip'
[ENTER] pos:130 whitespaces
[FAIL] pos:130 whitespaces failed to match: ',\r\n\t"isActive":...'
[ENTER] pos:130 Sequence...
[ENTER] pos:130 literal '//'
[FAIL] pos:130 literal '//' failed to match: ',\r\n\t"isActive":...'
[FAIL] pos:130 Sequence... failed to match: ',\r\n\t"isActive":...'
[FAIL] pos:130 'skip' failed to match: ',\r\n\t"isActive":...'
[ENTER] pos:130 'pair'
[ENTER] pos:130 'string'
[FAIL] pos:130 'string' failed to match: ',\r\n\t"isActive":...'
[FAIL] pos:130 'pair' failed to match: ',\r\n\t"isActive":...'
[SUCCESS] pos:4 SeparatedRepeat[0..] (allow trailing)... matched: '"id": 1,\r\n\t"nam ..... , "tag3"],' [126 chars]
[ENTER] pos:130 literal '}'
[FAIL] pos:130 literal '}' failed to match: ',\r\n\t"isActive":...'
[FAIL] pos:0 'object' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL] pos:0 'value' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL] pos:0 'content' failed to match: '{\r\n\t"id": 1,\r\n\t...'
... End of walk trace ...
RCParsing is designed to outstand with unique features, and easy developer experience, but it performance is good enough to compete with other fastest parser tools.
This table highlights the unique architectural and usability features of each library.
| Feature | RCParsing | Pidgin | Parlot | Superpower | ANTLR4 |
|---|---|---|---|---|---|
| Architecture | Scannerless hybrid | Scannerless | Scannerless | Lexer-based | Lexer-based with modes |
| API | Fluent, lambda-based | Functional | Fluent/functional | Fluent/functional | Grammar Files |
| Barrier/complex Tokens | Yes, built-in or manual | None | None | Yes, manual | Yes, manual |
| Skipping | 6 strategies, global or manual | Manual | Global or manual | Lexer-based | Lexer-based |
| Error Messages | Extremely Detailed, extendable with API | Simple | Manual messages | Simple | Simple by default, extendable |
| Minimum .NET Target | .NET Standard 2.0 | .NET 7.0 | .NET Standard 2.0 | .NET Standard 2.0 | .NET Framework 4.5 |
All benchmarks are done via BenchmarkDotNet.
Here is machine and runtime information:
BenchmarkDotNet v0.15.2, Windows 10 (10.0.19045.3448/22H2/2022Update)
AMD Ryzen 5 5600 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 9.0.302
[Host] : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2
Job-KTXINV : .NET 8.0.18 (8.0.1825.31117), X64 RyuJIT AVX2
The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses visitors to transform value from AST (Abstract Syntax Tree).
| Method | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|
| JsonBig_RCParsing | 172.574 us | 1.0484 us | 0.4655 us | 1.00 | 13.1836 | 4.1504 | 218.23 KB | 1.00 |
| JsonBig_RCParsing_NoValue | 143.908 us | 1.2655 us | 0.5619 us | 0.83 | 8.3008 | 2.6855 | 136.34 KB | 0.62 |
| JsonBig_RCParsing_Optimized | 94.066 us | 1.0540 us | 0.4680 us | 0.55 | 9.3994 | 2.0752 | 154.15 KB | 0.71 |
| JsonBig_RCParsing_Optimized_NoValue | 60.354 us | 0.2330 us | 0.0831 us | 0.35 | 4.3945 | 0.9155 | 72.25 KB | 0.33 |
| JsonBig_ANTLR | 182.729 us | 0.9600 us | 0.4263 us | 1.06 | 19.5313 | 7.5684 | 322.84 KB | 1.48 |
| JsonBig_ANTLR_NoValue | 123.729 us | 0.9649 us | 0.4284 us | 0.72 | 10.7422 | 3.9063 | 176.01 KB | 0.81 |
| JsonShort_RCParsing | 9.260 us | 0.0524 us | 0.0233 us | 1.00 | 0.6561 | - | 10.73 KB | 1.00 |
| JsonShort_RCParsing_NoValue | 7.351 us | 0.0396 us | 0.0141 us | 0.79 | 0.3891 | - | 6.44 KB | 0.60 |
| JsonShort_RCParsing_Optimized | 5.222 us | 0.0447 us | 0.0159 us | 0.56 | 0.5341 | 0.0076 | 8.77 KB | 0.82 |
| JsonShort_RCParsing_Optimized_NoValue | 3.349 us | 0.0514 us | 0.0183 us | 0.36 | 0.2708 | 0.0038 | 4.47 KB | 0.42 |
| JsonShort_ANTLR | 10.521 us | 0.0886 us | 0.0393 us | 1.14 | 1.1444 | 0.0305 | 18.91 KB | 1.76 |
| JsonShort_ANTLR_NoValue | 7.029 us | 0.0323 us | 0.0143 us | 0.76 | 0.6332 | 0.0229 | 10.35 KB | 0.96 |
Notes:
RCParsing uses its default configuration, without any optimizations and settings applied.RCParsing_Optimized uses UseInlining(), UseFirstCharacterMatch(), IgnoreErrors() and SkipWhitespacesOptimized() settings.*_NoValue methods does not calculates a value from AST.JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).The JSON value calculation with the typeset Dictionary<string, object>, object[], string, int and null. It uses token combination style for immediate transformations without AST.
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| JsonBig_RCParsing | 30,539.1 ns | 966.16 ns | 344.54 ns | 1.00 | 0.01 | 2.5635 | 0.1831 | 43096 B | 1.00 |
| JsonBig_RCParsing_NoValue | 18,568.7 ns | 90.52 ns | 32.28 ns | 0.61 | 0.01 | 0.4883 | - | 8312 B | 0.19 |
| JsonBig_Parlot | 43,390.4 ns | 2,386.59 ns | 1,059.66 ns | 1.42 | 0.04 | 1.9531 | 0.1221 | 32848 B | 0.76 |
| JsonBig_Pidgin | 222,498.3 ns | 43,307.75 ns | 19,228.91 ns | 7.29 | 0.59 | 3.9063 | 0.2441 | 66816 B | 1.55 |
| JsonBig_Superpower | 1,296,556.1 ns | 149,894.47 ns | 66,554.07 ns | 42.46 | 2.09 | 39.0625 | 5.8594 | 653627 B | 15.17 |
| JsonBig_Sprache | 1,188,669.8 ns | 23,721.34 ns | 10,532.42 ns | 38.93 | 0.52 | 232.4219 | 27.3438 | 3899736 B | 90.49 |
| JsonShort_RCParsing | 1,591.3 ns | 12.23 ns | 4.36 ns | 1.00 | 0.00 | 0.1354 | - | 2280 B | 1.00 |
| JsonShort_RCParsing_NoValue | 990.8 ns | 5.21 ns | 1.86 ns | 0.62 | 0.00 | 0.0324 | - | 568 B | 0.25 |
| JsonShort_Parlot | 2,339.5 ns | 8.47 ns | 3.76 ns | 1.47 | 0.00 | 0.1144 | - | 1960 B | 0.86 |
| JsonShort_Pidgin | 10,735.2 ns | 38.67 ns | 13.79 ns | 6.75 | 0.02 | 0.2136 | - | 3664 B | 1.61 |
| JsonShort_Superpower | 65,377.8 ns | 610.65 ns | 217.76 ns | 41.08 | 0.16 | 1.9531 | - | 34117 B | 14.96 |
| JsonShort_Sprache | 63,140.1 ns | 597.33 ns | 213.01 ns | 39.68 | 0.16 | 12.6953 | 0.2441 | 213168 B | 93.49 |
Notes:
RCParsing uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.RCParsing_NoValue method does not calculates a value, just validation.Parlot uses Compiled() version of parser.JsonShort methods uses ~20 lines of hardcoded (not generated) JSON with simple content.JsonBig methods uses ~180 lines of hardcoded (not generated) JSON with various content (deep, long objects/arrays).The int value calculation from expression with parentheses (), spaces and operators +-/* with precedence.
| Method | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|
| ExpressionBig_RCParsing | 291,542.2 ns | 5,064.93 ns | 1,315.35 ns | 1.00 | 23.9258 | 11.7188 | 403312 B | 1.00 |
| ExpressionBig_RCParsing_Optimized | 169,101.5 ns | 3,900.30 ns | 603.58 ns | 0.58 | 20.0195 | 9.0332 | 337688 B | 0.84 |
| ExpressionBig_RCParsing_TokenCombination | 57,988.6 ns | 911.87 ns | 236.81 ns | 0.20 | 4.1504 | 0.0610 | 70288 B | 0.17 |
| ExpressionBig_Parlot | 64,083.7 ns | 278.18 ns | 72.24 ns | 0.22 | 3.2959 | - | 56608 B | 0.14 |
| ExpressionBig_Pidgin | 678,366.6 ns | 7,911.06 ns | 2,054.48 ns | 2.33 | 0.9766 | - | 23536 B | 0.06 |
| ExpressionShort_RCParsing | 2,317.5 ns | 49.43 ns | 7.65 ns | 1.00 | 0.2213 | - | 3736 B | 1.00 |
| ExpressionShort_RCParsing_Optimized | 1,546.2 ns | 32.98 ns | 8.57 ns | 0.67 | 0.2136 | - | 3584 B | 0.96 |
| ExpressionShort_RCParsing_TokenCombination | 512.7 ns | 5.18 ns | 1.35 ns | 0.22 | 0.0391 | - | 656 B | 0.18 |
| ExpressionShort_Parlot | 580.2 ns | 10.32 ns | 2.68 ns | 0.25 | 0.0534 | - | 896 B | 0.24 |
| ExpressionShort_Pidgin | 6,522.9 ns | 107.10 ns | 27.81 ns | 2.81 | 0.0153 | - | 344 B | 0.09 |
Notes:
RCParsing uses its default configuration, without any optimizations and settings applied.RCParsing_Optimized uses UseInlining(), IgnoreErrors() and SkipWhitespacesOptimized() settings.RCParsing_TokenCombination uses complex manual tokens with immediate transformations instead of rules, and UseFirstCharacterMatch() setting.Parlot uses Compiled() version of parser.ExpressionShort methods uses single line with 4 operators of hardcoded (not generated) expression.ExpressionBig methods uses single line with ~400 operators of hardcoded (not generated) expression.Matching identifiers and emails in the plain text.
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| EmailsBig_RCParsing | 236,175.3 ns | 26,801.07 ns | 6,960.15 ns | 1.00 | 0.04 | 0.9766 | - | 16568 B | 1.00 |
| EmailsBig_RCParsing_Optimized | 157,271.9 ns | 5,076.92 ns | 1,318.46 ns | 0.67 | 0.02 | 0.9766 | - | 16568 B | 1.00 |
| EmailsBig_Regex | 27,638.6 ns | 711.08 ns | 184.66 ns | 0.12 | 0.00 | 1.5564 | 0.1221 | 26200 B | 1.58 |
| EmailsShort_RCParsing | 6,658.5 ns | 78.57 ns | 20.40 ns | 1.00 | 0.00 | 0.0916 | - | 1600 B | 1.00 |
| EmailsShort_RCParsing_Optimized | 3,799.0 ns | 35.69 ns | 5.52 ns | 0.57 | 0.00 | 0.0954 | - | 1600 B | 1.00 |
| EmailsShort_Regex | 931.5 ns | 13.52 ns | 3.51 ns | 0.14 | 0.00 | 0.0601 | - | 1008 B | 0.63 |
| IdentifiersBig_RCParsing | 158,034.1 ns | 4,041.56 ns | 625.44 ns | 1.00 | 0.01 | 5.8594 | - | 101664 B | 1.00 |
| IdentifiersBig_RCParsing_Optimized | 99,086.9 ns | 1,619.80 ns | 420.66 ns | 0.63 | 0.00 | 5.9814 | - | 101664 B | 1.00 |
| IdentifiersBig_Regex | 71,439.8 ns | 4,727.93 ns | 731.65 ns | 0.45 | 0.00 | 11.1084 | 3.6621 | 187248 B | 1.84 |
| IdentifiersShort_RCParsing | 4,041.5 ns | 172.86 ns | 44.89 ns | 1.00 | 0.01 | 0.2518 | - | 4240 B | 1.00 |
| IdentifiersShort_RCParsing_Optimized | 2,930.9 ns | 56.37 ns | 14.64 ns | 0.73 | 0.01 | 0.2518 | - | 4240 B | 1.00 |
| IdentifiersShort_Regex | 2,386.2 ns | 160.57 ns | 41.70 ns | 0.59 | 0.01 | 0.3624 | 0.0076 | 6104 B | 1.44 |
Notes:
RCParsing uses naive pattern for matching, without any optimization settings applied.RCParsing_Optimized uses the same pattern, but with configured skip-rule for making it faster.Regex uses RegexOptions.Compiled flags.Identifiers pattern is [a-zA-Z_][a-zA-Z0-9_]*.Emails pattern is [a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+.Just GraphQL parsing without transformations from AST. GraphQL is a mid-complex language that can be described in 600 lines of ANTLR's version of BNF notation.
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|---|
| QueryBig_RCParsing_Default | 1,610.33 us | 9.340 us | 3.331 us | 1.00 | 0.00 | 31.2500 | 13.6719 | 3.9063 | 603.94 KB | 1.00 |
| QueryBig_RCParsing_Optimized | 369.89 us | 1.312 us | 0.468 us | 0.23 | 0.00 | 20.9961 | 5.3711 | - | 345.59 KB | 0.57 |
| QueryBig_ANTLR | 1,206.50 us | 19.478 us | 8.648 us | 0.75 | 0.01 | 35.1563 | 11.7188 | - | 590.55 KB | 0.98 |
| QueryShort_RCParsing_Default | 166.01 us | 0.599 us | 0.266 us | 1.00 | 0.00 | 4.3945 | 0.4883 | - | 72.58 KB | 1.00 |
| QueryShort_RCParsing_Optimized | 37.10 us | 0.132 us | 0.058 us | 0.22 | 0.00 | 2.3193 | 0.1221 | - | 38.31 KB | 0.53 |
| QueryShort_ANTLR | 68.37 us | 0.321 us | 0.142 us | 0.41 | 0.00 | 5.9814 | 0.7324 | - | 99.2 KB | 1.37 |
Notes:
RCParsing uses its default configuration, without any optimizations and settings applied.RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.RCParsing grammar was ported from this ANTLR Grammar.QueryShort methods uses ~40 lines of GraphQL query.QueryBig methods uses ~400 lines of GraphQL query with various content (all syntax structures, long and deep queries).Yes, seriously, the entire Python 3.13 parsing, without transformations from AST. Involves barrier tokens for RCParsing and custom lexer for ANTLR.
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|---|
| PythonBig_RCParsing_Default | 35,562.6 us | 1,261.52 us | 560.12 us | 1.00 | 0.02 | 357.1429 | 285.7143 | 142.8571 | 37508.89 KB | 1.00 |
| PythonBig_RCParsing_Optimized | 4,969.6 us | 17.83 us | 6.36 us | 0.14 | 0.00 | 226.5625 | 140.6250 | - | 3974.76 KB | 0.11 |
| PythonBig_RCParsing_Memoized | 22,798.5 us | 238.79 us | 85.15 us | 0.64 | 0.01 | 281.2500 | 250.0000 | 125.0000 | 26926.87 KB | 0.72 |
| PythonBig_RCParsing_MemoizedOptimized | 11,273.2 us | 356.30 us | 158.20 us | 0.32 | 0.01 | 234.3750 | 218.7500 | 93.7500 | 11964.9 KB | 0.32 |
| PythonBig_ANTLR | 5,583.7 us | 36.85 us | 13.14 us | 0.16 | 0.00 | 406.2500 | 281.2500 | - | 6699.11 KB | 0.18 |
| PythonShort_RCParsing_Default | 3,535.2 us | 18.01 us | 6.42 us | 1.00 | 0.00 | 46.8750 | 19.5313 | 7.8125 | 2569.46 KB | 1.00 |
| PythonShort_RCParsing_Optimized | 580.7 us | 3.29 us | 1.46 us | 0.16 | 0.00 | 28.3203 | 6.8359 | - | 475.3 KB | 0.18 |
| PythonShort_RCParsing_Memoized | 1,355.7 us | 34.23 us | 12.21 us | 0.38 | 0.00 | 35.1563 | 25.3906 | 9.7656 | 1467.47 KB | 0.57 |
| PythonShort_RCParsing_MemoizedOptimized | 634.0 us | 27.53 us | 9.82 us | 0.18 | 0.00 | 19.5313 | 15.6250 | 3.9063 | 634.17 KB | 0.25 |
| PythonShort_ANTLR | 556.4 us | 1.50 us | 0.67 us | 0.16 | 0.00 | 46.8750 | 12.6953 | - | 780.65 KB | 0.30 |
Notes:
RCParsing uses its default configuration, without any optimizations and settings applied.RCParsing_Optimized uses UseInlining(), IgnoreErrors() and UseFirstCharacterMatch() settings.RCParsing_Memoized uses UseCaching() setting.RCParsing_MemoizedOptimized uses UseInlining(), IgnoreErrors(), UseFirstCharacterMatch() and UseCaching() settings.RCParsing grammar was ported using this ANTLR Grammar and Python Reference Grammar.PythonShort methods uses ~20 lines of Python code, see source.PythonBig methods uses ~430 lines of Python code, see source.More benchmarks will be later here...
LLT, the template Razor-like language.Using RCParsing in your project? We'd love to feature it here! Submit a pull request to add your project to the list.
The future development of RCParsing is focused on:
This framework is born recently (4 months ago) and some little features may not be tested and be buggy.
If you have an idea about this project, you can report it to Issues.
For contributing code, please fork the repository and make your changes in a new branch. Once you're ready, create a pull request to merge your changes into the main branch. Pull requests should include a clear description of what was changed and why.