A C# library providing lexers, parsers, and other processing tools for Open Office XML DSLs.
$ dotnet add package OpenLanguageOpenLanguage is a C# library providing lexers, parsers, and other processing tools for Open Office XML DSLs.
Install via NuGet Package Manager:
dotnet add package OpenLanguageOr via Package Manager Console:
Install-Package OpenLanguageusing OpenLanguage.SpreadsheetML.Formula;
// Parse an Excel formula
Ast.Node formula = FormulaParser.Parse("=SUM(A1:A10) * 2");
// Access the AST
Console.WriteLine($"Reconstructed: {formula.ToString()}");
// Try parsing with error handling
Ast.Node? maybeFormula = FormulaParser.TryParse("=INVALID_SYNTAX(");
if (maybeFormula == null)
{
Console.WriteLine("Parse failed - invalid syntax");
}using OpenLanguage.WordprocessingML.FieldInstruction;
using OpenLanguage.WordprocessingML.FieldInstruction.Ast;
using OpenLanguage.WordprocessingML.Ast;
// Parse a field instruction into a strongly-typed AST node
var ast = FieldInstructionParser.Parse("MERGEFIELD FirstName \\* Upper");
// Check the type and use specific properties
if (ast is MergeFieldFieldInstruction mergeField)
{
Console.WriteLine($"Field Name: {mergeField.FieldName}");
if (mergeField.GeneralFormat?.Argument is StringLiteralNode format)
{
Console.WriteLine($"General Format: {format.Value}");
}
}
// Reconstruct field instruction
Console.WriteLine($"Field instruction: {ast.ToString()}");The project uses a CMake-based build system with multiple targets:
# Configure build
cmake -B build
# Process .y/.lex files and generate code
cmake --build build --target process
# Build the solution
cmake --build build --target build
# Run tests
cmake --build build --target test
# Format code
cmake --build build --target format
# Generate documentation
cmake --build build --target doc
# Package for NuGet
cmake --build build --target pack
# Install git hooks
cmake --build build --target install-hooks
# Clean all build artifacts
cmake --build build --target clean-all# Restore dependencies
dotnet restore
# Build solution
dotnet build --configuration Release
# Run tests
dotnet test --configuration Release
# Format code
dotnet csharpier .
# Pack for NuGet
dotnet pack --configuration ReleaseOpenLanguage/
├── OpenLanguage/ # Main library
│ ├── SpreadsheetML/
│ │ └── Formula/ # SpreadsheetML formula processing
│ │ ├── Lang/
│ │ │ ├── Lex/ # Lexical analysis (.lex files)
│ │ │ └── Parse/ # Grammar parsing (.y files)
│ │ └── FormulaParser.cs # Main formula API and parser implementation
│ └── WordprocessingML/
│ ├── FieldInstruction/ # WordprocessingML field instructions
│ ├── MergeField/ # Mail merge functionality
│ └── Expression/ # Expression evaluation
├── OpenLanguage.Test/ # Unit tests
├── docs/ # Documentation for docfx
├── docfx/ # Docfx configuration
└── CMakeLists.txt # Build system configuration
The project uses POSIX yacc/lex style grammar files for robust parsing:
SpreadsheetML/Formula/Lang/Parse/formula.ySpreadsheetML/Formula/Lang/Lex/formula.lexSpreadsheetML/Formula/Lang/Lex/function/*.lexThese files are processed during build to generate C# parser code.
For detailed documentation, please visit the project documentation site.
The source for the documentation is in the docs/ and docfx/ directories.
This project uses CSharpier for code formatting:
# Format entire solution
dotnet csharpier format .
# Check formatting
dotnet csharpier check .Install git hooks to ensure code quality:
cmake --build build --target install-hooksThis installs a pre-commit hook that:
The project uses xUnit for testing:
# Run all tests
dotnet test
# Run tests with coverage
dotnet test --collect:"XPlat Code Coverage"
# Run specific test project
dotnet test OpenLanguage.Test/OpenLanguage is built with performance as a primary concern:
Evaluation
decimalSymbolMisc
decimalSymbol used for parsing floating point numbersCountryRegion enumerations.Test coverage is quite comprehensive, as are grammar and parser rule specifications - there should not be anything left to complete here as far as implementation of parsing, parsing dependencies, nor AST. However, optimization leaves a bit to be desired, and evaluation is unimplemented.
Optimization
FormulaParser and AST node classes, as
well as size of generated parser code, jumped by ~10x on adding the
builtin_function_call_head_raw rule. Investigate the cause and resolve.Evaluation
Evaluate, on the abstract ExpressionNode class and overriden by derived
classes.
_xlpm.-prefixed function references, or
LAMBDA functions.SpreadsheetContext to abstract common data
reading and writing operations
SpreadsheetContext, use generic underlying data representation which
is derived from a common Spreadsheet class, allowing any underlying
matrix-like data representation to be manipulated by formulas.Example Usage
Numbering Format
See also
git checkout -b feature/amazing-feature)cmake --build build --target test)cmake --build build --target format)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.