26 packages tagged with “text-processing”
Find the english language indeterminate article ('a' or 'an') for a word. Based on real usage patterns extracted from the wikipedia text dump; can therefore even deal with tricky edge cases such as acronyms (FIAT vs. FAA, NASA vs. NSA) and odd symbols. (Requires .NET Core 1.0 or .NET 4.5)
Provides the infrastructure for text processing. Typically used areas and classes/interfaces/services: - ITokenizer. - OpticalCharacterRecognition: IOcrProcessor. Kephas Framework ("stone" in aramaic) aims to deliver a solid infrastructure for applications and application ecosystems.
* Define string.format-like templates using *named* tokens. * A fluent API that makes template configuration clear and intuitive. * Access properties and parameterless methods of input values using a simple syntax. * Use delegates to process inputs. Tell your template to join a list, make a string uppercase, or calculate a difference -- decleratively. * No awkward string-based syntax that needs to be learned, and no tables you need to search through to find the type of formatting you need. All processing syntax is 100% language integrated. * All objects are totally immutable. No need to worry about thread-safety. * High-performance parser written using the FParsec parser-combinator library that performs a single pass on the input. Performs exponentially better than regex-based solutions.
Microsoft Extensions AI Abstractions adapter for AiGeekSquad.AIContext semantic chunking library. Enables seamless integration between Microsoft's AI abstractions and AIContext's semantic text chunking capabilities by providing an adapter that converts between Microsoft's IEmbeddingGenerator interface and AIContext's embedding requirements.
It is a WIP .Net Core implementation of the original markitdown Python library.
Core library for FluxCurator - Zero dependencies. PII masking, content filtering, and rule-based chunking with Korean language support.
Text preprocessing library for RAG pipelines: PII masking, content filtering, and intelligent chunking including semantic chunking with Korean language support.
Textrude is a utility that allows you to quickly develop templates for code-generation and text processing. It runs on Windows and Linux and uses Scriban as the template language. It comes as a CLI exe for use in build systems as well as an GUI prototyping tool which regenerates output in realtime as templates are edited. Models are automatically imported from JSON, YAML, CSV or plain text.
A comprehensive transliteration library for Japanese text normalization
Provides APIs that assist in lexical analysis of source code in various programming languages
Offline spell-checking of words within texts, including Markdown tags. This library enables accurate detection of spelling errors while considering the presence of Markdown formatting. The library utilizes OpenOffice's *.dic and *.aff files to support spell-checking in over 80 languages from around the world. See also: SpellCheck.Dictionaries nuget package which comes with built-in support for languages such as de-DE, en-GB, en-US, es-ES, fr-FR, it-IT, pl-PL, and pt-PT for your convenience. This feature enhances the library's spell-checking capabilities by providing preloaded dictionaries, thereby streamlining the process of verifying text accuracy across multiple languages.
C# + TorchSharp implementation of GPT
Txt Okuma - Ebubekir Bastama
Offline spell-checking of words within texts, including Markdown tags. This library enables accurate detection of spelling errors while considering the presence of Markdown formatting. The library utilizes OpenOffice's *.dic and *.aff files to support spell-checking in over 80 languages from around the world. The library comes with built-in support for languages such as de-DE, en-GB, en-US, es-ES, fr-FR, it-IT, pl-PL, and pt-PT for your convenience. It enhances the spell-checking capability by providing preloaded dictionaries, thus streamlining the process of checking text for accuracy across different languages.
A powerful .NET library for intelligent document structure analysis and chunking. Automatically identifies and parses various document patterns including Markdown headings, numeric outlines, legal sections, and appendices. Features hierarchical content organization, advanced keyword extraction with ML.NET, and ONNX vectorization support for semantic embeddings.
A production-ready C# library for processing unified diff files and applying changes to text documents. Supports synchronous, asynchronous, streaming, and memory-optimized processing with comprehensive error handling and progress reporting.
A powerful .NET library for parsing and writing structured text data using fixed-width and separator-based formats
Sample tool implementations demonstrating file system operations, HTTP requests, and text processing for AI agent workflows.
A comprehensive .NET utility library providing common infrastructure components and extensions for modern C# applications. Features include HTTP clients, data storage, security utilities, dependency injection, text processing, and more.
A professional-grade .NET library for fluent string manipulation, including URL slugging, phonetic algorithms (Soundex), and smart casing.
Intelligent text chunking strategies for RAGify. Break down large documents into optimal-sized chunks for embedding and retrieval. Includes fixed-size chunking, sentence-aware chunking, and sliding window approaches to preserve context and improve retrieval accuracy.
A comprehensive .NET library for correcting OCR errors in English text with ~837 battle-tested patterns. Specifically designed for Tesseract PGS subtitle extraction, achieving 100% success rate on tested corpus. Handles capital I/lowercase l confusion, spacing errors, apostrophe issues, and number/letter confusion. Zero false positives, modular architecture, multi-pass processing.
CDTk is a .NET library for building compilers, transpilers, and code generators. It provides a clean, unified API that makes language creation simple and intuitive.
NetExt.Strings is a powerful utility library that extends string manipulation capabilities in .NET. It provides a variety of robust, easy-to-use methods for trimming, validating, transforming, encoding, and replacing string values. This package simplifies common string operations, improves code readability, and enhances productivity for .NET developers.
AI-powered text summarization for .NET using GPT-4, Claude, Llama and 400+ models. Summarize articles, documents, legal texts, research papers, meeting notes, and more. Supports multiple summary styles (bullet points, executive, abstract), custom lengths, key point extraction, and batch processing. Perfect for content curation, document processing, and automated reporting.
SkyWebFramework.Utilities is an all-in-one helper library that eliminates boilerplate code and provides battle-tested extension methods for common .NET operations. Includes string manipulation (truncate, slugify, sanitize), date utilities (business days, age calculation, time ago), collection helpers (batch processing, safe operations), security tools (hashing, encryption, GUID generation), math utilities (percentage, rounding, range checks), and object manipulation (deep clone, mapping, null checks). Perfect for web applications, APIs, microservices, and enterprise systems.