#text-processing

26 packages tagged with “text-processing”

AvsAn

Find the english language indeterminate article ('a' or 'an') for a word. Based on real usage patterns extracted from the wikipedia text dump; can therefore even deal with tricky edge cases such as acronyms (FIAT vs. FAA, NASA vs. NSA) and odd symbols. (Requires .NET Core 1.0 or .NET 4.5)

v3.2.0↙ 822.1K

v3.2.0

↙ 822.1K / total

englishNLPtext-processinglibraryindeterminate-article

Kephas.TextProcessing✓

Provides the infrastructure for text processing. Typically used areas and classes/interfaces/services: - ITokenizer. - OpticalCharacterRecognition: IOcrProcessor. Kephas Framework ("stone" in aramaic) aims to deliver a solid infrastructure for applications and application ecosystems.

v11.1.0↙ 66.5K

v11.1.0

↙ 66.5K / total

kephastexttext-processing

pkg

FluTe

* Define string.format-like templates using *named* tokens. * A fluent API that makes template configuration clear and intuitive. * Access properties and parameterless methods of input values using a simple syntax. * Use delegates to process inputs. Tell your template to join a list, make a string uppercase, or calculate a difference -- decleratively. * No awkward string-based syntax that needs to be learned, and no tables you need to search through to find the type of formatting you need. All processing syntax is 100% language integrated. * All objects are totally immutable. No need to worry about thread-safety. * High-performance parser written using the FParsec parser-combinator library that performs a single pass on the input. Performs exponentially better than regex-based solutions.

v0.5.0.1↙ 9.9K

v0.5.0.1

↙ 9.9K / total

text-templatingstring.formatformattingtext-processingfluent

AiGeekSquad.AIContext.MEAI

Microsoft Extensions AI Abstractions adapter for AiGeekSquad.AIContext semantic chunking library. Enables seamless integration between Microsoft's AI abstractions and AIContext's semantic text chunking capabilities by providing an adapter that converts between Microsoft's IEmbeddingGenerator interface and AIContext's embedding requirements.

v1.1.103↙ 12.9K

v1.1.103

↙ 12.9K / total

aimicrosoft-extensions-aiembeddingadaptersemantic-chunking

pkg

MarkItDown

It is a WIP .Net Core implementation of the original markitdown Python library.

v0.0.1↙ 1.2K

v0.0.1

↙ 1.2K / total

markdowntext-processing

FluxCurator.Core

Core library for FluxCurator - Zero dependencies. PII masking, content filtering, and rule-based chunking with Korean language support.

v0.7.2↙ 6.7K

v0.7.2

↙ 6.7K / total

ragchunkingpiimaskingnlp

FluxCurator

Text preprocessing library for RAG pipelines: PII masking, content filtering, and intelligent chunking including semantic chunking with Korean language support.

v0.7.2↙ 6.8K

v0.7.2

↙ 6.8K / total

ragchunkingpiimaskingnlp

pkg

textrude

Textrude is a utility that allows you to quickly develop templates for code-generation and text processing. It runs on Windows and Linux and uses Scriban as the template language. It comes as a CLI exe for use in build systems as well as an GUI prototyping tool which regenerates output in realtime as templates are edited. Models are automatically imported from JSON, YAML, CSV or plain text.

v1.6.0↙ 3.9K

v1.6.0

↙ 3.9K / total

scribancode-generationtemplatetext-processingjson

pkg

Yosina

A comprehensive transliteration library for Japanese text normalization

v1.0.0↙ 693

v1.0.0

↙ 693 / total

japanesetransliterationtext-processingnormalization

CilTools.SourceCode

Provides APIs that assist in lexical analysis of source code in various programming languages

v2.9.0↙ 1.3K

v2.9.0

↙ 1.3K / total

dotnetdotnet-standarddotnet-frameworkcode-analysistext-processing

SpellCheck.Simple

Offline spell-checking of words within texts, including Markdown tags. This library enables accurate detection of spelling errors while considering the presence of Markdown formatting. The library utilizes OpenOffice's *.dic and *.aff files to support spell-checking in over 80 languages from around the world. See also: SpellCheck.Dictionaries nuget package which comes with built-in support for languages such as de-DE, en-GB, en-US, es-ES, fr-FR, it-IT, pl-PL, and pt-PT for your convenience. This feature enhances the library's spell-checking capabilities by providing preloaded dictionaries, thereby streamlining the process of verifying text accuracy across multiple languages.

v1.2.0↙ 2.4K

v1.2.0

↙ 2.4K / total

"spell-checkingtext-processinglanguage-supportcsharpmarkdown

pkg

LostTech.Torch.MinGPT

C# + TorchSharp implementation of GPT

v0.3.0↙ 1.4K

v0.3.0

↙ 1.4K / total

PyTorchTorchSharptext-processingdeep-learningML

EBSTxtOkuma.TxtOkuma

Txt Okuma - Ebubekir Bastama

v2.0.0↙ 293

v2.0.0

↙ 293 / total

txt-okumatext-processingcsharplibraryopen-source

SpellCheck.Dictionaries

Offline spell-checking of words within texts, including Markdown tags. This library enables accurate detection of spelling errors while considering the presence of Markdown formatting. The library utilizes OpenOffice's *.dic and *.aff files to support spell-checking in over 80 languages from around the world. The library comes with built-in support for languages such as de-DE, en-GB, en-US, es-ES, fr-FR, it-IT, pl-PL, and pt-PT for your convenience. It enhances the spell-checking capability by providing preloaded dictionaries, thus streamlining the process of checking text for accuracy across different languages.

v1.2.0↙ 2.4K

v1.2.0

↙ 2.4K / total

"spell-checkingtext-processinglanguage-supportcsharpmarkdown

pkg

MarkdownStructureChunker

A powerful .NET library for intelligent document structure analysis and chunking. Automatically identifies and parses various document patterns including Markdown headings, numeric outlines, legal sections, and appendices. Features hierarchical content organization, advanced keyword extraction with ML.NET, and ONNX vectorization support for semantic embeddings.

v1.0.7↙ 2.5K

v1.0.7

↙ 2.5K / total

markdowndocumentparsingchunkingstructure

TextDiff.Sharp

A production-ready C# library for processing unified diff files and applying changes to text documents. Supports synchronous, asynchronous, streaming, and memory-optimized processing with comprehensive error handling and progress reporting.

v1.2.2↙ 2.4K

v1.2.2

↙ 2.4K / total

diffpatchunified-difftext-processingversion-control

pkg

MDLSoft.StringParsers

A powerful .NET library for parsing and writing structured text data using fixed-width and separator-based formats

v1.1.2↙ 707

v1.1.2

↙ 707 / total

parsercsvfixed-widthtext-processingdata-parsing

GenericAgents.Tools.Samples

Sample tool implementations demonstrating file system operations, HTTP requests, and text processing for AI agent workflows.

v1.2.0↙ 592

v1.2.0

↙ 592 / total

aiagentssamplestoolsexamples

pkg

RoeiBajayo.Infrastructure

A comprehensive .NET utility library providing common infrastructure components and extensions for modern C# applications. Features include HTTP clients, data storage, security utilities, dependency injection, text processing, and more.

v1.0.1↙ 499

v1.0.1

↙ 499 / total

utilitiesinfrastructureextensionshttpsecurity

pkg

FluentTextUtils

A professional-grade .NET library for fluent string manipulation, including URL slugging, phonetic algorithms (Soundex), and smart casing.

v1.0.0↙ 90

v1.0.0

↙ 90 / total

stringfluentextensionsslugsoundex

RAGify.Chunking

Intelligent text chunking strategies for RAGify. Break down large documents into optimal-sized chunks for embedding and retrieval. Includes fixed-size chunking, sentence-aware chunking, and sliding window approaches to preserve context and improve retrieval accuracy.

v1.0.0↙ 111

v1.0.0

↙ 111 / total

ragretrieval-augmented-generationembeddingsvector-searchnlp

pkg

ZentrixLabs.OcrCorrection

A comprehensive .NET library for correcting OCR errors in English text with ~837 battle-tested patterns. Specifically designed for Tesseract PGS subtitle extraction, achieving 100% success rate on tested corpus. Handles capital I/lowercase l confusion, spacing errors, apostrophe issues, and number/letter confusion. Zero false positives, modular architecture, multi-pass processing.

v1.0.1↙ 359

v1.0.1

↙ 359 / total

ocrcorrectiontesseractpgssubtitle

pkg

CDTk

CDTk is a .NET library for building compilers, transpilers, and code generators. It provides a clean, unified API that makes language creation simple and intuitive.

v9.1.0↙ 1.2K

v9.1.0

↙ 1.2K / total

compilercompiler-toolkitcompiler-frameworkcompiler-front-end language-design

NetExt.Strings

NetExt.Strings is a powerful utility library that extends string manipulation capabilities in .NET. It provides a variety of robust, easy-to-use methods for trimming, validating, transforming, encoding, and replacing string values. This package simplifies common string operations, improves code readability, and enhances productivity for .NET developers.

v1.0.2↙ 588

v1.0.2

↙ 588 / total

extensions.netC#csharpstring-extensions

ForeverTools.Summarize

AI-powered text summarization for .NET using GPT-4, Claude, Llama and 400+ models. Summarize articles, documents, legal texts, research papers, meeting notes, and more. Supports multiple summary styles (bullet points, executive, abstract), custom lengths, key point extraction, and batch processing. Perfect for content curation, document processing, and automated reporting.

v1.0.0↙ 263

v1.0.0

↙ 263 / total

summarizesummarizationtext-summarytldrabstract

SkyWebFramework.Utilities

SkyWebFramework.Utilities is an all-in-one helper library that eliminates boilerplate code and provides battle-tested extension methods for common .NET operations. Includes string manipulation (truncate, slugify, sanitize), date utilities (business days, age calculation, time ago), collection helpers (batch processing, safe operations), security tools (hashing, encryption, GUID generation), math utilities (percentage, rounding, range checks), and object manipulation (deep clone, mapping, null checks). Perfect for web applications, APIs, microservices, and enterprise systems.

v1.0.1↙ 351

v1.0.1

↙ 351 / total

utilitieshelpersextensionsextension-methodsutility-library