Found 538 packages
Domain Specific Language for extracting tabular data from input string using M language. Part of the Microsoft PROgram Synthesis using Examples SDK (PROSE).
Domain Specific Language for extracting tabular data from JSON documents. Part of the Microsoft PROgram Synthesis using Examples SDK (PROSE).
Domain Specific Language for extracting data from web pages. Part of the Microsoft PROgram Synthesis using Examples SDK (PROSE).
The Open XML SDK provides tools for working with Office Word, Excel, and PowerPoint documents. It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations. - Populating content in Word files from an XML data source. - Splitting up (shredding) a Word or PowerPoint file into multiple files, and combining multiple Word/PowerPoint files into a single file. - Extraction of data from Excel documents. - Searching and replacing content in Word/PowerPoint using regular expressions. - Updating cached data and embedded spreadsheets for charts in Word/PowerPoint. - Document modification, such as removing tracked revisions or removing unacceptable content from documents.
Domain Specific Language for extracting substrings from text data. Part of the Microsoft PROgram Synthesis using Examples SDK (PROSE).
Domain Specific Language for extracting data from text files. Part of the Microsoft PROgram Synthesis using Examples SDK (PROSE).
Extract tables from PDF files (port of tabula-java using PdfPig).
Package Description
The Open XML SDK provides tools for working with Office Word, Excel, and PowerPoint documents. It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations. - Populating content in Word files from an XML data source. - Splitting up (shredding) a Word or PowerPoint file into multiple files, and combining multiple Word/PowerPoint files into a single file. - Extraction of data from Excel documents. - Searching and replacing content in Word/PowerPoint using regular expressions. - Updating cached data and embedded spreadsheets for charts in Word/PowerPoint. - Document modification, such as removing tracked revisions or removing unacceptable content from documents.
A .NET library extracts album art from metadata such as FLAC, ID3, etc.
Aspose.Words for .NET is a powerful, high-performance document processing library for creating, editing, converting, and rendering Word and PDF files in C#. It supports DOCX, DOC, RTF, ODT, HTML, PDF, Markdown, and over 30 formats. Designed for .NET developers, it enables advanced document automation, mail merge, text extraction, and report generation. Aspose.Words ensures high fidelity in document conversion, seamless API integration, and cross-platform compatibility. Ideal for cloud, web, and desktop applications.
Process, transforms, filters and handle audio signals for machine learning and statistical applications. This package is part of the Accord.NET Framework.
Source Graph extraction from C# programs
Wraps 7z.dll or any compatible one and makes use of LZMA SDK, includes self-extraction functionality.
Content extraction via text density
Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
Boilerpipe text extraction library ported to .Net Core based on rasmusjp's implementation in .NET 4.5 which you can find here https://github.com/rasmusjp/boilerpipe.net
PdfExtractionStrategies Class Library
Extract/Conversion IFC file content into JSON using XBIM COBie format
Package Description