110 packages tagged with “extraction”
Extract tables from PDF files (port of tabula-java using PdfPig).
Process, transforms, filters and handle audio signals for machine learning and statistical applications. This package is part of the Accord.NET Framework.
Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
Boilerpipe text extraction library ported to .Net Core based on rasmusjp's implementation in .NET 4.5 which you can find here https://github.com/rasmusjp/boilerpipe.net
Face recognition and analytics library based on deep neural networks and ONNX runtime.
A utility library for 7zip compression related operations
Extract tables from PDF files (port of tabula-java using PdfPig). Json writer.
~ PDF Tagging and Accessibility ~ Logical Content Extraction ~ PDF to HTML conversion ~ PDF Editing ~ PDF Accessibility and Remediation - Make PDF Accessible and PDF/UA or WCAG compliant - Add and Edit Tags in PDF - Fix Accessibility Issues PDF to JSON Data Extraction - Text, Images, Tables, Forms - Document Metadata, Tag Structure, Content PDF to HTML Conversion - Fixed Original Layout - Responsive Layout with Content Reflow - Layout from PDF Tags (Deriving HTML from PDF) - PDF Form to HTML Form Standard PDF Features - Redaction - Page Rendering - Comments and Form Filling (AcroForm) - Digital signatures and E-Sign - Low-level PDF object access Change log: https://github.com/pdfix/pdfix_sdk_builds/blob/main/changelog.md Code Examples: https://github.com/pdfix/pdfix_sdk_example_dotnet
Face recognition and analytics library based on deep neural networks and ONNX runtime. Gpu implementation.
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
Microsoft SQL Server model extraction utility.
Finds localizable messages in *.fs and *.cs files by looking for calls such as I18n.Translate("message") in those sources. Puts unique messages into specified JSON file (updates it if neccessary). Class name, method name and other things are configurable
.Net (C#) Binding for Babel Street Analytics API
Extract tables from PDF files (port of tabula-java using PdfPig). Csv and Tsv writers.
A .NET library extracts album art from metadata such as FLAC, ID3, etc.
A utility library dealing with Tar and XZ (tar.xz) extraction/archiving and (de)compression
dotnet tool to extract a graphQL schema in SDL format, without starting a web application
Extract strings from files
MSBuild.Xrm.SourceControl provides a simple but powerful method for extracting Dynamics 365 customisations. The extension uses PowerShell scripts that can seamlessly extract customisations from a Dynamics 365 instance and then subsequently rebuild them into a zipped Solution file ready for import when necessary. The scripts use the SolutionPackager.exe tool provided by the Dynamics 365 SDK. It supports file mappings, managed and unmanaged solutions, and the export of AutoNumber and Calendar settings. Please find the documentation through the Project Site link.
Extract data from Palworld .pak file
Library for text extraction. Supports doc, docx, xlsx, odt, pdf, rtf, html, rar, zip,
Content extraction via text density
A collection of methods for injecting or extracting icons from files on Windows operating systems.
A HTML parser, for extracting the text from a web pages, with CSS selectors.
Simple but functional string functions that I think might be useful.