Found 49 packages
Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.
Library for text extraction. Supports doc, docx, xlsx, odt, pdf, rtf, html, rar, zip,
GroupDocs.Parser for .NET is a useful parsing class library which allows to extract different data from documents of various formats. The data extraction API supports PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and many more formats.
Simple Pdf text extractor based on PDFSharp. Supports both single and two-byte fonts, ToUnicode maps, Encodings. Doesn't support precise symbol positioning on page so text order can differ from the original.
Description
Bare-bones IKVM Java-to-.NET port of Apache Tika. You'll want to install TikaOnDotNet.TextExtractor.
This is a renderer for Melville.PDF that extracts all of the text from a PDF page.
Extracts string from .NET solutions and projects for GetText Catalog template files (.pot).
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
A c# library that provides the ability to extract text from various document file formats, e.g. pdf, docx, ppt, etc...
Winnovative PDF Images Extractor Library for .NET (Classic) can be used in .NET Framework, .NET Core and .NET Standard applications to extract images from PDF documents. This package is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms. For applications that need to run on both Windows and Linux platforms, you can use the Winnovative.Pdf.Next.PdfProcessor package, which allows you to extract text and images from PDF documents, search text in PDF documents and convert PDF pages to images. The compatibility list includes the following .NET versions, platforms and application types: * .NET Framework 4.0 and above * .NET 10, 9, 8, 7, 6 * .NET Standard 2.0 * Windows platforms * Azure App Service * Azure Cloud Services and Azure Virtual Machines * Web, Console and Desktop applications Main Features: * Extract images from PDF documents * Preserve transparency information from PDF documents * Extract images in memory or to image files in a folder * Save the extracted images in various image formats * Support for password-protected PDF documents * Extract images only from a range of PDF pages * Get the number of pages in a PDF document * Get the PDF document title, keywords, author and description * Does not require Adobe Reader or other third-party tools Documentation and code samples: https://www.winnovative-software.com/winnovative-pdf-images-extractor-dotnet
Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX - extract data from PDF documents
Simple C# library for extracting text and metadata from .docx, .pptx, and .xlsx files
EVO PDF Images Extractor Library for .NET (Classic) can be used in .NET Framework, .NET Core and .NET Standard applications to extract images from PDF documents. This package is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms. For applications that need to run on both Windows and Linux platforms, you can use the EvoPdf.Next.PdfProcessor package, which allows you to extract text and images from PDF documents, search text in PDF documents and convert PDF pages to images. The compatibility list includes the following .NET versions, platforms and application types: * .NET Framework 4.0 and above * .NET 10, 9, 8, 7, 6 * .NET Standard 2.0 * Windows platforms * Azure App Service * Azure Cloud Services and Azure Virtual Machines * Web, Console and Desktop applications Main Features: * Extract images from PDF documents * Preserve transparency information from PDF documents * Extract images in memory or to image files in a folder * Save the extracted images in various image formats * Support for password-protected PDF documents * Extract images only from a range of PDF pages * Get the number of pages in a PDF document * Get the PDF document title, keywords, author and description * Does not require Adobe Reader or other third-party tools Documentation and code samples: https://www.evopdf.com/evopdf-pdf-images-extractor-dotnet
A simple C# shell wrapper for the wonderful pdfplumber library in Python to extract text from .PDF files
Legacy iTextSharp PDF text extractor for the sensenet platform.