Found 837 packages
Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.
Extract text from PDF's using streams.
C# PDF library to create, edit, draw and print PDF files. Docotic.Pdf is the .NET PDF library for .NET Core, ASP.NET, Windows Forms, WPF, Xamarin, Blazor, Unity, and HoloLense applications. Use Docotic.Pdf to: * Create PDF documents using Canvas API. * Generate PDF reports, invoices, etc. using the fluent API provided by the Layout add-on. * Extract text from PDF documents. * Convert HTML to PDF in C# with the help of the HTML to PDF converter add-on. * Convert PDF to image and print PDF documents. * Add digital signatures to PDF documents. * Encrypt and decrypt PDF files. * OCR PDF files. * Merge PDF in C# or VB.NET code. Split PDF documents. * Compress PDF files. * Linearize PDF files. Optimize for Fast Web View. * Edit PDF files. The SDK is a 100% managed assembly without unsafe blocks. The assembly has no external dependencies. Docotic.Pdf supports .NET 8, .NET 7, .NET 6, .NET 5, .NET Standard / .NET Core, and .NET 4.x frameworks. You can use the library in .NET on Windows, Linux, macOS, Android, and iOS. Works in Azure, AWS and other cloud environments. Can be used from a Docker container. To test the library, visit https://bitmiracle.com/pdf-library/ and get a free time-limited license key. For documentation, sample code, and API reference, visit https://bitmiracle.com/pdf-library/help There are add-ons to the library: * HTML to PDF add-on https://www.nuget.org/packages/BitMiracle.Docotic.Pdf.HtmlToPdf/ * Layout add-on https://www.nuget.org/packages/BitMiracle.Docotic.Pdf.Layout/ * Gdi add-on https://www.nuget.org/packages/BitMiracle.Docotic.Pdf.Gdi/ * Logging add-on https://www.nuget.org/packages/BitMiracle.Docotic.Pdf.Logging We offer royalty-free licenses for Docotic.Pdf. Eligible projects and/or people can receive a free license.
SDK for UI automation and text capture featured in UiPath Studio
Simple Pdf text extractor based on PDFSharp. Supports both single and two-byte fonts, ToUnicode maps, Encodings. Doesn't support precise symbol positioning on page so text order can differ from the original.
Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX - extract data from PDF documents
HiQPdf Library for .NET (Classic) is a fast and flexible tool for creating high-quality PDF documents and converting HTML to PDF in .NET Framework, .NET Core and .NET Standard applications. The library uses the Classic rendering engine to convert HTML to PDF, images and SVG. You can also create, stamp, secure, merge and split PDF documents, extract text and images from PDF documents, search text in PDF, convert PDF pages to images or HTML. This package is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms. For applications that need to run on both Windows and Linux platforms, the HiQPdf.Next.HtmlToPdf package provides a newer and highly accurate rendering engine designed for modern HTML, CSS and JavaScript content. The full HiQPdf.Next package allows you to create, edit and merge PDF documents, convert HTML to PDF or images, convert Word, Excel, RTF and Markdown to PDF, convert PDF to text or images. The compatibility list includes the following .NET versions, platforms and application types: * .NET Framework 2.0, 3.5, 4.0 and above * .NET 10, 9, 8, 7, 6 * .NET Standard 2.0 * Windows platforms * Azure Cloud Services and Azure Virtual Machines * Web, Console and Desktop applications Library Features: * HTML to PDF to quickly create PDF documents from HTML * HTML to Image and HTML to SVG converters * PDF to Image to rasterize PDF document pages to images * PDF to HTML to create HTML documents from PDF pages * PDF to Text to extract text from PDF documents * Search text in PDF documents * Extract images from PDF documents * Create PDF documents with text, HTML, SVG, images and graphics * Create encrypted, password-protected, digitally signed PDF documents * Create PDF documents with forms, text notes, links and JavaScript actions * Merge multiple PDF documents into a single one * Stamp PDF with HTML, text and images Documentation and code samples: https://www.hiqpdf.com/hiqpdf-dotnet
XImage.OCR is a C# Optical Character Recognition library to read, extract text contents from images, scanned PDFs, multi-page TIFF files in .NET projects. XImage.OCR from RasterEdge is an advanced OCR library : * Allow characters recognition and extraction from images captured by digital camera, scanned PDF document and image-only PDF * Support multiple languages, including English, French, German, Portuguese, Spanish, Russian, Italian, Dutch, Arabic, Korean, etc * Support user-defined image and document OCR, like full-page, auto and manual zonal OCR recognition * Able to read QR Code, barcode data Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Email : support@rasteredge.com
EVO PDF to Text Library for .NET (Classic) can be used in .NET Framework, .NET Core and .NET Standard applications to extract text from PDF documents and search text in PDF documents. This package is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms. For applications that need to run on both Windows and Linux platforms, you can use the EvoPdf.Next.PdfProcessor package, which allows you to extract text and images from PDF documents, search text in PDF documents and convert PDF pages to images. The compatibility list includes the following .NET versions, platforms and application types: * .NET Framework 4.0 and above * .NET 10, 9, 8, 7, 6 * .NET Standard 2.0 * Windows platforms * Azure App Service * Azure Cloud Services and Azure Virtual Machines * Web, Console and Desktop applications Main Features: * Extract text from PDF documents * Search text in PDF documents * Save the extracted text using various text encodings * Case sensitive and whole word options for text search * Support for password-protected PDF documents * Extract the text or search only a range of PDF pages * Extract text preserving the original PDF layout * Extract text in PDF reading order or PDF internal order * Get the number of pages in a PDF document * Get the PDF document title, keywords, author and description * Does not require Adobe Reader or other third-party tools Documentation and code samples: https://www.evopdf.com/evopdf-pdf-to-text-dotnet
[PDF Reader. PDF Control. PDF Component] Apryse SDK is the ultimate PDF toolkit. This package is for Linux, macOS, and Windows .NET 64bit applications. For other Nuget packages see here: https://www.apryse.com/kb_nuget_packages With Apryse components you can build reliable & speedy applications that can view, create, print, edit, and annotate PDFs ... across operating systems. Developers use Apryse SDK to read, write, and edit PDF documents compatible with all published versions of PDF specification (including the latest ISO32000). The extensive PDF library API supports most common use-case scenarios such as: * PDF Viewing & Collaboration * PDF Rasterization * PDF Printing * PDF Form filling and flattening * PDF Split & Merge * PDF Stamping * Dynamic PDF generation (e.g. FlowDocument & Xaml to PDF) * PDF Text extraction and indexing * PDF Packages * PDF Layers (OCGs) * PDF Editing * PDF Encryption * Manipulate PDF bookmarks, links, and annotations. * PDF Optimization * PDF conversion to XML, HTML, XPS, SVG, TIF, etc. * PDF/A Validation and Conversion * PDF Redaction * PDF Conversion from XPS, MS Office, HTML, XAML, TXT, TIFF etc. * HTML to PDF Conversion
Package Description
ByteScout Text Recognition SDK for .NET, and ActiveX/COM - recognizes text from scanned documents using OCR (Optical Character Recognition).
The Syncfusion® Essential PDF is a .NET standard PDF library that enables your .NET core applications to read, write, and manipulate existing PDF documents. It has various features such table creation, PDF form creation and filling, digital signature, and advanced PDF encryption. It also has various PDF manipulation features such as PDF compression, redaction, split and merge PDF, watermark, import and export PDF form data, replace or remove images, extract text and find text.
It is 100% managed code and does not require special manipulations to run with any .NET framework version starting from 2.0. PDF standard versions supported are: ALL versions. Files can be normal, linearized, password-protected, signed, incrementally updated. - We support many possible PDF content manipulations scenarios, below are a few things that worth mentioning: - Extract, modify and add graphics (text, images, drawings) - Split or merge PDF documents - Extract PDF text to HTML, Tagged or Raw format - Fill, sign or create PDF forms - Add or remove document fields - Examine resources within a document - fonts, embedded files, xml ( ZUGFeRD ) - Digitally sign and check existing signatures on PDF documents - Search for specific text - Protect document with a password - Work with navigation objects, e.g. create bookmarks or links - Full support for annotations - Full support for PDF actions - All fonts defined by specification are supported - Various colorspaces and color profiles are supported, e.g. you may draw in RGB, CMYK, gray, or whatever colorspace you like. - Files can be saved to other [subtypes] of PDF – Linearized or PDF/a for example. - If you require a specific funtionality and are unsure about whether it is supported, please review our online help or contact support so we will be able to handle this. - Fixed layout API, implemented to be 100% PDF specification compatible, it unlocks full power of the PDF for you. Any complex PDF creation or manipulation task can be completed instantly. - Flow layout API, a styles-driven content generation API similar to HTML+CSS provides you with ability to create stunning documents, reports, bills, catalogues an more in minutes. Compact and easy to use, supports creation of XML templates and much more.
XDoc.PDF supports C# developers to create, read, edit, convert PDF document in .NET projects. Enable PDF converter, viewer for .NET 8, .NET 7, .NET 6, .NET Core, Standard and .NET Framework. XDoc.PDF from RasterEdge is an advanced Adobe PDF library : * Generate PDFs from MS Word, Excel, PowerPoint, TIFF, JPG, PNG and many image formats * Convert PDF to Word, multi-page TIFF, SVG, JPG, PNG and other image formats * Read, extract text, image, font data, AcroForm data from PDFs * Edit, modify existing PDF text, image, bookmark, metadata contents * Annotate, markup PDF content with highligh, comments, drawings * Redact, remove sensitive information from PDF documents * Protect PDFs with password protection * Add, remove digit signature to PDFs * Add, generate, read QR Code and barcode on PDF * Convert scanned PDF to editable PDF using OCR Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : https://www.rasteredge.com/how-to/csharp-imaging/pdf-overview/ * Email : support@rasteredge.com
Aspose.PDF for C++ is a native C++ library that enables the developers to add PDF handling capabilities to their C++ applications. Aspose.PDF for C++ API can be used to build C++ applications which are capable of reading, writing, rendering, printing, and converting PDF documents (PDF, PDF/A). You can also work with attachments, images, security, signatures, text, and tables. Aspose.PDF for C++ allows you to extract text from all pages of a PDF document, set privileges on a PDF file, work with bookmarks and annotations. It also gives you vast control over customizing the PDF display properties, fonts, zoom fact, and content formatting. Text search feature is also available. You can convert PDF documents to the DOC, DOCX, and SVG formats by simply calling the designated C++ methods. Aspose.PDF for C++ performs equally well at the client-end and server-side. It can be used in the development environment that supports C++ but explicitly supports MS Visual Studio 2015 or later. Aspose.PDF for C++ can be installed manually by downloading its ZIP package or it (Aspose.PDF.CPP) can be added via NuGet Package Manager. Support for Qt framework is also available.
Linear-progressive text discovery engine exposing functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word detection, tokenization, and matching options are configurable. Use a common adapter interface with interchangeable DOM libraries (HtmlAgility, AngleSharp, etc.) to do the following: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and extract text content with selectively preserved formatting indicators. High degree of extensibility leveraging dependency injection. While regex can be used in advanced configurations, it is not required. See project site for demos.
Winnovative PDF to Text Library for .NET (Classic) can be used in .NET Framework, .NET Core and .NET Standard applications to extract text from PDF documents and search text in PDF documents. This package is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms. For applications that need to run on both Windows and Linux platforms, you can use the Winnovative.Pdf.Next.PdfProcessor package, which allows you to extract text and images from PDF documents, search text in PDF documents and convert PDF pages to images. The compatibility list includes the following .NET versions, platforms and application types: * .NET Framework 4.0 and above * .NET 10, 9, 8, 7, 6 * .NET Standard 2.0 * Windows platforms * Azure App Service * Azure Cloud Services and Azure Virtual Machines * Web, Console and Desktop applications Main Features: * Extract text from PDF documents * Search text in PDF documents * Save the extracted text using various text encodings * Case sensitive and whole word options for text search * Support for password-protected PDF documents * Extract the text or search only a range of PDF pages * Extract text preserving the original PDF layout * Extract text in PDF reading order or PDF internal order * Get the number of pages in a PDF document * Get the PDF document title, keywords, author and description * Does not require Adobe Reader or other third-party tools Documentation and code samples: https://www.winnovative-software.com/winnovative-pdf-to-text-dotnet
Classes for visual processing of PDF documents in WinForms image viewer. Search and select text on PDF page. Draw graphics on PDF page. Remove content from PDF page. Add redaction marks on PDF page. Extract images from PDF page. View and edit annotations on PDF page. View, fill and edit interactive fields on PDF page.
ByteScout Document Parser SDK for .NET, ASP.NET, ActiveX - parse data from PDF documents and images.