479 packages tagged with “TESSERACT”
Tesseract 5 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
Adds support for interop with System.Drawing to Tesseract such as passing Bitmap to Tesseract.
IronOCR is an advanced OCR (Optical Character Recognition) library for C# and .NET It provides Tesseract OCR on Mac, Windows, Linux, Azure and Docker for: * .NET Framework 4.6.2 + * .NET Standard 2.0 + * .NET Core 2.0 + * .NET 5 * .NET 6 * .NET 7 * .NET 8 * .NET 9 * .NET 10 * Mono for MacOS and Linux * Xamarin for MacOS IronOCR reads Text, Barcodes & QR from all major image and PDF formats using the latest Tesseract 5 engine. This library adds OCR functionality to Desktop, Console and Web applications in minutes. IronOCR's Unique Features: * Pure .Net OCR API * All OCR tasks run locally (no SAAS) * 125 languages * Barcode & QR Code reading * Corrects low quality, noisy and distorted scans * Performance tuned above and beyond any other known build of Tesseract OCR. * Reads PDFs * Reads multi-page TIFFs * Can save any OCR Scan to a searchable PDF document or XHTML Data output options include: Plain Text, Barcode Data and an OCR Result class containing paragraphs, lines, words, and characters. Language Support: 125 Languages including Arabic, Chinese, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Portuguese, Russian, Spanish... Custom language packs can also be created. Licensing & Support available for commercial deployments. Email: support@ironsoftware.com For code examples, documentation & more visit http://ironsoftware.com/csharp/ocr/
XImage.OCR is a C# Optical Character Recognition library to read, extract text contents from images, scanned PDFs, multi-page TIFF files in .NET projects. XImage.OCR from RasterEdge is an advanced OCR library : * Allow characters recognition and extraction from images captured by digital camera, scanned PDF document and image-only PDF * Support multiple languages, including English, French, German, Portuguese, Spanish, Russian, Italian, Dutch, Arabic, Korean, etc * Support user-defined image and document OCR, like full-page, auto and manual zonal OCR recognition * Able to read QR Code, barcode data Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Email : support@rasteredge.com
Tesseract 5.5.0 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
English language data files for Tesseract OCR v4.0
Google Tesseract OCR engine train data files for Polish language.
Recognize text from image and save the recognition results to a text file or searchable PDF document. Export text recognition result into HOCR format.
Tesseract OCR for Xamarin and Xamarin.Forms.
XImage.OCR for C# library enables OCR feature to read MiddleEnglish text from image, scanned PDF in .NET Web, Windows applications. This OCR language package includes : * MiddleEnglish language This nuget package includes : * OCR (Optical Character Recognition) engine to convert image, multi-page TIFF, PDF to editable text content in .NET apps * MiddleEnglish language supported * Image document cleanup library to process image with digital noise, distortion, skewing Additional features : * QR Code and 20+ barcode reading * Convert scanned PDF to text editable PDF documents Input : * Raster images : bitmap, png, jpeg, gif * Document : multi-page TIFF, PDF Compatible with : * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5 * .NET Core 3.x, & 2.x * .NET Framework 4.6 Online Guides : * C# Developer Guide : https://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Support : support@rasteredge.com
XImage.OCR for C# library enables OCR feature to read Chinese text from image, scanned PDF in .NET Web, Windows applications. This OCR language package includes : * Chinese language This nuget package includes : * OCR (Optical Character Recognition) engine to convert image, multi-page TIFF, PDF to editable text content in .NET apps * Chinese language supported * Image document cleanup library to process image with digital noise, distortion, skewing Additional features : * QR Code and 20+ barcode reading * Convert scanned PDF to text editable PDF documents Input : * Raster images : bitmap, png, jpeg, gif * Document : multi-page TIFF, PDF Compatible with : * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5 * .NET Core 3.x, & 2.x * .NET Framework 4.6 Online Guides : * C# Developer Guide : https://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Support : support@rasteredge.com
XImage.OCR for C# library enables OCR feature to read LatinAlphabet text from image, scanned PDF in .NET Web, Windows applications. This OCR language package includes : * LatinAlphabet language This nuget package includes : * OCR (Optical Character Recognition) engine to convert image, multi-page TIFF, PDF to editable text content in .NET apps * LatinAlphabet language supported * Image document cleanup library to process image with digital noise, distortion, skewing Additional features : * QR Code and 20+ barcode reading * Convert scanned PDF to text editable PDF documents Input : * Raster images : bitmap, png, jpeg, gif * Document : multi-page TIFF, PDF Compatible with : * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5 * .NET Core 3.x, & 2.x * .NET Framework 4.6 Online Guides : * C# Developer Guide : https://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Support : support@rasteredge.com
Tesseract.NET SDK it's a class library based on the tesseract-ocr project for embedding ocr capability in your .net project.
The Syncfusion® Essential PDF OCR is a .NET character recognition library that recognizes characters from both images and PDF in any ASP.NET MVC application. Syncfusion® OCRProcessor uses tesseract, one of most accurate OCR engines. Key features: • Converts scanned PDF to searchable PDF. • Converts various image formats such as TIFF, JPEG, PNG, BMP to searchable PDF. • Converts image or PDF to text with location. • Process OCR for the specified region in both PDF and image. • Supports 60+ languages. • Recognize text from rotated images and PDF documents. • Works both in 32-bit and 64-bit environments. Learn more: https://www.syncfusion.com/pdf-framework/net?utm_source=nuget&utm_medium=listing Documentation: https://help.syncfusion.com/file-formats/pdf/working-with-ocr?utm_source=nuget&utm_medium=listing Support: Incident: https://www.syncfusion.com/support/directtrac/incidents/newincident?utm_source=nuget&utm_medium=listing Forum: https://www.syncfusion.com/forums/aspnetmvc?utm_source=nuget&utm_medium=listing
The Syncfusion® Essential PDF OCR is a .NET character recognition library that recognizes characters from both images and PDF in any Windows Forms application. Syncfusion® OCRProcessor uses tesseract, one of most accurate OCR engines. Key features: • Converts scanned PDF to searchable PDF. • Converts various image formats such as TIFF, JPEG, PNG, BMP to searchable PDF. • Converts image or PDF to text with location. • Process OCR for the specified region in both PDF and image. • Supports 60+ languages. • Recognize text from rotated images and PDF documents. • Works both in 32-bit and 64-bit environments. Learn more: https://www.syncfusion.com/pdf-framework/net?utm_source=nuget&utm_medium=listing Documentation: https://help.syncfusion.com/file-formats/pdf/working-with-ocr?utm_source=nuget&utm_medium=listing Support: Incident: https://www.syncfusion.com/support/directtrac/incidents/newincident?utm_source=nuget&utm_medium=listing Forum: https://www.syncfusion.com/forums/aspnetmvc?utm_source=nuget&utm_medium=listing
The Syncfusion® Essential PDF OCR is a .NET character recognition library that recognizes characters from both images and PDF in any WPF application. Syncfusion® OCRProcessor uses tesseract, one of most accurate OCR engines. Key features: • Converts scanned PDF to searchable PDF. • Converts various image formats such as TIFF, JPEG, PNG, BMP to searchable PDF. • Converts image or PDF to text with location. • Process OCR for the specified region in both PDF and image. • Supports 60+ languages. • Recognize text from rotated images and PDF documents. • Works both in 32-bit and 64-bit environments. Learn more: https://www.syncfusion.com/pdf-framework/net?utm_source=nuget&utm_medium=listing Documentation: https://help.syncfusion.com/file-formats/pdf/working-with-ocr?utm_source=nuget&utm_medium=listing Support: Incident: https://www.syncfusion.com/support/directtrac/incidents/newincident?utm_source=nuget&utm_medium=listing Forum: https://www.syncfusion.com/forums/aspnetmvc?utm_source=nuget&utm_medium=listing
The Syncfusion Essential PDF OCR is a .NET character recognition library that recognizes characters from both images and PDF in any ASP.NET MVC application. Syncfusion OCRProcessor uses tesseract, one of most accurate OCR engines. Key features: • Converts scanned PDF to searchable PDF. • Converts various image formats such as TIFF, JPEG, PNG, BMP to searchable PDF. • Converts image or PDF to text with location. • Process OCR for the specified region in both PDF and image. • Supports 60+ languages. • Recognize text from rotated images and PDF documents. • Works both in 32-bit and 64-bit environments. Learn more: https://www.syncfusion.com/pdf-framework/net?utm_source=nuget&utm_medium=listing Documentation: https://help.syncfusion.com/file-formats/pdf/working-with-ocr?utm_source=nuget&utm_medium=listing Support: Incident: https://www.syncfusion.com/support/directtrac/incidents/newincident?utm_source=nuget&utm_medium=listing Forum: https://www.syncfusion.com/forums/aspnetmvc?utm_source=nuget&utm_medium=listing
Tesseract is probably the most accurate open source OCR engine available.
Use this library with the Atalasoft OCR library to add Google's Tesseract engine to the useable engines.
XDoc.PDF with OCR add-in is a C# Optical Character Recognition library to read, extract text contents from scanned PDFs, multi-page TIFF files in .NET projects. XDoc.PDF with OCR add-in from RasterEdge is an advanced OCR library : * Allow characters recognition and extraction from images captured by digital camera, scanned PDF document and image-only PDF * Support multiple languages, including English, French, German, Portuguese, Spanish, Russian, Italian, Dutch, Arabic, Korean, etc * Support user-defined image and document OCR, like full-page, auto and manual zonal OCR recognition * Able to read QR Code, barcode data Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Email : support@rasteredge.com
The Syncfusion Essential PDF OCR is a .NET character recognition library that recognizes characters from both images and PDF in any ASP.NET Web Forms application. Syncfusion OCRProcessor uses tesseract, one of most accurate OCR engines. Key features: • Converts scanned PDF to searchable PDF. • Converts various image formats such as TIFF, JPEG, PNG, BMP to searchable PDF. • Converts image or PDF to text with location. • Process OCR for the specified region in both PDF and image. • Supports 60+ languages. • Recognize text from rotated images and PDF documents. • Works both in 32-bit and 64-bit environments. Learn more: https://www.syncfusion.com/pdf-framework/net?utm_source=nuget&utm_medium=listing Documentation: https://help.syncfusion.com/file-formats/pdf/working-with-ocr?utm_source=nuget&utm_medium=listing Support: Incident: https://www.syncfusion.com/support/directtrac/incidents/newincident?utm_source=nuget&utm_medium=listing Forum: https://www.syncfusion.com/forums/aspnetmvc?utm_source=nuget&utm_medium=listing
XDoc.TIFF with OCR add-in is a C# Optical Character Recognition library to read, extract text contents from multi-page TIFF image files in .NET projects. XDoc.TIFF with OCR add-in from RasterEdge is an advanced OCR library : * Allow characters recognition and extraction from images captured by digital camera, scanned TIFF document * Support multiple languages, including English, French, German, Portuguese, Spanish, Russian, Italian, Dutch, Arabic, Korean, etc * Support user-defined image and document OCR, like full-page, auto and manual zonal OCR recognition * Able to read QR Code, barcode data Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Email : support@rasteredge.com
Tesseract-OCR binaries for NAPS2.Sdk
This helps to read simple text (string or number) from the images using Tesseract without additional configuration. IMPORTANT : Change the properties of all the files in the "tessdata" folder for "Copy To Output Directory" as "Copy always". Sample Project : https://github.com/rohitvipin/TesseractHelper.Demo
~ OCR Tesseeract plugin for PDFix.SDK ~ - OCR PDF document, page a portion of a page - embed fonts into the document Free trial: https://pdfix.net/download Code samples: http://pdfix.net/docs/_c_s__samples.html Related packages: PDFix.SDK ~ https://www.nuget.org/packages/PDFix.SDK/ PDFix.PdfToHtml ~ https://www.nuget.org/packages/PDFix.PdfToHtml/
Installs tessnet2, including the tesseract OCR library & the English language files into a project. It depends on the MSBuild.NugetContentRestore package, so that you can ignore the installed files in your VCS; they will be restored like assembly references. This package includes the following works of other authors (all under apache 2.0 license): - Tessnet2 - tesseract - tesseract tessdata english language files
EdgePDF Viewer OCR add-in for ASP.NET web app is a C# Optical Character Recognition library to read, extract text contents from scanned PDFs, multi-page TIFF files in .NET projects. EdgePDF.Viewer with OCR from RasterEdge is an advanced OCR library : * Allow characters recognition and extraction from images captured by digital camera, scanned PDF document and image-only PDF * Support multiple languages, including English, French, German, Portuguese, Spanish, Russian, Italian, Dutch, Arabic, Korean, etc * Support user-defined image and document OCR, like full-page, auto and manual zonal OCR recognition * Able to read QR Code, barcode data Compatible with * .NET Standard 2.0 * .NET 8, .NET 7, .NET 6, .NET 5, .NET Core 3.x & 2.x * .NET Framework 4.x * Windows, MacOS, Linux, Docker, Azure Online Documents * C# How to Guide : http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/ * Email : support@rasteredge.com