A simple C# shell wrapper for the wonderful pdfplumber library in Python to extract text from .PDF files
$ dotnet add package PdfTextExtractorPdfTextExtractor provides simple methods for extracting text and metadata from .pdf files.
This library is a shell wrapper leveraging an excellent Python library called pdfplumber. As such there are some constraints:
pypip install pdfplumber and associated dependenciesThis library has been tested on a limited set of documents. It is highly likely that documents exist this from which the library, in its current state, cannot extract text.
Refer to the Test project for a full example.
using DocumentTextExtractor;
void Main(string[] args)
{
using (PdfTextExtractor pdf = new PdfTextExtractor("mydocument.docx"))
{
string text = docx.ExtractText();
Dictionary<string, string> metadata = docx.ExtractMetadata();
}
}
Please refer to CHANGELOG.md.