PDF Text Processing Demo
This sample job demonstrates the PDF text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of documents without OCR, by processing the text layer of PDF files.
Computer-generated PDF files, such as those created using PDF printer drivers, already contain digitized text. SimpleIndex reads the text and performs Template and Dictionary Matching to locate and extract the correct data values from the text.
Full-Page OCR can also be used to get text from scanned PDF files with no existing text. SimpleIndex will also detect when a PDF file has existing text and only perform OCR on the documents that need it to improve performance.
Find Out More
- Download or get an Online Demo
- PDF Text Processing Features in SimpleIndex
- PDF Features and Settings Wiki Pages
- Full-Page OCR Wiki Pages
- OCR Features and Settings Wiki Pages
- OCR Software Guide on SimpleOCR
FAQ Related to PDF Text Processing
- SimpleIndex 10.1 with Textract!
- Take control of Sales Tax exemption forms
- Exclude Index Field from Index Log
- Change the Font Size of Index Fields
- Large documents (>500 pg) Slow to Process - Workaround
- How to activate SimpleView?
- How to activate any Add-on or Upgrade to SimpleIndex?
- TaxStacker: Sort & Classify Federal Tax Documents