Document Classification with OCR.
Monday, 01 July 2019
An essential first step to processing mixed batches with many types of documents is classification. Document Classification methods quickly sort documents by type using key content and layout attributes to identify them. The most popular document classification systems are advanced AI-based machine learning algorithms that automatically learn how to classify documents based on samples and user feedback. These systems are very powerful but also very expensive. Only large organizations processing millions of pages each year can afford these enterprise solutions. SimpleIndex naturally has a simpler way to do classification based on keyword patterns in the document text. Simply create a list of document types and assign one or more unique keywords or phrases that will only appear in that document type to each. Logical operators for AND, OR and NOT prevent false matches by requiring multiple keywords for matching or excluding documents that contain certain phrases. Keyword-based classificat
Tuesday, 03 October 2017
Office Videos | PDF Video The template and dictionary matching capabilities of SimpleIndex‘s OCR function can be used to extract index information from the text of existing MS Office and PDF files, or any file with an accompanying TXT file. SimpleIndex® will search the document for matches on unique patterns and value lists, then index the document with the matching data. Zone coordinates can be set to limit the search area to pre-defined regions on standard forms. The result is a fully automated indexing and renaming process for all your electronic documents! Using existing text, SimpleIndex can index and rename hundreds of files each minute and achieve perfect accuracy. These files can then be quickly searched with SimpleIndex Retrieval, SharePoint and Google search engines, or uploaded into your company’s document/content management system or custom business applications. Enhanced Text Parsing & PDF Support MS Office and PDF text parsing features are now included in