An essential first step to processing mixed batches with many types of documents is classification. Document Classification methods quickly sort documents by type using key content and layout attributes to identify them.
The most popular document classification systems are advanced AI-based machine learning algorithms that automatically learn how to classify documents based on samples and user feedback. These systems are very powerful but also very expensive. Only large organizations processing millions of pages each year can afford these enterprise solutions.
SimpleIndex naturally has a simpler way to do classification based on keyword patterns in the document text. Simply create a list of document types and assign one or more unique keywords or phrases that will only appear in that document type to each. Logical operators for AND, OR and NOT prevent false matches by requiring multiple keywords for matching or excluding documents that contain certain phrases.
Keyword-based classification works for the vast majority of applications at a fraction of the cost of AI classification.
After classification, SimpleIndex can automatically launch separate document indexing workflows for each document type found in the classified batch. This is especially useful when documents have different metadata requirements or business workflows associated with them.
Our LoanStacker application uses SimpleIndex classification capabilities to identify over 500 different types of residential mortgage documents and automatically verify that all required documents are present.
This is a great way for accountants and tax preparers to organize complex tax returns in a way that makes it easy to find specific documents. It can also be used to ensure all required schedules and supporting documents are present in the finished return.
Use our out-of-the-box TaxStacker configuration to automatically identify all the forms and schedules that make up a U.S. federal income tax return. These can then be sorted into separate PDF files or combined into a single file that has bookmarks to indicate each section.
This video shows the Sort My Documents sample configuration. Word documents, Excel spreadsheets and PowerPoint presentations are automatically sorted using the SimpleIndex template and dictionary matching algorithms.
The files are reorganized using the Sales Rep, Customer, Document Type and Date extracted from the text. SimpleSearch is then used to search and view the sorted files.