Document Classification

An essential first step to processing mixed batches with many types of documents is classification. Document Classification methods quickly sort documents by type using key content and layout attributes to identify them.

The most popular document classification systems are advanced AI-based machine learning algorithms that automatically learn how to classify documents based on samples and user feedback. These systems are very powerful but also very expensive. Only large organizations processing millions of pages can justify the cost of these enterprise solutions.

SimpleIndex naturally has a simpler way to do classification based on keyword patterns in the document text. Simply create a list of document types and assign one or more unique keywords or phrases that will only appear in that document type to each. Logical operators for AND, OR and NOT prevent false matches by requiring multiple keywords for matching or excluding documents that contain certain phrases.

Keyword-based classification works for the vast majority of applications at a fraction of the cost of AI classification.

After classification, SimpleIndex can automatically launch separate document indexing workflows for each document type found in the classified batch. This is especially useful when documents have different metadata requirements or business workflows associated with them.