Document Classification Pages

An essential first step to processing mixed batches with many types of documents is classification. Document Classification methods quickly sort documents by type using key content and layout attributes to identify them.

The most popular document classification systems are advanced AI-based machine learning algorithms that automatically learn how to classify documents based on samples and user feedback. These systems are very powerful but also very expensive. Only large organizations processing millions of pages each year can afford these enterprise solutions.

SimpleIndex naturally has a simpler way to do classification based on keyword patterns in the document text. Simply create a list of document types and assign one or more unique keywords or phrases that will only appear in that document type to each. Logical operators for AND, OR and NOT prevent false matches by requiring multiple keywords for matching or excluding documents that contain certain phrases.

Keyword-based classification works for the vast majority of applications at a fraction of the cost of AI classification.

After classification, SimpleIndex can automatically launch separate document indexing workflows for each document type found in the classified batch. This is especially useful when documents have different metadata requirements or business workflows associated with them.

Our LoanStacker application uses SimpleIndex classification capabilities to identify over 500 different types of residential mortgage documents and automatically verify that all required documents are present.

Reduce Click Charges for Data Capture

Monday, 14 November 2022 by Simple Software

If you operate a high-volume scanning department or service bureau, chances are you use software like Kofax to scan and index documents for your clients. If you do then you are well aware of the high cost of click charges and the inevitable mad rushes to purchase additional clicks at the end of a peak volume month.

There are some scanning jobs that need the multi-user batching and indexing features of these systems, but many do not. SimpleIndex^® can help you save big on click charges by supplementing your primary scanning infrastructure, letting you perform smaller, less complex jobs in a separate workflow.

Many data capture and forms processing applications charge for every page you process, even if all the data being read is only on the first page. Starting SimpleIndex 9, you can automatically send a copy of the first page from each exported file to a separate folder for data processing, helping you avoid unnecessary processing time and license costs.

Jobs like these can be easily processed with SimpleIndex:

Simple scan-to-file with no indexing
All indexing is done via bar codes or database lookup
No custom export or API integration is required

The following scenarios usually require a more robust solution:

Multi-user workflows
Complex data extraction and forms processing
Direct application integration with APIs

Basically, SimpleIndex is great for 1-2 user workflows where a single user performs the whole scanning and indexing process, or where one person scans and another indexes on a separate workstation. When more than 2 users are required to keep up with indexing volume then it makes more sense to use a system designed for multiple users.

Learn More:

Scan, file, and process document data quickly and efficiently with Simple Software's tailored OCR automation and one-click processing that fits your unique business needs

Use SimpleIndex OCR to convert scanned and digital images to searchable PDF files for automated sorting, filing, and export to applications such as Word, Excel, PowerPoint, etc.

KB Articles for Reduce Click Charges

1-Click Processing, Automatic Data Capture, Database, Document Classification, Document Imaging, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, Self-hosted OCR, Subscription free OCR, Sunshine OCR, TWAIN & ISIS Scanning, Workflow

1-Click Processing Automatic Data Capture Database Document Classification Document Imaging offline OCR on-prem OCR on-site OCR One-time payment OCR Self-hosted OCR Subscription free OCR Sunshine OCR TWAIN & ISIS Scanning Workflow

No Comments

Zone OCR and Dynamic OCR

Monday, 07 November 2022 by Simple Software

Other document scanning applications in this price range use Zone OCR to obtain index data from the page.

SimpleIndex improves upon this time-tested but limited model with its Dynamic OCR feature.

Let’s look at the difference between the two methods:

Zone OCR

Zone OCR is used to read document indexes or tags from text on the page. It is a great way to automate the data entry associated with scanning documents.

However, there are several limitations to zone OCR that must be overcome:

Index information must be in the exact same place on every page
Documents shift and skew during scanning, causing the zones to not line up
If surrounding lines or text on the document are too close, they can encroach on the zone

Dynamic OCR

SimpleIndex overcomes these limitations by using Dynamic OCR technology to extract values from anywhere on the page. Our simplified version of Dynamic OCR works great for many types of documents at a fraction of the cost of other solutions.

Index information can appear anywhere
Unwanted characters are ignored
Find unique patterns of letters and numbers using Template Matching
Use Dictionary Matching to find a value from a list of possible values
Use Cloud OCR or ChatGPT to perform AI analysis and intelligent data extraction

Download document scanning and OCR software.

Dynamic OCR and AI Assisted OCR

AI assisted OCR is the popular solution to the problem of unstructured and semi-structured documents. But there are many scenarios where simple Template and Dictionary matching provide much better results. And all of these solutions are much more expensive than SimpleIndex!

Often there are only a few key values that need to be extracted, and a wide variety of possible layouts. AI-based document training requires manual processing of several samples of each possible format before it learns how to read them reliably, where a Template could read them all with a single setting. Dictionary matching can perform advanced classification without analyzing thousands of samples.

When data extraction requires natural language processing, field label extraction, handwriting, AI document analysis, or other advanced features, SimpleIndex offers Cloud OCR and ChatGPT integrations.

Dynamic OCR Examples

In the video we see how SimpleIndex approaches a typical Zone OCR example. With SimpleIndex you can use large zones that give a wide margin for error. Template and Dictionary matching are then used to extract the 7-digit Account Number, 6-digit Order Number and Company Name. SimpleIndex discards the surrounding text and keeps the correct value.

Another common example is finding a unique identifier, for example a social security number, that could appear anywhere on the page. Simply enter the template ###-##-#### and SimpleIndex will search the full OCR text until it finds a match. Since only one social security number is likely to appear on the page, a match on this pattern is almost certainly the required value.

With dictionary matching, you can give SimpleIndex a list of possible values and it will automatically search the zone or page for each possible value until it finds a match.

Many dynamic forms processing applications can be implemented using these simple algorithms. This makes SimpleIndex far more versatile than other zone OCR solutions that require the index value to be in the exact same location on every page. Yet SimpleIndex costs only a fraction of the price!

SimpleIndex‘s dynamic forms processing can greatly speed up data entry by eliminating a good percentage of indexing work. For many this can put the labor cost of scanning within their reach.

Dynamic OCR can also be applied to MS Office and PDF files, creating a fully automated process for intelligently indexing and reorganizing electronic documents.

MS Office Document OCR Text Parsing Video

Amazon AWS Textract Cloud OCR

With Textract you can capture data from almost any type of form, including handwritten ones! Textract identifies labeled text anywhere on the document and returns the label text along with the corresponding value. Map the labels to index fields in SimpleIndex and you are ready to capture that data no matter where it appears on the page.

SimpleIndex Cloud OCR with Amazon Textract

Textract uses machine learning with a huge model based on the billions of pages processed using Textract to provide the most accurate OCR and form field extraction solution available.

By default, Textract is only available as an API and requires custom coding to integrate it into your document workflows. SimpleIndex turns it into a fully-featured batch document and data processing app that is ready to use out-of-the-box.

Since there are no templates to configure or train, setup can be done in hours instead of days or weeks months required by other enterprise data capture solutions.

Pay-as-you-go pricing makes SimpleIndex with Textract the most affordable way to batch process forms for projects with less than 50,000 pages per year to process, especially if you need to read handwriting or have forms with many layout variations.

Got a preference for ABBYY Cloud OCR, Microsoft Azure AI Vision, or Google Cloud Vision OCR? These can be quickly added for a small customization fee. Contact Us for a quote!

Wiki: How to configure AWS Textract OCR in SimpleIndex

Handprint and Handwriting Recognition

SimpleIndex 11 adds handprint recognition capabilities to the FineReader OCR engine to allow recognition of simple form fields and printed text. It works best with constrained form fields, with letter boxes for each character like you see on tax forms and credit applications. And no additional licensing or per-page costs are required!

For unconstrained handprint and cursive handwriting, use the Cloud OCR option to achieve the best recognition accuracy available. This option requires additional AWS processing fees for each page.

Support for Regular Expressions

SimpleIndex OCR has a simple built-in template format, as well as support for Regular Expressions. Regular Expressions (RegEx for short) let you define complex search patterns to extract matching values from the text. This greatly enhances the functionality of the dynamic OCR in SimpleIndex, making it capable of finding variable-length fields with no distinct pattern.

Regular Expressions are a commonly used in text parsing applications. The Perl programming language makes extensive use of RegEx, as do UNIX utilities like “grep”. Many programmers and IT personnel are already familiar with RegEx and can create complex expressions without specific training.

Click here for a reference guide to Regular Expressions

How to Configure SimpleIndex OCR

Our Wiki help has extensive information on how to configure OCR for various document and data capture scenarios.

Zone OCR read data in a specific location
Template matching to match unique patterns
Dictionary matching to match a list of possible values
OCR Options OCR job settings that apply to all fields
File Formats that can be output by OCR
Languages supported by OCR
FineReader versus Tesseract OCR engines
Searchable PDF with MRC compression
OCR to Field for point and click OCR during verification
Cloud OCR using Textract

Watch this Simple Software University training video to see how to configure and run an OCR job with SimpleIndex.

Learn More:

KB Articles for Optical Character Recognition (OCR)

Automatic Data Capture, Batch Scanning, Document Classification, Document Imaging, File Indexing, Invoice OCR, OCR, Office PDF Text Processing, on-prem OCR, on-site OCR, Optical Character Recognition, RegEx, Screenshot OCR, Search, Sunshine Software OCR, Text Processing, Watermark PDF Files, Workflow Software, Zone OCR

No Comments

How do I download and utilize TaxStacker after purchasing?

Friday, 11 October 2019 by Alex Stewart

The TaxStacker system is based on a custom set-up in the SimpleIndex software and consists of the “Tax Stacker.sic” Job Configuration file and the “TaxStacker.mdb” database.

To use the TaxStacker system after purchasing follow the instructions below.

Follow the SimpleIndex download, installation and activation instructions.
Download the “TaxStacker.zip” file from the link provided with your confirmation email, which includes the Job Configuration File and Database.
Create a Windows folder in the location of your choosing called “TaxStacker”
Copy the “TaxStacker.zip” file to to the “TaxStacker” folder that was just created and unzip the contents directly to this folder.
Create a folder called “Input” within the “TaxStacker” folder.
Put any tax documents and image files that you would like to process with the TaxStacker system in the “Input” folder.
Run the “Tax Stacker.sic” Job Configuration file within SimpleIndex to process any files contained within the “Input” folder.

Document Classification

No Comments

TaxStacker: Sort & Classify Federal Tax Documents

Friday, 16 August 2019 by Cary Wiedman

This is a great way for accountants and tax preparers to organize complex tax returns in a way that makes it easy to find specific documents. It can also be used to ensure all required schedules and supporting documents are present in the finished return.

Use our out-of-the-box TaxStacker configuration to automatically identify all the forms and schedules that make up a U.S. federal income tax return. These can then be sorted into separate PDF files or combined into a single file that has bookmarks to indicate each section.

Learn More:

Document Classification, File Indexing, OCR, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, PDF, PDF Forms, Self-hosted OCR, Subscription free OCR, Sunshine OCR

Document Classification File Indexing OCR offline OCR on-prem OCR on-site OCR One-time payment OCR PDF PDF Forms Self-hosted OCR Subscription free OCR Sunshine OCR

No Comments

What is Document Imaging?

Wednesday, 31 July 2019 by aaron

Document Imaging was the more commonly used term in the early days of document scanning and OCR and refers to any system used to replicate documents used in business. It evolved from the microfilm days where it was referred to as Document Image Management.

Document Imaging allows for the scanning of paper documents, as well as the processing of files saved electronically. These files are then named and saved for later searching.

Other document imaging terms include automatic imaging software, best digital imaging software, best imaging software, desktop imaging software, digital document imaging, digital imaging software, document imaging download, document imaging PDF, document imaging processing, document imaging products, document imaging software, document imaging solution, document imaging solutions, document imaging systems, document imaging technologies, document imaging technology, document imaging tools, image to database, imaging resource, imaging scanning software, imaging software companies, imaging software download, imaging software for windows, imaging solution, scanner imaging software, scanning and imaging, scanning imaging, and software for imaging.

Automatic Data Capture Automatic Indexing Software Document Automation Document Classification Document Imaging Document Management Software Document Scanning Image Scanning Keyword Indexing Office PDF Document Indexing Personal Document Management QuickBooks Document Management Required Documents Auditing Scanned Document Indexing Workflow

No Comments

Organize Office Documents with Text Parsing

Tuesday, 23 January 2018 by Simple Software

This video shows the Sort My Documents sample job included with the SimpleIndex trial download. It shows how you can organize office documents automatically by parsing the file’s text for relevant metadata and keywords. You can then use those keywords to tag documents with metadata and create standardized folders and filenames.

First we sort Word documents, Excel spreadsheets and PowerPoint presentations automatically using the SimpleIndex template and dictionary matching algorithms that match patterns and keywords in the parsed text.

Then the files are organized into folders and filenames using the Sales Rep, Customer, Document Type and Date values extracted from the text.

Organize Office Documents for Cloud Storage

You can also upload organized files to SharePoint or Cloud Storage platforms without the chaos and disorganization you inevitably get when users create their own folders and filenames.

Organize Office Documents for Document Management

In the video, we use SimpleSearch to search and view the sorted files. But you can just as easily use any third party document management system or custom database to perform keyword or full-text searching.

You can use the SimpleView embedded viewer to view Office documents, PDF files and images in a common interface. In the video we use the full version of Word, Excel, and PowerPoint to edit Office documents right from the search screen.

Find Out More

Learn More:

FAQ Related to Organizing Office Documents

Document Classification, Full Text Indexing, MS Office, Office PDF Document Indexing, Office PDF Text Processing, Office to PDF, Paperless Office, Search, SharePoint Migration, SharePoint Scanning, Text Processing

Document Classification Full Text Indexing MS Office Office PDF Document Indexing Office PDF Text Processing Office to PDF Paperless Office Search SharePoint Migration SharePoint Scanning Text Processing

No Comments

Streamlined Interface

Tuesday, 23 January 2018 by Simple Software

Maximum Data, Minimum Clicks

As with any repetitive task, a few seconds saved scanning and filing a single document quickly adds up to dozens or hundreds of hours over the course of a long project or daily routine. The most import part of planning your document capture project is to find the most efficient way to file them correctly. Creating an efficient workflow will save you countless hours of labor over the life of your project.

SimpleIndex is faster and easier because it is designed to perform all of the steps necessary to scan or import documents, process, verify and export them in one continuous workflow rather than requiring the user to click extra buttons each time to initiate the next step. When taken to the extreme, SimpleIndex is capable of performing all of these tasks automatically with just a single mouse click.

SimpleIndex does this by saving all of the settings for a document capture workflow to a file that can be opened just like an Office document. This file is configured by the administrator so the user doesn’t have to see any of the technical details. Very rarely does the operator need to be able to change, for instance, the export file format and file naming scheme. So why do some applications show you a complicated export settings screen every time you try to save a batch? It is this attention to detail that allows SimpleIndex to process the same batch 35-75% faster than its competitors.

SimpleIndex also has the ability to pre-set index values and run jobs using the Command Line Interface. More on this design feature can be found on our Getting Started page.

Index Automation Features

The two main methods for automating indexing are Barcode Recognition and Optical Character Recognition (OCR).

Barcode recognition is faster and more accurate, but your documents must contain a barcode on the document or a cover page for this to work.

OCR is able to read printed data directly from the page, which means most documents can be processed as-is. However it is not 100% accurate and usually requires some human review. Handwriting can be recognized as well, using the Cloud OCR option.

If your index data already exists in another database, SimpleIndex has features that can make use of this data to automate processing. The Index Autofill feature matches data read from barcodes or OCR to data in your database, verifying the correct value is read and populating additional search fields automatically.

Paper and Electronic Documents

Traditional document capture is focused on digitizing paper documents with a document scanner. However, more and more documents are living their best lives as native PDF and Word files, never once having to enter our physical realm.

SimpleIndex is designed to handle both scanned physical documents and electronic files in their native format seamlessly. The OCR function will use existing text from any PDF file or Office document when it is available, or automatically OCR scanned images when it isn’t.

Use the built-in SimpleView viewer to view most common file types, or use the PDF editor and word processor of your choice to provide full editing capabilities embedded right within the SimpleIndex application.

It can also simultaneously scan and import documents from a hotfolder into a single batch. So if, for example, you receive both paper and email invoices, you can process your day’s work all at once with just one click!

Using Pre-Indexed Batches

The Pre-Index Batch feature of SimpleIndex is what enables 1-click scanning and indexing, as well as command line and unattended processing.

Pre-indexing lets you set fixed values for index fields and apply them to a whole batch. These can be combined with automatic values from barcode recognition, OCR and Autofill to create fully automated batch processes that can be launched from your custom application, a desktop shortcut, scheduled server task or even linked to the scan button on your scanner.

Learn More:

KB Articles for Streamlined Interface

Automatic Data Capture, Barcode Recognition Software, Batch Scanning, Command Line Interface, Database, Document Automation, Document Classification, Document Imaging, Fast Scanning, OCR, Office PDF Text Processing, on-prem OCR, on-site OCR, RPA, Scanning Software, Solution, Sunshine Software OCR, TWAIN & ISIS Scanning, Unattended, Workflow, Workflow Software

No Comments

SimpleInvoice Invoice Processing Solution

Wednesday, 17 January 2018 by Simple Software

SimpleInvoice is a preconfigured solution that uses the OCR and dictionary matching functionality of the SimpleIndex scanning and indexing software to automatically capture key information from invoices needed for Accounts Payable processing.

SimpleInvoice requires minimal configuration to get started, and comes with everything you need to capture most common invoice styles.

Use SimpleInvoice to:

Capture data from paper and electronic invoices in a single workflow
Automatically receive and enter Accounts Payable data in your accounting software
Create full-text searchable invoice files
Create an organized filing system for archiving invoices
Quickly find and view invoices based on vendor, date, invoice number, or full-text search
Direct integration with QuickBooks on-premise using SimpleQB
Works with RPA bots to integrate with QuickBooks Online and other accounting systems

Uses Templates, Not Training

Most data on an invoice matches common patterns like dates and total amounts. The one exception is the invoice number, which has a different format for every vendor.

Using the Template Autofill feature in SimpleIndex, you to spell out the specific OCR pattern of a vendor’s invoice number as a column in your Vendor database. When processing invoices, it first identifies the vendor, then searches for the matching pattern in the text to find the invoice number.

This solution is far simpler than the machine learning algorithms employed by enterprise invoice OCR systems, which is why SimpleIndex is a fraction of the cost. It’s also simpler than other template-based systems that require you to locate every field for every vendor.

Enterprise Accounts Payable Automation

If your AP workflow requires advanced features like line item capture, GL coding, PO matching, VAT calculation, complex approval workflows, or if you have thousands of vendors to process, then an enterprise invoice processing solution is more appropriate.

Don’t worry, we can help you out with that too!

Find Out More

SimpleInvoice is included for free with any SimpleIndex license. Download SimpleIndex Now!

Some initial setup is required, and we can help you out with that too. Our Professional Services department can have you up and running in just a couple of hours.

Check out SimpleQB or our AP Automation RPA Bot to see how we integrate with your accounting software to automate the entry of transaction data.

Please Contact Us to find out more about SimpleInvoice!

Learn More:

FAQ Related to Invoice Processing

Database, Database Autofill, Document Classification, Invoice OCR, OCR, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, PDF, QuickBooks Document Management, Self-hosted OCR, Subscription free OCR, Sunshine OCR, Text Processing

Database Database Autofill Document Classification Invoice OCR OCR offline OCR on-prem OCR on-site OCR One-time payment OCR PDF QuickBooks Document Management Self-hosted OCR Subscription free OCR Sunshine OCR Text Processing

No Comments

SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

FORGOT YOUR DETAILS?

CREATE ACCOUNT

Learn More:

KB Articles for Reduce Click Charges

Dynamic OCR and AI Assisted OCR

Dynamic OCR Examples

Amazon AWS Textract Cloud OCR

Handprint and Handwriting Recognition

Support for Regular Expressions

How to Configure SimpleIndex OCR

Learn More:

KB Articles for Optical Character Recognition (OCR)

Learn More:

Organize Office Documents for Cloud Storage

Organize Office Documents for Document Management

Find Out More

Learn More:

FAQ Related to Organizing Office Documents

Maximum Data, Minimum Clicks

Index Automation Features

Paper and Electronic Documents

Using Pre-Indexed Batches

Learn More:

KB Articles for Streamlined Interface

Uses Templates, Not Training

Enterprise Accounts Payable Automation

Find Out More

Learn More:

FAQ Related to Invoice Processing