OMR and OCR Document Separation: Difference between revisions

From Simple Wiki
No edit summary
No edit summary
Line 1: Line 1:
== Standard Document Separation ==
For most jobs, document separation happens automatically as you index documents. When unique values are read via [[OCR]] or [[barcodes]] and assigned to index fields, a unique export [[filename]] is generated and the documents are separated automatically. This can be automated as long as a unique value can be read on the first page, and false positive values are not present on other pages.
However, there are some cases where it is more efficient to separate the documents into files before processing them. Particularly when documents can contain a variable number of attachments whose content is unknown, and the data being extracted doesn't have a unique pattern.
== OMR Based Separation ==
SimpleIndex offers a unique new approach to determining where the first page of a new document starts. Traditionally, [[barcode]] [[separator sheets]] are inserted during document prep to mark the start of a new document. It is wasteful and time-consuming to insert them between each file, especially if the files are only a few pages.
SimpleIndex offers a unique new approach to determining where the first page of a new document starts. Traditionally, [[barcode]] [[separator sheets]] are inserted during document prep to mark the start of a new document. It is wasteful and time-consuming to insert them between each file, especially if the files are only a few pages.


SimpleIndex takes advantage of [[OMR]] technology to provide an easier solution to this problem. Simply take a felt pen and make a black mark on the upper-left corner of the first page of each new document. SimpleIndex will scan automatically to numbered multi-page files, with a new file created each time a mark is detected ([[separation]]). These files can then be indexed and exported with a second SimpleIndex job.
SimpleIndex takes advantage of [[OMR]] technology to provide an easier solution to this problem. Simply take a felt pen and make a black mark on the upper-left corner of the first page of each new document. SimpleIndex will scan automatically to numbered multi-page files, with a new file created each time a mark is detected ([[separation]]). These files can then be indexed and exported with a second SimpleIndex job.
== OCR Based Separation ==


SimpleIndex can also use [[OCR]] to locate the first page of a new document by finding unique keywords or patterns of text on the page. If the same page is used as the first page of each document this method can be used to identify it without additional document preparation.
SimpleIndex can also use [[OCR]] to locate the first page of a new document by finding unique keywords or patterns of text on the page. If the same page is used as the first page of each document this method can be used to identify it without additional document preparation.
== Creating Automatic Separation Jobs ==


The [[Autonumber]] page describes how to configure [[OCR]] and [[OMR]] based automatic document separation.
The [[Autonumber]] page describes how to configure [[OCR]] and [[OMR]] based automatic document separation.
The separation event triggers an increment in the [[Autonumber]] field, which results in a unique numbered multi-page file when exporting.
Use the '''Combine pages into documents after processing''' option to merge the pages into multipage files before starting the indexing step, letting you do separation and indexing in one job workflow. Otherwise use the [[Post-Process]] setting to execute a second [[job file]] to process the separated documents.

Revision as of 08:56, 22 October 2022

Standard Document Separation[edit | edit source]

For most jobs, document separation happens automatically as you index documents. When unique values are read via OCR or barcodes and assigned to index fields, a unique export filename is generated and the documents are separated automatically. This can be automated as long as a unique value can be read on the first page, and false positive values are not present on other pages.

However, there are some cases where it is more efficient to separate the documents into files before processing them. Particularly when documents can contain a variable number of attachments whose content is unknown, and the data being extracted doesn't have a unique pattern.

OMR Based Separation[edit | edit source]

SimpleIndex offers a unique new approach to determining where the first page of a new document starts. Traditionally, barcode separator sheets are inserted during document prep to mark the start of a new document. It is wasteful and time-consuming to insert them between each file, especially if the files are only a few pages.

SimpleIndex takes advantage of OMR technology to provide an easier solution to this problem. Simply take a felt pen and make a black mark on the upper-left corner of the first page of each new document. SimpleIndex will scan automatically to numbered multi-page files, with a new file created each time a mark is detected (separation). These files can then be indexed and exported with a second SimpleIndex job.

OCR Based Separation[edit | edit source]

SimpleIndex can also use OCR to locate the first page of a new document by finding unique keywords or patterns of text on the page. If the same page is used as the first page of each document this method can be used to identify it without additional document preparation.

Creating Automatic Separation Jobs[edit | edit source]

The Autonumber page describes how to configure OCR and OMR based automatic document separation.

The separation event triggers an increment in the Autonumber field, which results in a unique numbered multi-page file when exporting.

Use the Combine pages into documents after processing option to merge the pages into multipage files before starting the indexing step, letting you do separation and indexing in one job workflow. Otherwise use the Post-Process setting to execute a second job file to process the separated documents.