High Volume Scanning

From Simple Wiki
Revision as of 23:05, 11 October 2023 by Aaron (talk | contribs)

Scanning departments and service bureaus that process thousands of pages daily can realize significant time and cost savings by using the most efficient Job Settings in a multi-user workflow.

Scanning[edit | edit source]

The most efficient scanning process is one that keeps the scanner running at its rated speed as much as possible. Time spent waiting for the software to process or save images, reloading the hopper, or anything else that stops the scanner, should be minimized.

SimpleIndex is able to perform barcode recognition and image clean-up in real time during scanning using the process images while scanning option.

SimpleIndex will also scan directly to the designated Output folder as long as there are no OCR options enabled, Autofill or other calculated fields, Database or SharePoint exports.

Enabling any of the following options will prevent direct scanning and require the Processing and Export steps that will stop the scanner. If they are needed, a second Job File should be used to perform the Processing step.

For optimal efficiency, use Barcode or Fixed fields to generate the scan filenames automatically so nothing needs to be keyed in by the scanner operator. But manual fields can be quickly entered before scanning if needed and used to generate the folders and filenames.

If the Output settings are configured to Append to End, a multi-page file will be created for each batch or barcode value.

Set it to Numbered 1-Page Files to create a separate file for each page. This option also makes it easier to stop and start the scanner, or remove incorrect images in the event of a paper jam.

Processing[edit | edit source]

Use a separate Job File to perform OCR, Autofill, Database exports, and other functions that require the processing and export stages.

Use Server Processing to fully automate this stage and run it in the background on a server.

If you don't have a server license, or need to perform all of the stages on a single workstation, you can run the processing step in a second SimpleIndex window while you continue scanning. You can also configure the Post-Process function to launch the processing job in a new window automatically after each batch is scanned.

The processing can bring over any data from the scanning step by parsing the Folders and Filenames that were created using Fixed fields.

In a multi-user workflow, the Database table used to store index values and manage batches should be populated during this step.

Quality Control & Rescanning[edit | edit source]

Use SimpleView application to load the scanned pages in a thumbnail view, with easy access to Image Clean-Up and Rescan functions. This is the most efficient configuration for high-volume processing, as it allows the QC operator to quickly load and view the images in place instead of moving all of the files during import and export steps.

Or use the embedded SimpleView viewer during a manual indexing or OCR data verification job to perform the review. This configuration is more efficient when many documents require data entry, since someone will need to look at all of the pages anyway.

Task switching between QC/Rescan and data entry is less efficient than performing each job separately. So for the highest volume and greatest efficiency it is best to separate these tasks.

If QC is being performed separately, it should be done before the Processing stage to improve the quality of the OCR results. But if performed as part of the manual indexing or data review stage, processing must necessarily come first.

You can also have SimpleView open while you scan so the operator can review and fix the images as they are being scanned.

Inserting missing pages into a batch or changing the page order can be done in SimpleIndex using the Scan->Insert and the Index->Move Up/Down menu commands.

If you are using SimpleView with numbered 1-page files, insert documents by scanning and appending to the page before the insert point. This will create a multipage file that can be split back into pages in the correct order during the Processing step.

Manual Indexing and Data Review[edit | edit source]

Any process that uses OCR to capture data will need a manual review step to verify the correct data was read on each document before the final output is created.

The Multiple Users page describes how you can use Folders or a Database to queue up batches for manual review and delegate them to operators.

Final Output[edit | edit source]

The export settings can be configured as part of the manual review step, or a dedicated job can run unattended and create the final output files when the other steps have been completed.

Using Server Processing to run a dedicated Export job keeps the data entry operators from having to wait for files to export before they can process the next batch. This can make a big difference if you are exporting very large files, uploading them to SharePoint or other content management systems, Imprinting, and other time consuming operations.

The Fast Export option should be used to significantly increase performance, especially when creating large multi-page files (over 100 pages).

Putting It All Together[edit | edit source]

For the most efficient high volume scanning workflow, the final steps would be:

  1. Scan documents to a shared folder, creating a subfolder for each batch.
  2. QC/Rescan operator reviews images in each subfolder and moves it to the processing Input folder.
  3. Server process reads documents, performs data extraction, saves data to a database.
  4. Data entry operators use Update mode to queue batches for review and save results database.
  5. Server job exports completed batches, generating the file naming schema, log files, and uploads to third party apps.

The same workflow can be configured without a database by linking the Input and Output folders for each step, and using folders and filenames to pass data between steps. This configuration allows remote workers to process batches using Cloud Storage services like Google Drive, without the need for VPNs, database connections, and other network considerations.