Office PDF Document Indexing Pages

Office PDF Document Indexing

SimpleIndex uses the existing text of Microsoft Office documents (Word, Excel, PowerPoint, etc.) and PDF files to extract data using RegEx patterns and database keyword matching. Scanned PDF files are converted to text with OCR. Automatically assign metadata and upload to any document management system.

Features

Monday, 14 November 2022 by Simple Software

SimpleIndex is the perfect solution for small business and departments looking to manage their files from a single interface, developers who don’t want to reinvent the wheel and large companies with many locations looking to decentralize their scanning.

SimpleIndex organizes scanned images and electronic documents into a single document management database your employees can access on their desktops.

SimpleIndex takes the labor out of document imaging by providing powerful barcode recognition and OCR search algorithms that can find index values no matter where they are on the page. By providing these essential automations for a reasonable price, we make document management affordable for anyone.

The most unique automation is our OCR template and dictionary matching search algorithms. This lets you find information like date, customer name, invoice number and other information on documents with different layouts. These can even be applied to the text in Office documents and PDF files to organize these files automatically and attach them to a database.

The SimpleCoversheet barcode printing application lets anyone in your company print bar-coded coversheets with all the information needed to identify a document. This is perfect for scanning with a centralized scanner or networked digital copier. SimpleIndex can then be configured to process these documents automatically without any user intervention whatsoever!

This level of automation is provided by SimpleIndex‘s command line interface. All the settings related to scanning or searching documents can be saved to “Job Files”, which can be saved to an icon and launched with a click of the mouse or a single line of code. These jobs can be configured to scan documents, read barcodes and OCR, generate folders and filenames and upload the index information to a database in a single step.

It is simply not possible to find an easier, faster way to process your documents!

Key Features and Recent Enhancements

All Editions offer OCR and Barcode

- SimpleIndex Standard has Tesseract OCR and DTK Barcode engines. These provide good recognition on clean originals.
- SimpleIndex Professional adds ABBYY FineReader OCR Engine, ICR handprint recognition, ISIS Scanning, and Cloud OCR. These are able to recognize hard-to-read text and bar codes, and improve overall speed and accuracy.
- Process existing text in PDF files and MS Office documents in all versions.

OCR
- Pro version includes ABBYY^® FineReader Engine for faster, more accurate OCR
- Searchable PDF output
- Clipboard / Screen Shot OCR
- Point-Click OCR – Click on text to send it to an index field
- Enhanced OCR Options – match against other indexes, skip OCR on files with existing text
- Support for international character sets using Unicode

Bar Code Recognition
- Multi-engine Barcode voting to boost accuracy (Professional version)
- Support for most 2D barcodes in all versions
- User-defined Barcode delimiter – no longer restricted to “|” for parsing multi-index barcodes
- Find/Replace characters in barcodes for matching or autofill

TWAIN and ISIS Scanning
- TWAIN support in all versions. ISIS support available as an add-on or in Pro version.
- Improved real-time image processing
- Multiple scan windows when using ISIS
- Scan directly to a network folder (processing occurs during scan)

Desktop Processing
- Selectively reprocess files
- Run multiple copies of SimpleIndex simultaneously
- Save any image region to a separate file for signature capture, etc.

Server Processing
- Run multiple jobs on different schedules
- Run multiple copies of the same job for parallel processing and increased throughput
- Server licenses can be purchased in 1 Million Page per Year increments as an add-on to any workstation license
- Unlimited page barcode processing license available with Advanced Barcode Server
- Server processing compatible with Windows 7 or above and all Windows Server versions

PDF Handling
- Support for reading and writing password protected PDF files
- Convert MS Office, HTML, XML and images to PDF before processing
- Searchable PDF output
- Native PDF Viewer – no dependence on 3rd party software
- PDF Auto-repair – attempts correction of bad PDF files

Indexing
- Return MD5 hash values
- Configure default values for empty fields
- Export to XML
- Edit Fixed Fields – make changes to auto-generated indexes

SharePoint Integration
- Compatible with all versions of SharePoint and SharePoint Online
- Append to or replace existing files in SharePoint
- Automatically match index fields to SharePoint columns to populate data

Website Resources

SimpleIndex.com is full of information and interactive content to help you learn what you need to know to implement SimpleIndex in your organization.

Design Philosophy & Getting Started Guide
An introduction to the way SimpleIndex approaches document processing
Bringing Document Imaging to Everyone
How Simple Software solutions reduce the cost of entry into the Paperless Office
Sample Applications
Examples of different ways you can put SimpleIndex to use
Compare SimpleIndex to the Competition
Videos showing the same batch of documents being scanned and indexed with SimpleIndex as well as 4 top desktop document capture applications.
Top Features of SimpleIndex
Detailed information on the automated scanning and indexing features of SimpleIndex
Demonstration Videos
See how SimpleIndex automates indexing with OCR and barcode recognition
Simple Software University
Online training videos teaching all aspects of Simple Software configuration
Simple Software FAQ
Answers to common questions about Simple Software products
How Many Clicks does it Take to Scan My Documents?
Printable brochure in PDF format

SimpleIndex Applications in Your Industry

PDF brochures outlining some of the applications for SimpleIndex in various industries.

SimpleIndex Feature Highlights

Links to more information on the major features of SimpleIndex.

Streamlined Document Capture
Ways that SimpleIndex helps reduce labor by streamlining the workflow and automating common indexing tasks
TWAIN and ISIS Scanner Driver Support
Use any scanner with SimpleIndex
Zone, Full Page and Dynamic OCR
Extract index data no matter where it appears on the page
Barcode Recognition
Read barcodes from scanned images to automate indexing
Optical Mark Recognition (OMR)
Read check boxes to find True/False or Yes/No values
Index Autofill
Populate multiple search fields with existing data from your database
Electronic Imprinting
Apply bates stamps and other image stamps/endorsements electronically
Database Integration
A full range of interactive database features allow for creative integration with custom database applications
Document Presence Auditing
Make sure that all required documents are present in the batch before it is released
Document Retrieval Options
How to find and view files once you have indexed them with SimpleIndex
Command Line Processing and Custom Application Integration
The SimpleIndex command-line interface makes it the easiest document capture application to integrate with your custom business software
Enable Distributed Document Capture
Companies with many remote locations can now afford to implement Distributed Capture with SimpleIndex

The Simple Software Imaging Suite

These applications enhance and expand the functionality of SimpleIndex by providing barcode printing, automatic uploading, quality control and direct integration with popular applications like QuickBooks.

Software Catalog

Top Features

The following is a list of the Top 25 major document capture features of SimpleIndex, in no particular order.

Indexing support for all file types
Viewing support for any installed, OLE-enabled application
No monthly page processing limitations
Text processing support for OCR’d images, PDF files and MS Office documents
Barcode Recognition
Dynamic & Zone OCR
Manual Zone OCR by indexing operator
Full-Page OCR to text, MS Word or HTML
Optical Mark Recognition (OMR)
Unstructured data capture using Template and Dictionary Matching
ODBC & OLEDB database connectivity
Use any database to store index data for document management and retrieval
Automatic population of index fields using database lookup (Autofill)
Document Presence Auditing – ensures all required pages are present in each batch
Clipboard / Screen Shot OCR
Command line execution
Input from any TWAIN or ISIS scanner or network folder
Media Wizard to create royalty-free, searchable document CDs or DVDs
Output images to TIFF, JPEG, PDF or PDF/A
Page Order Validation: reads the page number from each page with OCR and compare it to the scanned page order.
Double-Index Validation: compare the value of two fields during unattended processing and automatically route documents to exceptions when the values don’t match.
Automatic forwarding of a copy of the first page: from each exported file a first page is forwarded to a separate folder for data processing.
Integrated document separation: combines pages into multi-page documents without the need for a 2-step configuration.
Output index information to database or comma-delimited text file
Blank page detection and deletion
SharePoint 2010 Integration
Auto-Rotate
Easy-to-use cropping and redaction tools to remove confidential parts of images
Electronic imprinting and bates stamping

New versions History

Simple Software is always working on updating and upgrading. Here you can find change log for SimpleIndex.

Learn More:

KB Articles for OCR Features

1-Click Processing, Batch Scanning, Document Capture Solution, Document Numbering System, Document Scanning, Full Text Indexing, OCR, Office PDF Document Indexing, on-prem OCR, on-site OCR, Paperless Office, Personal Document Management, Scanned Document Indexing, SharePoint Migration, Sunshine Software OCR, TWAIN & ISIS Scanning, XSLT Data Conversion Software

1-Click Processing Batch Scanning Document Capture Solution Document Numbering System Document Scanning Full Text Indexing OCR Office PDF Document Indexing on-prem OCR on-site OCR Paperless Office Personal Document Management Scanned Document Indexing SharePoint Migration Sunshine Software OCR TWAIN & ISIS Scanning XSLT Data Conversion Software

Charset Name	Charset Value
ANSI_CHARSET (Latin)	0
DEFAULT_CHARSET	1
SYMBOL_CHARSET	2
SHIFTJIS_CHARSET (Japanese)	128
HANGUL_CHARSET (Korean)	129
GB2312_CHARSET (Simplified Chinese)	134
CHINESEBIG5_CHARSET (Chinese)	136
GREEK_CHARSET (Greek)	161
TURKISH_CHARSET (Turkish)	162
HEBREW_CHARSET (Hebrew)	177
ARABIC_CHARSET (Arabic)	178
BALTIC_CHARSET (Baltic)	186
RUSSIAN_CHARSET (Russian)	204
THAI_CHARSET (Thai)	222
EE_CHARSET	238
OEM_CHARSET	255

SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

FORGOT YOUR DETAILS?

CREATE ACCOUNT