Office PDF Document Indexing Pages - Page 2 of 2

How do you configure a field to select from a list of possible values?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Index & Batch Logging reference.

List fields are populated using a text file or database field containing the values for that list. The text file scenario will be described first.

To configure a list field, go to the Index tab in the Job Options. Create a list field by selecting “List” for the field type and give it a name. In the “List File/Field” setting for this field, put the full path or UNC to the text file containing the list or click “Set” to browse to the file. If you have not created a file, you can put the path where you want the file stored here and click the “Edit” button. This will open the file in Notepad; new files will prompt you to create them.

On each line of the text file you will put one of the possible values that you want to be able to select from while you are indexing with SimpleIndex. You can copy this information from another source and paste it into the text file.
Save the file in Notepad and close it. The List field is now configured!

To use a database field, your configuration must be connected to a database using the settings on the Database tab. Any fields defined as “List” on the Index tab that have a corresponding field mapped on the Database tab will use the unique values from that field to populate the list.

You can check “Only allow values in list” to disallow users from entering a value not in the list.

Once the List index field is configured properly you can select the values from a drop down after scanning that appear in the main SimpleIndex window while indexing. It also makes the field autofill the closest match from the list based on the characters that you type in that field.

Published in Indexing & UI

No Comments

Can the values of 2 or more fields be combined in a single field?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Field types reference.

Set the Type for the field that you want to store the combined value to “Fixed”.

In the template setting for that field, you can enter the keyword %FIELD#% (where # is the field number) and the keyword will be replaced with the value of the designated field when it is saved.

For example, to combine your first 2 fields into third, inserting a comma between them, set the template for field 3 to:
%FIELD1%,%FIELD2%

Automatic Indexing Software File Indexing Full Text Indexing Keyword Indexing Metadata Microsoft Word Data Extraction Office PDF Document Indexing PDF Data Extraction Software Scanned Document Indexing

Published in Indexing & UI

No Comments

How do you configure OCR to read index information from MS Office or PDF documents?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Zones & OCR Settings reference.

MS Office and PDF files generated by software or PDF printer drivers already have the text you need to recognize in the file. Scanned documents need to use OCR to read text from an image of the page. With Office and PDF files, SimpleIndex can just read the text, which is much faster and accurate than image OCR.

To recognize index fields from the document text, first create OCR fields on the Index tab as you would normally. Next, on the Zones & OCR options tab, check the “Use Full Page OCR for this Field” option for each OCR field. This tells SimpleIndex to process the existing file text.

If the index value is a unique pattern of digits or list of possible values, use Template or Dictionary matching to locate the value within the text. Please see the manual for details on Template and Dictionary matching.

If the value appears in a specific location in each file, coordinates can be used to locate it. When processing text, the X, Y, Width and Height settings correspond to line and column numbers within the file text. This is explained in greater depth in the manual.

SimpleIndex will assume that any TXT file with the same name as a file being processed is the OCR text for that file, so this method can work with any type of file.

Find out more about Optical Character Recognition on the SimpleOCR Guide.

Microsoft Word Data Extraction MS Office Office PDF Document Indexing Office PDF Text Processing Office to PDF Paperless Office PDF PDF Archive Scanning Software PDF Barcode Recognition PDF Data Extraction Software PDF Forms Text Processing Unattended Processing

Published in OCR, Office PDF Text Processing

No Comments

Why does the word “MISSING” show up in filenames and index fields when the field is blank?

Wednesday, 28 February 2018 by dwilder

“MISSING” is what SimpleIndex puts for any field value used as a filename or folder name and is left blank by default.

You can change this to whatever you want it to say when a field value is left blank. To do this go to the Job Settings wizard under the File menu, go to the Advanced Settings step and expand Advanced Indexing Options. The value is set to DEFAULT, which puts the word “MISSING” when the index field is blank. Any text can be put in this field to use a different value than “MISSING”.

How do I configure the output folder and file naming scheme?

Wednesday, 28 February 2018 by dwilder

Use the Folder and Filename check boxes on the Indexing & File Naming step in the Job Settings Wizard to indicate whether field values will be used to generate subfolders or filenames. Any field with the Folder option checked will create nested subfolders for each value in the order the fields are listed. Any field with the Filename checked will have the values concatenated to form the filename.

For example, if Field 1 and Field 3 have the Folder option checked, and Field 2 and Field 3 have the Filename option checked, image filenames will be created in the format:

%OUTPUTFOLDER%\Field 1\Field 3\Field 2 – Field 3.tif

The Filename Separator option on the Advanced tab lets you change the ” – ” between the fields in the filename to anything you want.

Published in Export

No Comments

Automatic Indexing Using Existing Data

Wednesday, 24 January 2018 by Simple Software

Automatic Indexing Using Existing Data

The Autofill feature of SimpleIndex is an easy way to associate many index fields with one document without retyping data that already exists in another database. Autofill uses a database lookup to retrieve records that match a key value entered by the user. Blank index fields are then filled in automatically with the data from this lookup. The result is a document database with many different possible search fields, of which only one needed to be entered during scanning.

The key field may be typed by the user, or it may be read from the document automatically using barcode recognition or OCR. The lookup is performed either when the user changes this field or when the index values are saved. If the lookup finds multiple matching records, the user will be notified and the first set of values will be used by default.

When used with pre-index batches, key information can be read automatically from barcodes or OCR and matched to database records with a single click. Search on up to 99 index fields without a single keystroke!

Learn More:

Scan, file, and process document data quickly and efficiently with Simple Software's tailored OCR automation and one-click processing that fits your unique business needs

Use SimpleIndex OCR to convert scanned and digital images to searchable PDF files for automated sorting, filing, and export to applications such as Word, Excel, PowerPoint, etc.

KB Articles for Automatic Indexing

1-Click Processing, Automatic Data Capture, Automatic Indexing Software, Barcode Recognition Software, Database, Database Autofill, Document Automation, File Indexing, Full Text Indexing, Keyword Indexing, Metadata, Microsoft Word Data Extraction, OCR, Office PDF Document Indexing, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, PDF Data Extraction Software, Scanned Document Indexing, Scanning Software, Self-hosted OCR, Subscription free OCR, Sunshine OCR

No Comments

Organize Office Documents with Text Parsing

Tuesday, 23 January 2018 by Simple Software

This video shows the Sort My Documents sample job included with the SimpleIndex trial download. It shows how you can organize office documents automatically by parsing the file’s text for relevant metadata and keywords. You can then use those keywords to tag documents with metadata and create standardized folders and filenames.

First we sort Word documents, Excel spreadsheets and PowerPoint presentations automatically using the SimpleIndex template and dictionary matching algorithms that match patterns and keywords in the parsed text.

Then the files are organized into folders and filenames using the Sales Rep, Customer, Document Type and Date values extracted from the text.

Organize Office Documents for Cloud Storage

You can also upload organized files to SharePoint or Cloud Storage platforms without the chaos and disorganization you inevitably get when users create their own folders and filenames.

Organize Office Documents for Document Management

In the video, we use SimpleSearch to search and view the sorted files. But you can just as easily use any third party document management system or custom database to perform keyword or full-text searching.

You can use the SimpleView embedded viewer to view Office documents, PDF files and images in a common interface. In the video we use the full version of Word, Excel, and PowerPoint to edit Office documents right from the search screen.

Find Out More

Learn More:

FAQ Related to Organizing Office Documents

Document Classification, Full Text Indexing, MS Office, Office PDF Document Indexing, Office PDF Text Processing, Office to PDF, Paperless Office, Search, SharePoint Migration, SharePoint Scanning, Text Processing

Document Classification Full Text Indexing MS Office Office PDF Document Indexing Office PDF Text Processing Office to PDF Paperless Office Search SharePoint Migration SharePoint Scanning Text Processing

No Comments

PDF Text Processing Demo

Friday, 12 January 2018 by Simple Software

This sample job demonstrates the PDF text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of documents without OCR, by processing the text layer of PDF files.

Computer-generated PDF files, such as those created using PDF printer drivers, already contain digitized text. SimpleIndex reads the text and performs Template and Dictionary Matching to locate and extract the correct data values from the text.

Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate, especially compared to solutions using zone OCR.

While this demo runs interactively, text processing jobs can run in unattended mode since the data does not need to be verified.

Full-Page OCR can also be used to get text from scanned PDF files with no existing text. SimpleIndex will also detect when a PDF file has existing text and only perform OCR on the documents that need it to improve performance.

Find Out More

Learn More:

FAQ Related to PDF Text Processing

OCR, Office PDF Document Indexing, Office PDF Text Processing, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, PDF, PDF Archive Scanning Software, PDF Data Extraction Software, Self-hosted OCR, Subscription free OCR, Sunshine OCR, Text Processing, Unattended Processing

OCR Office PDF Document Indexing Office PDF Text Processing offline OCR on-prem OCR on-site OCR One-time payment OCR PDF PDF Archive Scanning Software PDF Data Extraction Software Self-hosted OCR Subscription free OCR Sunshine OCR Text Processing Unattended Processing

No Comments

MS Office & PDF Text Parsing

Tuesday, 03 October 2017 by dwilder

Office Videos | PDF Video

The template and dictionary matching capabilities of SimpleIndex‘s OCR function can be used to extract index information from the text of existing MS Office and PDF files, or any file with an accompanying TXT file. SimpleIndex^® will search the document for matches on unique patterns and value lists, then index the document with the matching data. Zone coordinates can be set to limit the search area to pre-defined regions on standard forms. The result is a fully automated indexing and renaming process for all your electronic documents!

Using existing text, SimpleIndex can index and rename hundreds of files each minute and achieve perfect accuracy. These files can then be quickly searched with SimpleIndex Retrieval, SharePoint and Google search engines, or uploaded into your company’s document/content management system or custom business applications.

Enhanced Text Parsing & PDF Support

MS Office and PDF text parsing features are now included in the Basic version of SimpleIndex, making it much more affordable to enable automatic document sorting on the desktop. Additional Office and PDF features include:

Convert any MS Office, HTML, XML and image files to PDF before processing
Read and write password protected PDF file
Searchable PDF output (Image + Hidden Text)
Interactive template builder and tester
Easily select PDF or PDF/A output format
Native PDF viewer and auto-repair of problematic PDFs
Read data from PDF forms
Populate blank PDF forms with index data

Batch Convert Office Documents to PDF

If you have Microsoft Office or OpenOffice installed, you can use SimpleIndex to automatically convert MS Office documents to PDF files for archival. PDF files are better for archival than editable formats like Word and Excel. They can be annotated, encrypted, searched and viewed with free PDF readers.

There are many free applications that let you convert documents to PDF one at a time. SimpleIndex lets you convert thousands of files at once while it also extracts data from the text for indexing or data entry automation. This feature is ideal for migrating or archiving Office documents to SharePoint, document management systems and custom web applications.

Quickly Organize Any File on Your Computer

SimpleIndex lets you process any type of file on your computer. If an OLE-enabled viewer is installed, SimpleIndex will display the document on the screen. Other documents can be opened automatically in their default application when they are indexed. Quickly type index field data that can be used to reorganize the files into subfolders and structured filenames for browsing and searching on your network, or uploaded to your document/content management system or custom business application.

If the file has an accompanying text file (*.TXT) with the same name, the text in that file can be used for index field extraction, fully automating the process.

Viewing & Indexing MS Office Documents

SimpleIndex features full support for viewing and editing MS Office documents (Word, PowerPoint and Excel) on computers with or without MS Office installed. The full application interface is displayed within the SimpleIndex viewer, letting users view the full content of the documents, edit them with all the features of MS Office and save the changes. Modify privileges can be denied using Windows file security or by the SimpleIndex administration wizard to keep out unauthorized changes.

If MS Office is not installed, SimpleIndex can open and display them in the built-in viewer in read-only mode.

Learn More:

KB Articles for MS Office & PDF Text Parsing

Automatic Data Capture, File Indexing, Microsoft Word Data Extraction, MS Office, Office PDF Document Indexing, Office PDF Text Processing, Office to PDF, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, Paperless Office, PDF, PDF Archive Scanning Software, PDF Barcode Recognition, PDF Data Extraction Software, PDF Forms, Self-hosted OCR, Subscription free OCR, Sunshine Software OCR, Text Processing, Unattended Processing

No Comments