MS Office & PDF Text Parsing
Office Videos | PDF Video
The template and dictionary matching capabilities of SimpleIndex‘s OCR function can be used to extract index information from the text of existing MS Office and pdf files, or any file with an accompanying TXT file. SimpleIndex® will search the document for matches on unique patterns and value lists, then index the document with the matching data. Zone coordinates can be set to limit the search area to pre-defined regions on standard forms. The result is a fully automated indexing and renaming process for all your electronic documents!
Using existing text, SimpleIndex can index and rename hundreds of files each minute and achieve perfect accuracy. These files can then be quickly searched with SimpleIndex Retrieval, SharePoint and Google search engines, or uploaded into your company's document/content management system or custom business applications.
Enhanced Text Parsing & PDF Support
MS Office and PDF text parsing features are now included in the Basic version of SimpleIndex, making it much more affordable to enable automatic document sorting on the desktop. Additional Office and PDF features include:
- Convert any MS Office, HTML, xml and image files to PDF before processing
- Read and write password protected PDF file
- Searchable PDF output (Image + Hidden Text)
- Interactive template builder and tester
- Easily select PDF or PDF/A output format
- Native PDF viewer and auto-repair of problematic PDFs
- Read data from pdf forms
- Populate blank PDF forms with index data
Batch Convert Office Documents to PDF
If you have Microsoft Office or OpenOffice installed, you can use SimpleIndex to automatically convert MS Office documents to PDF files for archival. PDF files are better for archival than editable formats like Word and Excel. They can be annotated, encrypted, searched and viewed with free PDF readers.
There are many free applications that let you convert documents to PDF one at a time. SimpleIndex lets you convert thousands of files at once while it also extracts data from the text for indexing or data entry automation. This feature is ideal for migrating or archiving Office documents to SharePoint, document management systems and custom web applications.
Quickly Organize Any File on Your Computer
SimpleIndex lets you process any type of file on your computer. If an OLE-enabled viewer is installed, SimpleIndex will display the document on the screen. Other documents can be opened automatically in their default application when they are indexed. Quickly type index field data that can be used to reorganize the files into subfolders and structured filenames for browsing and searching on your network, or uploaded to your document/content management system or custom business application.
If the file has an accompanying text file (*.TXT) with the same name, the text in that file can be used for index field extraction, fully automating the process.
Viewing & Indexing MS Office Documents
SimpleIndex features full support for viewing and editing MS Office documents (Word, PowerPoint and Excel) on computers with or without MS Office installed. The full application interface is displayed within the SimpleIndex viewer, letting users view the full content of the documents, edit them with all the features of MS Office and save the changes. Modify privileges can be denied using Windows file security or by the SimpleIndex administration wizard to keep out unauthorized changes.
If MS Office is not installed, SimpleIndex can open and display them in the built-in viewer in read-only mode.
KB Articles for MS Office & PDF Text Parsing
- Change the Dictionary Separator Value
- Regular Expression (RegEx) - Syntax or Type
- Check and Repair All PDF Files
- Keep Pages in Original Order when Bookmarking
- Do Not Combine Pages to 1 Bookmark
- Can I split a PDF based on bookmark values?
- Is it possible to search for and retrieve documents with Windows desktop search?
- Can SimpleIndex read bar codes from existing PDF files?
- Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
- How do you configure OCR to read index information from MS Office or PDF documents?