Automatic archival of Microsoft Office documents to PDF via batch conversion, indexing and document management workflow.
If you have Microsoft Office or OpenOffice installed, you can use SimpleIndex to automatically convert MS Office documents to PDF files for archival. PDF files are better for archival than editable formats like Word and Excel. They can be annotated, encrypted, searched and viewed with free PDF readers.
There are many free applications that let you convert documents to PDF one at a time. SimpleIndex lets you convert thousands of files at once while it also extracts data from the text for indexing or data entry automation. This feature is ideal for migrating or archiving Office documents to SharePoint, document management systems and custom web applications.
How do you select what types of files to process?
Please refer to the Wiki Documentation for the complete Batch Processing Stages reference.
You can tell SimpleIndex what types of files it should process and which file types to ignore.
This is done by clicking “Job Options” On the “Batch” tab you will find a field labeled “Input file types or mask”. These are the file types that SimpleIndex will input files from. The default types are:
TIF,PDF,JPG,GIF,BMP,DOC,XLS,PPT,DOCX,XLSX,PPTX,VSD,DWG,AVI,MP3
To process all files, enter *
SimpleIndex will ignore any file whose extension does not appear on the list.
In SimpleIndex 6 or above you can enter file masks to filter input files. Some examples are:
abc*.pdf (PDF files starting with “abc”)
ab??ef.* (All files starting with “ab”, 2 characters and “ef”)
It is possible to have some file types open automatically in their default application. This can be done by inserting a pipe “|” into the list. Any file types after the pipe will be opened in their default application. For example:
TIF,PDF,JPG|WAV,MP3,WMV,AVI
will cause SimpleIndex to display image files and open sound and video files in the default media player.
- Published in Import
Organize Office Documents with Text Parsing
This video shows the Sort My Documents sample job included with the SimpleIndex trial download. It shows how you can organize office documents automatically by parsing the file’s text for relevant metadata and keywords. You can then use those keywords to tag documents with metadata and create standardized folders and filenames.
First we sort Word documents, Excel spreadsheets and PowerPoint presentations automatically using the SimpleIndex template and dictionary matching algorithms that match patterns and keywords in the parsed text.
Then the files are organized into folders and filenames using the Sales Rep, Customer, Document Type and Date values extracted from the text.
Organize Office Documents for Cloud Storage
You can also upload organized files to SharePoint or Cloud Storage platforms without the chaos and disorganization you inevitably get when users create their own folders and filenames.
Organize Office Documents for Document Management
In the video, we use SimpleSearch to search and view the sorted files. But you can just as easily use any third party document management system or custom database to perform keyword or full-text searching.
You can use the SimpleView embedded viewer to view Office documents, PDF files and images in a common interface. In the video we use the full version of Word, Excel, and PowerPoint to edit Office documents right from the search screen.
Find Out More
- Download or get an Online Demo
- MS Office Text Processing Features in SimpleIndex
- MS Office Features and Settings Wiki Pages
- OCR Features and Settings Wiki Pages
- OCR Software Guide on SimpleOCR
Learn More:








