Image and PDF files and metadata export to CSV, XML, database or document management system.
Many times when outputting a Log file via CSV, XML, TXT, etc. there will be index fields that are required in the Job Configuration, but not desired to be output in the Index Log. In those cases those fields can be excluded from the Index Log with a “~” character at the end of the Index Field Name.
To do this go into the Job Options/Job Settings Wizard, go to the Index tab/step, find the Index field that you want to exclude from the Index Log and add this to the end of the field name: ~
EX. The original Index Name is “OCR Text” and that field should be excluded from the Index Log, so it doesn’t appear. This field should be changed to “OCR Text~”.
SimpleIndex can import or export files that need to be or have been processed from an FTP site, but it requires that the FTP site be configured as a Windows Drive Letter.
Configuring an FTP site as a Windows Drive Letter isn’t included in Windows, so third party software is required for this. We recommend SSHFS-Win Manager for this, but any tool that will accomplish this will work.
If you are having issues with the files not exporting properly or you have missing images in the export folder that should have been saved, then a registry key needs to be added to correct this. This registry key changes the export process from the faster process that SimpleIndex uses by default, to a slower export process that will avoid these issues.
1. Search for “regedit” on your computer.
2. Navigate to this folder in the Registry Editor window: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
3. In the right pane of the Registry Editor window Right Click and select New>String Value
4. Set the name of the file to this: EnableAtalaExport
5. Double click on the “EnableAtalaExport” registry key, set the Value to “0” (Zero) and click OK.
If you have the InstaDocs add-on for SAGE that allows you to search for and retrieve documents in the SAGE system, then it is possible to have SimpleIndex directly output to the InstaDocs folder system. If this is done when SimpleIndex outputs the images they will show up as searchable files immediately from InstaDocs.
To do this you first set-up the Output folder in SimpleIndex to save to the same folder that InstaDocs is set to for image storage. Then for each sub-folder in the main InstaDocs image storage folder make and index field in SimpleIndex and make sure that the SimpleIndex fields are in the same order as the folder levels. Then check the Folder check box for each of those index fields in SimpleIndex.
If you have many different users running SimpleIndex on different computers saving to the same Output folder, the STOPFILE from one can cause another user to not be able to export to the folder while the original user is outputting to the folder. In this case the STOPFILE can be disabled, so this no longer happens.
This will not work if you have anyone that needs to add files to an existing file in the output folder, so make sure that won’t happen.
- Close SimpleIndex entirely
- Open the Windows Registry by going to the Windows Search and searching for “RegEdit”
- Go to this location in the Registry Folder Tree: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
- In the right section of the Registry window Right Click in the white space and select New>String Value
- Name the new key “StopFile”
If you want to keep all the pages in the same order that they were imported, even though they all go with different bookmarks then do the following.
1. Open the configuration in Notepad.
2. Search for <BOOKMARK_PAGE_ORDER>
3. Change this line from “false” to “true”: <BOOKMARK_PAGE_ORDER>true</BOOKMARK_PAGE_ORDER>
4. Save and close.
If you want to keep pages in bookmarks separate instead of combining them into a single bookmark when the same bookmark value is found in several interspersed images in the batch do the following:
1. Open the Job Configuration file in Notepad.
2. Search for this value: <BOOKMARK_PDF1>
3. Enter this directly above the line that has <BOOKMARK_PDF1> if its not already there: <BOOKMARK_UNIQUE_LEVELS>-1</BOOKMARK_UNIQUE_LEVELS>
4. -1 is the default value and that means that no pages should be combined into one bookmark unless they fall in order. 0 means that the first bookmark level should be combined into one bookmark value and the rest should not. 1 means that the first and second bookmark levels should be combined and the rest should not be. ETC.
If you want to move the MISSING files from the Output folder to another folder and create multiple MISSING files then use this script for the .bat file:
ren “C:\Users\dgraves.META\Desktop\Folder1\Missing.pdf” Missing-%date:~10,4%%date:~7,2%%date:~4,2%_%time:~0,2%%time:~3,2%.PDF
Move “C:\Users\dgraves.META\Desktop\Folder1\Missing*.pdf” “C:\Users\dgraves.META\Desktop\Folder2”
This will rename the file to MISSING-DATE_TIME and then move it to another folder.
Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.
Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.
IFilters allow Windows Search to search within file contents.
Here are three popular PDF IFilters that will enable text searching for PDF files:
- Foxit PDF IFilter (commercial)
- TET PDF IFilter (free/commercial)
- Adobe PDF IFilter (32-bit / 64-bit) (free)
If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:
To enable the Media Wizard, you must first create a job configuration that exports index information to an Access database. Once you have scanned all the documents for the CD or DVD and attached them to the database, create a second job that uses “Retrieve and View Records” to search and view these files.
The media wizard will be enabled in the Send menu whenever you have this configuration file open. The sample configurations included with SimpleIndex demonstrate scanning and searching with an Access database. Microsoft Access is not required to create the database.
The media wizard will copy the Access database and all of the files in your Output folder to a temporary folder, along with the SimpleSearch configuration and Autorun files needed to search the files from a CD or DVD. Simply burn all the files in this folder to create the searchable disc.
There are a variety of ways to connect to your database. Detailed instructions are provided in the Manual (check the Help menu). Here is a brief overview of the steps involved:
-Create a job configuration to scan and index files
-On the database tab, set the “Database Mode” to “Insert New Records”
-To use ODBC, enter the data source name or file in Data Source
-To connect directly, select your database type under “Select a Data Source” and click Start. A series of dialogs will prompt you for database connection information.
-Select destination Table or View and click Reload
-For each index field, select the corresponding database field that will receive that field value
-The “Output File Field” will receive the path to the exported file
Once you have created records in your database in “Insert” mode, you can change to “Retrieve and View Records” and use SimpleIndex or SimpleSearch to search and view the files.
SimpleIndex creates output files with upper case file extensions but we use an UNIX-based fileserver which requires lower case file extensions. How can i change the output file extension from upper case to lower case (e.g. from .PDF to .pdf)?
In SimpleIndex 6, file extensions were changed to default to lower case so this should no longer be an issue. If you want to default back to upper case file extensions, you must edit the registry.
Go to HKEY_LOCAL_MACHINE\Software\SimpleIndex\Misc
Create a String value called “UpperCaseFileExtension” and set to 1 for upper case or 0 for lower case.
This registry setting will also work to change the default behavior of version 5.
Yes. Image files can be inserted into binary fields in Access, SQL Server, Oracle, MySQL and other databases.
Check the “Store files as binary objects” option on the Database tab and the “Output File Field” setting can be mapped to a binary field.
If using PDF, MS Office or other non-image files, use the File Type Field to store the file extension of the stored file.
SimpleSearch mode will let you view files stored using this method as well.
When I use an Autonumber with single-page files, only the page number is shown in the filename and not the Autonumber.
This happens when the autonumber is in the same format as file page numbers. By default, page numbers have 4 digits–0001, 0002, etc. If you need to use an autonumber with 4 digits, you should set the FILE_NUMBER_LENGTH setting in the INI file to 5 (accessed from the Advanced tab).
The input filename can be specified automatically by configuring a field of type “Filename”. The input file path may also be parsed by the SimpleIndex dictionary and template matching algorithms to extract data fields from the folder and file names.
Match & Attach mode lets you batch update multiple records in a database using the index data from your SimpleIndex job. For example, if you have a large backfile of documents that you want to scan and link to records in an existing database, you can use Match & Attach to find the corresponding record and set the Image Path field to the newly scanned file.
This allows documents to be indexed with a variety of information and then have it find a particular record based on up to three different key indexes in a data source. It can then fill in additional data columns with indexed information along with the full text information, page count, batch ID and image path.
The Match & Attach uses the key field in the “Autofill Settings…” in the Indexing & File Naming step of the Job Settings Wizard in the File menu. Then fills the data into any blank columns for the record in the database and also changes any fields that are different.
This is all done through the electronic imprinting features, which puts the desired information electronically on the output images that are saved in your output folder. This is all done in SimpleIndex by clicking going to the File menu, selecting Job Settings Wizard and then going to the Imprinting step.
To implement bates stamping or page numbering click the ‘Enable Imprinting’ check box and also the ‘Imprint page numbers’ check box. This is the most basic method, but there are also features which allow you to manipulate what this information is, what it looks like and where on the page it will go.
The ‘Font Size’ field allows you to choose what point font you would like the imprinted value to be.
The ‘Page # Length’ lets you determine how many digits you would like the page number or bates number to be. It will add leading zeros to the page number based on the number you enter into this field. This is used to keep images with page numbers in the proper order when saving them. EX. If you put 4 into the ‘Page # Length’ field the number will read starting at 0001 and will count up from there always keeping the page number 4 digits long.
If you would like leading characters on the front of the page number you can add these to the ‘Imprint Text or Image’ field and they will appear in front of the page number. These pages will appear as (leading characters) – (page number). If you would like them to appear directly next to each other you would remove everything from the ‘Separator’ field, because this field is what is inserted between the imprint values. EX. You want the page numbers to read PMB#####. You would put PMB in the ‘Imprint Text’ field, remove everything from the ‘Imprint Separator’ field and put 5 in the ‘Page # Length’ field.
You will then decide where you want this information to appear on the image. If you want the page number on the top of the image do not check the box marked ‘Measure X,Y from bottom-right’ if you want page number on the bottom of the image check this box. Next, set up the X and Y coordinates to have the imprinted information located in the section of the image that you would like it. The X coordinate measures from top to bottom (bottom to top when ‘Measure X,Y from bottom-right’ is checked) and the Y coordinate measures from left to right (right to left when ‘Measure X,Y from bottom-right’ is checked). The unit of measurement of the X and Y coordinates are pixels. The number of pixels per page change based on the resolution or dpi (dots per inch) that the image was scanned at. So if you are scanning at 200 dpi 1 inch = 200, but at 300 dpi 1 inch = 300 and so on. EX. You have a 300 dpi, 8.5×11″ image that you want to imprint page numbers on the bottom left of the image an inch and a half from the the left and bottom of the page. You would want to have ‘Measure X,Y from bottom-right’ checked, 1950 in the ‘X coordinate’ field (6.5″ from right x 300 dpi) and 450 in the ‘Y coordinate’ field (1.5″ from bottom x 300 dpi).
Yes, it can. You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”. There are many settings in the OCR step that you can used to customize the output and recognition of images.
SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.