Image and PDF files and metadata export to CSV, XML, database or document management system.
Many times when outputting a Log file via CSV, XML, TXT, etc. there will be index fields that are required in the Job Configuration, but not desired to be output in the Index Log. In those cases those fields can be excluded from the Index Log with a “~” character at the end of the Index Field Name.
To do this go into the Job Options/Job Settings Wizard, go to the Index tab/step, find the Index field that you want to exclude from the Index Log and add this to the end of the field name: ~
EX. The original Index Name is “OCR Text” and that field should be excluded from the Index Log, so it doesn’t appear. This field should be changed to “OCR Text~”.
SimpleIndex can import or export files that need to be or have been processed from an FTP site. To configure this you need to connect up the FTP site to Windows through Windows File Explorer.
Instructions below based on this FTP site information:
Site – testftp.simpleindex.com
User Name – SIftp
Password – 1234
- Open Windows File Explorer in Windows.
- Enter this string based on the test FTP information above: ftp://SIftp@testftp.simpleindex.com
- Once you enter this address into Windows File Explorer a window will open asking for the Password, which based on our test FTP information would this: 1234
- Save the password
- Set the Input or Output or any other path in SimpleIndex to the following: ftp://SIftp@testftp.simpleindex.com
- If the folder required is a sub folder on the ftp site, for example ‘Output’ this would be the path: ftp://SIftp@testftp.simpleindex.com/Output/
If you are having issues with the files not exporting properly or you have missing images in the export folder that should have been saved, then a registry key needs to be added to correct this. This registry key changes the export process from the faster process that SimpleIndex uses by default, to a slower export process that will avoid these issues.
1. Search for “regedit” on your computer.
2. Navigate to this folder in the Registry Editor window: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
3. In the right pane of the Registry Editor window Right Click and select New>String Value
4. Set the name of the file to this: EnableAtalaExport
5. Double click on the “EnableAtalaExport” registry key, set the Value to “0” (Zero) and click OK.
If you have many different users running SimpleIndex on different computers saving to the same Output folder, the STOPFILE from one can cause another user to not be able to export to the folder while the original user is outputting to the folder. In this case the STOPFILE can be disabled, so this no longer happens.
This will not work if you have anyone that needs to add files to an existing file in the output folder, so make sure that won’t happen.
- Close SimpleIndex entirely
- Open the Windows Registry by going to the Windows Search and searching for “RegEdit”
- Go to this location in the Registry Folder Tree: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
- In the right section of the Registry window Right Click in the white space and select New>String Value
- Name the new key “StopFile”
If you want to keep all the pages in the same order that they were imported, even though they all go with different bookmarks then do the following.
1. Open the configuration in Notepad.
2. Search for <BOOKMARK_PAGE_ORDER>
3. Change this line from “false” to “true”: <BOOKMARK_PAGE_ORDER>true</BOOKMARK_PAGE_ORDER>
4. Save and close.
If you want to keep pages in bookmarks separate instead of combining them into a single bookmark when the same bookmark value is found in several interspersed images in the batch do the following:
1. Open the Job Configuration file in Notepad.
2. Search for this value: <BOOKMARK_PDF1>
3. Enter this directly above the line that has <BOOKMARK_PDF1> if its not already there: <BOOKMARK_UNIQUE_LEVELS>-1</BOOKMARK_UNIQUE_LEVELS>
4. -1 is the default value and that means that no pages should be combined into one bookmark unless they fall in order. 0 means that the first bookmark level should be combined into one bookmark value and the rest should not. 1 means that the first and second bookmark levels should be combined and the rest should not be. ETC.
If you want to move the MISSING files from the Output folder to another folder and create multiple MISSING files then use this script for the .bat file:
ren “C:\Users\dgraves.META\Desktop\Folder1\Missing.pdf” Missing-%date:~10,4%%date:~7,2%%date:~4,2%_%time:~0,2%%time:~3,2%.PDF
Move “C:\Users\dgraves.META\Desktop\Folder1\Missing*.pdf” “C:\Users\dgraves.META\Desktop\Folder2”
This will rename the file to MISSING-DATE_TIME and then move it to another folder.
Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.
Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.
IFilters allow Windows Search to search within file contents.
Here are three popular PDF IFilters that will enable text searching for PDF files:
- Foxit PDF IFilter (commercial)
- TET PDF IFilter (free/commercial)
- Adobe PDF IFilter (32-bit / 64-bit) (free)
If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:
To enable the Media Wizard, you must first create a job configuration that exports index information to an Access database. Once you have scanned all the documents for the CD or DVD and attached them to the database, create a second job that uses “Retrieve and View Records” to search and view these files.
The media wizard will be enabled in the Send menu whenever you have this configuration file open. The sample configurations included with SimpleIndex demonstrate scanning and searching with an Access database. Microsoft Access is not required to create the database.
The media wizard will copy the Access database and all of the files in your Output folder to a temporary folder, along with the SimpleSearch configuration and Autorun files needed to search the files from a CD or DVD. Simply burn all the files in this folder to create the searchable disc.
There are a variety of ways to connect to your database. Detailed instructions are provided in the Manual (check the Help menu). Here is a brief overview of the steps involved:
-Create a job configuration to scan and index files
-On the database tab, set the “Database Mode” to “Insert New Records”
-To use ODBC, enter the data source name or file in Data Source
-To connect directly, select your database type under “Select a Data Source” and click Start. A series of dialogs will prompt you for database connection information.
-Select destination Table or View and click Reload
-For each index field, select the corresponding database field that will receive that field value
-The “Output File Field” will receive the path to the exported file
Once you have created records in your database in “Insert” mode, you can change to “Retrieve and View Records” and use SimpleIndex or SimpleSearch to search and view the files.
SimpleIndex creates output files with upper case file extensions but we use an UNIX-based fileserver which requires lower case file extensions. How can i change the output file extension from upper case to lower case (e.g. from .PDF to .pdf)?
In SimpleIndex 6, file extensions were changed to default to lower case so this should no longer be an issue. If you want to default back to upper case file extensions, you must edit the registry.
Go to HKEY_LOCAL_MACHINE\Software\SimpleIndex\Misc
Create a String value called “UpperCaseFileExtension” and set to 1 for upper case or 0 for lower case.
This registry setting will also work to change the default behavior of version 5.
Yes. Image files can be inserted into binary fields in Access, SQL Server, Oracle, MySQL and other databases.
Check the “Store files as binary objects” option on the Database tab and the “Output File Field” setting can be mapped to a binary field.
If using PDF, MS Office or other non-image files, use the File Type Field to store the file extension of the stored file.
SimpleSearch mode will let you view files stored using this method as well.
When I use an Autonumber with single-page files, only the page number is shown in the filename and not the Autonumber.
This happens when the autonumber is in the same format as file page numbers. By default, page numbers have 4 digits–0001, 0002, etc. If you need to use an autonumber with 4 digits, you should set the FILE_NUMBER_LENGTH setting in the INI file to 5 (accessed from the Advanced tab).
The input filename can be specified automatically by configuring a field of type “Filename”. The input file path may also be parsed by the SimpleIndex dictionary and template matching algorithms to extract data fields from the folder and file names.
Match & Attach mode lets you batch update multiple records in a database using the index data from your SimpleIndex job. For example, if you have a large backfile of documents that you want to scan and link to records in an existing database, you can use Match & Attach to find the corresponding record and set the Image Path field to the newly scanned file.
For details on how to configure Match & Attach mode, please refer to the manual.
|“MISSING” is what SimpleIndex puts for any field value used as a filename or folder name and is left blank by defualt.|
You can change this to whatever you want it to say when a field value is left blank. To do this go to “Job Options” then to the “Index” Tab now click “Advanced Options”. In the middle of the window you will see a box labeled “Use this value whe a field is empty” just change “DEFAULT” to whatever you want (including leaving it blank) and click OK. Now the next time you have a blank field value for a filename or folder name it will have your new message.
This is all done through the electronic imprinting features, which puts the desired information electronically on the output images that are saved in your output folder. This is all done in SimpleIndex by clicking ‘Options’ then going to the ‘Imprint’ tab.
To implement bates stamping or page numbering click the ‘Enable imprinting’ check box and also the ‘Imprint page numbers’ check box. This is the most basic method, but there are also features which allow you to manipulate what this information is, what it looks like and where on the page it will go.
The ‘Font Size’ field allows you to choose what point font you would like the imprinted value to be.
The ‘Page # Length’ lets you determine how many digits you would like the page number or bates number to be. It will add leading zeros to the page number based on the number you enter into this field. This is used to keep images with page numbers in the proper order when saving them. EX. If you put 4 into the ‘Page # Length’ field the number will read starting at 0001 and will count up from there always keeping the page number 4 digits long.
If you would like leading characters on the front of the page number you can add these to the ‘Imprint Text’ field and they will appear in front of the page number. These pages will appear as (leading characters) – (page number). If you would like them to appear directly next to each other you would remove everything from the ‘Imprint Separator’ field, because this field is what is inserted between the imprint values. EX. You want the page numbers to read PMB#####. You would put PMB in the ‘Imprint Text’ field, remove everything from the ‘Imprint Separator’ field and put 5 in the ‘Page # Length’ field.
You will then decide where you want this information to appear on the image. If you want the page number on the top of the image do not check the box marked ‘Measure X,Y from bottom-right’ if you want page number on the bottom of the image check this box. Next, set up the X and Y coordinates to have the imprinted information located in the section of the image that you would like it. The X coordinate measures from top to bottom (bottom to top when ‘Measure X,Y from bottom-right’ is checked) and the Y coordinate measures from left to right (right to left when ‘Measure X,Y from bottom-right’ is checked). The unit of measurement of the X and Y coordinates are pixils. The number of pixils per page change based on the resolution or dpi (dots per inch) that the image was scanned at. So if you are scanning at 200 dpi 1 inch = 200, but at 300 dpi 1 inch = 300 and so on. EX. You have a 300 dpi, 8.5×11″ image that you want to imprint page numbers on the bottom left of the image an inch and a half from the the left and bottom of the page. You would want to have ‘Measure X,Y from bottom-right’ checked, 1950 in the ‘X coordinate’ field (6.5″ from right x 300 dpi) and 450 in the ‘Y coordinate’ field (1.5″ from bottom x 300 dpi).
If you enable full-page OCR and output to PDF, the full-page OCR text will be inserted as invisible text on each page.
With the addition of the FineReader Engine in version 7, SimpleIndex now creates PDF files with fully searchable text formatted to flow with the image of the document.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
Use the Folder and Filename check boxes on the Index tab in the Job Options to indicate whether field values will be used to generate subfolders or filenames. Any field with the Folder option checked will create nested subfolders for each value in the order the fields are listed. Any field with the Filename checked will have the values concatenated to form the filename.
For example, if Field 1 and Field 3 have the Folder option checked, and Field 2 and Field 3 have the Filename option checked, image filenames will be created in the format:
%OUTPUTFOLDER%\Field 1\Field 3\Field 2 – Field 3.tif
The Filename Separator option on the Advanced tab lets you change the ” – ” between the fields in the filename to anything you want.