Export Pages - SimpleIndex

Home
Simple Software Knowledge Base - Article

Image and PDF files and metadata export to CSV, XML, database or document management system.

Exclude Index Field from Index Log

Tuesday, 29 December 2020 by Alex Stewart

Please refer to the Wiki Documentation for the complete Index & Batch Logging reference.

Many times when outputting a Log file via CSV, XML, TXT, etc. there will be index fields that are required in the Job Configuration, but not desired to be output in the Index Log. In those cases those fields can be excluded from the Index Log with a “~” character at the end of the Index Field Name.

To do this go into the Job Options/Job Settings Wizard, go to the Index tab/step, find the Index field that you want to exclude from the Index Log and add this to the end of the field name: ~

EX. The original Index Name is “OCR Text” and that field should be excluded from the Index Log, so it doesn’t appear. This field should be changed to “OCR Text~”.

Automatic Indexing Software File Indexing Full Text Indexing Office PDF Document Indexing Scanned Document Indexing

Connect SimpleIndex to FTP Site

Monday, 03 August 2020 by Alex Stewart

Please refer to the Wiki Documentation for the complete Distributed Capture reference.

SimpleIndex can import or export files that need to be or have been processed from an FTP site, but it requires that the FTP site be configured as a Windows Drive Letter.

Configuring an FTP site as a Windows Drive Letter isn’t included in Windows, so third party software is required for this. We recommend SSHFS-Win Manager for this, but any tool that will accomplish this will work.

Export Issues and Missing Images after Export

Tuesday, 16 June 2020 by Alex Stewart

Please refer to the Wiki Documentation for the complete Export Settings reference.

If you are having issues with the files not exporting properly or you have missing images in the export folder that should have been saved, then a registry key needs to be added to correct this. This registry key changes the export process from the faster process that SimpleIndex uses by default, to a slower export process that will avoid these issues.

Instructions:
1. Search for “regedit” on your computer.
2. Navigate to this folder in the Registry Editor window: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
3. In the right pane of the Registry Editor window Right Click and select New>String Value
4. Set the name of the file to this: EnableAtalaExport
5. Double click on the “EnableAtalaExport” registry key, set the Value to “0” (Zero) and click OK.

Automatic Indexing Software File Indexing Image Scanning Scanned Document Indexing

InstaDocs for SAGE Integration

Wednesday, 11 September 2019 by Alex Stewart

If you have the InstaDocs add-on for SAGE that allows you to search for and retrieve documents in the SAGE system, then it is possible to have SimpleIndex directly output to the InstaDocs folder system. If this is done when SimpleIndex outputs the images they will show up as searchable files immediately from InstaDocs.

To do this you first set-up the Output folder in SimpleIndex to save to the same folder that InstaDocs is set to for image storage. Then for each sub-folder in the main InstaDocs image storage folder make and index field in SimpleIndex and make sure that the SimpleIndex fields are in the same order as the folder levels. Then check the Folder check box for each of those index fields in SimpleIndex.

Disable StopFile

Monday, 29 July 2019 by Simple Software

Please refer to the Wiki Documentation for the complete File Input Settings reference.

If you have many different users running SimpleIndex on different computers saving to the same Output folder, the STOPFILE from one can cause another user to not be able to export to the folder while the original user is outputting to the folder. In this case the STOPFILE can be disabled, so this no longer happens.

This will not work if you have anyone that needs to add files to an existing file in the output folder, so make sure that won’t happen.

Instructions:

Close SimpleIndex entirely
Open the Windows Registry by going to the Windows Search and searching for “RegEdit”
Go to this location in the Registry Folder Tree: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc
In the right section of the Registry window Right Click in the white space and select New>String Value
Name the new key “StopFile”

Keep Pages in Original Order when Bookmarking

Monday, 29 July 2019 by Simple Software

If you want to keep all the pages in the same order that they were imported, even though they all go with different bookmarks then do the following.

1. Open the configuration in Notepad.
2. Search for <BOOKMARK_PAGE_ORDER>
3. Change this line from “false” to “true”: <BOOKMARK_PAGE_ORDER>true</BOOKMARK_PAGE_ORDER>
4. Save and close.

Office PDF Document Indexing Office PDF Text Processing Office to PDF PDF PDF Archive Scanning Software PDF Barcode Recognition PDF Bookmarking PDF Data Extraction Software PDF Forms Unattended Processing

Do Not Combine Pages to 1 Bookmark

Monday, 29 July 2019 by Simple Software

Please refer to the Wiki Documentation for the complete PDF Bookmarking reference.

If you want to keep pages in bookmarks separate instead of combining them into a single bookmark when the same bookmark value is found in several interspersed images in the batch do the following:

1. Open the Job Configuration file in Notepad.
2. Search for this value: <BOOKMARK_PDF1>
3. Enter this directly above the line that has <BOOKMARK_PDF1> if its not already there: <BOOKMARK_UNIQUE_LEVELS>-1</BOOKMARK_UNIQUE_LEVELS>
4. -1 is the default value and that means that no pages should be combined into one bookmark unless they fall in order. 0 means that the first bookmark level should be combined into one bookmark value and the rest should not. 1 means that the first and second bookmark levels should be combined and the rest should not be. ETC.

PDF PDF Archive Scanning Software PDF Bookmarking

MISSING file move with multiple files

Monday, 29 July 2019 by Simple Software

Please refer to the Wiki Documentation for the complete File Output Settings reference.

If you want to move the MISSING files from the Output folder to another folder and create multiple MISSING files then use this script for the .bat file:

ren “C:\Users\dgraves.META\Desktop\Folder1\Missing.pdf” Missing-%date:~10,4%%date:~7,2%%date:~4,2%_%time:~0,2%%time:~3,2%.PDF

Move “C:\Users\dgraves.META\Desktop\Folder1\Missing*.pdf” “C:\Users\dgraves.META\Desktop\Folder2”

This will rename the file to MISSING-DATE_TIME and then move it to another folder.

Is it possible to search for and retrieve documents with Windows desktop search?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Searchable PDF reference.

Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.

Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.

IFilters allow Windows Search to search within file contents.

Here are three popular PDF IFilters that will enable text searching for PDF files:

Foxit PDF IFilter (commercial)
TET PDF IFilter (free/commercial)
Adobe PDF IFilter (32-bit / 64-bit) (free)

If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:

https://fixedit.itxpress.biz/2018/07/05/searching-pdfs-in-windows-10/

Contentverse Document Management Software Document Retrieval File Indexing Microsoft Word Data Extraction Office PDF Document Indexing Office PDF Text Processing Paperless Office PaperVision PDF Archive Scanning Software QuickBooks Document Management Search Server OCR Text Processing Unattended Processing

Published in Database & Retrieval, Export, Office PDF Text Processing

How do I use the Media Wizard to create searchable DVDs or thumb drives?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Send Menu reference.

To enable the Media Wizard, you must first create a job configuration that exports index information to an Access database. Once you have scanned all the documents for the CD or DVD and attached them to the database, create a second job that uses “Retrieve and View Records” to search and view these files.

The media wizard will be enabled in the Send menu whenever you have this configuration file open. The sample configurations included with SimpleIndex demonstrate scanning and searching with an Access database. Microsoft Access is not required to create the database.

The media wizard will copy the Access database and all of the files in your Output folder to a temporary folder, along with the SimpleSearch configuration and Autorun files needed to search the files from a CD or DVD. Simply burn all the files in this folder to create the searchable disc.

Database Document Managment Document Retrieval Export

Published in Database & Retrieval, Export

How do I export index data to a database?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Export reference.

There are a variety of ways to connect to your database. Detailed instructions are provided in the Manual (check the Help menu). Here is a brief overview of the steps involved:

-Create a job configuration to scan and index files
-On the database tab, set the “Database Mode” to “Insert New Records”
-To use ODBC, enter the data source name or file in Data Source
-To connect directly, select your database type under “Select a Data Source” and click Start. A series of dialogs will prompt you for database connection information.
-Select destination Table or View and click Reload
-For each index field, select the corresponding database field that will receive that field value
-The “Output File Field” will receive the path to the exported file

Once you have created records in your database in “Insert” mode, you can change to “Retrieve and View Records” and use SimpleIndex or SimpleSearch to search and view the files.

CSV Database Document Capture Solution Document Retrieval MS Access MySQL ODBC Oracle Server OCR SharePoint Scanning SQL Server Workflow Workflow Software XML XSLT

Published in Database & Retrieval, Export

SimpleIndex creates output files with upper case file extensions but we use an UNIX-based fileserver which requires lower case file extensions. How can i change the output file extension from upper case to lower case (e.g. from .PDF to .pdf)?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Advanced Settings reference.

In SimpleIndex 6, file extensions were changed to default to lower case so this should no longer be an issue. If you want to default back to upper case file extensions, you must edit the registry.

Go to HKEY_LOCAL_MACHINE\Software\SimpleIndex\Misc

Create a String value called “UpperCaseFileExtension” and set to 1 for upper case or 0 for lower case.

This registry setting will also work to change the default behavior of version 5.

Published in Export

Is it possible to have the scanned image itself added to a database and not just the image path?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Database reference.

Yes. Image files can be inserted into binary fields in Access, SQL Server, Oracle, MySQL and other databases.

Check the “Store files as binary objects” option on the Database tab and the “Output File Field” setting can be mapped to a binary field.

If using PDF, MS Office or other non-image files, use the File Type Field to store the file extension of the stored file.

SimpleSearch mode will let you view files stored using this method as well.

Database Document Retrieval MS Access MySQL ODBC Oracle SharePoint Scanning SQL Server Workflow Software

Published in Database & Retrieval, Export

When I use an Autonumber with single-page files, only the page number is shown in the filename and not the Autonumber.

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Autonumber reference.

This happens when the autonumber is in the same format as file page numbers. By default, page numbers have 4 digits–0001, 0002, etc. If you need to use an autonumber with 4 digits, you should set the FILE_NUMBER_LENGTH setting in the INI file to 5 (accessed from the Advanced tab).

Published in Export

Can the original image filename be used as part of the output filename?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete File Output Settings reference.

The input filename can be specified automatically by configuring a field of type “Filename”. The input file path may also be parsed by the SimpleIndex dictionary and template matching algorithms to extract data fields from the folder and file names.

Published in Export, Import, Indexing & UI

What is “Match & Attach” mode?

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Database Mode reference.

Match & Attach mode lets you batch update multiple records in a database using the index data from your SimpleIndex job. For example, if you have a large backfile of documents that you want to scan and link to records in an existing database, you can use Match & Attach to find the corresponding record and set the Image Path field to the newly scanned file.

This allows documents to be indexed with a variety of information and then have it find a particular record based on up to three different key indexes in a data source. It can then fill in additional data columns with indexed information along with the full text information, page count, batch ID and image path.

The Match & Attach uses the key field in the “Autofill Settings…” in the Indexing & File Naming step of the Job Settings Wizard in the File menu. Then fills the data into any blank columns for the record in the database and also changes any fields that are different.

Related Links

SimpleIndex Wiki – Match and Attach Records

Database Database Autofill Document Automation Document Retrieval MS Access MySQL ODBC Oracle Server OCR SQL Server Workflow

Published in Database & Retrieval, Export

Why does the word “MISSING” show up in filenames and index fields when the field is blank?

Wednesday, 28 February 2018 by dwilder

“MISSING” is what SimpleIndex puts for any field value used as a filename or folder name and is left blank by default.

You can change this to whatever you want it to say when a field value is left blank. To do this go to the Job Settings wizard under the File menu, go to the Advanced Settings step and expand Advanced Indexing Options. The value is set to DEFAULT, which puts the word “MISSING” when the index field is blank. Any text can be put in this field to use a different value than “MISSING”.

Related Links

SimpleIndex Wiki – Advanced Settings

Automatic Indexing Software File Indexing Keyword Indexing Metadata Office PDF Document Indexing PDF Data Extraction Software Scanned Document Indexing

Published in Export, Indexing & UI

How can I configure SimpleIndex to perform bates stamping or page numbering for my images?

Wednesday, 28 February 2018 by dwilder

This is all done through the electronic imprinting features, which puts the desired information electronically on the output images that are saved in your output folder. This is all done in SimpleIndex by clicking going to the File menu, selecting Job Settings Wizard and then going to the Imprinting step.

To implement bates stamping or page numbering click the ‘Enable Imprinting’ check box and also the ‘Imprint page numbers’ check box. This is the most basic method, but there are also features which allow you to manipulate what this information is, what it looks like and where on the page it will go.

The ‘Font Size’ field allows you to choose what point font you would like the imprinted value to be.

The ‘Page # Length’ lets you determine how many digits you would like the page number or bates number to be. It will add leading zeros to the page number based on the number you enter into this field. This is used to keep images with page numbers in the proper order when saving them. EX. If you put 4 into the ‘Page # Length’ field the number will read starting at 0001 and will count up from there always keeping the page number 4 digits long.

If you would like leading characters on the front of the page number you can add these to the ‘Imprint Text or Image’ field and they will appear in front of the page number. These pages will appear as (leading characters) – (page number). If you would like them to appear directly next to each other you would remove everything from the ‘Separator’ field, because this field is what is inserted between the imprint values. EX. You want the page numbers to read PMB#####. You would put PMB in the ‘Imprint Text’ field, remove everything from the ‘Imprint Separator’ field and put 5 in the ‘Page # Length’ field.

You will then decide where you want this information to appear on the image. If you want the page number on the top of the image do not check the box marked ‘Measure X,Y from bottom-right’ if you want page number on the bottom of the image check this box. Next, set up the X and Y coordinates to have the imprinted information located in the section of the image that you would like it. The X coordinate measures from top to bottom (bottom to top when ‘Measure X,Y from bottom-right’ is checked) and the Y coordinate measures from left to right (right to left when ‘Measure X,Y from bottom-right’ is checked). The unit of measurement of the X and Y coordinates are pixels. The number of pixels per page change based on the resolution or dpi (dots per inch) that the image was scanned at. So if you are scanning at 200 dpi 1 inch = 200, but at 300 dpi 1 inch = 300 and so on. EX. You have a 300 dpi, 8.5×11″ image that you want to imprint page numbers on the bottom left of the image an inch and a half from the the left and bottom of the page. You would want to have ‘Measure X,Y from bottom-right’ checked, 1950 in the ‘X coordinate’ field (6.5″ from right x 300 dpi) and 450 in the ‘Y coordinate’ field (1.5″ from bottom x 300 dpi).

Related Links

Batch Scanning Bates Numbering Software Document Numbering System TIFF PDF Annotations Watermark

Published in Export, Imprinting & Watermarking

Can SimpleIndex create searchable PDF Image+Text files with hidden text?

Wednesday, 28 February 2018 by dwilder

Yes, it can. You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”. There are many settings in the OCR step that you can used to customize the output and recognition of images.

SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.

Related Links

Full Text Indexing OCR OCR Form Processing OCR Scanning Office PDF Text Processing PDF Data Extraction Software Text Processing Unattended Processing Zone OCR

Published in Export, OCR, Office PDF Text Processing

How do I configure the output folder and file naming scheme?

Wednesday, 28 February 2018 by dwilder

Use the Folder and Filename check boxes on the Indexing & File Naming step in the Job Settings Wizard to indicate whether field values will be used to generate subfolders or filenames. Any field with the Folder option checked will create nested subfolders for each value in the order the fields are listed. Any field with the Filename checked will have the values concatenated to form the filename.

For example, if Field 1 and Field 3 have the Folder option checked, and Field 2 and Field 3 have the Filename option checked, image filenames will be created in the format:

%OUTPUTFOLDER%\Field 1\Field 3\Field 2 – Field 3.tif

The Filename Separator option on the Advanced tab lets you change the ” – ” between the fields in the filename to anything you want.

Related Pages

Automatic Data Capture Automatic Indexing Software File Indexing Full Text Indexing Keyword Indexing Metadata Microsoft Word Data Extraction Office PDF Document Indexing PDF Data Extraction Software Scanned Document Indexing

Published in Export

TOP

});