Use SimpleIndex for handprint recognition to convert print and cursive handwriting to machine-readable and searchable text.
Amazon Textract OCR and ICR
What is Amazon Textract?
Amazon Textract is a service that automatically detects and extracts text and data from scanned documents. It goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables
Benefits
- Extract data quickly and accurately
Amazon Textract makes it easy to quickly and accurately extract data from documents and forms. Amazon Textract automatically detects a document’s layout and the key elements on the page, understands the data relationships in any embedded forms or tables, and extracts everything with its context intact. This means you can instantly use the extracted data in an application or store it in a database without a lot of complicated code in between - No code or templates to maintain
With Amazon Textract’s pre-trained machine learning models, you don’t need to write code for data extraction. This is because the models have already been trained on tens of millions of documents from many industries—including invoices, receipts, contracts, tax documents, sales orders, enrollment forms, benefit applications, insurance claims, and policy documents. You no longer need to maintain code for every document or form you might receive, or worry about how page layouts change over time. - Easily implement human reviews
With the addition of Amazon Augmented AI you can build-in human reviews to manage nuanced or sensitive workflows that require human judgement to get high confidence predictions or to audit predictions on an on-going basis. - Lower document processing costs
Amazon Textract’s text extraction API enables you to process documents for $1.50 per 1,000 pages. Whether you process a few hundred documents a year or millions, Amazon Textract provides OCR and structured data extraction (forms and tables) at a very low cost, and you only pay for what you use. There are no upfront commitments or long-term contracts.
How Does Amazon Textract work?
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. You can quickly automate document processing and act on the information extracted, whether you’re automating loans processing or extracting information from invoices and receipts. Textract can extract the data in minutes instead of hours or days. Additionally, you can add human reviews with Amazon Augmented AI to provide oversight of your models and check sensitive data., Google needs to be happy with the images you use. There is no need to upset the algorithm so that your page ends up at the bottom of the search results.
Limitations of Textract
While Textract enables a number of great new features, it does have some limitations.
- Only single page TIFF images can be processed with Textract
- Other file types must be converted to single page TIFF prior to processing
- Searchable PDF output is not supported
- Only asynchronous processing is available
- No offline processing – must be connected to the Internet
- AWS usage fees will be incurred for each page processed
Textract Integration with SimpleIndex
The Textract integration feature enables the Amazon AWS Textract OCR engine that has the ability to read unconstrained print and scripted handwriting, with surprisingly good accuracy.
It can be purchased separately or included with SimpleIndex Professional.
Textract is only available as an API, requiring custom programming to make it work. SimpleIndex turns it into a complete document and data capture application designed for easy batch processing on a workstation or server.
Extract text from typed or handwritten documents automatically, even on unconstrained handprint and cursive writing. Automatic extraction of form fields lets you identify key values without templates or training. Accounts payable invoice and receipt processing is also included.
Captured data can be used to organize files into folders for cloud storage apps, save to a csv, XML or JSON file, export to a database, upload to a document management system, perform full-text searching, or even create bookmarks in pdf files.
Connect to Your AWS Account
Using Textract requires an AWS account, which will incur charges for any documents processed using the Textract OCR option.
Follow the directions on the Textract Getting Started Guide to connect SimpleIndex to your Textract account.
In summary the setup process is:
- Create an IAM user for Textract
- Obtain the Access Key and Secret Access Key for the Textract user account
- On the OCR Options tab of the Job Settings Wizard, select AWSForms, AWSText, or AWSInvoice as the OCR Engine
- Click the AWS Creds button to enter your User Access Key and Secret Access Key
To manually create the AWS credentials file under your user profile, follow these steps:
- Create the folder c:\Users\xxx\.aws (replacing xxx with your Windows user name)
- Create a file called config (no file extension) with notepad and enter your region info
- Be sure to use the abbreviated version of the region name (e.g. us-east-1) and not the full name
- Create a file called credentials (no file extension) with notepad and enter your Access and Secret keys
- Copy the .aws folder and config files to the profile directory for any other accounts that will use it, including service accounts
Pricing
SimpleIndex with Amazon Textract has a dual tiered license structure. First, the correct version of SimpleIndex needs to be purchased, which can be found on SimpleIndex.com. Second, a per image cost needs to be paid directly to Amazon. A link to an Amazon AWS account needs to be made to SimpleIndex through the SimpleIndex Job Configuration interface. Once the Amazon AWS Account and SimpleIndex are linked, processing files using the Amazon Textract Cloud OCR Engine in SimpleIndex the images that are processed will be kept count automatically on the Amazon AWS account. Amazon will directly charge this account for the total number of images processed.
These Prices are issued by Amazon based on the US region, and are subject to change.
AWSText (Detect Document Text API) = $0.0015 per page / $1.50 per 1,000 pages
AWSForms (Analyze Document API – Forms) = $0.05 per page / $50.00 per 1,000 pages
AWSInovice (Analyze Expense API) = $0.10 per page / $10.00 per 1,000 pages
Textract Demo Video
See what SimpleIndex with Textract can do and learn the configuration basics with this video.
Learn More:
Automatic Web Image Optimization
There are a number of tools that optimize images for websites. Most of them run on your web server and optimize images after you upload them.
But what if you could take images that are linked to product data in a database and automatically rename them using things like item description and SKU while you optimize them? This would give you additional Search Engine Optimization benefits that you can’t get from standard compression and web optimization tools.
Why is it important to optimize web images?
These days, when most of the things in life are happening online, it is extremely important to make sure that your web presence works well. And images are an invaluable asset to it.
On one side, images need to look pretty to attract audiences and customers, but they also need to be light enough to make sure that your page loads in less than 2 seconds, even on slower cellular networks.
More importantly, Google needs to be happy with the images you use. There is no need to upset the algorithm so that your page ends up at the bottom of the search results.
There are many ways you can optimize your images. Some of them include plugins on your website, but mostly it is manual work with each image individually. SimpleIndex offers you a way to optimize your web images AUTOMATICALLY!
Size Matters
Probably the most obvious and straightforward task is to make images lighter without losing quality. There are many online services or graphic design programs that will work for that. However, many of them are slowly processing images one by one, while SimpleIndex optimizes images through recompression in bulk quantities, any size of the folder at a time.
SimpleIndex also allows you to automatically resize your images to a standard width or height in pixels.
It also recompresses and changes the format of images to meet more web-friendly standards. By doing this before images are uploaded to the server, you can avoid sending file formats the server can’t read, or press-ready image files that are 100 times bigger than they need to be.
Point it to a folder with all of your product images and SimpleIndex will process all of them at once, changing size, compressing images to make them lighter, and changing the image format to a more standard form, while retaining backup copies of the originals.
Names Matter, Too
When you are researching web image optimization, many sources mostly give advice on how to make your images lighter, how to resize them, or how to change their format. However, none of these tools can be used to rename your files automatically using Search Engine Optimized filenames.
For SEO purposes, it is very important that your image file name has a good description, includes keywords you want to use, and looks unique enough. Optimizing file names will give a significant boost compared to defaults like “image1.jpg” that confer no useful information. This will not only put your images at the top of related image searches, the keywords in the filenames also boost the rank of your page in the main search results.
Many programs that help people with disabilities (who cannot see very well) access web pages automatically read the content of your pages and will just say “image1.jpg”. It would be so much better if an image had a descriptive name instead. Accessibility has been an important and growing part of any SEO since 2021. Changing the filenames of your images will give you a boost on the accessibility front and points for better page ranking.
Changing file names, unfortunately, is mostly done manually and is a very tedious process. But with SimpleIndex, you can change image file names automatically!
Let’s look at a common example workflow:
- Vendor provides a spreadsheet with a list of product SKUs, descriptions, and associated image files. However, the files are named with arbitrary names like 0001.jpg or Image1.png, etc.
- Product data is loaded into a master database with data from multiple vendors and references to the provided image filenames.
- SimpleIndex imports the images from a folder, then looks up the matching product record in the master database to get the SKU, description, and other details.
- SimpleIndex renames the image using a designated file naming scheme, such as BrandName_SKU_Description.jpg
- Filenames are automatically normalized to remove any special characters not allowed by the webserver. Long descriptions are
- SimpleIndex updates the database record to reflect the new filenames so the link is maintained.
- Product data is exported from the master database to the storefront, now with SEO image filenames.
Learn More:
Move Over Cloud: Here Comes Sunshine!
Sunshine Software Saves on Subscriptions
Are cloud-based solutions raining on your IT budget? Are you tired of paying monthly fees for utilities that used to have a simple, one-time price? Do you long for software you can call your own?
Simple Software sweeps the cloud away with Sunshine Software solutions for document and data capture. While other capture platforms have moved to the cloud and require hefty subscription fees that increase the more documents you process, SimpleIndex gives you a license that lasts forever and lets you process as many files as you want on the licensed workstation.
We are using term Sunshine Software because it’s catchy and makes us smile, but also because there is no good term for this type of software. It is defined as an opposite to Cloud Software, but before the mid-2000s it was just known as “Software”. Later, with the development of Cloud Computing, terms such as On-Prem, On-Premise, On-Site, Offline, Local, Native, Self-Hosted, In-House, and so on, have been used to describe this type of software. But Sunshine Software is more than just software that you run on your own computers, it is software that you can own instead of rent, and keeps working even if you pay no subscription fees.
It has several different aspects:
- Local Processing: Sunshine Software is installed and operates on the organization’s own servers or computers. This means that sensitive or confidential data doesn’t leave the organization’s premises, which can be a critical requirement for companies with strict data security and privacy regulations.
- Network Independence: Sunshine Software doesn’t rely on an internet connection for processing, making it suitable for environments with limited or unreliable internet connectivity.
- Payment Structure: Sunshine Software is “pay once and use it forever”. Cloud Computing requires monthly or annual payments which make Total Cost of Ownership (TCO) for it much higher then it seems on a fist glance.
Cloud is just someone else’s server that you have no control of.
Local Processing
On-premise OCR has the benefit that you have total control. This can also be its disadvantage–in case it does not work, you will be the one to deal with it. But this advantage is much more significant when dealing with Enterprise software that requires multiple servers and detailed configurations. Standalone utilities like SimpleIndex that can be configured in just a few hours and require little support don’t provide the soft IT cost savings that justify the additional expense of the Cloud.
However, one of the biggest reasons to choose Sunshine Software for OCR is regulatory compliance. We often find that the process of certifying that a comparable cloud-based solution will comply with all required regulations and SLAs requires more effort than it takes to fully configure a SimpleIndex workflow on-premise. By avoiding the cloud, you avoid all the legal liabilities of having all your documents sent over the internet and stored on a third party server.
The federal laws and statutes that are commonly implicated in cloud-based service contracts range from data privacy and security laws specific to financial transaction information, healthcare information and the like. In the U.S., these include:
- The Gramm-Leach-Bliley Act, which applies to financial services;
- The Health Insurance Portability and Accountability Act (HIPPA) and the Health Information Technology for Economic and Clinical Health Act (HITECH Act), which apply to protected health information;
- The Family Educational Rights and Privacy Act (FERPA), which applies to educational institutions and their vendors; and
- Federal and state laws and regulations that apply generally to third-party service providers in given industries, such as:
- Third-party risk guidance for the financial services industry from the Federal Reserve, the Office of the Comptroller of the Currency (OCC), the Financial Industry Regulatory Authority (FINRA), the New York State Department of Financial Services (NYDFS), and other regulatory agencies; and
- FERPA, which in addition to governing data privacy, also governs the scope of permitted outsourcing in higher education.
Network Independence
Yes, today it is hard to find a place with no internet access, but in some countries and rural areas it is often not reliable or affordable. Document processing requires significant bandwidth, so a cloud-based solution isn’t viable when the Internet isn’t reliable.
Having a network independent OCR also allows you to meet the most stringent security requirements, where documents can only be stored locally or on a secure LAN, and workstations cannot be connected to the Internet.
Payment Structure
While Sunshine OCR provides greater control and security, it typically requires more substantial upfront investments in hardware, software, and IT infrastructure, compared to cloud-based OCR services. Additionally, organizations are responsible for maintenance, updates, and support.
However, many cloud solutions make you to sign contracts charging you annually. Because of that accumulated costs of cloud solutions are often more expensive than it seem. On-premise Sunshine solutions have a one-time fixed payment, though.
Read this study on Total Cost of Ownership for OCR solutions to explore the long-term costs.
SimpleIndex: Sunshine Software for OCR
SimpleIndex is the Sunshiniest OCR software of them all! The main license model is a simple, unlimited use, workstation license, the way it has been for nearly 20 years.
SimpleIndex has some optional Cloud features like integration with AWS Textract and ChatGPT, for example.
Server-based OCR also requires an annual volume-based processing license. This license does not expire, and the page counter resets to zero each year automatically.
There is also a subscription option for SimpleIndex that lowers the up-front costs, but you can always choose to pay once and use it forever.
Technical Support and software upgrades require an annual maintenance subscription, but this only gives you access to our support team and doesn’t impact the software license.
Handwriting Recognition Software
HANDWRITING RECOGNITION OPTIONS IN SIMPLEINDEX
SimpleIndex offers one of the easiest and lowest cost solutions for handwriting recognition and forms processing available anywhere. It is capable of recognizing printed and cursive handwriting, as well as traditional forms with letter boxes or combs.
Modern AI technology has dramatically improved the quality of handwriting recognition. In the past, software was only able to read clearly printed text where each character has space separating it from the others. And most data capture solutions were designed for very large enterprises who process thousands of forms per day.
Cloud-based OCR solutions employ AI and massive training datasets to allow recognition of any kind of text with remarkable accuracy. And their consumption-based pricing means that you can read 1,000 pages for the same price per page as 1,000,000, dramatically lowering the base cost for solutions.
SimpleIndex also offers a fixed-cost ICR solution for handprint and forms with no additional costs for processing volume.
SIMPLEINDEX WITH FINEREADER ICR
The FineReader OCR Engine offers handprint recognition designed for forms processing. it is optimized for hand-filled forms that use letter boxes or combs to ensure each letter is separated. FineReader will also work with underlined text as long as it is printed.
ICR features are enabled with the SimpleIndex Professional license, available in workstation, server, subscription, or concurrent.
SIMPLEINDEX WITH CLOUD OCR
The Cloud OCR feature enables the Amazon AWS Textract OCR engine, that has the ability to read unconstrained print and scripted handwriting with surprisingly good accuracy. It can be purchased separately or included with SimpleIndex Professional.
SimpleIndex Cloud OCR makes it easy to leverage Amazon Textract in your document processing workflow.
Textract is only available as an API, requiring custom programming to make it work. SimpleIndex turns it into a complete document and data capture application designed for easy batch processing on a workstation or server.
Extract text from typed or handwritten documents automatically, even on unconstrained handprint and cursive writing. Automatic extraction of form fields lets you identify key values without templates or training. Accounts payable invoice and receipt processing is also included.
Captured data can be used to organize files into folders for cloud storage apps, save to a CSV, XML or JSON file, export to a database, upload to a document management system, perform full-text searching, or even create bookmarks in PDF files.
LEARN MORE ABOUT HANDWRITING RECOGNITION
- Handprint Recognition wiki page for documentation and configuration settings
- Download the Demo
- Handprint Recognition Guide on SimpleOCR.com
- Watch the Cloud OCR Demo Video
WHAT IS ICR HANDPRINT RECOGNITION?
ICR stands for Intelligent Character Recognition and is the technology that allows software to interpret hand printed text on scanned images.
Forms Processing Software uses ICR technology to automate data entry tasks involving hand-filled surveys, applications and forms. It provides interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.
Forms Processing also includes OCR (Optical Character Recognition) technology to recognize machine printed text, and OMR (Optical Mark Recognition) for check boxes and multiple choice bubbles.
Traditional forms processing relies on constrained handwriting, where boxes on the form force the filler to write with separated, printed block characters. Modern AI technology has dramatically improved the ability to recognized unconstrained handwriting and cursive script. Hand printed notes, free-form comments blocks, non-segmented fields, historic documents, and more can now be converted to text with acceptable accuracy where these were impossible just a few years ago.
Amazon Textract integration into SimpleIndex
You can learn more about Amazon Textract integration in to SimpleIndex here.