Tesseract

From Simple Wiki

Tesseract is the open-source OCR engine included with all versions of SimpleIndex.

FineReader OCR is enabled with an OCR or Professional License.

The current release includes version 3.04 of the Tesseract engine.

Tesseract provides adequate recognition speed and accuracy when dealing with good quality documents.

Text Output Formats with Tesseract:

  • Text (txt)
  • PDF (pdf)

Tesseract also supports some languages that are unsupported by FineReader and other commercial engines, for example Indian languages like Hindi and Tamil.

The Language Pack must be installed via the Global Settings Wizard in order to enable all languages.

The full list of Tesseract supported languages is below.

  • Afrikaans
  • Arabic
  • AzeriCyrillic
  • Belarusian
  • Bengali
  • Bulgarian
  • Catalan
  • Czech
  • ChineseSimplified
  • ChineseTraditional
  • Cherokee
  • Danish
  • German
  • Greek
  • English
  • EnglishMiddle
  • Esperanto
  • Estonian
  • Basque
  • Finnish
  • French
  • Frankish
  • FrenchMiddle
  • Galician
  • GreekAncient
  • Hebrew
  • Hindi
  • Croatian
  • Hungarian
  • Indonesian
  • Icelandic
  • Italian
  • ItalianOld
  • Japanese
  • Kannada
  • Korean
  • Latvian
  • Lithuanian
  • Malayalam
  • Macedonian
  • Maltese
  • MalayMalaysian
  • DutchStandard
  • NorwegianBokmal
  • Polish
  • PortugueseStandard
  • Romanian
  • Russian
  • Slovak
  • Slovenian
  • Spanish
  • SpanishOld
  • Albanian
  • SerbianLatin
  • Swahili
  • Swedish
  • Tamil
  • Telugu
  • Tagalog
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese