Here is 2013’s first version of OCRFeeder, version 0.7.11.
For this version, a number of bugs were fixed, especially some that were affecting saving and loading projects.
Some small improvements were also made such as being able to load multiple images at once and being able to choose the OCR engine from the command line interface version of OCRFeeder (using the -e option).
Now for the main feature, I developed something that had been requested by a good number of users: being able to easily choose the language for the OCR engine.
When I developed OCRFeeder, I wanted to make it easy for users to use system-wide OCR engines from the layout analysis that OCRFeeder performs but I also wanted it to remain powerful and that’s why the engines are configured in a general, abstract way, as if from the command line.
Some OCR engines support setting the language in order to get a better recognition and while, users could already set the language of an engine manually using the OCR editor dialog, they wanted to have a nice drop-down list with the languages instead.
This represented a real challenge: to keep the old and flexible configuration and, at the same time, offer a high-level way of choosing the language.
So here is how it works. There is a new special argument keyword $LANG that will be replaced by the new field “language argument” and the currently set language. Since engines support different languages (or none) and call them different names (e.g. Tesseract expects “por” for the Portuguese, others may expect “pt”) there is another new field called “languages” which should be a map between the language code in the ISO 639-1 and the name of the language of the engine expects, as shown in the screenshot.
To show the languages, there is a new tab in the areas’ editor called Misc (in lack of a better name for a tab that’s holding more stuff in the future) with the languages combo. This combo shows a check on the languages that the currently selected engine recognizes as seen in the screenshot.
There is also a new setting in the preferences dialog with the default language and the first time the application runs, it will assign it to the user’s locale.
One thing must be taken into account: even though Tesseract supports an extensive list of languages, the users must have those packages installed in their distros, otherwise, recognition will of course fail.
To finish, related to my recent job search, I have spent this week in San Francisco getting to know some people from an exciting start-up and despite the jet lag, I managed to finish this release so I can now say that least part of OCRfeeder was designed and developed in California 😛