Archive for the ‘ocrfeeder’ Category

RPM packages of OCRFeeder for Fedora

Wednesday, December 21st, 2011

If you’re one of the people waiting for RPM packages of OCRFeeder for Fedora, rejoice!
Juan, my friend and coworker at Igalia, has cooked an RPM spec and created an OCRFeeder repository for Fedora 15 and 16.

To add this repository to Fedora 16 simply download this file and move it to /etc/yum.repo.list/.
Alternatively you can download the RPMs directly from:
OCRFeeder RPM for Fedora 15
OCRFeeder RPM for Fedora 16

Important: These Fedora packages haven’t been thoroughly tested and there might be tiny issues currently (like the icons not being installed in the right place) and I’m no longer using Fedora myself (I’ve switched to Debian) so please report any issues you might find.

OCRFeeder 0.7.7 released

Saturday, December 10th, 2011

After more than 4 months, I am finally releasing OCRFeeder‘s new version (its last release was in August, just before the DesktopSummit).
The reason for the delay, apart from some vacation in Berlin and Portugal and being busy in Igalia, was that this release brings deep changes internally.

The big issue

The problem with developing such an application from scratch in just a few months and worrying about writing a thesis is that you don’t care much for design and performance. So from 2008 until now, OCRFeeder has suffered a big problem related to memory consumption: depending on the number of images loaded and their size, it would create a reviewer (this is what I call the place where you do stuff on the images) per image and those would remain in memory, eventually crashing.
I assumed that since nobody complained about that for so long it was probably because people made a simpler usage of the application and didn’t use it for full books but now it seems that some institutions are interested in OCRFeeder and there have a been complaints and bugs filed (gb#637599 and db#646605).

This was fixed by having only up to 5 instances of reviewers. When selecting a new image, it will drop the oldest reviewer and have this one added to the cache. It gets a bit slower to select a new image but the trade-off is worth IMHO. In future changes I’ll probably make the number of reviewers configurable in some way.
Each of the content areas now also shares an editor instance instead of each one having a dedicated one.

I was able to load more than 500 images of ~4.5 Mb each and it was still usable so hopefully this will improve the experience for users who had these problems.

Other changes

Another change is that now OCRFeeder stores all its temporary files in a dedicated temporary folder under the system’s temporary folder (usually /tmp). By deleting this folder when the application quits it’s guaranteed that no temporary files will be left (as happened sometimes). Related to these changes, I’ve also decided to remove the possibility of choosing the temporary folder. Supposedly Python will already know what’s the system’s temporary folder and having such an option would make it look like Windows software from 1998.

As usual, some code cleaning and bug fixing was done and I would like to thank the awesome GNOME i18n team and everyone who sent their contributions.
Thanks to my friend Berto you can also expect an OCRFeeder Debian package on a repository next to you soon.

For a more detailed list of changes, check out the NEWS file.

Source Tarball
Git
Bugzilla

OCRFeeder 0.7.6 and DesktopSummit 2011

Friday, August 5th, 2011

Just in time for the Desktop Summit 2011, I’ve released the 0.7.6 version of OCRFeeder.

The new interesting stuff in this version is that OCRFeeder can now export to PDF. When exporting the pages to PDF, users will have two choices: “a PDF from scratch” or “a searchable PDF”. The PDF from scratch means that the text part of what will be exported will be written in the PDF using ReportLab whereas the searchable PDF means that the PDF will present the whole original picture but with invisible text overlaid in order to make it searchable.
The PDF exportation still needs some polishing but I wanted to get it out there as soon as possible for the people who need it.
Check out these examples:

OCRFeeder
(page loaded in OCRFeeder and recognized automatically)

OCRFeeder's exported PDF from scratch
(exported PDF from scratch)

OCRFeeder's exported searchable PDF
(exported searchable PDF with selected text)

This version also fixes issues when recognizing grayscale pictures as well as the mouse cursor that was being changed when it was over a page’s right margin.

I’ve also added separators to divide the Document’s submenus so they are grouped correctly and I’ve made ODT the first choice in the list of exportation formats, which had been mistakenly changed.

As usual, the incredible team of translators is doing a great job and apart from the updated translations, OCRFeeder now comes in Catalan (with the Valencian option as well) and in Greek.

DesktopSummit

No, once again, OCRFeeder’s talk wasn’t approved by the Desktop Summit’s organization. If you think that I’ve presented it some well known conferences (LinuxTag, GUADEC ES and twice in FOSDEM), it makes me a bit sad that I couldn’t yet present this unique project in the conference of the desktop it targets, but let’s hope it makes it next year.

Still, Igalia is sponsoring me again to attend the DesktopSummit, so, if you’re interested in OCRFeeder or other projects I’m involved, let me know!

See you in Berlin!

LinuxTag 2011 and OCRFeeder 0.7.5

Wednesday, May 18th, 2011

Last week, after a delayed flight that shortened my trip in one day I finally arrived in the fascinating city of Berlin to attend LinuxTag.
This was my first time in this event and I really liked it. The event’s program was very interesting, too bad my German isn’t still good enough to be able to fully understand the presentations in German (which was about half of the program or more). There were also booths with interesting stuff going on, from companies to the most well known Open Source projects and also some alternative things like a lockpicking hands-on.

It was a great place to talk to people and get more aware of what’s going on in Germany, and a lot seems to be going on.

On Wednesday afternoon, I presented OCRFeeder and couldn’t be happier after all the feedback I got in the questions session and afterwards. Probably a couple of bugs that were filed after the event have to do with that :)

You can find the slides here.


(me, presenting OCRFeeder at LinuxTag 2011)



OCRFeeder’s new release

Yesterday I finally finished the latest OCRFeeder version, 0.7.5.

Here are the highlights:

* It is possible to edit the content boxes’ bounds by dragging their edges or corners;
* When selecting a content box using the menu or keyboard shortcuts it will automatically focus their text area. This was suggested by Joanmarie for improving the usability of visually impaired users.
* Added the missing dependency of the “sane” module
* Changed some mnemonics in the menu to avoid clashes (thanks to Łukasz Jernaś)
* Prevent problems when adding image paths that do not exist (from the command line)
* Reset the OCR engine when it doesn’t exist. This bug happened when the settings pointed to an engine that no longer exists (if you passed the conf folder to another machine without the engines, for example) and would prevent the automatic recognition from doing the OCR step.

For other news, like the always amazing translation work, check out the NEWS file.

Source tarball
Git
Bugzilla

Going to LinuxTag

Monday, May 9th, 2011

That’s right, Igalia is sponsoring me to attend LinuxTag so tomorrow I’m flying to the wonderful city of Berlin.

I am also giving a presentation about OCRFeeder in there and I’m looking forward to seeing how it turns out because much of the feedback I got about OCRFeeder is from German users.
Another Igalian, Diego, is also presenting NavalPlan in there so if you need a project planning and resource management software be sure to attend his talk.

We’re usually also friendly people so if you wanna grab a beer and currywurst let us know!