Supporting Wikipedia again
Friday, December 16th, 2011It is one of the greatest accomplishments of humanity and we should be proud of it:
If Wikipedia is useful for you, click the link above and make a donation to help supporting it.
It is one of the greatest accomplishments of humanity and we should be proud of it:
If Wikipedia is useful for you, click the link above and make a donation to help supporting it.
After more than 4 months, I am finally releasing OCRFeeder‘s new version (its last release was in August, just before the DesktopSummit).
The reason for the delay, apart from some vacation in Berlin and Portugal and being busy in Igalia, was that this release brings deep changes internally.
The big issue
The problem with developing such an application from scratch in just a few months and worrying about writing a thesis is that you don’t care much for design and performance. So from 2008 until now, OCRFeeder has suffered a big problem related to memory consumption: depending on the number of images loaded and their size, it would create a reviewer (this is what I call the place where you do stuff on the images) per image and those would remain in memory, eventually crashing.
I assumed that since nobody complained about that for so long it was probably because people made a simpler usage of the application and didn’t use it for full books but now it seems that some institutions are interested in OCRFeeder and there have a been complaints and bugs filed (gb#637599 and db#646605).
This was fixed by having only up to 5 instances of reviewers. When selecting a new image, it will drop the oldest reviewer and have this one added to the cache. It gets a bit slower to select a new image but the trade-off is worth IMHO. In future changes I’ll probably make the number of reviewers configurable in some way.
Each of the content areas now also shares an editor instance instead of each one having a dedicated one.
I was able to load more than 500 images of ~4.5 Mb each and it was still usable so hopefully this will improve the experience for users who had these problems.
Other changes
Another change is that now OCRFeeder stores all its temporary files in a dedicated temporary folder under the system’s temporary folder (usually /tmp). By deleting this folder when the application quits it’s guaranteed that no temporary files will be left (as happened sometimes). Related to these changes, I’ve also decided to remove the possibility of choosing the temporary folder. Supposedly Python will already know what’s the system’s temporary folder and having such an option would make it look like Windows software from 1998.
As usual, some code cleaning and bug fixing was done and I would like to thank the awesome GNOME i18n team and everyone who sent their contributions.
Thanks to my friend Berto you can also expect an OCRFeeder Debian package on a repository next to you soon.
For a more detailed list of changes, check out the NEWS file.
As promised before, here is the first release of SeriesFinale for MeeGo Harmattan.
This summer Micke Prag, a fellow programmer from Sweden contacted me because he was starting a port of SF for Harmattan. By then I still didn’t have an N950 because of having missed the deadline for the first developers program. Later, when the second developers program was launched I managed to finally get one. At that point, even though I already had my Samsung Galaxy S (yes, with Android) I still wanted to have a port of SeriesFinale as I had received many emails asking for this port so I started from Micke’s code and finally here it is!
The Harmattan port
Maybe it is something obvious but this version is not written in PyGTK/PyMaemo. It uses part of the “old” Python backend that was changed to play well with the new UI code written in QML.
The OVI Store
It was also the first time I published something on Nokia’s Ovi Store and the process took around 2 weeks before it finally got approved (it was rejected twice before due to weird stuff like “they” thinking bugs.maemo.org was not a good place to report issues or the fact that an application that says it works only with English US is eligible only for the USA, not for all the countries…).
The future
I really like the N9/N950. The user experience is something awesome and I believe this was the phone that could really compete with the iPhone and Android. Unfortunately someone at Nokia disagrees and the future of this incredible phone is doomed even though Nokia’s alternative is not better. Due to this mainly, I’m not using the N950 as my main phone. This and the fact that my personal time, in which I develop SF, is very limited, means that unless things change, I don’t know how much more releases I will do but I still wanted to add some cool features. It will probably depend again on the feedback and support.
Anyway here it is at an Ovi Store a few taps/swipes away and for free, as always (although I appreciate when someone buys me a beer
):
You might have heard/seen that there are quite a few Igalians in this year’s Desktop Summit.
What you might have not noticed is that we also have a booth in here. The booth is only set during breaks since we all are attending or giving talks and the reason you might wanna pass by is to try some of the cool things we work on in Igalia or to get some free FOSS stickers (WebKit, Epiphany, MeeGo, Grilo, Orca, …).
You can see some of the stickers in this crappy photo:
Just in time for the Desktop Summit 2011, I’ve released the 0.7.6 version of OCRFeeder.
The new interesting stuff in this version is that OCRFeeder can now export to PDF. When exporting the pages to PDF, users will have two choices: “a PDF from scratch” or “a searchable PDF”. The PDF from scratch means that the text part of what will be exported will be written in the PDF using ReportLab whereas the searchable PDF means that the PDF will present the whole original picture but with invisible text overlaid in order to make it searchable.
The PDF exportation still needs some polishing but I wanted to get it out there as soon as possible for the people who need it.
Check out these examples:

(page loaded in OCRFeeder and recognized automatically)

(exported searchable PDF with selected text)
This version also fixes issues when recognizing grayscale pictures as well as the mouse cursor that was being changed when it was over a page’s right margin.
I’ve also added separators to divide the Document’s submenus so they are grouped correctly and I’ve made ODT the first choice in the list of exportation formats, which had been mistakenly changed.
As usual, the incredible team of translators is doing a great job and apart from the updated translations, OCRFeeder now comes in Catalan (with the Valencian option as well) and in Greek.
DesktopSummit
No, once again, OCRFeeder’s talk wasn’t approved by the Desktop Summit’s organization. If you think that I’ve presented it some well known conferences (LinuxTag, GUADEC ES and twice in FOSDEM), it makes me a bit sad that I couldn’t yet present this unique project in the conference of the desktop it targets, but let’s hope it makes it next year.
Still, Igalia is sponsoring me again to attend the DesktopSummit, so, if you’re interested in OCRFeeder or other projects I’m involved, let me know!

See you in Berlin!