During my hackfest time in Igalia I thought it’d be interesting to see how much it’d take to make OCRFeeder run in Fremantle just like Stefan Kost did for Jokosher and Pitivi talked about during GCDS.
At the beginning, I thought it’d be a little difficult (I thought I’d need a lot of stuff)
So I installed libgoocanvas-dev, checked out pygoocanvas and compiled it (I also needed to manually copy the generated egg from Python 2.3’s site-packages to Python 2.5’s). After that, no OCR engines available so I installed OCRAD which was pretty easy. I also decided to give a shot at installing Tesseract which went wrong supposedly due to a broken make file or something but this week that problem was fixed and now Tesseract works like a charm!
I’m not thinking of porting OCRFeeder to Maemo (it is an office application that wouldn’t be very easy to use on a device nor it makes sense to want to do that kind of office task in a mobile device) but it was indeed nice to see how easy it is to make a GNOME application written in Python to work on it.
OCR can have many interesting applications in a mobile device and I got a few ideas stashed in a corner of my memory so, if the time allows, I’ll try to put some to practice in the future.
Here are some screenshots of OCRFeeder and the result ODT document (yes, the ODFPy modules worked fine as well):
10 thoughts on “OCRFeeder running in Fremantle”
Wow. Awesome work.
As far as not wanting a mobile device that can do OCR: there is currently one web service and a product that do that. Scanr is a web service to take pictures from a mobile phone and create a PDF from them, while the Plano DocuPen is a handheld scanner used for single-page scanning w/OCR capabilities.
Quick question: was there any code modification required at all? From the sound of the post, it seems like all you needed to do was install a library, OCRFeeder, and OCRAD.
I can say it didn’t require any considerable code change. I only touched the code to:
1) Change the resources dir on the util/constants.py (to point to my local dir since I didn’t install it on the Scratchbox)
2) Comment the url_show function call on the util/lib.py because it didn’t have that module and I don’t care if users couldn’t click to access OCRFeeder’s web page
That was all!
Indeed document OCR on a mobile device may not be a very common use case but there are a lot of other things you can do with OCR like handwriting recognition, etc. which may be more interesting to have on a touch screen device.
I have been thinking about OCR on my IT for some time. One of the ideas I had was to take a picture of a business card (not a trivial feat on the N810, but possible) and build a contact entry.
I imagined a screen where the OCR’d text showed up on one side, and possible places to put that data on the other. Then, draw the connections (or drag the data), allow a final edit, and store in the contacts DB.
Was this also one of the ideas you had stashed in the corner of your memory?
Karl… maybe it will sound like false but the “contacts from biz card” was exactly my idea! Not sure if I pictured the UI the same way as you but I’ve had this idea for some time now (since the N810 got the possibility of doing it by using the little cam it has).
I’ll try to implement that idea in my spare time. We can talk and discuss ideas to make it a cool application.
What do you think?
Joaquim (or should I cal you Jay),
Sounds good. You can take credit for the idea, I have no problem with that at all.
I am an enbedded SW engineer by day, but I was not successful in getting the SDK to work. I had made some small changes to MaemoFTP.
Send me an direct email (I assume you can get at that from my login). It’s my work email. I’ll respond with my home email.
I had the same thought as you guys too 🙂
I’m not up on the details of how it currently works, but it would be nice to have a generalized interface to the OCR engine so that more than business cards could be scanned & then stored. Have it populate the d-bus for addresses! For example, an envelope is almost standard (per country) than business cards. Ah, templates!
Karl’s idea drag between connections sounds good – although tap an item on either side, then tap the equivalent on the other side (tapping a second time on one side would switch of course) might be easier than drag & drop. Or of course, allow input both ways.
Being able to grab any text would be cool by itself. I’m always writing down details from item tags in a store, that I might want to price shop, compare, or buy later. A bar code scanner would be good for that too.
I like the tap-tap idea.
Also, I was thinking of a lasso you could paint with the stylus or your finger. That would highlight the text, and then tap the destination.
This would also be awesome combined with a dictionary in a foreign country.
Use the camera to take a picture of a foreign word, OCR it and pump it into a dictionary for instant translation.
I know some Japanese phones have this ability with English words and Kanji
“This would also be awesome combined with a dictionary in a foreign country.”
It could be even used in conjunction with google language tools.
As for dictionary I think qstardict would be great since it supports a lot of databases.
Current version of OCRFeeder (0.7.1) does not allow to choose a language to recognize. I needed to add some options to the command line in the ocr engines settings window to recognize non-english text.
It would be great to have some GUI to select the main recognition language. It would be even greater to set a language for every area (most OCR engines like tesseract and cuneiform currently do not allow to recognize multilanguage documents).
And thank you for all your efforts 🙂
Comments are closed.