Regarding my enrollment in the SAPO Summerbits program, I created a media filter for DSpace called OCR4DSpace.
With this media filter you are able to submit document images (scanned documents) and be able to search for their contents without having to fill them manually at submission time. The image contents are read by the OCR engines you have on your system.
The media filter is really simple to configure and use and hopefully will make some people’s life easier!
The web page is currently only in Portuguese but will be translated to English soon but rest your soul, the README is included in English.
To checkout the code, all you need is to run:
svn co svn://svn.softwarelivre.sapo.pt/ocrd/trunk/OCR4DSpace
Read the README file carefully and enjoy the automation that Optical Character Recognition can do for you!
Portugal needs more events like these, I hope next year more companies will join SAPO and Associação Ensino Livre and bring up the second edition of Summerbits with more projects!