<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joaquim Rocha's Web Page &#187; ocrfeeder</title>
	<atom:link href="http://www.joaquimrocha.com/category/ocrfeeder/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joaquimrocha.com</link>
	<description>Linux, technology and art</description>
	<lastBuildDate>Sun, 05 Sep 2010 21:52:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>One more step in OCR with OCRFeeder 0.7</title>
		<link>http://www.joaquimrocha.com/2010/07/30/one-more-step-in-ocr-with-ocrfeeder-0-7/</link>
		<comments>http://www.joaquimrocha.com/2010/07/30/one-more-step-in-ocr-with-ocrfeeder-0-7/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 13:44:14 +0000</pubDate>
		<dc:creator>Joaquim Rocha</dc:creator>
				<category><![CDATA[gnome]]></category>
		<category><![CDATA[gtk]]></category>
		<category><![CDATA[guadec]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[ocrfeeder]]></category>
		<category><![CDATA[planet]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.joaquimrocha.com/?p=486</guid>
		<description><![CDATA[I have been hacking on some new and cool features on OCRFeeder for a while and now it is time to show them to the world in a new release.
These features I&#8217;m talking about fall mainly in 2 areas: improving the a11y of the UI and improving the recognition of documents.
A11y Improvement
The improvement of the [...]]]></description>
			<content:encoded><![CDATA[<p>I have been hacking on some new and cool features on <a href="http://live.gnome.org/OCRFeeder" target="_blank">OCRFeeder</a> for a while and now it is time to show them to the world in a new release.</p>
<p>These features I&#8217;m talking about fall mainly in 2 areas: improving the a11y of the UI and improving the recognition of documents.</p>
<p><strong>A11y Improvement</strong></p>
<p>The improvement of the a11y has the typical UI changes to include mnemonics, missing labels and relations, but also other approaches that have more to do with UX like using a progress dialog to inform users that time-taking operations are being carried. This means that now, the PDF importation and OCR won&#8217;t block the UI.<br />
Other changes in this category were the navigation through the content boxes (before, these could only be selected by clicking on them), the selection of all boxes and the deletion of selected boxes.</p>
<p>The following screenshot shows the box editor area of OCRFeeder with its mnemonics highlighted:</p>
<div id="attachment_508" class="wp-caption aligncenter" style="width: 160px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_a11y1.png"><img class="size-medium wp-image-508" title="ocrfeeder_a11y" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_a11y1-150x300.png" alt="Box edition area" width="150" height="300" /></a><p class="wp-caption-text">Box edition area</p></div>
<p><strong>Recognition Improvements</strong></p>
<p>Sometimes, text columns are so close to each other that they end up being recognized as a single paragraph, so I added a post-detection method to solve this issue. This feature is optional and can be toggled from the Preferences dialog.</p>
<p>Here&#8217;s an example of the difference it makes:</p>
<div id="attachment_503" class="wp-caption aligncenter" style="width: 229px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_no_columns1.png"><img class="size-medium wp-image-503" title="ocrfeeder_no_columns" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_no_columns1-219x300.png" alt="Before columns' detection improvements" width="219" height="300" /></a><p class="wp-caption-text">Before columns&#39; detection improvements</p></div>
<div id="attachment_504" class="wp-caption aligncenter" style="width: 230px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_columns1.png"><img class="size-medium wp-image-504" title="ocrfeeder_columns" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_columns1-220x300.png" alt="After columns' detection improvements" width="220" height="300" /></a><p class="wp-caption-text">After columns&#39; detection improvements</p></div>
<p>Scanned document images are usually skewed and this makes it more difficult for the contents to be successfully detected and &#8220;OCRed&#8221;. I decided to implement an algorithm to deskew these images. The algorithm uses the <a href="http://en.wikipedia.org/wiki/Hough_transform" target="_blank">Hough transform</a> to try to find lines in the image and their angles and, while it is a bit slow, it works well:</p>
<div id="attachment_509" class="wp-caption aligncenter" style="width: 201px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_skewed1.png"><img class="size-medium wp-image-509" title="ocrfeeder_skewed" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_skewed1-191x300.png" alt="Skewed image" width="191" height="300" /></a><p class="wp-caption-text">Skewed image</p></div>
<div id="attachment_510" class="wp-caption aligncenter" style="width: 201px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_deskewed1.png"><img class="size-medium wp-image-510" title="ocrfeeder_deskewed" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/ocrfeeder_deskewed1-191x300.png" alt="Deskewed image" width="191" height="300" /></a><p class="wp-caption-text">Deskewed image</p></div>
<p>This action can be used in a loaded image but can also be configured to be automatically performed before the images are added. The Unpaper tool can now also be set to be clean images before adding them.<br />
This makes it much easier to successfully recognize images obtained from a scanner device.</p>
<p>Some fine tunning of the content boxes&#8217; bounds was done by trying to shorten their margins, that is, lowering the distance between the boxes and their actual contents.</p>
<p>The font size recognition was also tweaked to solve the problem of having paragraphs with initials (you know, the huge starting characters) which were influencing the whole paragraphs&#8217; font size.</p>
<p>To finish the recognition&#8217;s improvements, I have added an optional action to find and fix the text&#8217;s line breaks. Usually, OCR engines don&#8217;t consider &#8220;semantic line-breaks&#8221;, that is, OCR engines always insert a newline in the end of each line.<br />
Using some regular expressions, I try to find these &#8220;fake&#8221; line-breaks and recover the original flow of the text. Like some of the features mentioned above, this one can also be turned on/off from the Preferences dialog.</p>
<p>Here&#8217;s how the Preferences dialog looks like now:</p>
<p><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/Preferences_dialog1.png"><img class="aligncenter size-medium wp-image-511" title="Preferences_dialog" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/Preferences_dialog1-263x300.png" alt="Preferences_dialog" width="263" height="300" /></a></p>
<p><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/07/Preferences_dialog_recognition1.png"><img class="aligncenter size-medium wp-image-512" title="Preferences_dialog_recognition" src="http://www.joaquimrocha.com/wp-content/uploads/2010/07/Preferences_dialog_recognition1-263x300.png" alt="Preferences_dialog_recognition" width="263" height="300" /></a></p>
<p>To finish, images can now be dragged and dropped onto the pages&#8217; area and the mouse wheel can be used to scroll horizontally combining it with the Shift key, thanks to Stefan Löffler, and of course, several bugs were corrected and code was improved.</p>
<p>As you see, this is a &#8220;rich&#8221; new version of OCRFeeder that keeps being the easiest way to use OCR in a desktop. You are welcome to file bugs in <a href="bugzilla.gnome.org/" target="_blank">bugzilla</a> or to send patches and features&#8217; requests to its <a href="http://mail.gnome.org/mailman/listinfo/ocrfeeder-list" target="_blank">mailing list</a> or approaching me if you&#8217;re in <a href="http://www.guadec.org" target="_blank">GUADEC</a>.</p>
<p>Download: <a href="http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.7" target="_blank">OCRFeeder 0.7 tarball on GNOME FTP</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joaquimrocha.com/2010/07/30/one-more-step-in-ocr-with-ocrfeeder-0-7/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>GUADEC ES, a good beginning for GUADEC</title>
		<link>http://www.joaquimrocha.com/2010/07/24/guadec-es-a-good-beginning-for-guadec/</link>
		<comments>http://www.joaquimrocha.com/2010/07/24/guadec-es-a-good-beginning-for-guadec/#comments</comments>
		<pubDate>Sat, 24 Jul 2010 22:38:00 +0000</pubDate>
		<dc:creator>Joaquim Rocha</dc:creator>
				<category><![CDATA[a coruña]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[gnome]]></category>
		<category><![CDATA[guadec]]></category>
		<category><![CDATA[ocrfeeder]]></category>
		<category><![CDATA[planet]]></category>

		<guid isPermaLink="false">http://www.joaquimrocha.com/?p=455</guid>
		<description><![CDATA[Yesterday was the last day of the 7th edition of GUADEC Hispana, originally to be organized in Chile but due to the disastrous earthquake, it was moved to the city of Corunna, Spain.
Between hacking on OCRFeeder (expect a new version soon), giving a talk about it, attending nice presentations and chatting with people, I had [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday was the last day of the 7th edition of <a href="http://2010.guadec.es" target="_blank">GUADEC Hispana</a>, originally to be organized in Chile but due to the <a href="http://en.wikipedia.org/wiki/2010_Chile_earthquake" target="_blank">disastrous earthquake</a>, it was moved to the city of <a href="http://en.wikipedia.org/wiki/A_Coru%C3%B1a" target="_blank">Corunna</a>, Spain.</p>
<p>Between hacking on <a href="http://live.gnome.org/OCRFeeder" target="_blank">OCRFeeder</a> (expect a new version soon), giving a talk about it, attending nice presentations and chatting with people, I had a great time.<br />
Diego&#8217;s <a href="http://people.gnome.org/~diegoe/" target="_blank">presentation about Epiphany</a> was simply epic and <a href="http://blogs.igalia.com/mario" arget="_blank">Mario</a> gave a very complete crash course of git.</p>
<p>I guess there&#8217;s a first time for these things but Thursday, while I was giving a demo of the new OCRFeeder&#8217;s features, it crashed on me&#8230; Never again will I laugh at Mr. Gates and friends when their products freeze out of the blue (nah, it is too funny).<br />
Now that I think of it&#8230; was this the first time a Portuguese man gave a talk at GUADEC Hispana?</p>
<p>The presentation was a cut-down version of the one <a href="http://www.joaquimrocha.com/2010/02/09/fosdem-follow-up/" target="_blank">I gave at FOSDEM</a> this year and you can check its slides below (it&#8217;s in Spanish):</p>
<div style="width:425px" id="__ss_4822465"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/j_rocha/ocrfeeder" title="Ocrfeeder">Ocrfeeder</a></strong><object id="__sse4822465" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ocrfeeder-100723053038-phpapp01&#038;stripped_title=ocrfeeder" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse4822465" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ocrfeeder-100723053038-phpapp01&#038;stripped_title=ocrfeeder" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/j_rocha">Joaquim Rocha</a>.</div>
</div>
<p>(thanks to <a href="http://blogs.igalia.com/mrego/" target="_blank">Manuel Rego</a> for reviewing my Spanish in the slides)</p>
<p>Here&#8217;s the group photo of the GUADEC ES attendants:</p>
<p><img alt="" src="http://farm5.static.flickr.com/4081/4822344878_d030dd0057.jpg" title="GUADEC ES 2010 group photo" class="alignnone" width="500" height="333" /></p>
<p>And from next Monday on, I&#8217;ll be in Den Haag for <a href="http://www.guadec.org" target="_blank">GUADEC 2010</a>. My lightening talk about the <a href="http://www.joaquimrocha.com/2010/03/03/text-prediction-on-gnome/" target="_blank">Predictor Input Method</a> got accepted, so if you&#8217;re into this kind of stuff, I hope to see you there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joaquimrocha.com/2010/07/24/guadec-es-a-good-beginning-for-guadec/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Going to GUADEC</title>
		<link>http://www.joaquimrocha.com/2010/07/04/going-to-guadec-2/</link>
		<comments>http://www.joaquimrocha.com/2010/07/04/going-to-guadec-2/#comments</comments>
		<pubDate>Sun, 04 Jul 2010 11:43:09 +0000</pubDate>
		<dc:creator>Joaquim Rocha</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[gnome]]></category>
		<category><![CDATA[grilo]]></category>
		<category><![CDATA[guadec]]></category>
		<category><![CDATA[igalia]]></category>
		<category><![CDATA[ocrfeeder]]></category>
		<category><![CDATA[planet]]></category>
		<category><![CDATA[travel]]></category>

		<guid isPermaLink="false">http://www.joaquimrocha.com/?p=422</guid>
		<description><![CDATA[One more year, Igalia will give me the chance and the pleasure to attend GUADEC one more year, this time in Den Haag.

My fellow Igalians Iago, Alejandro Piñeiro and José Dapena will give talks about Grilo, Cally and Modest 4, respectively.
As for me, I&#8217;m hoping my lightening talk about Text Prediction on GNOME gets accepted.
So, [...]]]></description>
			<content:encoded><![CDATA[<p>One more year, Igalia will give me the chance and the pleasure to attend <a href="http://www.guadec.org" target="_blank">GUADEC</a> one more year, this time in <a href="http://en.wikipedia.org/wiki/The_Hague" target="_blank">Den Haag</a>.</p>
<p><img src="http://www.guadec.org/img/guadec-oranje.png" alt="I'm going to GUADEC" /></p>
<p>My fellow Igalians <a href="http://blogs.igalia.com/itoral" target="_blank">Iago</a>, <a href="http://blogs.igalia.com/apinheiro" target="_blank">Alejandro Piñeiro</a> and <a href="http://blogs.igalia.com/dape" target="_blank">José Dapena</a> will give talks about <a href="http://guadec.org/index.php/guadec/2010/paper/view/17" target="_blank">Grilo</a>, <a href="http://guadec.org/index.php/guadec/2010/paper/view/94" target="_blank">Cally</a> and <a href="http://guadec.org/index.php/guadec/2010/paper/view/60" target="_blank">Modest 4</a>, respectively.</p>
<p>As for me, I&#8217;m hoping my lightening talk about <a href="http://www.joaquimrocha.com/2010/03/03/text-prediction-on-gnome/" target="_blank">Text Prediction on GNOME</a> gets accepted.</p>
<p>So, as usual, if you wanna talk about GNOME, <a href="http://live.gnome.org/OCRFeeder" target="_blank">OCR</a>, <a href="http://www.joaquimrocha.com/2010/03/03/text-prediction-on-gnome/" target="_blank">Input Methods</a>, <a href="http://live.gnome.org/Grilo" target="_blank">Grilo</a>, <a href="http://www.djangoproject.com" target="_blank">Django</a> or Free Software in general and have beer while we&#8217;re on it, come along!</p>
<p>Hope to see you in Den Haag.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joaquimrocha.com/2010/07/04/going-to-guadec-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>OCRFeeder 0.6.6</title>
		<link>http://www.joaquimrocha.com/2010/04/05/ocrfeeder-0-6-6/</link>
		<comments>http://www.joaquimrocha.com/2010/04/05/ocrfeeder-0-6-6/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 15:11:22 +0000</pubDate>
		<dc:creator>Joaquim Rocha</dc:creator>
				<category><![CDATA[gnome]]></category>
		<category><![CDATA[gtk]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[ocrfeeder]]></category>
		<category><![CDATA[odf]]></category>
		<category><![CDATA[planet]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.joaquimrocha.com/?p=343</guid>
		<description><![CDATA[OCRFeeder version 0.6.6 has been released.
This version has no big improvements and exists mainly to introduce the fix of a bug that prevented using the algorithm for recognizing documents automatically.
The copyright was updated to include the proper copyright and license notices of ODFPy, which ships with OCRFeeder.
It also features some improvements to Debian related files [...]]]></description>
			<content:encoded><![CDATA[<p><a title="OCRFeeder" href="http://live.gnome.org/OCRFeeder" target="_blank">OCRFeeder</a> version 0.6.6 has been released.</p>
<p>This version has no big improvements and exists mainly to introduce the fix of a bug that prevented using the algorithm for recognizing documents automatically.</p>
<p>The copyright was updated to include the proper copyright and license notices of <a title="ODFPy" href="http://odfpy.forge.osor.eu/" target="_blank">ODFPy</a>, which ships with OCRFeeder.<br />
It also features some improvements to Debian related files (thanks to <a title="Berto's blog" href="http://blogs.igalia.com/berto/" target="_blank">Alberto Garcia</a>, who is creating the official deb package for Debian) and a few translation updates.</p>
<p>See the whole list of changes <a href="http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/ocrfeeder-0.6.6.news" target="_blank">here</a>.</p>
<p>Your usual links:<a title="OCRFeeder's git" href="http://git.gnome.org/browse/ocrfeeder/" target="_blank"><br />
OCRFeeder&#8217;s git<br />
</a><a title="OCRFeeder's Bugzilla" href="https://bugzilla.gnome.org/buglist.cgi?cmdtype=runnamed&amp;namedcmd=OCRFeeder" target="_blank"> OCRFeeder&#8217;s bugzilla</a><br />
<a title="OCRFeeder on GNOME's FTP" href="http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/" target="_blank">OCRFeeder&#8217;s Tarball from GNOME&#8217;s FTP</a><br />
<a title="OCRFeeder 0.6.6 deb package" href="http://ocrfeeder.googlecode.com/files/ocrfeeder_0.6.6-1_all.deb" target="_blank">OCRFeeder 0.6.6 Debian package</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joaquimrocha.com/2010/04/05/ocrfeeder-0-6-6/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>OCRFeeder version 0.6.5</title>
		<link>http://www.joaquimrocha.com/2010/03/24/ocrfeeder-version-0-6-5/</link>
		<comments>http://www.joaquimrocha.com/2010/03/24/ocrfeeder-version-0-6-5/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 21:59:46 +0000</pubDate>
		<dc:creator>Joaquim Rocha</dc:creator>
				<category><![CDATA[gnome]]></category>
		<category><![CDATA[gtk]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[ocrfeeder]]></category>
		<category><![CDATA[odf]]></category>
		<category><![CDATA[planet]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.joaquimrocha.com/?p=311</guid>
		<description><![CDATA[I have just released OCRFeeder version 0.6.5!
Here are the main changes in this version:
* Importing PDF files is now faster
* The OCR engines manager dialog now allows to detect and choose to use system-wide OCR engines (this action is also used when the application is started with no engines configured)
* Multiple content areas in OCRFeeder&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>I have just released OCRFeeder version 0.6.5!</p>
<p>Here are the main changes in this version:</p>
<p>* Importing PDF files is now faster<br />
* The OCR engines manager dialog now allows to detect and choose to use system-wide OCR engines (this action is also used when the application is started with no engines configured)<br />
* Multiple content areas in OCRFeeder&#8217;s canvas can now be selected using Shift+Click<br />
* Introduces Ctrl+a shortcut to select all content areas in OCRFeeder&#8217;s canvas<br />
* The Tools menu now has the new action &#8220;Recognize Selected Areas&#8221; which will perform the automatic recognition on selected content areas of OCRFeeder&#8217;s canvas</p>
<p>Also, a few bugs were fixed:</p>
<p>* Removed PDF files&#8217; extension from the images generated from them<br />
* Sorts images when adding them from a folder<br />
* Selection areas are now getting selected after creating them<br />
* Fixed problem when quitting the application</p>
<p>(You can also read the <a href="http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/ocrfeeder-0.6.5.news" target="_blank">full list of changes</a>)</p>
<div id="attachment_312" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.joaquimrocha.com/wp-content/uploads/2010/03/OCRFeeder_recognize_all_areas.png"><img class="size-medium wp-image-312" title="OCRFeeder_recognize_all_areas" src="http://www.joaquimrocha.com/wp-content/uploads/2010/03/OCRFeeder_recognize_all_areas-300x88.png" alt="Recognize All Areas action" width="300" height="88" /></a><p class="wp-caption-text">Recognize All Areas action</p></div>
<p>You can download the new tarball from <a title="GNOME's FTP" href="http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/" target="_blank">GNOME&#8217;s FTP</a> or a Debian package from <a title="OCRFeeder deb package" href="http://ocrfeeder.googlecode.com/files/ocrfeeder_0.6.5-1_all.deb" target="_blank">here</a>.</p>
<p>I&#8217;d also would like to thank the <a title="GNOME i18n Team" href="http://www.gnome.org/i18n/" target="_blank">GNOME i18n Team</a> for their work translating OCRFeeder.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joaquimrocha.com/2010/03/24/ocrfeeder-version-0-6-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
