SeriesFinale for Harmattan (N9/N950)

As promised before, here is the first release of SeriesFinale for MeeGo Harmattan.

This summer Micke Prag, a fellow programmer from Sweden contacted me because he was starting a port of SF for Harmattan. By then I still didn’t have an N950 because of having missed the deadline for the first developers program. Later, when the second developers program was launched I managed to finally get one. At that point, even though I already had my Samsung Galaxy S (yes, with Android) I still wanted to have a port of SeriesFinale as I had received many emails asking for this port so I started from Micke’s code and finally here it is!

The Harmattan port

SF first version for MeeGo

Maybe it is something obvious but this version is not written in PyGTK/PyMaemo. It uses part of the “old” Python backend that was changed to play well with the new UI code written in QML.

This port’s code is a bit dirty by now and I’m sure there are bugs in this first version but at least it can be used and I didn’t want to make people wait much more. The support and feedback that SeriesFinale’s users have given me is amazing (some people even saying they still use the N900 only for SF!), thank you all for it.
My heart is still filled with GNOME/GTK+ love but QML is really impressive; there are some things I still need to spend some time with to figure out but I like how quick and flexible one can do stuff in QML.

The OVI Store

It was also the first time I published something on Nokia’s Ovi Store and the process took around 2 weeks before it finally got approved (it was rejected twice before due to weird stuff like “they” thinking was not a good place to report issues or the fact that an application that says it works only with English US is eligible only for the USA, not for all the countries…).

The future

I really like the N9/N950. The user experience is something awesome and I believe this was the phone that could really compete with the iPhone and Android. Unfortunately someone at Nokia disagrees and the future of this incredible phone is doomed even though Nokia’s alternative is not better. Due to this mainly, I’m not using the N950 as my main phone. This and the fact that my personal time, in which I develop SF, is very limited, means that unless things change, I don’t know how much more releases I will do but I still wanted to add some cool features. It will probably depend again on the feedback and support.

Anyway here it is at an Ovi Store a few taps/swipes away and for free, as always (although I appreciate when someone buys me a beer 🙂 ):

Get SeriesFinale from Ovi Store

OCRFeeder 0.7.6 and DesktopSummit 2011

Just in time for the Desktop Summit 2011, I’ve released the 0.7.6 version of OCRFeeder.

The new interesting stuff in this version is that OCRFeeder can now export to PDF. When exporting the pages to PDF, users will have two choices: “a PDF from scratch” or “a searchable PDF”. The PDF from scratch means that the text part of what will be exported will be written in the PDF using ReportLab whereas the searchable PDF means that the PDF will present the whole original picture but with invisible text overlaid in order to make it searchable.
The PDF exportation still needs some polishing but I wanted to get it out there as soon as possible for the people who need it.
Check out these examples:

(page loaded in OCRFeeder and recognized automatically)

OCRFeeder's exported PDF from scratch
(exported PDF from scratch)

OCRFeeder's exported searchable PDF
(exported searchable PDF with selected text)

This version also fixes issues when recognizing grayscale pictures as well as the mouse cursor that was being changed when it was over a page’s right margin.

I’ve also added separators to divide the Document’s submenus so they are grouped correctly and I’ve made ODT the first choice in the list of exportation formats, which had been mistakenly changed.

As usual, the incredible team of translators is doing a great job and apart from the updated translations, OCRFeeder now comes in Catalan (with the Valencian option as well) and in Greek.


No, once again, OCRFeeder’s talk wasn’t approved by the Desktop Summit’s organization. If you think that I’ve presented it some well known conferences (LinuxTag, GUADEC ES and twice in FOSDEM), it makes me a bit sad that I couldn’t yet present this unique project in the conference of the desktop it targets, but let’s hope it makes it next year.

Still, Igalia is sponsoring me again to attend the DesktopSummit, so, if you’re interested in OCRFeeder or other projects I’m involved, let me know!

See you in Berlin!

Demystifying Grilo

It’s been a while since Grilo was released and although Iago’s post announcing it, together with Grilo’s webpage, do a good job describing what Grilo is about, it seems many people out there still do not understand what Grilo is and what it isn’t. Hence, I wrote this non-technical post as an attempt to demystify Grilo.

Grilo means cricket in Galician
Grilo means cricket in Galician
(CreativeCommons photo by Danforth1)

What Grilo is

Nowadays, a number of online services provide a public API for application developers to retrieve those services’ information. YouTube lets you retrieve videos’ info by browsing or searching; Jamendo lets you retrieve its music and artists’ info in a similar way; and many more offer similar options.

Although many of these services offer a RESTful API, which already makes it easy, it is up to the applications’ developers to write code to access that API, process the results (usually XML) and build their applications’ own structures with the info. An alternative way is, of course, using an already existing library, suitable for the developers’ needs, but whose API might differ from other services’ libraries

Grilo exists to solve these issues.

Grilo has a number of plugins that retrieve media information from several services. It exposes that information in a consistent API so you don’t have to learn more than one way of getting that media’s info.
Although there are more plugins for online services, there are also plugins for UPnP or for the very filesystem.

For the examples given before, searching for media in YouTube or Jamendo would be as easy as calling a method on Grilo, either choosing to search in one, both or all available media sources.

The search would result in media objects whose information (metadata keys) can be previously configured.

So, this is a very basic definition of what Grilo is: a framework that retrieves content from various services.

What Grilo is not

One thing people often expect from Grilo is for it to play content. Well, Grilo does NOT play media and that’s a planned “misfeature”.

Grilo’s main purpose is to retrieve media, or better said, media information, and to do it well.
GStreamer is already here to play media and it does a wonderful job at it. Having Grilo to be a media player as well would deviate it from its specialization which would surely make it not suitable for some use cases.

Why should you care

More and more online services are being used in many platforms with applications being developed around them. Grilo eases the development of such applications.
For a media player dedicated to play videos from YouTube and Vimeo: Grilo gets you the videos’ URLs, GStreamer plays them and voila, you can focus on other implementation details.

Examples of applications that could have they’re job done easier would be Totem, Rhythmbox and Miro. For Totem and Rhythmbox, Rygel-Grilo (Grilo’s DBUS interface) has already shown (as a proof of concept) how easy it is to provide services as YouTube, SHOUTCast, Jamendo, filesystem’s media, and more, just in a fragment of the code needed to write a dedicated plugin for each of these services.
I put also Miro as an example application because it is a video and audio player strongly intimate with the web, Grilo could only make it easy to find these videos. Plus, Grilo’s podcast plugin could also be used to manage Miro’s video channels’ subscriptions.

As a different use case, a desktop like Meego‘s, which integrates, for example, social services in it, could also integrate a way to search media, without the need to use the web browser.

So, summarizing, Grilo fills a gap in the media application development infrastructure; developers that are interested in integrating multimedia content in their applications could get an important benefit from using Grilo to access that content, and that’s why we encourage you to check it out

Caribou and Text Predictor Input Mode

I have been wanting to show how Caribou can be used with the Text Predictor Input Mode I wrote a while ago and finally today I took the time to do it.

Caribou with Text Predictor Input Mode from Joaquim Rocha on Vimeo.

Okay, the shortcuts  to accept prediction candidates or scroll through them can be changed into some that are quickly accessible.
With the changes I did to Caribou, one can even easily provide a special button, such as “ACCEPT”, like the screenshot below shows:

Caribou with Accept key

The changes I’m talking about and that you see in the video and the QWERTY keyboard layout I used can be found in Caribou’s bug #613229.

I wrote these changes because the current way of writing layouts for Caribou doesn’t seem very flexible nor appropriate for non programmers, in my opinion.
These changes drop the current usage of Python files with tuples as a way to configure Caribou’s layouts. Instead, json files should be used and more functionality that wasn’t implemented before is also possible with the mentioned patch.

Basically, instead of having either character keys or symbol, label pairs that Caribou understands, each key should be a set of attributes that define it, which Caribou then interprets accordingly.

For a basic key, all one needs to have is the value attribute, which can receive a string (for example a character) or the name of a key in GDK (you can easily figure them out from the GDK key syms file).

{“value”: “a”} will create a key labeled a that inputs the character a
{“value”: “BackSpace”} will create a backspace key but labeled with “BackSpace”

You can override the label of a key using the attribute “label”, as:

{“value”: “BackSpace”,
“label”: “⌫”}
will create a backspace key but labeled with “⌫”

Labels can use Pango Markup to change its text style, for example: {“label”: “<small><b>Small Bold Text Key</b></small>”, …}

A width attribute is also introduced and means the width relative to a usual key’s width. A width of 3 will generate a key that fills the space of 3 keys whereas 0.25 fills a quarter of a regular key’s space.

A key can be of a given type which indicates how it behaviors. There is 5 types of keys: normal, layout_switcher, preferences, mask and dummy.
A normal key type indicates it is a regular “you-press-you-input” key and is the default type, which is why it wasn’t specified in the examples above.
A layout_switcher key, when pressed, will change the keyboard sublayout to the one given by the value attribute (and must exist in the layout file), so, if we are in the “lowercase” layout and we want a key labeled “UP” to change to the “uppercase” layout:
{“label”: “UP”, “key_type”: “layout_switcher”, “value”: “uppercase”}

The preferences key type brings up the preferences menu.
A mask key means that you set a mask indicated by the value attribute when you press it. For the Alt key:
{“label”: “Alt”, “key_type”: “mask”, “value”: “mod1”} again, the “mod1” is the mask name from GDK.

Finally, there’s the dummy key type which is used basically to set spacer keys and allow to separate some keys from others in order to improve visual grouping. Rows that don’t have the number of keys in any row (including dummy keys) will be centered horizontally.

These let you play with keyboards’ layouts and design any kind of layout in a flexible and easy way.

At the moment, the patch is still pending review. Let’s hope it gets a green light and is applied.

Text Prediction on GNOME

I was disappointed with the text completion provided by the N900 (eZiText) that, on top of that, is closed and I wondered if it was possible to have an Open Source solution to provide text prediction and completion.

I searched a bit and besides my original intentions of developing a library to search Free and Open Source dictionaries’ words from a prefix, I found Presage.
Presage is better than most text prediction systems I have seen out there because it really is text prediction, not text completion. This C++ library, retrieves words taking into account the surrounding text, not only the prefix or frequency of words. It uses a database representing N-grams that can be trained with more text; the more you train it, the more accurate it can be.

This means that is you type something like:
“I m”
instead of suggesting nonsense things like:
“I mouse” “I mother” “I market” or “I more
it suggests something more like:
“I must” “I met” “I mean” or “I might
The difference is obvious!

So I developed a little wrapper around Presage in C that provides a yet very basic API to get text completion. Then I created a GTK+ Input Method context to control the user’s input in regular GTK+ text widgets and used the wrapper to process the inputted text. I called it: Predictor Input Method (not very original I know…).
The result is that Predictor suggests you words, even if you type a prefix or not, and lets you accept the candidate word or scroll through a list of suggestions as you can see in the video below:

Text prediction in GNOME from Joaquim Rocha on Vimeo.

How to use it

The current key bindings are:

Ctrl+Enter -> Selects the current candidate
Ctrl+Up/Down -> Scrolls through the list of candidates
Backspace -> Deletes the character previous to the cursor and suggests again
Directional arrows -> Move cursor and discard suggestions

Who should use it

This kind of assistance technology can have many applications but the main ones are: the usage in small/mobile devices and the assistance of users with disabilities. Both have the same reasons behind: speeding the input and reducing failed characters, because the input required gets minimized;
Of course, you can as well use it in your GNOME desktop regularly for faster typing your emails, etc.

In the case of users with disabilities, a popup menu could be added to show a complete list of candidates and the bound fast-access keys.

Why is Free Software important in this

This is the kind of technology that everybody should have an interest in using a FOSS solution because of the obvious advantage that is developers from all over the world being able to modify it.
Suppose you’re creating a mobile phone and you choose a closed solution to provide text prediction for your phone. And then you find out you’re disappointing all your users from country X because that library you’re paying for does not support their language and the library owner is not interested that much in adding it. Now if you’re using an open solution, local communities from many places in the world can add support for their languages and your phone can have a better acceptance in places you hadn’t even imagined.

Software that reaches an international audience with different languages is software you want to have open.

How to get Predictor Input Method

You can find the Predictor Input Method’s source its Gitorious page:
Of course, you should also install Presage for it to work.

If you are not using GTK+ Input Methods then you can use the wrapper text-predictor.cpp which is not tight to the Input Method code itself. And of course, you can copy the little tricks used on the Input Method code and apply it to your source (like delaying the retrieval of the candidates some fractions of a second to not block the input, etc.).

Hope you like it.