Archive for the ‘Technology’ Category

Demystifying Grilo

Wednesday, June 9th, 2010

It’s been a while since Grilo was released and although Iago’s post announcing it, together with Grilo’s webpage, do a good job describing what Grilo is about, it seems many people out there still do not understand what Grilo is and what it isn’t. Hence, I wrote this non-technical post as an attempt to demystify Grilo.

Grilo means cricket in Galician
Grilo means cricket in Galician
(CreativeCommons photo by Danforth1)

What Grilo is

Nowadays, a number of online services provide a public API for application developers to retrieve those services’ information. YouTube lets you retrieve videos’ info by browsing or searching; Jamendo lets you retrieve its music and artists’ info in a similar way; and many more offer similar options.

Although many of these services offer a RESTful API, which already makes it easy, it is up to the applications’ developers to write code to access that API, process the results (usually XML) and build their applications’ own structures with the info. An alternative way is, of course, using an already existing library, suitable for the developers’ needs, but whose API might differ from other services’ libraries

Grilo exists to solve these issues.

Grilo has a number of plugins that retrieve media information from several services. It exposes that information in a consistent API so you don’t have to learn more than one way of getting that media’s info.
Although there are more plugins for online services, there are also plugins for UPnP or for the very filesystem.

For the examples given before, searching for media in YouTube or Jamendo would be as easy as calling a method on Grilo, either choosing to search in one, both or all available media sources.

The search would result in media objects whose information (metadata keys) can be previously configured.

So, this is a very basic definition of what Grilo is: a framework that retrieves content from various services.

What Grilo is not

One thing people often expect from Grilo is for it to play content. Well, Grilo does NOT play media and that’s a planned “misfeature”.

Grilo’s main purpose is to retrieve media, or better said, media information, and to do it well.
GStreamer is already here to play media and it does a wonderful job at it. Having Grilo to be a media player as well would deviate it from its specialization which would surely make it not suitable for some use cases.

Why should you care

More and more online services are being used in many platforms with applications being developed around them. Grilo eases the development of such applications.
For a media player dedicated to play videos from YouTube and Vimeo: Grilo gets you the videos’ URLs, GStreamer plays them and voila, you can focus on other implementation details.

Examples of applications that could have they’re job done easier would be Totem, Rhythmbox and Miro. For Totem and Rhythmbox, Rygel-Grilo (Grilo’s DBUS interface) has already shown (as a proof of concept) how easy it is to provide services as YouTube, SHOUTCast, Jamendo, filesystem’s media, and more, just in a fragment of the code needed to write a dedicated plugin for each of these services.
I put also Miro as an example application because it is a video and audio player strongly intimate with the web, Grilo could only make it easy to find these videos. Plus, Grilo’s podcast plugin could also be used to manage Miro’s video channels’ subscriptions.

As a different use case, a desktop like Meego’s, which integrates, for example, social services in it, could also integrate a way to search media, without the need to use the web browser.

So, summarizing, Grilo fills a gap in the media application development infrastructure; developers that are interested in integrating multimedia content in their applications could get an important benefit from using Grilo to access that content, and that’s why we encourage you to check it out

Caribou and Text Predictor Input Mode

Monday, April 5th, 2010

I have been wanting to show how Caribou can be used with the Text Predictor Input Mode I wrote a while ago and finally today I took the time to do it.

Caribou with Text Predictor Input Mode from Joaquim Rocha on Vimeo.

Okay, the shortcuts  to accept prediction candidates or scroll through them can be changed into some that are quickly accessible.
With the changes I did to Caribou, one can even easily provide a special button, such as “ACCEPT”, like the screenshot below shows:

Caribou with Accept key

The changes I’m talking about and that you see in the video and the QWERTY keyboard layout I used can be found in Caribou’s bug #613229.

I wrote these changes because the current way of writing layouts for Caribou doesn’t seem very flexible nor appropriate for non programmers, in my opinion.
These changes drop the current usage of Python files with tuples as a way to configure Caribou’s layouts. Instead, json files should be used and more functionality that wasn’t implemented before is also possible with the mentioned patch.

Basically, instead of having either character keys or symbol, label pairs that Caribou understands, each key should be a set of attributes that define it, which Caribou then interprets accordingly.

For a basic key, all one needs to have is the value attribute, which can receive a string (for example a character) or the name of a key in GDK (you can easily figure them out from the GDK key syms file).
So:

{”value”: “a”} will create a key labeled a that inputs the character a
{”value”: “BackSpace”} will create a backspace key but labeled with “BackSpace”

You can override the label of a key using the attribute “label”, as:

{”value”: “BackSpace”,
“label”: “⌫”}
will create a backspace key but labeled with “⌫”

Labels can use Pango Markup to change its text style, for example: {”label”: “<small><b>Small Bold Text Key</b></small>”, …}

A width attribute is also introduced and means the width relative to a usual key’s width. A width of 3 will generate a key that fills the space of 3 keys whereas 0.25 fills a quarter of a regular key’s space.

A key can be of a given type which indicates how it behaviors. There is 5 types of keys: normal, layout_switcher, preferences, mask and dummy.
A normal key type indicates it is a regular “you-press-you-input” key and is the default type, which is why it wasn’t specified in the examples above.
A layout_switcher key, when pressed, will change the keyboard sublayout to the one given by the value attribute (and must exist in the layout file), so, if we are in the “lowercase” layout and we want a key labeled “UP” to change to the “uppercase” layout:
{”label”: “UP”, “key_type”: “layout_switcher”, “value”: “uppercase”}

The preferences key type brings up the preferences menu.
A mask key means that you set a mask indicated by the value attribute when you press it. For the Alt key:
{”label”: “Alt”, “key_type”: “mask”, “value”: “mod1″} again, the “mod1″ is the mask name from GDK.

Finally, there’s the dummy key type which is used basically to set spacer keys and allow to separate some keys from others in order to improve visual grouping. Rows that don’t have the number of keys in any row (including dummy keys) will be centered horizontally.

These let you play with keyboards’ layouts and design any kind of layout in a flexible and easy way.

At the moment, the patch is still pending review. Let’s hope it gets a green light and is applied.

Text Prediction on GNOME

Wednesday, March 3rd, 2010

I was disappointed with the text completion provided by the N900 (eZiText) that, on top of that, is closed and I wondered if it was possible to have an Open Source solution to provide text prediction and completion.

I searched a bit and besides my original intentions of developing a library to search Free and Open Source dictionaries’ words from a prefix, I found Presage.
Presage is better than most text prediction systems I have seen out there because it really is text prediction, not text completion. This C++ library, retrieves words taking into account the surrounding text, not only the prefix or frequency of words. It uses a database representing N-grams that can be trained with more text; the more you train it, the more accurate it can be.

This means that is you type something like:
“I m”
instead of suggesting nonsense things like:
“I mouse” “I mother” “I market” or “I more
it suggests something more like:
“I must” “I met” “I mean” or “I might
The difference is obvious!

So I developed a little wrapper around Presage in C that provides a yet very basic API to get text completion. Then I created a GTK+ Input Method context to control the user’s input in regular GTK+ text widgets and used the wrapper to process the inputted text. I called it: Predictor Input Method (not very original I know…).
The result is that Predictor suggests you words, even if you type a prefix or not, and lets you accept the candidate word or scroll through a list of suggestions as you can see in the video below:

Text prediction in GNOME from Joaquim Rocha on Vimeo.

How to use it

The current key bindings are:

Ctrl+Enter -> Selects the current candidate
Ctrl+Up/Down -> Scrolls through the list of candidates
Backspace -> Deletes the character previous to the cursor and suggests again
Directional arrows -> Move cursor and discard suggestions

Who should use it

This kind of assistance technology can have many applications but the main ones are: the usage in small/mobile devices and the assistance of users with disabilities. Both have the same reasons behind: speeding the input and reducing failed characters, because the input required gets minimized;
Of course, you can as well use it in your GNOME desktop regularly for faster typing your emails, etc.

In the case of users with disabilities, a popup menu could be added to show a complete list of candidates and the bound fast-access keys.

Why is Free Software important in this

This is the kind of technology that everybody should have an interest in using a FOSS solution because of the obvious advantage that is developers from all over the world being able to modify it.
Suppose you’re creating a mobile phone and you choose a closed solution to provide text prediction for your phone. And then you find out you’re disappointing all your users from country X because that library you’re paying for does not support their language and the library owner is not interested that much in adding it. Now if you’re using an open solution, local communities from many places in the world can add support for their languages and your phone can have a better acceptance in places you hadn’t even imagined.

Software that reaches an international audience with different languages is software you want to have open.

How to get Predictor Input Method

You can find the Predictor Input Method’s source its Gitorious page: http://gitorious.org/text-predictor-input-method
Of course, you should also install Presage for it to work.

If you are not using GTK+ Input Methods then you can use the wrapper text-predictor.cpp which is not tight to the Input Method code itself. And of course, you can copy the little tricks used on the Input Method code and apply it to your source (like delaying the retrieval of the candidates some fractions of a second to not block the input, etc.).

Hope you like it.

OCRFeeder version 0.3 released

Friday, October 16th, 2009

Moments ago I released OCRFeeder v0.3!

This version contains several improvements like:

* A setup.py script that makes installation easier
* Zoom fit option to the zoom options and its usage when an image is loaded
* German translation
* Code improvements
* Better integration of the Tesseract OCR engine
* Better desktop integration by using an application icon and desktop file
* Updated instructions in the README file
* Corrected a few issues in the OCR engines manager dialog
* Corrected engine name access
* Fixed project being cleared whether a new project is successfully loaded or not
* Correct actions availability depending on the existence of images

A big thank you to Renard Voß who was kind enough to provide you all with a German translation.

Give it a try, either download the tarball release or clone the git repository.

You too can help: either submit bugs or feature requests or translate it to your language.

For the next release I’m thinking on having a deb package to make it even more easier to install OCRFeeder.

Stay tuned!

OCRFeeder Repository Relocation and Maemo Preview

Wednesday, October 7th, 2009

It’s been a while since I wrote my last post but I guess this one will compensate.

When I posted about how I made OCRFeeder run in Fremantle I said I wasn’t thinking of porting the application but in later talks with some people, it was clear that OCRFeeder might come in handy for some people.
One of the use cases that we have talked about was to be able to create a contact in the address book by recognizing the contact fields from a business card.

So, for some days in these last weeks, I’ve been porting OCRFeeder to Fremantle!
(The card-to-contact feature is still to come as I wanted to have OCRFeeder “fremantelized” before)

New Respository

I had been using git-svn to develop OCRFeeder and while this was okay when there was just a branch (trunk), with the Maemo version it was clear that Google Code’s SVN repository wasn’t enough. (Yes, I know they have mercurial but I’m git user)
So, yesterday I relocated OCRFeeder’s development to Gitorious where you’ll find the branch “maemo” besides the “master” one: http://gitorious.org/ocrfeeder

Development Notes

I must say that although I had for a long time used PyGTK for my UI code, on Hildon, I am more experienced in using C. While from the theory part this is the same, on the practical side, the PyMaemo bindings had some issues that delayed the development a bit (mainly undocumented functions that differ from the direct and expected usage, as well as some bugs I found).
I must thank Lizardo and other PyMaemo folks who were kind enough to help me every time I bugged them with questions and suggestions.

I think OCRFeeder for Maemo represents another example of how a desktop targetted application can be ported to Fremantle, specially from the design point of view. The chats I had with my friend and colleague Felipe (who, by the way, has just become a Master degree student in a in User-Centered Interactive Technologies) surelly helped in this matter.

Trying OCRFeeder for Maemo

Now, you can try to use OCRFeeder but you’ll have to first compile and install pygoocanvas and Tesseract or another OCR engine like I wrote here. I hope I have time to create deb packages for both pygoocanvas and Tesseract as they’re also very useful apps to have.

As a final note, I must say that although everything was working fine on Maemo 5.0 SDK beta 2, today the final SDK was released and I tested OCRFeeder on it… and not everything works well as before. The problems are mainly related to GtkTreeViews (Hildon style) which, from the C side seem to be working okay, but from the PyMaemo side seems not to obey the selection mode I assign to it.

Some Eye Candy…

OCRFeeder for Maemo preview from Joaquim Rocha on Vimeo.

Preferences dialog

Preferences dialog

Recognized page

Recognized page