Open Source Translation Database: Interview with Andrew Smith

Globe, close-up

Open Source Translation Database

The Powerbase: Your latest project is the Open Source Translation Database, can you describe the goal of the project and how it works?

Andrew: Getting started with translating your software is a lot of work. You can find a guide and some examples and get your code ready for translations, but the hard part is actually finding translators.

I’m hoping this service will encourage both open source software maintainers and translators to make more software available in more languages. Maintainers will get some free (as in, little work required) translations and translators will have something to start with, instead of just a POT file (the English template).

The Powerbase:  How is this different than the way programs are translated into different languages normally? Is OSTD anything like Launchpad Translations run by Canonical?

Andrew:  I have great respect for what Canonical is doing, including their Launchpad Software Translations project. Speaking as an outsider I’ll guess that they try to create, maintain, and grow communities of translators who will help translate different software.

That’s great but quite different from what the OSTD is offering – a quick, simple, automated way to get many partial translations of your software.

I think the web is big enough for both of us, and I’ll be happy to set up a relationship with Canonical if it’s beneficial to our end goals (theirs and mine the same, the processes are different).

The Powerbase: How accurate would you say your method is? Do that many programs really share similar strings?

Andrew: Thanks to Christian Perrier from debian-i18n I have managed to import about 75% of all the translations from Debian (over 10 million strings) into the OSTD.

As the import progresses I look at my logs and see that typically 10 to 15% of the translated strings in one software already exist in the database. It’s not proper statistics but I’d guess that’s how much a random software maintainer would get for a new PO file.

In fact I’m pretty sure that a lot of the software packages in Debian (where most of the OSTD translations are from) can get a lot of new translations from the OSTD. It’s a fascinating discovery, if you can wrap your head around it.

That wasn’t a complete surprise to me. Everyone can think of the simple example of “File” or “Properties”, but even more unusual strings such as “File size: ” have surely been used by someone else. And likely someone already translated it, probably several people for different software. Why not pool the efforts, and make those translations available easily for everyone?

Translations

Translating the common phrase "Open File"

The Powerbase: It seems like translations into as many languages as possible were big goals for you in Asunder and ISOMaster as well. How did your experience getting those applications translated manually influence the design and goals of OSTD?

Andrew: Getting ISO Master and Asunder translated took a lot of work. I had to troll various forums (sometimes in languages I don’t understand), email random strangers, and do everything I could to recognize the translators’ contributions. I was very happy with the results (40 and 35 full translations), and my software must be usable and used by ten times more people all around the world now.

What I’m trying to do with the OSTD is give software maintainers a boost with translations, that feeling I got when I received my first PO file, the bragging rights I got when I had more than 30 translations. And hopefully when a translator sees a partially translated piece of software it will take less effort to convince them to translate more of it.

Using The Service

The Powerbase: Could you briefly explain the .PO and .POT files used by the OSTD, and how they work? Is this the only way to translate strings with the OSTD?

Andrew: .POT files are the templates containing only English strings – users will upload one to get translations back. Translated files (English/other string pairs) are .PO files. This is the most popular internationalization system used on Linux, called Gettext.

You can also translate a single English string, I expect most people will use that to decide what English string to use in their software.

I am very interested in adding support for other systems, especially for Android string.xml files, but that will have to wait a little.

The Powerbase: The feature where you can see which program the translated string came from is especially nice. Are you considering adding the ability to sort and filter results via the program name or type of program (text editor, media player, etc)?

Andrew: First I want to make sure that all the software where that translation is used shows up as the source, that way people will more likely know the context of the string. I don’t think I’ll be adding filtering by source any time soon, though at some point perhaps I will.

The Powerbase:  What is the advantage of making an account on the OSTD? Will an account be required to use the service?

Andrew: I hate services that make you sign up for the sake of signing up, and will do my best to avoid that. Right now the only feature that requires logging in is uploading already translated .PO files. There are other features coming such as notifications when new translations become available which will need an email address.

The Powerbase: Will you be attempting to charge for this service in any way?

Andrew: I’m not planning to, mostly because I don’t see how I could. I think this project will be purely community service and I hope I will get enough thank you emails and postcards to keep me interested in maintaining it long term.

Or perhaps after a couple of years I can merge it with one of the bigger established communities such as Canonical or Fedora or Debian.

The Powerbase: What can the community do you help the project? What are you looking for right now in terms of assistance or improvements with the service?

Andrew: Try it! Use it! Tell your friends about it! I would love to hear suggestions for improvement, and I’ll be happy to hear about every open source project that it was useful for.

Big thanks to Andrew for taking the time to answer a few questions for us. Be sure to check back for our continuing coverage of the Open Source Translation Database as it evolves.


Tom Nardi

Tom is a Network Engineer with focus on GNU/Linux and open source software. He is a frequent submitter to "2600", and maintains a personal site of his projects and areas of research at: www.digifail.com .

Related posts

Top