Introduction

The current DDTP/DDTSS website is showing its age. On the backend its a sprawl of Perl and shell scripts which started nice but over the years have been adapted to meet changes in the Debian and the requirement of translators.

So in June 2011 a new project was started by Martijn van Oosterhout under the name DDTSS-Django which aimed to clean this mess by reimplementing the DDTSS using a proper ORM (SQLAlchemy) for the database interface and a proper framework (Django) for the web interface. This would allow us to much more easily implement the things translators have been asking for years. Originally the plan was the replace just the DDTSS but it quickly became clear that with the basic infrastructure in place the remainder of the DDTP could be easily replaced as well. This new project has mostly been written by the major maintainers of the current DDTP, Martijn and Michael Bramer (grisu).

What follows is an edited version of emails that have been sent to the debian-i18n mailing list describing various aspects the new system. Over time it should be edited to be more coherent.

Note that the title of this page, DDTP2 is really just a placeholder since we havn't thought of a better name.

There is no timeframe yet for completion. However, most of the system is functional and can be easily tested on any Debian machine. In the repository mentioned above there is a README which should be enough to get you up and running fairly quickly.

Goals

The initial goal is to make a system works just like the current one, but with a few new features that translators have been asking for for ages. However, there are several other goals.

Ideally we'd like to present a whole new interface that looks nicer. Anybody who would like to design a better interface is encourage to show their skills on the debian-i18n list. The new system uses a templating engine so you don't need any programming skills to contribute.

Translation acceptance

The current DDTSS provides extremely limited control over when a translation is considered to be accepted. To provide translators with the flexibility to control how translations a schema is provide that allow users to be grouped and controls over how each group can interact with the system.

There are several levels of authorisation:

people who create accounts.

policy decision, it does not have to mean the same for every language.

Assigned by admins. These people have complete control over language, including determining trusted users and translation policy, wordlists, etc.

languages but cannot affect policy. Note this permission is orthogonal to the others since these are non-overlapping rights.

The goal here is delegation. The people who administer the system as a whole should not be responsible for setting policy for individual languages. Hence each language-group needs to appoint some people who will be responsible for policy and each language-group can effect policy without dealing with admins.

Acceptance of translations works on a points basis (this is language-group policy) and you can also do access control. User are split in three groups:

For each group you can decide:

I believe it is flexible enough to cover all the various requests I've heard on this list. We shall see if it's enough..

Note that the system is flexible enough to have more translation models, (such as reviews only allowed by people whose name begins with 'j') but I'd rather not have more than necessary.

Status: Implemented

The mail interface

Looking on my list of things to do it says "mail interface" but I've just looked at the logs on ddtp.debian.net and it looks like the mail interface is almost unused and has been for a long time. This surprised me as I thought there were language teams using other systems for translations and using the mail interface to feed the translations back. Apparently not.

At the last sprint we discussed the mail interface briefly and it was the suggested that it would be useful to have one but that the current mail interface was not as useful as it could be. In particular it was suggested that if the interface worked with PO files that it would make people's lives much easier.

There has been some feedback from the few people who occasionally use the mail interface. The main reason is that updates that way bypass the normal review mechanism, which is useful for simple updates. However, it shouldn't really be possible to bypass the mechanism this way.

So the idea is to have the mail interface interact with the web frontend so that new transation add placed in the normal review process and that email can be used for review as well.

Being able to bypass the review mechanism is a requested feature, so the proposal is to extend the above translation acceptance so that certain users can be allowed to do a "force submit".

On the question of PO files there seemed to be some agreement that it would be better, however it was not a priority.

Status: idea

Importing of Packages files

After being told that the old import Perl process would not be acceptable on the new ddtp.debian.net (due to the error logs it produces) I've been testing the new script I wrote. It's currently faster, though it takes about 5.5 hours to do what the old script did in 10, while it's importing 15 architectures instead of just 11. But it also does less, especially with respect to milestones.

This is still a long time, so I had some ideas about how to improve this:

1. Not every architecture every day. How often does it happen that a package only exists on one architecture? Does it matter if some architectures are only imported once a week?

Turns out no-one has any objections to this.

The Packages files are all different sizes so there must be a difference, but is it significant from the point of view of translations?

2. Only import changes. Each day there are Packages.diff files produced with just the changes from the previous day. In theory you could use this to just import the packages that have changed.

This is a technically feasable option, however I have heard that people are trying to get rid of the Packages.diff files and that we shouldn't rely on them.

A side-effect would be that the "description was in distribution X" timestamps would no longer be wholly accurate. Would need to deal with this some other way.

3. Non-free files are not imported. This is due to the fact that the description may not be allowed to be translatable.

Status: worksm but a work in progress.

Relation to ddtp.debian.org

Much of the infrastructure for supporting translators has historically been on non-DSA machines. A lot was hosted on a machine named churro. However, over the years this machine was not always as reliable. Hence it was decided to move the infrastructure over to DSA machines. The majority of this work was done during the I18n/Sprint2012.

This new DDTP project has not much to do with ddtp.debian.org as such. After the move it became clear that parts of the old system (for example, te import process) could not be used on the new server. However, the server is not intended to be used for beta testing services. Hence currently the ddtp.debian.org is not used pending either the completion of the new system, or (unlikely) someone stepping up to make the old system work.

This is all however coincidental. The new DDTP/DDTSS was envisioned without considering the possibility of a new server. The most likely step from here is to setup the new DDTP on another server somewhere so it can be tested by the users (i.e. translators) and only once it is considered an acceptable replacement for the current DDTP will it be transferred to ddtp.debian.org.

Authentication

It was suggested that using a federated login, such as OpenID would be useful, to avoid the Yet Another Password problem. This is fairly easy to implement.

There were some comments about being able to use your Alioth login. However, setting up federated login with seems to be more difficult. Someone on the list asked the alioth admins but apparently got no response.

Status: Almost done

Fuzzy matching

To help translators out there is a fuzzy matching algorithm which, for parts or a translation which have not yet been translated, attempts to find other translated parts that resemble the one in the current translation.

The algorithm is fairly simple.

1. Find all descriptions which share a paragraph with the current description.

This part isn't foolproof, as the part_description_tb table is not complete. This is to capture the case where a library changed a version number but the rest of the description stayed the same.

2. Find all descriptions which share a package or source package name with any of the descriptions found in the first step.

Again, packages from the same source are more likely to have similar descriptions.

3. For each paragraph of a description found in the second step, find the best match which is no more than 20% different from the paragraph being translated.

The results so far seem ok, but it has yet to be used in practice.

Status: Implemented