Differences between revisions 2 and 14 (spanning 12 versions)
Revision 2 as of 2012-08-03 22:21:59
Size: 4291
Editor: ?MartijnVanOosterhout
Comment:
Revision 14 as of 2016-12-21 08:28:50
Size: 11115
Editor: LauraArjona
Comment: ddtp.debian.net -> ddpt2.debian.net (plus http -> https)
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
The current [[http://ddtp.debian.net|DDTP/DDTSS]] website is showing its age. On the backend its a sprawl of Perl and shell scripts which started nice but over the years have been adapted to meet changes in the Debian and the requirement of translators. <<TableOfContents()>>
Line 3: Line 3:
So a new project has been started under the name [[https://github.com/kleptog/DDTSS-Django|DDTSS-Django]] which aim to clean this mess by reimplementing the DDTP using a proper ORM (SQLAlchemy) for the database interface and a proper framework (Django) for the web interface. This will allow us to must more easily implement the things translators have been asking for years. See also [[DDTP/Future]]
Line 5: Line 5:
What follows is an edited version of the emails the have been sent to the debian-i18n mailing list describing various aspects the new system. = Introduction =
The current [[https://ddtp2.debian.net|DDTP/DDTSS]] website is showing its age. On the backend its a sprawl of Perl and shell scripts which started nice but over the years have been adapted to meet changes in the Debian and the requirement of translators.

So in June 2011 a new project was started by Martijn van Oosterhout under the name [[https://github.com/kleptog/DDTSS-Django|DDTSS-Django]] which aimed to clean this mess by reimplementing the DDTSS using a proper ORM (SQLAlchemy) for the database interface and a proper framework (Django) for the web interface. This would allow us to much more easily implement the things translators have been asking for years. Originally the plan was the replace just the DDTSS but it quickly became clear that with the basic infrastructure in place the remainder of the DDTP could be easily replaced as well. This new project has mostly been written by the major maintainers of the current DDTP, Martijn and Michael Bramer (grisu).

What follows is an edited version of emails that have been sent to the debian-i18n mailing list describing various aspects the new system. Over time it should be edited to be more coherent.

Note that the title of this page, DDTP2 is really just a placeholder since we haven't thought of a better name.

Note that the current ddpt2.debian.net system is the same as was ddtp.debian.net (only changed the DNS), no the proposed "DDTP2".

There is no timeframe yet for completion. However, most of the system is functional and can be easily tested on any Debian machine. In the repository mentioned above there is a [[https://github.com/kleptog/DDTSS-Django/blob/master/README|README]] which should be enough to get you up and running fairly quickly.

= Goals =

The initial goal is to make a system works just like the current one, but with a few new features that translators have been asking for for ages. However, there are several other goals.

 * Easier to maintain
 * Easier to test/setup. Anyone should be able to run it on their own machine. This makes development easier.
 * Delegation of responsibilities. Currently a handful of people can make changes. The goal is that language teams should be able to manage themselves.

Ideally we'd like to present a whole new interface that looks nicer. Anybody who would like to design a better interface is encourage to show their skills on the debian-i18n list. The new system uses a templating engine so you don't need any programming skills to contribute.
Line 14: Line 35:
 * Logged-in, but with no special rights. This is the default for
people who create accounts.
 * Trusted. As determined by language coordinator. What this means is a
policy decision, it does not
have to mean the same for every language.
 * Coordinator. Each language will have one of more coordinators.
Assigned by admins. These people have complete control over language,
including determining trusted users and translation policy, wordlists,
etc.
* Admin, system administrator. They can assign coordinators and create
languages but cannot affect policy. Note this permission is orthogonal
to the others since these are non-overlapping rights.
 * Logged-in, but with no special rights. This is the default for people who create accounts.
 * Trusted. As determined by language coordinator. What this means is a policy decision, it does not have to mean the same for every language.
 * Coordinator. Each languag
e will have one of more coordinators. Assigned by admins. These people have complete control over language,
including determining trusted users and translation policy, wordlists, etc.
 * Admin, system administrator. They can assign coordinators and create languages but cannot affect policy. Note this permission is orthogonal to the others since these are non-overlapping rights.
Line 26: Line 41:
The goal here is delegation. The people who administer the system as a
whole should not be responsible for setting policy for individual
languages. Hence each language-group needs to appoint some people who
will be responsible for policy and each language-group can effect
policy without dealing with admins.
The goal here is delegation. The people who administer the system as a whole should not be responsible for setting policy for individual languages. Hence each language-group needs to appoint some people who will be responsible for policy and each language-group can effect policy without dealing with admins.
Line 32: Line 43:
Acceptance of translations works on a points basis
(this is language-group policy) and you can also do access control.
User are split in three groups:
Acceptance of translations works on a points basis (this is language-group policy) and you can also do access control. User are split in three groups:
Line 45: Line 54:
I believe it is flexible enough to cover all the various requests I've
heard on this list. We shall see if it's enough..
I believe it is flexible enough to cover all the various requests I've heard on this list. We shall see if it's enough..
Line 48: Line 56:
Note that the system is flexible enough to have more translation
models, (such as reviews only allowed by people whose name begins with
'j') but I'd rather not have more than necessary.
Note that the system is flexible enough to have more translation models, (such as reviews only allowed by people whose name begins with 'j') but I'd rather not have more than necessary.
Line 52: Line 58:
Status: Implemented
Line 54: Line 61:
Looking on my list of things to do it says "mail interface" but I've
just looked at the logs on ddtp.debian.net and it looks like the mail
interface is almost unused and has been for a long time. This surprised me as
I thought there were language teams using other systems for
translations and using the mail interface to feed the translations
back. Apparently not.
Looking on my list of things to do it says "mail interface" but I've just looked at the logs on ddtp.debian.net and it looks like the mail interface is almost unused and has been for a long time. This surprised me as I thought there were language teams using other systems for translations and using the mail interface to feed the translations back. Apparently not.
Line 61: Line 63:
At the last sprint we discussed the mail interface briefly and it was
the suggested that it would be useful to have one but that the current
mail interface was not as useful as it could be. In particular it was
suggested that if the interface worked with PO files that it would
make people's lives much easier.
At the last sprint we discussed the mail interface briefly and it was the suggested that it would be useful to have one but that the current mail interface was not as useful as it could be. In particular it was suggested that if the interface worked with PO files that it would make people's lives much easier.
Line 69: Line 67:
So the idea is to have the mail interface interact with the web frontend so that new transation add placed in the normal review process and that email can be used for review as well. So the idea is to have the mail interface interact with the web frontend so that new translation add placed in the normal review process and that email can be used for review as well.
Line 72: Line 70:

On the question of PO files there seemed to be some agreement that it would be better, however it was not a priority.

Status: idea
== Importing of Packages files ==

After being told that the old import Perl process would not be acceptable on the new ddtp.debian.net (due to the error logs it produces) I've been testing the new script I wrote. It's currently faster, though it takes about 5.5 hours to do what the old script did in 10, while it's importing 15 architectures
instead of just 11. But it also does less, especially with respect to
milestones.

This is still a long time, so I had some ideas about how to improve this:

1. Not every architecture every day. How often does it happen that a package only exists on one architecture? Does it matter if some architectures are only imported once a week?

Turns out no-one has any objections to this.

The Packages files are all different sizes so there must be a difference, but is it significant from the point of view of translations?

2. Only import changes. Each day there are Packages.diff files produced with just the changes from the previous day. In theory you could use this to just import the packages that have changed.

This is a technically feasable option, however I have heard that people are trying to get rid of the Packages.diff files and that we shouldn't rely on them.

A side-effect would be that the "description was in distribution X" timestamps would no longer be wholly accurate. Would need to deal with this some other way.

3. Non-free files are not imported. This is due to the fact that the description may not be allowed to be translatable.

Status: worksm but a work in progress.

== Relation to ddtp.debian.org ==

Much of the infrastructure for supporting translators has historically been on non-DSA machines. A lot was hosted on a machine named churro. However, over the years this machine was not always as reliable. Hence it was decided to move the infrastructure over to DSA machines. The majority of this work was done during the [[I18n/Sprint2012]].

This new DDTP project has not much to do with ddtp.debian.org as such. After the move it became clear that parts of the old system (for example, the import process) could not be used on the new server. However, the server is not intended to be used for beta testing services. Hence currently the ddtp.debian.org is not used pending either the completion of the new system, or (unlikely) someone stepping up to make the old system work.

This is all however coincidental. The new DDTP/DDTSS was envisioned without considering the possibility of a new server. The most likely step from here is to setup the new DDTP on another server somewhere so it can be tested by the users (i.e. translators) and only once it is considered an acceptable replacement for the current DDTP will it be transferred to ddtp.debian.org.

== Authentication ==

It was suggested that using a federated login, such as OpenID would be useful, to avoid the Yet Another Password problem. This is fairly easy to implement.

There were some comments about being able to use your Alioth login. However, setting up federated login with seems to be more difficult. Someone on the list asked the alioth admins but apparently got no response.

Status: Almost done

== Fuzzy matching ==

To help translators out there is a fuzzy matching algorithm which, for parts
or a translation which have not yet been translated, attempts to find other
translated parts that resemble the one in the current translation.

The algorithm is fairly simple.

1. Find all descriptions which share a paragraph with the current description.

This part isn't foolproof, as the part_description_tb table is not complete. This is to capture the case where a library changed a version number but the rest of the description stayed the same.

2. Find all descriptions which share a package or source package name with any of the descriptions found in the first step.

Again, packages from the same source are more likely to have similar descriptions.

3. For each paragraph of a description found in the second step, find the best match which is no more than 20% different from the paragraph being translated.

The results so far seem ok, but it has yet to be used in practice.

Status: Implemented

== Milestones ==

''This part isn't from an email, it is my description of what Grisu built''

The old DDTSS, when the pending translations was low, selected a new description based on a priority ranking. This priority ranking did not really match what many groups wanted, leading to people writing scripts to fetch particular translations.

The idea behind milestones is that language groups and individuals can indicate which types of packages they wish to assign priority to. These milestones could be either dabtags, priorities, sections or characteristics like "this package has been translated before".

Status: built, been needs sanding of the rough edges.

See also DDTP/Future

Introduction

The current DDTP/DDTSS website is showing its age. On the backend its a sprawl of Perl and shell scripts which started nice but over the years have been adapted to meet changes in the Debian and the requirement of translators.

So in June 2011 a new project was started by Martijn van Oosterhout under the name DDTSS-Django which aimed to clean this mess by reimplementing the DDTSS using a proper ORM (SQLAlchemy) for the database interface and a proper framework (Django) for the web interface. This would allow us to much more easily implement the things translators have been asking for years. Originally the plan was the replace just the DDTSS but it quickly became clear that with the basic infrastructure in place the remainder of the DDTP could be easily replaced as well. This new project has mostly been written by the major maintainers of the current DDTP, Martijn and Michael Bramer (grisu).

What follows is an edited version of emails that have been sent to the debian-i18n mailing list describing various aspects the new system. Over time it should be edited to be more coherent.

Note that the title of this page, DDTP2 is really just a placeholder since we haven't thought of a better name.

Note that the current ddpt2.debian.net system is the same as was ddtp.debian.net (only changed the DNS), no the proposed "DDTP2".

There is no timeframe yet for completion. However, most of the system is functional and can be easily tested on any Debian machine. In the repository mentioned above there is a README which should be enough to get you up and running fairly quickly.

Goals

The initial goal is to make a system works just like the current one, but with a few new features that translators have been asking for for ages. However, there are several other goals.

  • Easier to maintain
  • Easier to test/setup. Anyone should be able to run it on their own machine. This makes development easier.
  • Delegation of responsibilities. Currently a handful of people can make changes. The goal is that language teams should be able to manage themselves.

Ideally we'd like to present a whole new interface that looks nicer. Anybody who would like to design a better interface is encourage to show their skills on the debian-i18n list. The new system uses a templating engine so you don't need any programming skills to contribute.

Translation acceptance

The current DDTSS provides extremely limited control over when a translation is considered to be accepted. To provide translators with the flexibility to control how translations a schema is provide that allow users to be grouped and controls over how each group can interact with the system.

There are several levels of authorisation:

  • Unauthenticated (IP-user), dummy user with no authorisation.
  • Logged-in, but with no special rights. This is the default for people who create accounts.
  • Trusted. As determined by language coordinator. What this means is a policy decision, it does not have to mean the same for every language.
  • Coordinator. Each language will have one of more coordinators. Assigned by admins. These people have complete control over language,

including determining trusted users and translation policy, wordlists, etc.

  • Admin, system administrator. They can assign coordinators and create languages but cannot affect policy. Note this permission is orthogonal to the others since these are non-overlapping rights.

The goal here is delegation. The people who administer the system as a whole should not be responsible for setting policy for individual languages. Hence each language-group needs to appoint some people who will be responsible for policy and each language-group can effect policy without dealing with admins.

Acceptance of translations works on a points basis (this is language-group policy) and you can also do access control. User are split in three groups:

  • Anonymous
  • Logged-in
  • Trusted/Coordinators

For each group you can decide:

  • not allowed to translate/review
  • allowed to translated/review, but no affect on acceptance
  • allowed to translated/review and may trigger acceptance

I believe it is flexible enough to cover all the various requests I've heard on this list. We shall see if it's enough..

Note that the system is flexible enough to have more translation models, (such as reviews only allowed by people whose name begins with 'j') but I'd rather not have more than necessary.

Status: Implemented

The mail interface

Looking on my list of things to do it says "mail interface" but I've just looked at the logs on ddtp.debian.net and it looks like the mail interface is almost unused and has been for a long time. This surprised me as I thought there were language teams using other systems for translations and using the mail interface to feed the translations back. Apparently not.

At the last sprint we discussed the mail interface briefly and it was the suggested that it would be useful to have one but that the current mail interface was not as useful as it could be. In particular it was suggested that if the interface worked with PO files that it would make people's lives much easier.

There has been some feedback from the few people who occasionally use the mail interface. The main reason is that updates that way bypass the normal review mechanism, which is useful for simple updates. However, it shouldn't really be possible to bypass the mechanism this way.

So the idea is to have the mail interface interact with the web frontend so that new translation add placed in the normal review process and that email can be used for review as well.

Being able to bypass the review mechanism is a requested feature, so the proposal is to extend the above translation acceptance so that certain users can be allowed to do a "force submit".

On the question of PO files there seemed to be some agreement that it would be better, however it was not a priority.

Status: idea

Importing of Packages files

After being told that the old import Perl process would not be acceptable on the new ddtp.debian.net (due to the error logs it produces) I've been testing the new script I wrote. It's currently faster, though it takes about 5.5 hours to do what the old script did in 10, while it's importing 15 architectures instead of just 11. But it also does less, especially with respect to milestones.

This is still a long time, so I had some ideas about how to improve this:

1. Not every architecture every day. How often does it happen that a package only exists on one architecture? Does it matter if some architectures are only imported once a week?

Turns out no-one has any objections to this.

The Packages files are all different sizes so there must be a difference, but is it significant from the point of view of translations?

2. Only import changes. Each day there are Packages.diff files produced with just the changes from the previous day. In theory you could use this to just import the packages that have changed.

This is a technically feasable option, however I have heard that people are trying to get rid of the Packages.diff files and that we shouldn't rely on them.

A side-effect would be that the "description was in distribution X" timestamps would no longer be wholly accurate. Would need to deal with this some other way.

3. Non-free files are not imported. This is due to the fact that the description may not be allowed to be translatable.

Status: worksm but a work in progress.

Relation to ddtp.debian.org

Much of the infrastructure for supporting translators has historically been on non-DSA machines. A lot was hosted on a machine named churro. However, over the years this machine was not always as reliable. Hence it was decided to move the infrastructure over to DSA machines. The majority of this work was done during the I18n/Sprint2012.

This new DDTP project has not much to do with ddtp.debian.org as such. After the move it became clear that parts of the old system (for example, the import process) could not be used on the new server. However, the server is not intended to be used for beta testing services. Hence currently the ddtp.debian.org is not used pending either the completion of the new system, or (unlikely) someone stepping up to make the old system work.

This is all however coincidental. The new DDTP/DDTSS was envisioned without considering the possibility of a new server. The most likely step from here is to setup the new DDTP on another server somewhere so it can be tested by the users (i.e. translators) and only once it is considered an acceptable replacement for the current DDTP will it be transferred to ddtp.debian.org.

Authentication

It was suggested that using a federated login, such as OpenID would be useful, to avoid the Yet Another Password problem. This is fairly easy to implement.

There were some comments about being able to use your Alioth login. However, setting up federated login with seems to be more difficult. Someone on the list asked the alioth admins but apparently got no response.

Status: Almost done

Fuzzy matching

To help translators out there is a fuzzy matching algorithm which, for parts or a translation which have not yet been translated, attempts to find other translated parts that resemble the one in the current translation.

The algorithm is fairly simple.

1. Find all descriptions which share a paragraph with the current description.

This part isn't foolproof, as the part_description_tb table is not complete. This is to capture the case where a library changed a version number but the rest of the description stayed the same.

2. Find all descriptions which share a package or source package name with any of the descriptions found in the first step.

Again, packages from the same source are more likely to have similar descriptions.

3. For each paragraph of a description found in the second step, find the best match which is no more than 20% different from the paragraph being translated.

The results so far seem ok, but it has yet to be used in practice.

Status: Implemented

Milestones

This part isn't from an email, it is my description of what Grisu built

The old DDTSS, when the pending translations was low, selected a new description based on a priority ranking. This priority ranking did not really match what many groups wanted, leading to people writing scripts to fetch particular translations.

The idea behind milestones is that language groups and individuals can indicate which types of packages they wish to assign priority to. These milestones could be either dabtags, priorities, sections or characteristics like "this package has been translated before".

Status: built, been needs sanding of the rough edges.