Setting up an unofficial repository

for DebianScience users.

The discussion has started from this thread.

Motivation

Currently there are a fair number of repositories of science-related unofficial Debian packages out there. It might make sense to consolidate them into a single site. If you are the owner of one of thoses repositories, you might consider adding it to http://wiki.debian.org/UnofficialRepositories

Goals

Resources

Hosting considered

Potential contributors

Archive tool considered so far

Organisation of the archive

Similar archives

Best way to organize the archive by section?

By fields?

The usual divisions "main contrib non-free" are fine for Debian, but one of the main reasons an unofficial repository is needed is the often-poor state of care to licenses in scientific software that makes them unsuitable for Debian's archive. Probably the only software in "main" in the repo would be either things undergoing testing on their way to the Debian official archive, or Free software that's too obscure to package for Debian. (Kevin B. ?McCarty is thinking of CERN's "patchy" as an example for the latter.)

So Kevin B. ?McCarty was thinking perhaps a division by field makes more sense - "analysis astronomy biology chemistry physics" etc. A typical sources.list line might then look something like

deb http://www.debian-science.org/ physics analysis

Maybe some packages could be made available under more than one field (e.g. ROOT under both physics and analysis)? After all, ROOT and (e.g.) PAW aren't intrinsically physics software (unlike say GEANT), they're just traditionally used by physicists.

Some disagree: such an organisation is pretty unmanagable and reproduce a cloisonning that does more harm than good. Why shouldnt some visualisation tool developped for medical imagery not be suitable for geology? If we start subdividing the repository, it will be harder to seek for a given product (eg, an imagery tool), and defeat the main purpose of having a single archive. We may think again when we'll reach a few thousands different packages... Likely not now!

it would be a classification system

It would be another usage of the classification system available on installed systems to find applications on Debian and Ubuntu systems. Maybe also derived from the tags collected via the deb-tag browser. See discussion some weeks in the past on this list (may, 14 .. 17). What Thomas Walter has learned there, one should not overstress this. As you mention below, this has the risk of putting the same thing into several classes. Another idea came up to divide into something more task oriented like plotting, visualisation, data processing.

From Thomas Walter point of view, each has its pros and cons. So for a complete repository/site/.. choosing an easy maintainable structure may be the first goal. Maybe so simple as by A, B, C, ... Where each package is tagged with attributes mentioned above. And one can search or list the packages by tag.

Use a simple maintainable repo structure with a front-end showing the classification. Lots of tools are used / can be used by a very broad user base, some are very specific.

Keep the standard splitting

Brett Viren thinks it is best to keep the standard main/non-free/contrib splitting. This splitting was invented to make clear the free-ness of the code and not the types of packages. We may be asking for a lot of extra effort if we break that model. We might consider other types such as "no licence but the author says it's okay" (not sure how legal that is). Also, this is something he needs to take very seriously as it would be a huge problem if he serves copyright infringing files.

What Debian versions will be supported (or what Debian derivatives)?

Michael Hanke maintains some unofficial packages related to experimental psychology and MRI data analysis. From user feedback and the download stats he knows that people seem use my packages with sarge, etch and Ubuntu breezy and dapper more or less equally often. Moreover, a single lab often uses a mixture of the above. Therefore Michael Hanke tries to provide binary packages for all those distributions.

Christian T. Steigies does not think packages would have to be built for all Debian supported arches. As much as I defend the m68k arch, he does not think he would want to run geant4 there. In his opinion it would be best to concentrate on those arches, where the software would actually be used, amd64, i386, maybe powerpc and sparc. If somebody wants to run the software on a different arch, they can just grab the sources and when the Build-Depends are set correctly, it will be a piece of cake to build packages for that arch. It might only take some time, and if that takes too much time, maybe that arch is not really suitable for using that software.

Frédéric Lehobey says users expect packages available for several distributions (stable, testing, etch, Ubuntu current and previous one, even Fedora or whatever...)

Ubuntu

Michael Hanke knows that some people simply do not care about Ubuntu, but there is obviously a demand and most of the time porting a package to an Ubuntu release is just recompiling it.

What Michael Hanke wants to say is, that he would prefer a repository that provides packages for every distribution and platform that people (maintainers) are willing to support.

It makes sense to Kevin B. ?McCarty to provide both Debian and Ubuntu packages as long as someone can be found to build them. Maybe this could be a job for an automatic buildd network.

Jordan Mantha thinks Ubuntu Science team could perhaps help out. Right now, in Debian proper, he'd say about 1 in 10 packages from Debian need to get tweaked (mostly dependencies) to work on Ubuntu. Otherwise the Debian source packages are just rebuilt. If this idea takes off he'd imagine they could get some Ubuntu build machines. He's not sure if it would work (or even be wanted) to put them all in the same repo though.

As for the ubuntu/debian stable/unstable problem, Christian T. Steigies does not think that really is a problem. The debian buildds are building their packages in chroots, so you can build packages for stable, unstable, security on one machine and even concurrently if the CPU and RAM are sufficient. He is building the R backports for sarge and last week he started building for Ubuntu/dapper as well. You just need to setup the chroot and you're ready to build packages. Once you find out that you need to install the ubuntu debootstrap package to setup an ubuntu chroot, it works just as building packages for debian. If the packages are prepared well, we might even cross-build them, if you really want to build for 11 arches... but that is something he only tried for the kernel, he thinks this would be a lot more difficult for many other packages.

What are the requirements a package has to meet to be included in the repository (e.g. license)?

If a package is perfect in any sense it could obviously go directly into the Debian archive. Therefore the repository will contain imperfect packages and the question is what kind of imperfection is tolerated (lintian error, minor/major licensing issues, ...)?

Relaxation on licensing restrictions

Kevin B. ?McCarty expects that the main relaxation would be on licensing restrictions. (Not so relaxed that we might get sued of course!) As long as someone could get permission from upstream for the repository (and its mirrors?) to distribute a package, it could be put in the archive. This would help in the case of things like Pythia where the author doesn't bother to give his work a specific license, but might be willing to permit redistribution in .deb format if the source code is unchanged.

Stuff that has too little demand per unit filesize

Kevin B. ?McCarty thinks that the unofficial repository could include stuff that has too little demand per unit filesize to go into Debian proper even if freely licensed. This might include obscure specialized programs like CERN's Patchy, or large but specialized data files like those of GEANT4 or the HIPPARCOS astronomical catalogs.

Package quality

Ought to remain high

In Kevin B. ?McCarty opinion the quality of the packaging ought to remain high, for the sake of the reputation of the repository as a whole. This could therefore be a good opportunity to teach packaging skills.

Unstable?

Frédéric Lehobey says users accept some instability under the condition it is possible to come back to a previous working version. (Needs similar to backports.org).

Who will be able to upload packages?

If only DDs are able to upload packages the number of contributors is (unecessarily?) limited. But if the Debian-science repository aims to provide the same quality and security as the main archive, there is no way around it.

More open than the Debian archive

everybody gets upload rights

This is simple, but might be the source of serious trouble.

every DD has uploads rights, selected rights for others

Frédéric Lehobey says sponsoring (for Debian main) is a bottleneck for _scientific_ software (hence the debian-science repository should allow upload from non-DD). He suggest a policy like: every DD has upload rights, other non-DD might upload based on trust from the team running the debian-science repository.

a procedure similar to Alioth

Potential contributers explain what they want to provide and get upload rights if they provide a solid explanation. From that point on they have the right to upload new packages, but not to upload new versions of packages already in the archive where they are not (co-)maintainers. DDs might be an exception of the rule. This should not limit the number of contributors and introduces a minimal protection against bad guys.

The main disadvantage is that somebody has to implement this.

Kevin B. ?McCarty thinks this sounds very reasonable. He'd be scared to have anyone able to upload anything. It might even be worthwhile to implement a NEW queue of some kind to make sure that new uploads meet some minimum standard of quality.

people who build packages for the less common platforms

Every maintainer who contributes to the repository can't be expected to have a full set of {Debian, Ubuntu} x {stable, testing, unstable} x {i386, amd64, powerpc...} machines to build on. So there will have to be people who build packages for the less common platforms (either by admining a buildd, or else just manually by request as some have already volunteered for). These people will need to be highly trusted.

other best practices

Jordan Mantha suggests looking at other projects such as mentors.debian.net or even Ubuntu's REVU system might help.

same precautions as Debian does with PGP keys

Brett Viren (from BNL) can't subject BNL to a potential embarrassment due to serving packages with malicious code so unauthenticated uploads can't be supported. At the very least he thinks we must take the same precautions as Debian does with their PGP keys. This means face to face signing parties. This is probably easier for us w/in a particular scientific comunity that Debian as a whole since we tend to naturally mingle more. Maybe someone can look into other distributed authentication mechanism that would be equivalent (eg grid certs).

Alternative propositions

Web page with sources.list lines for all the known repositories

Russell Shaw suggests an easier way might be just to make a web page that has all the necessary sources.list lines for all the known repositories, so the user can just paste them into their sources.list and apt-get update. If the page was a wiki, newly found repositories could be added easily.

a package doing this

Another way would be to make a debian package that updates your sources.list with new sources.list lines for unofficial repositories. That way, you could do: apt-get install debian-science, and the package installation script will run apt-get update for you.

debian-unofficial

http://www.debian-unofficial.org ?

Mario Fux thinks that's exactly what we are searching.

not really

Not really. Because:

Debian Unofficial should be *only* a repository for three types of packages, which:

1. do violating allegedly software-patents 2. requires a contract with the vendor prior distribution 3. too restrictive licenses (e.g. keeping sources 6 month after distribution etc.)

Nothing at all (using existing infrastructures)

Daniel Baumann sees no reason to not upload any dfsg-compliant package, whetever it is science related or not, to Debian directly. If the package is upstream buggy, then it should be uploaded to experimental, rather than an unofficial repository.

If we are lacking sponsors, tell Daniel Baumann and he can help with sponsoring packages.

Apart from autobuilding issue Carlo U. Segre agrees with Daniel Baumann that putting the packages in the main distribution is the best policy.

Daniel Baumann notices non-free is not a problem in Debian, but undistributable it is. That means, even if a package qualifies for non-free, it's better to keep it in the non-free suite on debian.org than on any other unofficial/secondary repository.

Christian T. Steigies does not see the point in recreating what Debian has already, "just" for scientific software. Software packaged for debian should be part of debian, if it is in some other repo, it can be hard to find. Especially if you have to grab packages from a dozen different sources...

but non-free autobuilding?

See http://lists.debian.org/debian-science/2006/07/msg00027.html