Debian GNU/Linux clusters at Max Planck draft title

The [http://www.aei.mpg.de/english/research/teams/observationalRelativity/index.html Observational Relativity and Cosmology Research Group] is a team of scientists working at the [http://www.aei.mpg.de/hannover-en/66-contemporaryIssues/home/index.html Hannover Branch] of the [http://www.aei.mpg.de/english/contemporaryIssues/home/index.html Max Planck Institute for Gravitational Physics] (Albert Einstein Institute]) in [http://en.wikipedia.org/wiki/Hannover Hannover], [http://en.wikipedia.org/wiki/Germany Germany], and trying to detect [http://en.wikipedia.org/wiki/Gravitational_wave gravitational waves] directly, first [http://www.einstein-online.info/en/ predicted] by Albert Einstein. They are working with the friends and coleagues within the [http://www.ligo.org/ LIGO] Scientific Community.

The massive computing effort is done at the ATLAS Debian GNU / Linux 1342 node cluster, achieving measured performance in terms of [http://www.top500.org/ top500.org] linpack of 32.8 TFlops, with a theoretical peak of about 50 TFlops, using 10+ TB RAM, approximately 1.3 PB storage and a specialty network able to transfer almost 4 days worth of DVD movies per second (2880 Gb/s). This performance would place ATLAS Debian GNU / Linux cluster as 4th in Germany, 11th in Europe and 34th worldwide a [http://www.top500.org/ top500.org] november 2007 list with a cost of EUR 1.8m (~ US$ 2.8m).

The ATLAS Debian GNU / Linux cluster consists of 1342 [http://supermicro.com/ Supermicro] computer nodes ([http://www.intel.com/cd/channel/reseller/asmo-na/eng/products/server/processors/q3200/feature/index.htm Intel Xeon 3220] quad-cores 2,4 GHz, 8 GB RAM, 500 GB Hitachi HDD, IPMI remote management) along with 31 data servers (2x [http://www.intel.com/cd/products/services/emea/deu/processors/xeon5000/344530.htm Intel Xeon E5345] 2,33 GHz, 16 GB RAM, [http://www.areca.com.tw/ Areca] 1261ML, 16x750 GB Hitachi HDD) plus 4 similar head nodes with 4 x 750 GB HDD. Those are all running Debian GNU / Linux 4.0 Etch with a few modifications like custom kernel and Condor queuing system. Additional storage space is supplied by 13 [http://www.sun.com/servers/x64/x4500/ Sun Fire X4500] running Solaris 10. The system was built from off-the-shelf computers from a German company, [http://www.pyramid.de/ Pyramid Computer GmbH].

One of the many hardware specialties they have is the network from [http://www.wovensystems.com/ Woven Systems] which is a hierarchical fully non-blocking network. The EFX 1000 core switch features 144 10 Gb/s CX4 ports and connects currently to 32 TRX100 edge switches which feature 48 1 Gb/s ports and 4x10 Gb/s uplinks, reaching 2880 Gb/s. Also their Sun Fire X4500 are directly connected to the core switch.

The ATLAS Debian GNU / Linux cluster was designed, built and has been managed by [http://www.aei.mpg.de/hannover-en/09-staff/00-details/fehrmann/index.html Dr Henning Fehrmann] and [http://www.aei.mpg.de/hannover-en/09-staff/00-details/aulbert/index.html Dr Carsten Aulbert], who have been using Debian GNU / Linux for years. Its brother and sister systems in [http://en.wikipedia.org/wiki/Potsdam Potsdam], Germany, [http://gw.aei.mpg.de/resources/computational-resources/merlin-morgane-dual-compute-cluster "Merlin" and "Morgane"] are running Debian for years (one converted from RH 7.x at some point) and the experience with them had been very, very good, according to Dr. Aulbert.

"Debian features an extremely large set of packages, making it THE distro of choice for us keeping us out of the hassle to package needed software ourselves."

"Also Thomas Lange's [http://packages.debian.org/source/etch/fai FAI package] is extremely useful for automatic deployment of Debian. For example, without much tweaking and using only two hosts, we were able to reinstall the cluster in about 2.5 hours and were only limited by those two servers' network connection."

"Two weeks ago I would have written something about the very good security support, given that the reaction to the OpenSSL stuff was very good. I could still do, but in reality we don't need security updates except for the exposed nodes such as head nodes. Everything else is just visible internally."

> 4- What are the benefits of using Debian?

Partly covered above, maybe, one should add

* the simplicity of creating own packages

* how repositories can be set-up easily (we use reprepro)

* using clean build environments (pbuilder et al.)

* and, of course, the superb packaging infrastructure in general (aka dpkg/apt/aptitude)

> > 5- How Debian enabled such success / feature / business model, > etc?

Hard point to say anything. our colleagues are mainly using CentOS 5 or older Fedora versions, thus the cluster would also run with those, however, possibly with more work for us.

Personally, I like community distros more since they offer more long-term stability than a distro which is governed by the need of releasing often to generate revenue. Although on the downside it would be better for us to have a more settled release plan and/or some kind of "stable and supported" backports.

