Debian Hadoop packaging efforts

Debian currently does not include Hadoop packages. There are a number of reasons for this; in particular the Hadoop build process will load various dependencies via Maven instead of using distribution-supplied packages. Java projects like this are unfortunately not easy to package because of interdependencies; and unfortunately the Hadoop stack is full of odd dependencies (including Apache Forrest, which for a long time only worked with Java = 1.5).

If you want to build Debian packages, the most complete efforts can be found at the Apache Bigtop project . Unfortunately, the build process for these packages is currently of a disastrous quality, and should only be attempted within disposable virtual machines, as it requires root permissions and will install non-packaged software.

These will allow building Hadoop packages on a recent Debian system. However, the resulting packages do not live up to Debian quality standards. In particular, they include copies of .jar files from other packages, and will thus not benefit from security updates done to these packages.

If you are interested in getting Hadoop packages into Debian,

to avoid duplicate efforts. Thank you.


Information below is dated 2010

This page is used to track informations regarding the packaging of Hadoop in Debian



from the packaging repository, debian/TODO

Missing Dependencies

Solved by excluding parts of hadoop:

KosmoFS and jets3t (Amazons S3) are optional Filesystem implementations. It should be possible to build a Debian Hadoop package without them until they are packaged.

For Katta (not a part of hadoop):