Debian eScience with myGrid and Taverna

Introduction

The term eScience (or e-Science) describes data and CPU-time intensive research that is most likely to be performed through the integration of resources throughout the Internet. Well, it may be departments in larger corporations or collaborating universities. The term is related to computational grids but today's understanding rather associates web services. The United Kingdom has invested substantial resources towards the development of an IT infrastructure for eScience applications and other countries around the globe have followed suit. The most prominent outcome is the [http://www.mygrid.org.uk myGrid] (www.mygrid.org.uk) effort with its workflow tool [http://taverna.sf.net Taverna] ([http://taverna.sf.net taverna.sf.net]).

This page describes the effort to adopt the development of the myGrid eScience project for the Debian Linux distribution. An ["Alioth"] project ([http://alioth.debian.org/projects/pkg-escience/ pkg-escience]) has just been created.

Motivation for Debian Packaging

The DebianScience special interest group describes and provides resources for scientific computing with Debian and DebianMed, a CustomDebian distribution strives to render Debian a one-stop-shop for biomedical applications which also comprises Bioinformatics. Pkg-eScience understands itself as a dedicated effort contributing its bits to the prior two. If all works out nicely, then an easier provisioning of scientific services is possible by linking Debian-based developments via web services and myGrid to the world. Conversely, all myGrid services - the focus is yet on [http://en.wikipedia.org/wiki/Bioinformatics bioinformatics] but is not technically constrained to such - will be avaiable to Debian researchers.

The package taverna is useful now since one does not require to set up any local services to join Grid use and development.

Installation

In order to retrieve the packages created in this project for your local Debian machine (which is suggested to run testing or unstable) please add the following to /etc/apt/sources.list:

deb http://pkg-escience.alioth.debian.org/debian ./
deb-src http://pkg-escience.alioth.debian.org/debian ./

For the Sun JDK also add

deb http://ftp.de.debian-unofficial.org/debian/ unstable main contrib non-free
deb http://ftp.de.debian-unofficial.org/debian/ testing main contrib non-free

Try apt-get install taverna. Problems may occur if you are running Debian stable. If so, you may want to investigate if Debian ["Backports"] ([http://www.backports.org www.backports.org]) has more recent libraries. Please give respective feedback. To contribute to the packaging or to perform changes to the upstream sources please compare with the section "Installation from Source" at the Debian Wiki pages of ["BOINC"].

Work to be done

Direct adoption of upstream packages

The sources provided by the upstream developers can be installed on Debian machines without any difficulty since Linux is a common operating system among them. It is however far from being acceptable for inclusion with the Debian main distribution. For the most pragmatic adoption for Debian the direct results of the compilation of the upstream source can be taken.

Issues for compliance with DFSG and Debian Policy

Package-specific TODO list

Taverna

EnsJ

  W: ensj: binary-without-manpage id_mapping_cleanup.sql
  W: ensj: binary-without-manpage run_idmapping.sh
  W: ensj: binary-without-manpage run_idmapping_gui.sh
  W: ensj: binary-without-manpage run_probeset_2_transcript_mapping.sh
  W: ensj: executable-not-elf-or-script ./usr/bin/id_mapping_cleanup.sql

MartJ

BioJava

bytecode

freefluo

uddi4j

icu4j

wsdl4j

Overview on status of packages

Core packages

Package

apt

svn

Comments

DFSG

taverna

x

x

current Taverna 1.0 CVS, apparently works

no

mygrid

-

-

Otherwise missing libraries

Package

apt

svn

Comments

DFSG

ensj

x

x

compiles with Taverna

no

martj

x

x

compiles with Taverna

no

biojava

x

x

compiles with Taverna

almost

bytecode

x

x

compiles with Taverna

almost

freefluo

x

x

compiles with Taverna

no

uddi4j

x

x

compiles with Taverna

almost

icu4j

x

x

works with Taverna

almost

wsdl4j

x

x

untested

almost

How to contribute

Technical issues

Communication with upstream sources

Much in contrast with the general philosophy of Debian packaging, pkg-escience for now strives to use the reasonably latest upstream source.

Preparation of .orig.tar.gz

If a stable release of the upstream work is used, the orig.tar.gz is exactly that. Otherwise, such a file should be created dynamically from upstream's CVS or SVN repository. The following script performs this task for taverna:

# Script to update upstream CVS source,
# which is supposed to be existing
# locally in cvs_source/taverna1.0,
# and to prepare the .orig.tar.gz from it.

TARFILENAME=taverna_1.3.orig.tar.gz
CVSSOURCEDIR=cvs_source
TAVERNADIR=taverna1.0

(
        cd $CVSSOURCEDIR \
        && ( cd $TAVERNADIR && cvs update . ) \
        && tar czvf $TARFILENAME --exclude=CVS $TAVERNADIR
) && mv cvs_source/$TARFILENAME .

Checkout of latest alioth svn changes

 svn co svn+ssh://youraliothID@svn.debian.org/svn/pkg-escience/taverna

Use of svn-buildpackage

One changes the current working directory into the directory of the {{{svn-buildpackage --svn-dont-purge --svn-dont-clean \ --svn-reuse -rfakeroot}}} Please consider to add -kkeyid should buildpackage habe problems to find the right gpg key.

How to upload the packaging of a new package to svn

New packages are first submitted to the alioth svn

with svnrepos=svn+ssh://youraliothID@svn.debian.org/svn/pkg-escience.

And on alioth.debian.org

Maintenance of home page

The emphasis of the web pages describing should be on these wiki pages. If you feel inclined to update the project home page, then please do so by loggin in via

$ ssh youraliothID@alioth.debian.org
$ [ -x pkg-escience ] || \
ln -s  \
  /org/alioth.debian.org/chroot/home/groups/pkg-escience .
$ cd pkg-science/htdocs

That directory contains the [http://pkg-escience.alioth.debian.org index.html] which can be edited ad libido and the subfolder debian that harbors the:

apt repository

As a start, only a repository for Debian all is planned. Help to set this up properly across architectures is welcome. To upload new packages first create a new subfolder for the package

$ ssh youraliothID@alioth.debian.org \
  "mkdir pkg-escience/htdocs/debian/packagename"

then scp the files to the destination

$ scp taverna_1.3.orig.tar.gz taverna_1.3-1.cvs20060423* \ youraliothID@alioth.debian.org:pkg-escience/htdocs/debian/packagename/

and finally update the index files.

$ cat update.sh
#!/bin/bash
apt-ftparchive sources . | tee Sources | gzip -c > Sources.gz
apt-ftparchive packages . | tee Packages | gzip -c > Packages.gz
$ ./update.sh

in the Debian community

and outside of Debian


CategoryJava