Debian eScience with myGrid and Taverna
Introduction
The term eScience (or e-Science) describes data and CPU-time intensive research that is most likely to be performed through the integration of resources throughout the Internet. Well, it may be departments in larger corporations or collaborating universities. The term is related to computational grids but today's understanding rather associates web services. The United Kingdom has invested substantial resources towards the development of an IT infrastructure for eScience applications and other countries around the globe have followed suit. The most prominent outcome is the [http://www.mygrid.org.uk myGrid] (www.mygrid.org.uk) effort with its workflow tool [http://taverna.sf.net Taverna] ([http://taverna.sf.net taverna.sf.net]).
This page describes the effort to adopt the development of the myGrid eScience project for the Debian Linux distribution. An ["Alioth"] project ([http://alioth.debian.org/projects/pkg-escience/ pkg-escience]) has just been created.
Motivation for Debian Packaging
The DebianScience special interest group describes and provides resources for scientific computing with Debian and DebianMed, a CustomDebian distribution strives to render Debian a one-stop-shop for biomedical applications which also comprises Bioinformatics. Pkg-eScience understands itself as a dedicated effort contributing its bits to the prior two. If all works out nicely, then an easier provisioning of scientific services is possible by linking Debian-based developments via web services and myGrid to the world. Conversely, all myGrid services - the focus is yet on [http://en.wikipedia.org/wiki/Bioinformatics bioinformatics] but is not technically constrained to such - will be avaiable to Debian researchers.
The package taverna is useful now since one does not require to set up any local services to join Grid use and development.
Installation
In order to retrieve the packages created in this project for your local Debian machine (which is suggested to run testing or unstable) please add the following to /etc/apt/sources.list:
deb http://pkg-escience.alioth.debian.org/debian ./ deb-src http://pkg-escience.alioth.debian.org/debian ./
For the Sun JDK also add
deb http://ftp.de.debian-unofficial.org/debian/ unstable main contrib non-free deb http://ftp.de.debian-unofficial.org/debian/ testing main contrib non-free
Try apt-get install taverna. Problems may occur if you are running Debian stable. If so, you may want to investigate if Debian ["Backports"] ([http://www.backports.org www.backports.org]) has more recent libraries. Please give respective feedback. To contribute to the packaging or to perform changes to the upstream sources please compare with the section "Installation from Source" at the Debian Wiki pages of ["BOINC"].
Work to be done
Direct adoption of upstream packages
The sources provided by the upstream developers can be installed on Debian machines without any difficulty since Linux is a common operating system among them. It is however far from being acceptable for inclusion with the Debian main distribution. For the most pragmatic adoption for Debian the direct results of the compilation of the upstream source can be taken.
Issues for compliance with DFSG and Debian Policy
- Addition of new Debian packages. A considerable number of jar files is distributed without reference to the source
- through upstream CVS
- fetched at compile time as specified in build.xml
- Preparation of Documentation
- man pages
- preparation of packages for upstream documentation
Compatibility with Free Java Runtime Environments, currently the Sun SDK 1.5 is used from ?DebianUnofficial (www.debian-unofficial.org)
Package-specific TODO list
Taverna
- reinvestigate elimination of jdom.jar - it apparently worked with MartJ
- Upload fix for missing dependency to /usr/share/java/libicu4j-java and the respective entry in the classpath (svn is down at the moment)
- Prepare script for automated testing
[http://taverna.sf.net Taverna] upstream link
EnsJ
- Fix ensj-doc
Get java2html to work from [http://www.java2html.de www.java2html.de]
- Fix debian/ensj-doc.docs
- Test compatibility with Taverna
- Fix debian/watch
- Fix ["Lintian"] warnings
W: ensj: binary-without-manpage id_mapping_cleanup.sql W: ensj: binary-without-manpage run_idmapping.sh W: ensj: binary-without-manpage run_idmapping_gui.sh W: ensj: binary-without-manpage run_probeset_2_transcript_mapping.sh W: ensj: executable-not-elf-or-script ./usr/bin/id_mapping_cleanup.sql
[http://www.ensembl.org/software/java EnsJ] upstream link
MartJ
Fix classpath settings for applications other than ?MartShell
- Integrate with Taverna package and test compatibility
- Investigate how to share .jar files with Taverna
[http://www.biomart.org MartJ] upstream link
BioJava
- Documentation
- Has two lintian warnings
[http://www.biojava.org BioJava] upstream link
bytecode
Should use latest CVS (next to BioJava) rather than distributed .tar.gz as the method "forMethod" was required to be patched in from there for the compilation of BioJava
- API Documentation should be separated from library package.
[http://www.biojava.org BioJava] upstream link
freefluo
- Separation of documentation in separate Jar
- Proper description of documentation in debian/doc-base
- Lacking tests and proof of working integration with Taverna
- The freefluo-ext-taverna separate jar is still missing.
[http://freefluo.sf.net freefluo] upstream link
uddi4j
- Separation of documentation in separate Jar
- Proper description of documentation in debian/doc-base
- Compilation with free compiler
[http://uddi4j.sf.net uddi4j] upstream link
icu4j
- Seems fine, test with free Java for full DFSG compliance still pending
- Incorporate documentation
[http://icu4j.sf.net icu4j] upstream link
wsdl4j
- Seems fine, test with free Java for full DFSG compliance still pending
- Incorporate documentation
[http://wsdl4j.sf.net wsdl4j] upstream link
Overview on status of packages
Core packages |
||||
Package |
apt |
svn |
Comments |
DFSG |
taverna |
x |
x |
current Taverna 1.0 CVS, apparently works |
no |
mygrid |
- |
- |
||
Otherwise missing libraries |
||||
Package |
apt |
svn |
Comments |
DFSG |
ensj |
x |
x |
compiles with Taverna |
no |
martj |
x |
x |
compiles with Taverna |
no |
biojava |
x |
x |
compiles with Taverna |
almost |
bytecode |
x |
x |
compiles with Taverna |
almost |
freefluo |
x |
x |
compiles with Taverna |
no |
uddi4j |
x |
x |
compiles with Taverna |
almost |
icu4j |
x |
x |
works with Taverna |
almost |
wsdl4j |
x |
x |
untested |
almost |
How to contribute
- Join
- as developer on Alioth (optional)
on the [http://lists.alioth.debian.org/mailman/listinfo/pkg-escience-devel mailing list]
- Send patches or indicate URL with packages of interest
Technical issues
Communication with upstream sources
Much in contrast with the general philosophy of Debian packaging, pkg-escience for now strives to use the reasonably latest upstream source.
Preparation of .orig.tar.gz
If a stable release of the upstream work is used, the orig.tar.gz is exactly that. Otherwise, such a file should be created dynamically from upstream's CVS or SVN repository. The following script performs this task for taverna:
# Script to update upstream CVS source, # which is supposed to be existing # locally in cvs_source/taverna1.0, # and to prepare the .orig.tar.gz from it. TARFILENAME=taverna_1.3.orig.tar.gz CVSSOURCEDIR=cvs_source TAVERNADIR=taverna1.0 ( cd $CVSSOURCEDIR \ && ( cd $TAVERNADIR && cvs update . ) \ && tar czvf $TARFILENAME --exclude=CVS $TAVERNADIR ) && mv cvs_source/$TARFILENAME .
Checkout of latest alioth svn changes
svn co svn+ssh://youraliothID@svn.debian.org/svn/pkg-escience/taverna
Use of svn-buildpackage
One changes the current working directory into the directory of the {{{svn-buildpackage --svn-dont-purge --svn-dont-clean \ --svn-reuse -rfakeroot}}} Please consider to add -kkeyid should buildpackage habe problems to find the right gpg key.
How to upload the packaging of a new package to svn
New packages are first submitted to the alioth svn
after that package was first successfully packaged through dpkg-buildpackage
with svn-inject -v -o package.dsc $svnrepos
with svnrepos=svn+ssh://youraliothID@svn.debian.org/svn/pkg-escience.
And on alioth.debian.org
Maintenance of home page
The emphasis of the web pages describing should be on these wiki pages. If you feel inclined to update the project home page, then please do so by loggin in via
$ ssh youraliothID@alioth.debian.org $ [ -x pkg-escience ] || \ ln -s \ /org/alioth.debian.org/chroot/home/groups/pkg-escience . $ cd pkg-science/htdocs
That directory contains the [http://pkg-escience.alioth.debian.org index.html] which can be edited ad libido and the subfolder debian that harbors the:
apt repository
As a start, only a repository for Debian all is planned. Help to set this up properly across architectures is welcome. To upload new packages first create a new subfolder for the package
$ ssh youraliothID@alioth.debian.org \ "mkdir pkg-escience/htdocs/debian/packagename"
then scp the files to the destination
$ scp taverna_1.3.orig.tar.gz taverna_1.3-1.cvs20060423* \ youraliothID@alioth.debian.org:pkg-escience/htdocs/debian/packagename/
and finally update the index files.
$ cat update.sh #!/bin/bash apt-ftparchive sources . | tee Sources | gzip -c > Sources.gz apt-ftparchive packages . | tee Packages | gzip -c > Packages.gz $ ./update.sh
Related projects
in the Debian community
["DebianScience"] Wiki page
["pkg-bioc"] Wiki page accompaning ?BioConductor and R Debian packaging project
[http://alioth.debian.org/projects/pkg-emboss/ pkg-emboss] Alioth project (dormant, sadly)
- ["BOINC"]
[https://alioth.debian.org/projects/pkg-grid/ pkg-grid] Alioth project (appears dormant)
[https://alioth.debian.org/projects/pkg-scicomp/ pkg-scicomp] Alioth project on scientific computing
and outside of Debian
[http://www.mygrid.org.uk ?MyGrid.org.uk] - the upstream page
[http://www.vl-e.com/ Virtual Laboratory for e-Science] - VL-Eers - are you reading this?
[http://www.trianacode.org/ Triana] - Another workflow management environment with ties to several grids