Report of outputs from Debian Med Sprint Weekend 2014
Group Reports
(in no particular order)
Jalview for Debian/Bio-Linux
Tim Booth and Jim Procter worked on this
- Packaging of main workbench
- Set up new build environment for Jim
- Build of all Ubuntu packages on Debian
- Updated to latest Jalview bugfix release (2.8.0b1) and wrapper package for BL finished
- Added SVG icon and default configuration
- Committed all work to pkg-java SVN
- Further customisation of package and launch without network enabled
- Jaba/JabaWS for Debian
- Herve requested Jalview applet, so looking at server-side components related to deploying JV on Qlustar
- Jim - tacking hard problems with Debian/Jaba and has Jaba working but not clustering.
- Planned to progress towards a package in the near future.
Setting up the Qlustar HPC cluster demo
Luca Clivio, Tony Travis
- Objective: Credible demo of Bio-Linux GUI running on front-end of Qlustar cluster
- #0 Resolve DHCP problems preventing access to Cluster FE (Frontend) VM
- #1 Install Ubuntu Desktop on common NFS shared cluster chroot
- #2 Install x2go server and Gnome fall-back session
- #3 Tested from x2go client running under Bio-Linux on laptop
- #4 Install “ARB” X11 GUI phylogenetics suite (part of Bio-Linux)
Looking at ?MrBayes MPI on Sunday. There were some issues getting this to work.
- Diagnosed problems on Sunday - fix requires package rebuild
Work on Edam/Debtags/Tools registry
Investigate possible mechanisms of data interchange
Steffen Möller, Matúš Kalaš, Kristoffer Rapacki, Olivier Sallou, Piotr Chmura, Emil Rydza (probably splitting into subgroups)
Main activities
Mapping of ?DebTags to EDAM
- Draft 4.0 of tool description model
- Synchronisation of Debian Med's packages with the Tool Registry
- Integration of EDAM annotations into Debian Med
Achievements
- made sure that the Registry description model accommodates all the attributes needed by Debian/DebianMed
made sure that all the attributes essential to the Registry exist in DebianMed
established from where in DebianMed the tool descriptions can be obtained regularly via programmatic access
Outcome: Integration of Debtags with EDAM and with the registry model without loss of data determined to be do-able.
- This meeting has sparked off significant tasks and collaborations in this area
- Production of status report - see below.
Packaging demonstration
Brad Chapman, Daniel Barker, Andreas Tille, Detlef Wolf, Iain Learmonth
- Packaged seqTK as a demo packaging task
- Demo went well - seqTK and DNAclust both packaged
- Attendees started their own packaging tasks:
- python3-fitbitscraper now built - asked Deb Python about pushing it
- started on ngila - work in progress
Packaging the PubMed search (C + Java) from BioinfoC
Detlef Wolf, with help from Olivier Sallou, Andreas Tille, Jorge Soares, Iain Learmonth, Steffen Moeller, Tim Booth
Roadmap to ?PubmedSearch packaging
- libbioinfoc-0.1.0: setup of GNU build system (configure.ac, Makefile.am)
- to produce library (for shared and static linking)
towards Debian initiation: gpg key & show passport, alioth account
- Java part of pubmed search: for next sprint
- Many contributors to improving the build system, so that
- libbioinfoc-0.1.0: setup of GNU build system (configure.ac, Makefile.am)
- AM build now works and an initial package has been produced.
Gave an impromptu demo at 2pm (http://bioinfoc.ch)
- continued work to neaten up the package on Sunday:
- Steffen added the example prog to the bioinfoc package.
- Also the library gets a debugging package,
- close to ready for a Debian upload.
Personal Reports
Andreas Tille
- Package uploads
- Sponsored gnuhealth
- New version of gnumed-server + gnumed-client
- New version of flexbar, checking tests, contacted upstream about failing tests
- Some changes at staden packaging
- Worked with Detlef Wolf on upstream build system of bioinfo-c
- seqtk (created in a life packaging session)
- dnaclust (created in a life packaging session)
- snp-sites sponsored for Jorge Soares
- python3-fitbitscraper sponsored for Iain R. Learmonth
Presentation: Introduction into Mentoring of the Month
- Live packaging session at Saturday afternoon packaging randomly picked (simple) packages suggested by the audience
- Discussing issues about creating an ontology based in data in UDD (debtags+descriptions) with EDAM people
Working on the FAQ section of the Blends documentation (relation to ?DebTags)
- Working together with Luiz Ibanez on VistA packaging
- Running around to help people gaining their first packaging experience after the life packaging session yesterday
Enabling kgb for Git repositories
- Other stuff than Debian Med
- uscan enhancements
- Sponsering libcitygml for Debian GIS
- Activate 4096bit GPG key
- Fixing Debian Games tasks
Iain Learmonth
- Learnt Debian packaging through participating in Andreas' live packaging of dnaclust and seqtk.
- Sorted out a man page for dnaclust
- Packaged python3-fitbitscraper
- Spoke with Luca and agreed to package HEAVyBASE once there is documentation available for the package
Gave a demo of a Python application visualising personal health data from ?FitBit (Screenshot).
- Took some pictures, link at bottom of main page.
Steffen Möller
- Did a package of bio-parser-isatab (Perl lib) and it is nearly ready for commit
Jorge Soares
- Commit fixes to snp-sites and tidy up
- Committed. snp-sites v1.5.0 has now installed in all sid architectures
- Successful debugging of upstream issues
- Package Fastaq
- Initial debian git commit of Fastaq python package.
- Initial editing of several debian files.
Tim Booth
- Gave a short talk on Bio-Linux and recent updates
- Discussed EOS cloud plans with Tony (private clouds running BL for data analysis)
- Worked on Jalview with Jim (see above) - uploaded new pkgs to Bio-Linux
- Demonstrated packaging to Jim
- Showed Galaxy packaging to Peter and got feedback
- Discussed roadmap to making Bio-Linux love the Galaxy toolshed with Brad and Peter C
- Planned with Kristoffer how to use the Tools Registry in BL and how to contribute
- Connected to the Qlustar cluster and tried some basic ops
Niall Beard
- Interested in new packages + involvement in the tools registry group (Biocatalogue)
- Maybe looking at external tools in taverna with Steffen
- Joined Andreas and started packaging Coot - work ongoing
- Proceeded to productive discussion on tool description related issues
Olivier Sallou
- Fix biojava and libgo-perl
- Package new upstream version biojava3
- New packages: discosnp and mapsembler2
Peter Cock
- Worked with Brad and Tim - incl looking at Galaxy DEB package.
- Planned BOF at the next Galaxy conference after in-depth discussion on Galaxy toolshed issues and sane packaging.
- Looked into packaging an astronomy package.
Brad Chapman
Gave talk and demo on Cloud ?BioLinux
- Participated in packaging demo
- Worked on the manifest idea – list installed progs in CBL
A critical missing component of ?CloudBioLinux full and flavor-based custom installs is defining the full environment of packages and versions available on the system. We worked at the 2011 [BOSC] Codefest hackathon to add minimal support for creating this full manifest of packages and versions, but the script required integration into production workflows and numerous cleanups. During the first day of the DebianMed Sprint I focused on converting this manifest creation into a ?production ready importable module. It now handles creation of YAML files with packages and versions for all install methods supported by ?CloudBioLinux (Debian packages; Python, R and Ruby library installs; Homebrew packages; and custom ?CloudBioLinux scripts). The Debian version is 10x faster than previously thanks to tips on querying apt repos from Tim Booth.
These updates to manifest creation make it possible to integrate it into existing tools that use ?CloudBioLinux for installation. The community developed open source bcbio-nextgen next-generation sequencing pipeline uses this, and we adjusted the build scripts to generate manifests on installation and then use these manifests to provide a list of the biological packages that run as part of the pipeline. This replaces brittle code existing in bcbio-nextgen and ties automated installation to the new manifest feature, ensuring that manifest creation will be regularly updated going forward for all ?CloudBioLinux installs.
Additional, I worked to learn Debian package building thanks to help from Andreas Tille. This resulted in creation of my first Debian package for FreeBayes, a highly accurate variant caller from Erik Garrison in the Marth Lab. I pushed a nearly completed version to DebianMed, which Andreas helped to finalize and make available. The hope for future versions of ?CloudBioLinux is to move back to Debian/Ubuntu based support inside Docker containers, which will help this package replace a custom build function in ?CloudBioLinux with a proper package.
Report of tool registry working group (Reporting by Matúš)
Summary
The following was achieved:
we made sure that the Registry description model accommodates all the attributes needed by Debian DebianMed;
we made sure that all the attributes essential to the Registry exist in DebianMed
we established from where in DebianMed the tool descriptions can be obtained regularly via programmatic access
Motivation:
Community of ?ToolRegistry and Debian Med is expected to significantly overlap -> effort should not be performed redundantly
- Expected Synergies
- increased visibility of Debian Med's efforts to scientific community
head start for ?ToolRegistry with data provided
- facilitation of Debian packaging with tool descriptions, prioritization of efforts
- any mechanism for ensuring that tool description of desired 'tasks' is in the Tool Registry? (Not via harassment of Andreas)
maybe in the later future: test I/O data pairs for automated testing & benchmarking may be recorded in the registry and useful for automated testing in Debian
Best possible annotation for tools (and databases) in Computational Biology, resulting in improved accessibility, visibility & attribution, and provenance within the field
Constraints and challenges:
- Non-intrusive to ease acceptance in working communities
- maintainers are not forced to link to the Tool Registry
- maintainers may perhaps have a choice of ignoring the Tool Registry, importing information from the registry upon request, or some form of automatic updates with or without confirmation
Licensing of debian-provided annotation - Example here
- Difficulty to distinguish "source" packages and their general annotation with Debian's more fine-grained separation of binaries, APIs/libs, data, scripts, debug information ... and many bits and pieces that should be considered intrinsic parts of one tool
- Other way round too, package being a collection or an ad hoc cluster of tools
Ideas for implementation:
- The Tool Registry harvesting information regularly from Debian Med and including references to the created Tool Registry entries back into Debian
into debian/control (probably not) and/or debian/upstream and/or 'tasks' (these 2 probably reasonable and possibly optionable)
- Debian Maintainers are encouraged to add tags to reference an eventual Tool Registry entry and an option for eventual automated imports from the registry
into debian/control (probably not) and/or debian/upstream and/or 'tasks' (these 2 probably reasonable and possibly optionable)
- Tool information in Debian Med:
Ultimate Debian Database should integrate all information from
- which packages are in Debian
debian/control - Example here
debian/upstream Example here
See Also: DebTags on the Wiki, DebTags paper, Faceted Classification, DebTags home, DebTags FAQ
'tasks' page; is shown here; populated from here; and code is here
- description of unpackaged tools ignored until Tool Registry finds it useful to import them
- additional information about packages ignored until found useful
- matching of packages with registry entries may be implemented via information in the 'tasks' file (this may be likely in case the references are not desired in a package itself)
Important: Andreas has recently made it so that almost all relevant stuff from the ‘tasks’ file is in UDD
Contact information in the debian/copyright - Example maintained here
Andreas may be willing to include these into the UDD, as now they aren’t there
online manpages can be included in registry among documentation URLs http://manpages.debian.net/cgi-bin/man.cgi?query=<pkgname>
- Access to UDD via public pythonned postgres
- First attempt:
- Get all Deb Med descriptions from UDD 2a. Description of packages that had already been recorded in the registry will be fully overwritten 3a. Return registry accessions for newly created (or all) entries
- Let Andreas et al decide in which form they want to get them and record them
- In a later iteration:
- 2b. Solve synchronisation to allow update of descriptions without overwriting (likely via timestamps of imported information) 3b. Let Debian people decide whether, how - and eventually with what options - an updated information about packages is recorded back to Debian
- Integration of information (the “federated” model):
- 2c. Handle synchronisation of information updates from multiple sources (for simplicity start with Deb Med and SEQwiki?)
In future, would it be of interest to automatically (optionally manually) populate debian/upstream with enriched scientific & semantic information? See also the sketch here
Other sources:
Free software directory of FSF harvests and integrates information from multiple sources including but not limited to UDD
- Should certainly be heavily included in the Tool Registry effort
- Would be enormously useful to get design suggestion from FSF directory architects in particular about the information integration (“federated” model)
- Useful for import to registry, or are those tools anyway better described elsewhere?
- Registry accessions could be included into the Nordugrid XML description
Bio-Linux only has few packages that aren’t in Deb Med or aren’t planned to be included in Deb, and still are well-defined software (these may be e.g. unfree or hard to package or too ad-hoc packages)
- Should be added to the Tool Registry manually, done person-to-person with Tim (who knows everything relevant about those tools that are relevant for registration)
?CloudBioLinux is full of various stuff
thorough tool information starting to be in focus now: the “manifest” which is going to be a YAML about installed stuff. Expected with thrill!
Information from the Tool Registry may be of great benefit to ?CloudBioLinux
Debian Nonfree: Is included in UDD and among tasks. Bits that aren’t included in those should be included in those
Integration of EDAM annotations into Debian Med
?DebTags, Enrico Zini [http://debtags.debian.net, https://wiki.debian.org/Debtags/FAQ]
- ‘tasks’ categorisation
Challenges:
EDAM concepts need to be identified by alphanumeric IDs/URIs, because terms may & do change in time
- -- At the same time, of course, the terms need to be presented to both the users and annotators
- Lower priority but possibly high coolness: Search/filtering/grouping by EDAM DAG
Solutions:
Separate mapping file (to be packaged) between ?DebTags and external vocabularies
- -- Start with EDAM and Media types (in order to have more than EDAM only)
-- Before the larger mapping effort, ?DebTags need to be refactored (by us in accord with Enrico) -- After the mapping, information about the external concepts should be shown to Debian taggers in the tagging Web app
- -- Start with EDAM and Media types (in order to have more than EDAM only)
- Record information about available external vocabularies in the mapping files, at the Facet level
Tool description model draft 4.0
- Alignment with Deb Med pkg description
Compatibility of the tool description XSD with Emil’s & Piotr’s tooling: …
Finishing the tool description XSD and making it compatible with Emil’s & Piotr’s tooling & v.v.
- Main bits to finish: Interfaces, Versions
- Todo soon: Release new minor BioXSD version catering for the needs of the tool description XSD
Miscellanous
Keysigning all round.