Differences between revisions 31 and 32
Revision 31 as of 2010-09-05 22:19:56
Size: 11953
Comment:
Revision 32 as of 2010-09-05 23:38:30
Size: 12031
Comment: added ref to AMI
Deletions are marked like this. Additions are marked like this.
Line 197: Line 197:
The latest AMI prepared along this protocol was {{{
IMAGE ami-8040aae9
}}}

Well, the term "LiveCD" is somewhat historic. But we all know that we basically mean a

Preinstalled portable Debian Med software stack

The motivation for this effort is to bring the packages offered in Debian even more closely towards a direct usability. This may be via your USB stick, a local/remote virtual image or just a better collaboration via a completely homogeneous infrastructure.

This page was put up to discuss pros and cons for various ways to prepare such a LiveCD. Points for discussion are

  • selection of packages
    • the adaptation of those selection for the various media
    • the influence that this selection may have on the description of tasks in Debian Med
  • extra efforts that may be beyond main Debian
    • integration of public data
    • integration of educational material
  • most suitable tool to prepare the LiveCD
  • who does what, what folks external to the core of Debian can we get involved

Selections for Scenarios

This section mixes two things. One is the kind of environment that one may expect with the variour demands that on may have (average developer, average user, special interest). The other is the way that the environment shall be transported to the user (regular install, cloud, netboot, USB stick). But the two are also related.

All selections shall be additive.

'Scenario'

'Status'

'Selection'

'Comment'

developer's machine

ready

build-essential|med-bio-dev

machine is worked with locally

ready

med-bio

features many programs with GUI

machine is for computations only

discussed

'app list'

GUI-free sub-task is being discussed for Debian Med

executed on cloud

works, but no routine

euca2ools

a major motivation behind this effort

integration of data

works, but no routine

getData

the integration work will look very differently for cloud and non-cloud installs, getData is still in development and yet not prepared at all for the demands of the cloud

strong interest in special packages

ready

install those packages

and consider mailing the list about it

netboot

works, but no routine

extension of regular LiveCDs

some advanced packages of general interest

debfoster, deborphan

a ready live CD does not profit from these, but its production does.

Preparation

There shall be two tools proposed

  • plain chroot plus boot loader (this description moved to DebianMed/LiveCD/StickPlain)

  • live-helper
    • USB stick
    • netboot with netroot and copy-on-write via NFS (to be completed)
  • vmbuilder (to be done)

Protocol for plain chroot plus boot loader

A very neat (my favorite) setup for a machine to be booted with a USB stick is with the stick acting like a regular drive. Detailed instructions, almost an executable script, are given on what has not become this sub-page: DebianMed/LiveCD/StickPlain.

Protocol for Live Helper

Live Helper, see DebianLive for details, was prepared for preparing the LiveCD in a multi-stage manner. One does a configuration first, a script 'lh_config' prepares a good (and for us now until we want images and other bits added at various places sufficient) template for that. A second step will involve the actual compilation of the thus specified infrastructure.

The following sniplet will build an image to dd to the USB stick. Please follow the instructions on ?DebianLive/Howto/USB for further insights.

sudo apt-get install live-helper
lh_config -b usb-hdd \
  --distribution squeeze --categories "main contrib non-free" \
  -p standard --packages '^(libcv-dev|autodock|gromacs|ballview|autodocktools|dropbear|boinc-client|r-cran-qtl|r-recommended)$' \
  -m http://ftp.debian.org/debian/
sudo lh_build

The complete build time - over DSL and with a now elderly laptop is about 45 minutes. The resulting hard drive image can then be dumped to the USB stick's device (not its partition). With a disk partitioning tool the remainder of the stick may be rendered usable again - the DD also writes the partition table and this needs to be adapted for.

dd if=binary.img of=/dev/sdb
fdisk /dev/sdb
...

As a hard disk image, the live CD can also be run in a variety of virtual setups. The expert tool for such though is "vmbuilder", described below.

When preparing this live medium, one should be aware that the device is not directly mountable, but that on the FAT or FAT32 filesystem a very large file is created, which represents the real image. I have failed to make changes to the local setup, e.g. /etc/hosts changed or so, persistent. This is certainly possible, somehow, but this overview would need a good soul to describe how. I want the medium to be the only medium that the machine sees, e.g. to boot from otherwise diskless clients. If someone would know how for instance to create a second partition on the USB stick (ok, I have done that) and use that together with the aufs/squashfs to achieve persistency within the USB stick, not with the (non-existent) hard disk.

One persistency is achievable, and the image possibly compressed, this setup would seem ideal for many causes.

Protocol for netboot

this section is currently under preparation - don't try it yet, but feel free to correct if you know better

This scenario prepares one single root image that is served via NFS. Changes are written to a copy-on-write (COW) image, which is different for every machine. Everything is prepared in complete analogy to the regular Live CD that is prepared with live-helper:

sudo apt-get install live-helper

For an easier copy and paste by avoiding line breaks

server="server.example.com" # place your server here
packages="libcv-dev|autodock|gromacs|ballview|autodocktools"
packages="$packages|lvm2|openssh-server|openssh-client"
packages="$packages|boinc-client|r-cran-qtl|r-recommended"
packages="$packages|build-essential|nfs-common"

lh_config -b net \
  --net-root-server $server --net-cow-server $server \
  --distribution squeeze --categories "main contrib non-free" \
  -p standard \
  --packages "^($packages)\$" \
  -m http://ftp.debian.org/debian/ \
  --chroot-filesystem ext3
sudo lh_build

lvm2 was added to increase flexibility with locally added disks, should the need arise. We had issues with a squashfs-modules packages that was suddenly demanded from us and we did not have it nor could easily build it. The "--cache-stages disabled" should be possible to omit.

The Wiki page Network_Image_Server has more information on this matter. It also gives a reference for a tutorial for Creating_a_Test_environment .

Protocol for Clouds

Conceptionally, there is no difference in preparing for local or remote virtual images. To learn more about it,

  • see the walk-through on euca2ools

  • the ?instructions on how to create a cloud image

  • maybe reread through the concept of blends and how the community decides uttermost pragmatically on what might go together in one cloud image

From the user's perspective, the image on the cloud shall be different. One does not expect much interaction with that for instance, alone since one is likely to pay for traffic. Hence, all interactive tools should go, or mostly go. This will also help saving disc space and thus increase responsiveness upon the start of an instance. All the following code shall be executed within an empty cloud image - or just any image that shall be filled:

  • manually verify that contrib and non-free are available

    virtual:

    apt-get update

The whole of med-bio would just fit on a 2GB image, but not both the /var/cache/apt/archives of downloaded .debs 'and' the final installation. With the current version of apt-get, one needs to install in chunks and delete files earlier. Yes, this is some criticism on apt-get, but I cannot fix this myself.

  • Start with some packages in non-free, if this available

    virtual:

    if grep -q "non-free" /etc/apt/sources.list; then
       debfoster --upgrade clustalw phylip && apt-get clean
    fi

The invocation of "apt-get" is associated with some little constant overhead. We cannot install all at once, but don't want to install them one by one either. Sigh.

  • Install packages within those recommended by Debian Med in smallish chunks

    virtual:

    for i in "r-recommended r-cran-qtl" "mustang muscle t-coffee kalign boxshade" "infernal hmmer" emboss "autodock autogrid" "wise exonerate" "blast2 ncbi-tools-bin" "libbiojava-java bioperl bioperl-run" "python-biopython openbabel" "libball1.3 python-ball"
    do 
      apt-get --yes install $i && apt-get clean
    done

A few package don't really need to be installed since they are too graphical to be executed remotely. Their initial installation seems to be inevitable, though.

  • remove GUI-centric programs and avoid reinstallation as a recommended dependency

    virtual:

    notToInstall="ballview clustalx seaview pvm rasmol massxpert perlprimer xbitmaps ncbi-tools-x11 python-ballview treeviewx massxpert-data"
    for i in $notToInstall
    do
      dpkg --purge $i
      echo "$i hold" | dpkg --set-selections
    done
  • add remaining core packages && again save disk space virtual:

    apt-get install med-bio med-statistics && apt-get clean
  • Octave is too big to be installed with all its dependencies in one go. The effort is split by again installing a series of packages separately that are of interest on their own

    virtual:

    apt-get --yes install libqhull5 texinfo
    apt-get clean
    
    apt-get --yes install --no-install-recommends octave
    apt-get clean
    
    apt-get --no-install-recommends install med-bio-dev med-bio
  • few packages we don't really need

    virtual:

    for i in mlocate $(dpkg -l "*-doc" | grep ^ii | awk '{print $2}')
    do
       dpkg --purge $i && echo "$i hold"|dpkg --set-selections
    done
  • To "round it all up" and since some space is still available

    virtual:

    apt-get --yes install mysql-client r-cran-rmysql libcv-dev
    apt-get clean
    apt-get autoremove

EC2: When amending the Alestic Debian Image with the Amazon Elastic Compute Cloud

Much respected Debian images for Amazon's EC2 are those of Alestic. Filter for "sqeeze" among the images offered.

  • apt-get update # (consider selecting a server closer to your Amazon zone)
  • Some bits should be as current as possible, or yet missing (like vim) apt-get install vim debfoster deborphan apt aptitude 

  • Run above instructions, say "yes" in deborphan except to alien and a purge to rpm, or just exit.
  • Change the MOTD (message of the day) vim /etc/motd

  • Some tools just "need to be the very latest"

    apt-get -u dist-upgrade # say no, but select those you really want to be upgraded with
    debfoster -u g++ coreutils debian-archive-keyring openssh-client openssh-server patch procps psmisc ruby rsync make manpages ifupdown initscripts build-essential tar # you name them, but be careful with kernel-sensitive issues like udev
  • When pasting the commands, ensure that they have been truly executed. The buffer seems to be cleared with invocations of apt-get

The latest AMI prepared along this protocol was

IMAGE ami-8040aae9

Skit

If following the Cloud/CreateImage description referenced above, please remove the network persistence script in /etc/udev/rules.d - there is no such thing like network persistence for a cloud image.

This script supposedly works on various Debian-compatible platforms. It was run on amd64, please extend this description with whatever your experiences may be. The Debian Med community is discussing/implementing an explicit task 'med-cloud' and how to best automate its specification from the information available in apt and debTags. Thus: this description is a running target, it will change - only to the better.