Differences between revisions 23 and 24
Revision 23 as of 2007-08-09 10:36:35
Size: 10644
Comment:
Revision 24 as of 2007-08-09 11:00:45
Size: 10737
Comment:
Deletions are marked like this. Additions are marked like this.
Line 137: Line 137:
  * [http://lich.phys.spbu.ru/kab00m/projects/cdboot.eng.html How to make multiboot CD-ROM]

This project aims to create a lintian-like tool to test ISO files created with debian-cd. It is being developed as a Google Summer of Code project, the project description follows.

It's hosted on Alioth, with the name "[http://pancutan.alioth.debian.org/ pancután]". The subversion repository is at svn://svn.debian.org/pancutan/

Project description

Abstract

The Debian Project routinely builds CD and DVD images of specific distributions, like Debian stable, Debian testing and Debian unstable. Each one of those in different 'flavours' that fit different needs (business card, net-inst, full archive) and for different target architectures [ [http://lists.debian.org/debian-cd/2006/12/msg00068.html 1] ]. Also there are images for various Custom Debian Distributions like Skolelinux, and even live-CD systems.

Many of those images are built weekly or even daily, this results in a *lot* of ISO files.

As verifying that the built images work is currently a manual process, there is too much work spent in this task. Also, images meant for a release need more presumption of correctness, as it would be very bad publicity to ship a broken image.

To alleviate this situation, a modular lintian-like CD-image testing tool is proposed.

Proposal

The objective for this project is to build from scratch said tool, and a set of base tests. So three tasks must be completed:

  • Careful reviewing of different flavours of images, needs and common errors, by working with the different teams involved. Acquaintance with the current tools and a lot of testing.
  • Designing and building the core of the application, dealing with handling of very large files and sets of them (multi CD releases), configuration, and user interface.
  • Writing data collection scripts and base tests that automate most work done manually today, and test them on all available images.

The most vital part is to understand what needs to be done, so the first task would be given much effort and time.

The design will strive to be simple yet powerful, a clear interface for writing two types of modules: data collection modules and testing modules. The former will be executed first to collect and cache data from the image, and then the latter will execute using the data collected. Every test module will be able to inform of warnings and errors it finds.

The core of the application will direct the show and finally report to the user in a form similar to that of lintian and linda.

During the entire process, reports will be published in blog or email form, so the mentor and any Debian contributor can watch how things are going and to give feedback.

Benefits for Debian

This tool will greatly help the teams involved, and allow them to focus on important things that can't be automated. By easily adding test-cases they would be improving the confidence on the tool with a one-time coding work.

The results of each run could be published along with the images, like it's currently done for lintian reports, so users and developers can easy review an image's health.

This tool could also be used by other Debian derivatives as well.


Involved parties

I've already surveyed some people who currently use debian-cd and related tools to build CD's. I plan to support vanilla Debian CDs as well as [wiki:CustomDebian CDDs] as much as possible. I still have to research about live distributions to see if it's possible/reasonable to support them.

  • SteveMcIntyre - debian-cd

  • Franz Pop - DebianInstaller

  • Gustavo Franco - Simple-CDD
  • Felipe Augusto van der Wiel - Debian-BR-CDD
  • Otavio Salvador - Custom distributions for enterprises
  • Holger Levsen - DebianEdu

  • Kai Hendry - ?WebConverger

Drafts

Incomplete list of things written as I gather them

Sources of information

  • .disk/
    • info: (flag) we found a debian-cd ISO. Could also be used to detect a Debian Live CD.
    • base_installable: (flag) can do an install without the network
    • base_components: define the repositories: local, main, contrib, even non-free
    • cd_type: {full_cd,dvd,not_complete}
    • mkisofs: doesn't seem useful for this
    • udeb_include: should check presence on cd
  • md5sum.txt: to check integrity and completeness
  • autorun.*: to be sanity-checked
  • install/ & tools/: to be checked for presence

  • isolinux/
    • boot.cat: to be parsed and checked for bootability
    • isolinux.cfg: parsed to check correct location and presence of kernel and initrd
    • support files checked for presence
  • dists & pool: checked with apt tools: consistency, dependency checks, checksums of debs

  • supplied local mirror + d-i tasks definitions: to compare checksums and determine lists of essential packages for d-i

Some tests

My idea is that tests should have preconditions to be met for the test to be executed. Those preconditions will be satisfied by flags set by other tests and/or datasources.

  • Detection of type of cd. Only debian-cd at this moment -- Done
  • Check ISO size fits in medium -- Done
  • Presence of optional tools (W)
  • Presence of documentation
  • md5sum.txt validation and completeness check -- Done
  • {md5,sha1,sha256}sum check of packages
  • matching of checksums from packages and mirror (if available)
  • consistency check of cd repository
  • consistency check of base_instalable, cd_type, and available packages
  • presence check of essential d-i packages (black magic)
  • kernel ABI matching in vmlinuz image, udebs and initrd (research about multiarch cds)
  • check kernel from cd and mirror versions match (error in official cd)
  • check preseeding in simple-cdd if possible (needs research)
  • check boot catalog correctness, find real location of isolinux in ISO image
  • try to boot with qemu and check that isolinux gets loaded

Draft plug-in spec

Each plug-in is a Perl module, that should adhere to the following:

  • Uses it's own namespace (i.e. starts with "package foo") and doesn't export anything by default. As the tool will scan a directory for plug-ins, and will not know the namespace, it should return the package name. (i.e. the code section should end in  __PACKAGE__; )

  • Defines one or more "tasks" to be executed at different points in time. Each task name should be unique.
  • For each tasks is required at least one subroutine which does the work. Optionally can provide a couple of client/server subroutines to be executed in different processes for parallelization (to be defined).
  • Each task subroutine gets the following parameters:
    • A hashref to a structure containing all the data to be used across different discs of the set. Initially, these are its only contents:  { set => [ { file => "isofile", mount => "mountpoint" }, ... ] } . When writing to this hash, each task should use a new key named after the task.

    • If the task is of type *_cd, the index in the set array, and a second hashref as scratch space erased after each disc is processed.
  • Ends with a __DATA__ section, which contains a YAML document with the metadata of the plug-in. The metadata should be a hash of:

    • "tags": unique error/warning tags defined within this module, itself another hash. Each tag has a type (fatal, error, warn) and a desc(ription).
    • "tasks": unique tasks defined within this module. This hash contains:
      • type: pre, first_cd, any_cd, last_cd, post
      • depends: array of tasks this task depends on
      • non_decoupled_sub: mandatory monolithic subroutine
      • decoupled_master and
      • decoupled_slave: optional distributed subroutines
      • desc: description

Stages of execution

There are three stages:

  • pre: general checks or non-cd specific checks
  • any_cd: each cd is checked in turn. the first_cd and last_cd pseudo-stages are set when checking the first and last cds.
  • post: after every cd has been checked

A task can depend on a task from a different stage if: the dependency belongs to a previous stage (post -> *_cd -> pre) or if it's a *_cd type task and its dependency is any_cd type.

After all the plug-ins have been loaded, a running plan is built in a form of a tree of dependencies. The tree is checked for consistency and then the execution starts. When more than one task is ready to be executed (i.e. all its dependencies have been met) it's possible to run them in different processes, if decoupled subroutines have been specified. The master subroutine is be responsible of writing back the data sent from the slave subroutine in a serialization format thru socketpairs.

Bootable CD info

Some data I've gathered about the different boot processes for each arch. [http://lists.debian.org/debian-cd/2006/12/msg00109.html Mail] from Sledge.

http://www.netbsd.org/docs/bootcd.html

http://www.gnu.org/software/libcdio/libcdio.html