Integrate grid and cloud computing systems with Debian

Abstract (now with less buzzwords)

In many academic fields, as well as commercial industries, people use clusters to distribute tasks among multiple machines. Many times this is done by packaging a whole operating system disk image, uploading it onto the cluster, and having the cluster run it in a VM. I want to make it easier for Debian to distribute pre-prepared disk images like they distribute CD images now. Additionally, I would also like to make it easier for others to do the same by creating a GUI frontend to the various grid management tools (vmbuilder, the ec2 tools, etc).

About Me

Hello. I am David Wendt, a freshman at Stony Brook University studying Computer Science. I am a long time Debian and Ubuntu user. You can contact me on IRC (dwendt or _ _dwendt), AIM (?AnonBound), or e-mail (dcrkid@yahoo.com). I will have an almost completely free summer to develop for this project, barring emergencies.

Project Overview

The project I am interested in pursuing is integrating Debian into grid/cloud computing systems. Universities have been using grid computing internally for years. Grid computing allows many disparate entities to pool their computing resources together and execute jobs on them. Corporations have also taken up a similar idea; buying CPU time from on-demand 'cloud' computing services such as Amazon EC2.

(The difference between grid and cloud computing systems is like the difference between transit and peering. We can support both in the same project, of course.)

A grid/cloud computing system, in the scope of this proposal, is a cluster of machines that execute operating systems under a virtual machine monitor. By extension, operating system disk images intended to be used on a grid computing system will be called "VM images". Examples of this kind of a computing system include Amazon EC2 (commercial cloud) and Eucalyptus (non-commercial grid).

There is an existing package in the Ubuntu project, vmbuilder, which makes it easy to create a base Ubuntu image, and add packages to it. This can be used as the base of the the Debian integration work.

Use Cases

There are two main use cases for this project: the Debian project making signed VM images, and others making their own custom images.

In order for the Debian maintainers to produce a signed debian VM image and distribute it, they must be confident that everything they put into the image are signed Debian packages. So we create a file or process that describes what exactly to put in the VM image, which we will call a 'configuration file' for brevity. Such files can be signed by the Debian project, and used by the Debian maintainers to create an official VM image. The VM image could then be distributed on the Debian website like Debian CD images, or however the Debian project sees fit. Additionally, the configuration file could also be distributed as a package.

However, an end user may want to deliberatly modify their image with other packages. They can already do this with vmbuilder. But not everyone likes using command lines (sadly). After all the important integration and extension work has been done on vmbuilder; I can write up a new GUI frontend to manage VM images. The frontend will probably be written in Python, use GTK (via the python bindings), and use vmbuilder and aptitude as appropriate. Additionally, the GUI frontend can allow users to manage already-running VM images. (For EC2 and Eucalyptus, we can use the ec2 command tools as the backend.) This GUI frontend will be called 'vmmanager'.

Project Milestones

And of course, these will give Debian deliverables:

Deliverables for Debian

After GSoC

I am not currently a Debian developer, but I will be able to continue work related to the VMbuilder after the summer. (I.E. I will still be in #debian, the ML, etc) During the summer, as I said, I am relatively free of prior commitments. The first week might have a final exam or three, but I do not forsee that interfering with the project.

I'm mostly confident in the programming work itself. I am less familiar with the internals of the package management system, which I suspect VMbuilder is using heavily.