Integrate grid and cloud computing systems with Debian
Abstract (now with less buzzwords)
In many academic fields, as well as commercial industries, people use clusters to distribute tasks among multiple machines. Many times this is done by packaging a whole operating system disk image, uploading it onto the cluster, and having the cluster run it in a VM. I want to make it easier for Debian to distribute pre-prepared disk images like they distribute CD images now. Additionally, I would also like to make it easier for others to do the same by creating a GUI frontend to the various grid management tools (vmbuilder, the ec2 tools, etc).
About Me
Hello. I am David Wendt, a freshman at Stony Brook University studying Computer Science. I am a long time Debian and Ubuntu user. You can contact me on IRC (dwendt or _ _dwendt), AIM (?AnonBound), or e-mail (dcrkid@yahoo.com). I will have an almost completely free summer to develop for this project, barring emergencies.
Project Overview
The project I am interested in pursuing is integrating Debian into grid/cloud computing systems. Universities have been using grid computing internally for years. Grid computing allows many disparate entities to pool their computing resources together and execute jobs on them. Corporations have also taken up a similar idea; buying CPU time from on-demand 'cloud' computing services such as Amazon EC2.
(The difference between grid and cloud computing systems is like the difference between transit and peering. We can support both in the same project, of course.)
A grid/cloud computing system, in the scope of this proposal, is a cluster of machines that execute operating systems under a virtual machine monitor. By extension, operating system disk images intended to be used on a grid computing system will be called "VM images". Examples of this kind of a computing system include Amazon EC2 (commercial cloud) and Eucalyptus (non-commercial grid).
There is an existing package in the Ubuntu project, vmbuilder, which makes it easy to create a base Ubuntu image, and add packages to it. This can be used as the base of the the Debian integration work.
Use Cases
There are two main use cases for this project: the Debian project making signed VM images, and others making their own custom images.
In order for the Debian maintainers to produce a signed debian VM image and distribute it, they must be confident that everything they put into the image are signed Debian packages. So we create a file or process that describes what exactly to put in the VM image, which we will call a 'configuration file' for brevity. Such files can be signed by the Debian project, and used by the Debian maintainers to create an official VM image. The VM image could then be distributed on the Debian website like Debian CD images, or however the Debian project sees fit. Additionally, the configuration file could also be distributed as a package.
However, an end user may want to deliberatly modify their image with other packages. They can already do this with vmbuilder. But not everyone likes using command lines (sadly). After all the important integration and extension work has been done on vmbuilder; I can write up a new GUI frontend to manage VM images. The frontend will probably be written in Python, use GTK (via the python bindings), and use vmbuilder and aptitude as appropriate. Additionally, the GUI frontend can allow users to manage already-running VM images. (For EC2 and Eucalyptus, we can use the ec2 command tools as the backend.) This GUI frontend will be called 'vmmanager'.
Project Milestones
- Design a configuration file format, or some other standard method of controlling the vmbuilder. I imagine that I would need to converse with the developers responsible for building Debian CD images to know what extra configuration options we need. - Complete by week 2. (May 23 to June 6)
- Extend vmbuilder to handle last milestone's format and configuration changes. - Complete by week 4. (June 6 to June 20)
- Create a command line script to automate the process of getting an official debian package of configuration files and generating a VM image. - Complete by week 5. (June 20 to June 27)
- Ensure vmbuilder can handle all the various forms of VM images. Currently, VMbuilder can already handle images for KVM, vmw6, vmserver, vbox, and qemu. We may want to support some grid computing system that uses a different image format. - Complete by week 7 (June 27 to July 11)
- Make a nice graphical front-end for vmbuilder, to make it easier for others to build images and install them on their cluster. - Complete by week 10 (July 11 to August 1)
And of course, these will give Debian deliverables:
Deliverables for Debian
- A process for generating trusted grid computing images that can be distributed like the CD images
- An easy way for others to make their own images, or modify existing trusted ones
After GSoC
I am not currently a Debian developer, but I will be able to continue work related to the VMbuilder after the summer. (I.E. I will still be in #debian, the ML, etc) During the summer, as I said, I am relatively free of prior commitments. The first week might have a final exam or three, but I do not forsee that interfering with the project.
I'm mostly confident in the programming work itself. I am less familiar with the internals of the package management system, which I suspect VMbuilder is using heavily.