Student Application Template
Name: Eduard Sanou
- IRC: Dhole (on freenode, oftc)
I'm a Spanish student doing a master's on Computer Science at Polytechnic University of Catalonia. Although I graduated on Electrical Engineering, I have always had interest in computing, that's why I changed my major for the master. I enjoy learning programming languages and their new concepts and idioms. I find myself comfortable programming in C, python and go; and I can read/write a few others to some degree. Lately I've started learning rust. I believe in the freedom of software and as such I have tried many GNU/Linux distributions and *BSD operating systems along the years. I also try to use free software as much as I can and recommend it to my friends and family.
- I worked for a project in my university during one year for an internship. The project consisted on designing and implementing Image Processing algorithms, where I mainly coded in matlab and C++ using OpenCV libraries.
One of my latest projects (not finished yet) is a gameboy emulator written in C, that I code on my free time. Many games are already playable right now! I haven't started working on the sound yet. You can check it here: https://github.com/Dhole/miniBoy
I have great interest in security and cryptography, and as such I've been working on a series of challenges from the company matasano: https://github.com/Dhole/matasano
After using free software for so long I really feel the need to contribute back, now that I feel confident about my skills. I think the GSoC would be a great opportunity to get started.
Project title: Move forward reproducible builds
The reproducible builds project currently targets the unstable Debian release. Right now, 83% of the packages are reproducible. On the other hand, some work has been done on getting reproducible installation media / installs, but much work is requiered.
Two main tools have been developed: strip-nondeterminism, to remove sources of non-determinism from files; and debbindiff, to compare two built packages and recursively find the places where they differ.
Currently debbindiff crashes on some packages, takes too long on some others and may consume too much ram.
Work to be done
The non reproducible packages need to be studied so that patches can be made to build them reproducibly. In some cases the patch only involves the building process of that specific package. On other cases the patch applies to a package from a toochain used to build other packages.
Regarding reproducible installs there's a lot of work to be done. There are many sources of nondeterminism in debootstrap and the postinstall process (SSH private keys, users creation order, etc.). I'm not familiar with the internals of the install / postinstall process so I don't know how much work that would be, what could I achieve and how I could schedule that. If the mentors consider this interesting and doable for a GSoC I'd like to work on it and redefine my schedule and deliverables as needed. Otherwise I'll focus on reproducible packages.
How I plan to implement the deliverables
I intend to work on the packages that can't be built reproducibly now. For that I would first target packages dealing with timestamps issues, since I have already started looking into it (I've submitted a patch for cryptsetup: #780864, and another one for openssl #780955. The later was already fixed upstream so my patch was discarded). I will pay special attention to patches for toolchains that affect more than one package.
I will study how other distributions and developers have dealt with reproducibility and document it in the wiki. I will document the issues I find along with the solutions applied. I will improve the documentation about what are the sources of non-determinism and how they are fixed, being them dependent on the Debian building process or on the upstream package itself, detailing which parts are particular to the Debian build so that it can serve as a reference for other operating systems.
During the analysis of unreproducible packages I will be using debbindiff, and for that purpose I will contribute to the development of the tool as necessary.
Study the packages that can't be built reproducibly and write patches to achieve reproducible builds. Document the issues found and their solutions. Improve the debbindiff tool to aid the analysis of builds differences.
Benefits to Debian
Enabling reproducible builds on Debian would greatly benefit both users and developers.
- On the users side, they can ensure that the packages they are getting come from the appropriate source code, without any tampering. There's no need to trust a single party or building machine, and compromised building machines can be easily detected.
- On the developers side, they can be protected from adversaries that could try to compromise the package building machine. If a binary package doesn't correspond to the public source code, many people can notice it and alert everyone. This way, the developer doesn't carry the responsibility nor can they be coerced into adding malicious functionality that was not in the source code.
I also think that Debian can set a precedent into reproducible packages so that other operating systems can apply the same concept and benefit from the documentation and tools.
Regarding debbindiff, improvements and fixes will also be very useful to help moving forward cross-building: it will aid at finding issues (comparing cross-builds with native builds).
Many patches to make individual packages build reproducibly
Patches to toolchains to allow building packages reproducibly
Improvements and patches for debbindiff
Improvements and additions to the wiki documenting how reproducibility is achieved in Debian, issues found in packages and their solutions.
I can start working on the 27th of April combinig the work with my studies. On June I will probably slow down due to final exams, and after 25th of June I will be completely free and will dedicate more hours.
Phase 0 (Pre coding period: April 27 - May 25)
Get to know the mentor and the co-mentor. I will also study the packages that can't be built reproducibly in order to classify which ones can be patched individually and which ones depend on modifications on the toolchain. I will identify which changes can be made to toolchains that afect large number of packages. Start looking into how to improve debbindiff.
Phase 1 (May 25 - June 25)
Begin patching packages that fail to build reproducibly, with emphasis on timestamps issues. Write about timestamp issues and how to address them for specific builds. I plan to patch 1-2 packages per week, with some variance depending on the dificulty of the patch.
Phase 2 (June 25 - August 17)
Continue patching packages dealing with timestamps, and also start working on packages that fail for other reasons. Prioritize on toolchain modifictions that can affect several packages. Test debbindiff with cross-built packages to find bugs and ways to improve it. Add improvements and fixes to debbindiff along the way. Write more documentation about timestamp issues and other issues and how they are solved. Write documentation about the toolchain fixes. Write general documentation about reproducibility, relating it to the Debian building process and the challenges faced so that it can serve as a reference for other distributions. I plan to patch 3-5 packages per week and write documentation every week.
Exams and other commitments:
I have an exam period that lasts about two weeks (11th to 25th of June), after which I'll be totally free. I have no other commitments.
Other summer plans
I don't have any specific summer plans.
I have always considered Debian as the reference GNU/Linux distribution. Debian has always been my distribution of choice for setting up servers as I consider it reliable and with a great community. I also value very much the involvement of the Debian developers into making great free software without any company deciding its path.
Are you applying for other projects in SoC?: No