Recursively building Java dependencies from source
Description of the project: Many Java projects use a combination of JAR file dependencies from other Java projects. In some cases, third party JARs are also used to provide custom tools for the build process (e.g. custom ant tasks or maven plugins). Can the entire heirarchy of dependencies and build tools and their transitive dependencies all be built from published source code? If not, we can not be certain that a particular dependency JAR is clean, free of malicious code, easy to fix or adapt for future Java versions. This project aims to develop automated mechanisms for cataloguing the portfolio of Java libraries on sites like github and the Maven Central Repository, creating a database of dependencies, mirroring their source repositories, removing binary JARs from their source trees and trying to build them using symlinks to JARs found in Debian or built by the same recursive process. This project may be partially automated using a tool like Jenkins. Data for some of the dependencies can be harvested from Maven pom.xml files.
Confirmed Mentor: Daniel Pocock
Confirmed co-mentors: volunteers sought
Deliverables of the project: the ideal outcome will be a full suite of tools for automating this process: a user could insert the name of some JAR in a form, the tool would study the JAR, find the source, recursively build everything and inform the user whether or not the JAR they want to use can be built without any dependency on any JAR that is missing source code. Building individual components to help achieve the aims of this project would also be satisfactory. The student is not required to automatically correct individual Java projects that are completely unsuitable for automated builds (e.g. projects that can only be built in Eclipse).
Desirable skills: Java build tools (ant, maven, jenkins) and source repository tools (git, subversion)
What the student will learn: Effective use of Java projects involving a large number of dependencies. Using automated systems to orchestrate commands that are normally invoked manually.