Recursive Building of Java Dependencies
- Name: Andrew Schurman
- School: Simon Fraser University
During the summer, I propose to create a suite of tools for recursively building java applications from dependencies to the project itself. This process involves two steps: building a projects dependencies and building the project itself. Due to the popularity of maven (and its broad selection of artifacts), most build tools, such as ivy and ant, have support for building from its central repository. This leads me to the conclusion that the two build steps outlined above can be split into 2 separate steps.
Building a projects dependencies will be the bulk of the project. I believe the best way to fulfill this would be to create a maven plugin. This decouples us from the application build tool (good practice). Although it is possible to radically alter maven's dependency resolution (see Tycho and their p2 dependency resolution), this would require maven to be used as the application build tool. Even so, building in this way would require modification to the project source (specifically the pom), which some would not want. A maven plugin can easily be integrated into a Jenkins build as a pre-build step.
As input, the plugin would either take a maven project proper or a maven artifact ID in which dependency information will be extracted from an artifact from some remote maven repository. I say some repository, because maven allows configuration of which repository to use via settings, defaulting to its central repository. By allowing a maven artifact ID, we also decouple the process of building the application from the dependencies, i.e. we can use different tools as long as the application build produces a maven artifact.
The output of the plugin would be a maven repository which could easily be used as a source repository when building the application itself. Maven can be configured to pull only from that select repository, overridding the availability of any artifacts in central. Other build tools should have similar options.
Clearly the bulk of the project involves building a projects dependencies which, as shown above, can be thought of as a separate problem from building the actual application. This section will detail difficulties that will arise as part of building dependencies.
Using a maven plugin allows all dependencies to be resolved, including transitive dependencies. This does not make the process easy though, as configuration of Maven can allow builds to occur in very selective ways. For example, a build-time dependency could be introduced via a maven profile activated only during some sort of release, or a property could be introduced during release time changing a dependencies version. In general, this is shouldn't happen (it's not the best practice, but may have use cases).
So far, I've assumed that maven was used when building these dependencies. Again, this may not be the case. Even if it is the case, a specific version of maven could have been used to build the project which could affect the ability to build the project. Even though maven 2 and maven 3 are for the most part compatible, there are some plugins which cease to work in maven 3 (for example war overlays in maven-warpath-plugin). This issue, although stated as fixed is not actually fixed after testing. Another build tool related issue is that source can have, for example, multiple tool standard files, i.e. a build.xml and a pom.xml. The plugin can handle all these issues by requiring an installation of the required tools and forking off a shell script or bean shell to build each project (similar in function to the maven-invoker-plugin). Optionally, a tool could be automatically selected if project used standard names for their build files. The assumption I'm using is that there is a command line tool for building the project which may not be the case, but still a good assumption. If it is not, things will get more complicated.
A maven plugin will also aid in the discovery of source locations. Most maven artifacts attempt to use maven to the fullest by including SCM information. By using a plugin, we will be able to pull that information from parent artifacts (common in large multi-module projects) potentially resolving any properties which appear in it. In some cases, SCM information may not be present. In this case, all is not lost, there is sometimes the presence of a source jar which may be used as a replacement. This may not bare fruit, as there are source jars which do not include all the source in their project. This may be to hide some implementation or something forgotten because it was generated. Another potential problem is the case where the SCM information is out of date, i.e. the source repository was moved or even deleted. In any case, a database of artifact scm urls can be maintained to override artifact information.
Licensing is an interesting subject. It is another issue that can potentially be tackled by this plugin. Thankfully, it is something that has already been tackled by other people. This plugin can be used to construct a license report of all artifacts (based on their maven artifact information), optionally overriding artifacts which do not have licenses embedded in their artifact.
This far, I've outlined and justified how a maven plugin which can be used to build a projects dependencies in both a developer environment and a build environment such as Jenkins. Efforts should be made to reuse existing plugins and maven APIs to reduce the amount of code to be written and maintained. The plugin would have have parameters along the lines of:
artifact -- either the current maven project or one that was passed in (would need a specific version)
scmdb -- properties file listing overridden SCM information on a per GAV (groupid-artifactid-version) basis
outputdir -- where to build the repository
additionalDependencies -- optional dependencies to always try to build (for oddly constructed dependencies)
buildscripts -- beanshell/shell/groovy? scripts for special build instructions on a per GAV basis
prebuiltDependencies -- optional dependencies to be resolved from central (don't know how to or can't build)
This plugin should be written in a way will work with no dependencies being built, i.e. all dependencies come from a repository. Dependencies can then be iteratively added to the list of built dependencies while we figure out which ones we can build with such an infrastructure.
As with most Maven plugins, reporting generally occurs as part of a separate file. Generally the maven plugin is just a wrapper to an existing tool a fixed file format. Translators are sometimes included to for generation of the maven site. Depending on what is embedded in the plugin, reports can be generated.
A wrapper script could be constructed to automatically set properties of the plugin, but one would have to be created for each build tool. I don't think this would be too useful, at least not for projects which weren't built with Maven because it may not be as easy as just a wrapper. In any case, the wrapper would essentially create a dummy pom and invoke maven on top of it with the desired properties and then invoke the application build tool of choice. A dummy pom is not required, but would be easier in setting up some plugin parameters. This will also make it easier to make use of additional plugins, i.e. the licensing-maven-plugin, which may require a maven project to be run on.
I'm a masters student with 5 years Java experience, four of which was in industry. Some of that time was gained while interning during my undergrad, but a fair chunk was gained between my undergrad and graduate studies. Most of that time was spent as a developer, but at least a year of that time was spent as a release engineer. The company where I was working initially was using ant to build a large scale e-commerce application with a thick-client management application. This later moved to maven which I was involved in the conversion process. Most projects were easy to convert, although some took a little more effort as we worked through build dependencies, how to build things (javacc, tycho, etc.). The tycho project was an enormous feat as it involved using a then experimental build tool with little to no documentation. I got to understand exactly how Tycho worked. It was this conversion process in which I was able to gain intimate knowledge of how maven worked including dependency resolution, an assortment of plugins and standards.
My last year at that company, I was involved in maintaining the builds of many maven projects. All projects were built on top of Jenkins using both subversion and git. In fact, I got to find the limits of those systems. The jenkins builds ranged from standard maven/ant builds and shell scripts to code compliance report generation and integration testing on a production-like servers, setup via Jenkins builds. My time was also spent maintaining a few internal maven plugins which ranged in functionality from setting up Jenkins jobs to creating a visual of Jenkins build tree hierarchy to controlling a nexus repository.
I believe with the knowledge that I have in Jenkins, Maven and Git, I would be able to create a useful plugin which could make a real difference for open source community. I also have previous experience working with open source projects (Basie, now defunct) which involved communicating with team members across time zones.
Benefits to Debian
A plugin such as this will allow further insight into which java dependencies are truly open source which is of great value to the open source community and Debian itself. Considering the license detail that Debian adheres to when releasing packages, this will enable better control of what Debian can and should release.
I've been using Debian for more than 4 years and have always been interested in its packaging. It's probably one of the reasons why I was successful as a release engineer when I was working. I've also been following the testing branch of Debian since gnome 3 was introduced. I think its high time that I try to contribute.
The final deliverable will consist of a maven plugin. It can be setup to build one of the java libraries already available in Debian as an example. I suspect, creating the plugin will not take much time, but finding all the corner cases will. The plugin should not take more than a week or two to complete. If this involves creating the database of SCM/other info for Debian packages, it will take longer.
As for my availability. I should be able to start working immediately.
I will have 1 class in the summer and will be working part time. I don't forsee any big problems with this and a commitment to Debian.
I also play hockey once a week which should continue into the Summer. This shouldn't affect my commitment to Debian.
Other GSoC Applications