Summary

The proposal is to automatically produce debugging symbols for everything in the archive, without the developers needing to add -dbg packages everywhere, which are rarely used and right now we mirror everywhere.

Status

Created: ?EmilioPozueloMonfort

Target: Squeeze

Why

This thread in debian-devel expresses the need of this change. The reasons mentioned there include:

We don't want to completely remove them without having a replacement though. They are very useful when the time arises. The proposed replacement is to automatically build .ddeb packages that contain those debug symbols, and that are moved to a separate component/archive that isn't mirrored. That way we get debugging symbols for every binary in the archive with no effort, and solve the mirror problems.

Design

.ddeb packages will be automatically created by helper tools.

dak will accept .ddeb packages and move them to a repository that isn't mirrored by default.

A share would be provided shipping all the debugging symbols that use build ids (so it's easy to mount it in /usr/lib/debug/.build-id/ and have all the debugging symbols available to debuggers and anything that needs them).

Implementation

Helper tools (e.g. debhelper and CDBS) will be modified to build a .ddeb package(only if there are architecture-dependent packages for the architecture they are being built). It would be done incrementally by only building .ddebs if debhelper's compat is strictly greater than (say) 7 (in the case of debhelper and CDBS using debhelper.mk). This way we don't suddenly risk breaking lots of packages. The transition would be as easy as bumping debhelper's compat (and checking that everything is fine, as usual when you bump it). Modifying helper tools means we build .ddebs everywhere, and not only on buildds, which is good because of reproducibility, no hacks on the buildds, no buildd changes needed, no need for source-only uploads.

In the case of packages already shipping -dbg packages, a .ddeb won't be automatically created for them. In the case the -dbg packages only ship debugging symbols, it's as easy as removing those packages, and automatically the .ddeb will start to be created. However it may be the case that those -dbg packages don't (only) ship debugging symbols (e.g. python2.5-dbg ships an interpreter). In those cases, it would be possible to set a variable or pass an option to debhelper so that it creates a .ddeb anyway. If the package was also shipping debugging symbols, it could stop doing so as they will now be in the .ddeb.

The .ddebs would be added to the .changes file and uploaded to the archive together with all the other files. Then dak would put them in a separate section in the archive, or move them to a complete different archive.

Migration

The migration to .ddebs will in most cases only need a debhelper compat bump, and in the case of sources already building -dbg packages, those will need to be removed. This will need little manual work at the beginning as to not make disruptive changes, but it will be a one-time change.

Getting rid of -dbg packages is an ftpmaster goal.

DDeb Format

The format of .ddeb packages is exactly the same of .deb packages.

.ddeb packages will be named ${sourcepackage}-ddeb_${version}_{arch}.ddeb. This is the same way .udeb packages are named. This way dpkg won't need any changes, they will be installed as normal packages.

The content of .ddeb packages is restricted to debug info. There is no formal restriction as to where files can be shipped or anything. Common sense should apply and the previous rule (debug info) should be followed.

/usr/share/doc/$package/ can be shipped as usual.

Build IDs

With gcc already using build ids by default (--build-id option to ld), we can serve the symbols directly unpacked through a share, so that a user can automatically (virtually) have all the debugging symbols, which would then be downloaded on the fly as needed.

Debugging symbols using build ids load much faster in debuggers (e.g. gdb) since these don't need to checksum every .debug file to see if it's OK (gdb currently calculates the CRC32 for every .debug file it loads).

The build ids work the following way: when an object is linked, ld writes a checksum of the binary in a note in the binary header. That can be used by tools (e.g. gdb already understands it) to look for debugging symbols in a unique path for that binary. (E.g. right now we put symbols on /usr/lib/debug/$path. Using build ids, the symbols would be put in /usr/lib/debug/.build-id/ab/cdef1234.debug, where abcdef1234 is the hash of the binary, which gdb (or other tools) would look for after reading the Build ID note in the binary). This way it's possible to ship debugging symbols for several versions of the same binary/library through the share, and the correct one would be picked up. A very useful feature of this would be that you can mount our share, and automatically get debugging symbols for everything, even if you have packages outdated (depending on how outdated they are). We could also integrate tools like bug-buddy or drkonqi, that catch crashes and produce backtraces, to mount the share to get symbols for everything and produce useful backtraces. Fedora already uses --build-id by default (since 2007 or so). Not sure about SuSE, but they have this.

File Conflicts

The file conflicts issue is interesting. Since we ship debugging symbol files in a path that is unique only due to the binary build id, there's a chance that two source packages ship files with the same build id, thus having a file conflict in the respective .ddebs. This can happen for two reasons:

If a source package is built multiple times with the result of each built shipped in a different binary packages, it's possible that some files are the same and thus have the same build id. This wouldn't be a problem, since their debugging symbols would be the same, so we only need to ship one copy of them. In this case, the conflict is not a problem but an advantage.

Helper tools

There is no reliable way we can build .ddebs for every package out there. No matter what we choose, some special package will do it in a way that it doesn't work.

A good candidate is to use debhelper, which is used by about 97.4% of the packages in the archive (this includes CDBS packages through debhelper.mk).

For packages not using debhelper or any helper tool, automatizing it depends pretty much on how each of those packages is built.

Other options, like patching/diverting strip, objcopy, or other lower-level tools, are crazy and not reliable anyway. So with going on a higher-level like debhelper we get simplicity at the cost of not covering 100% of the archive. It's a reasonable price to pay, specially since other approaches don't guarantee 100% coverage.

Future improvements

Size

Debugging symbols take a big amount of size compared right now. -dbg packages are sometimes pretty big. Reducing it is good for the archive where they are stored, for the archive and the users' bandwidth, disk space...

Two (non-exclusive) sides can be improved:

This is not a requirement for .ddebs though, so it may be investigated after .ddebs are in place.

Release Notes

Discussion