Home Directory Cleaner
Name: Erich Schubert
Email: ErichSchubert erich@d.o IRC: erich @ OFTC
Background: Debian Developer for 10+ years, PhD student in Computer Science (Data Mining, Clustering, Outlier Detection), (rare) GNOME contributor: erichs.
Project title: Home directory cleaning assistance
Synopsis: Package managers such as dpkg never touch /home, for good reasons (for example, the home directory may be shared across multiple machines, and the admin and user do not have to be the same person ...).
However, when uninstalling applications, data remains in /home, and the user may or may not want to keep it. The scope of this utility is to give the user a tool to clean up his home directory from such leftover data (or just identify which files might belong to which application, for that matter). Note that we are NOT talking about automatic removal, but about a UI utility to "manually" do this on-demand.
A solid solution should be well integrated with package management and supported by Linux standards organizations such as Freedesktop.
The proposal can probably be best explained with this 'mockup' dialog message:Since you do not seem to have Gimp installed anymore, do you want to delete the folder ~/.gimp-2.4, which contains 600k of data. Delete / Backup / Keep / Examine Folder
- allow the user to remove caches and cookies (e.g. google earth defaults to a huge cache of image tiles)
- allow the user to reset the configuration of an application
- allow the user to 'archive' the configuration and/or data of a single application (and e.g. transfer it to a different computer or account - note that this doesn't yet fix any file names or include e.g. gconf settings)
- identification of which application a particular file belongs to
- identifying users of a particular application (so the administrator can check whether or not to uninstall the application, or can inform users about pending chances in the system, e.g. notify them of a major software upgrade. Note that there are privacy issues associated with this use case)
- showing files that are not managed by dpkg (this requires an extension of the codebase to read /var/lib/dpkg/info/*.*files)
- helping the administrator to clean /etc and/or /var
Benefits to Debian: A useful administration utility for end users, using some features from Debians powerful package management.
- It was suggested that this should be put into Debian policy (like e.g. menu files). That is true, however the Debian policy part should ideally only read "packages should ship accurate Freedesktop uninstall information". An insular Debian-only solution is not sustainable, but the goal is to have wide upstream support.
Deliverables:
- Specification file format discussed with Freedesktop, GNOME, KDE;
- Python library to process these files
- User GUI frontend with PyGTK
- Dpkg integration to cache specification files of previously installed packages
- A set of 'example' specification files for common software (most software that I use myself), to kick-start the applications usefulness.
- Admin mode to show changed and unknown configuration files in /etc and unclaimed files in /var
- Archive/Backup/Reset modes for managing configurtion files of installed software.
Project details: While this tool is obviously interesting to be used with any kind of package management or even without any package management, I think Debian has some features that allow better integration.
The key point here are DpkgTriggers. The cleaning application can setup a dpkg file trigger on a directory (e.g. /usr/share/uninstall-information) and cache any file that is installed there (so it persists when a package is uninstalled; the cache can be cleared e.g. by purging the home directory cleaner package).
So once enough packages ship this kind of information (e.g. because it has become a freedesktop.org standard, and is endorsed by GNOME and KDE - this is a key objective), the utility will basically have a database of which files in $HOME might belong to which application.
When the user then runs the application, it will scan for such patterns (e.g. .gimp-*) and compare them with installed packages ("*gimp*") and applications in $PATH (to not miss e.g. /usr/local/bin/gimp). Files that are matched with known applications but no application found installed are then presented to the user along with:- an explanation which application might have generated the files
- when the files were last used (Bonus: check if filesystem is atime-enabled)
- the space savings achieved by removing them
- a backup option, to in a first step just tar them into a backup archive, and only delete them later when they've proven obsolete
- A file format standard to specify 'application file name patterns' along with use ('cache data', 'configuration data'), detection information (package name, binary names, folder names) that is to be submitted and discussed with Freedesktop.org, GNOME and KDE (XFCE, ... are also welcome of course - the more the merrier)
- An admin application that builds a 'cache' of formerly installed and currently installed applications, integrated with Debian package management
- A library to process the patterns, detect installation, etc.
A user frontend application that assists the user in deciding which files to delete (or backup first)
Also note that this application differs from existing 'cleaner' applications such as: fdupes, bleachbit, fslint, detox, sweeper, scleaner, gconf-cleaner which do not use such a database of patterns, but instead search for files with identical contents (e.g. fdupes), empty files, generic leftover file patterns (e.g. .bak, *~) etc. - the application most similar is probably bleachbit, which is designed to remove caches and cookie files of known applications such as browsers (it does obviously use some patterns for that, but these are part of the application itself). It does however not track the installed/uninstalled status of applications. However, the functionality of bleachbit might be an interesting addition to the home directory cleaner (with the difference that deleting the browsers cache may be interesting even when the browser is still installed).
Another note: in general, the (frontend) application will favor to NOT delete files. So when e.g. the game 'epiphany' (installed in /usr/games/epiphany) is incorrectly identified as a binary for the browser 'epiphany' (packaged as epiphany-browser, but installing /usr/bin/epiphany) it is preferrable to not suggest deleting the browsers files. The other way round - deleting the browsers files because of not detecting a custom browser installation e.g. in $HOME/.bin/epiphany - would be much worse.
Project schedule:
- Week 1-2: Prototyping of UI, patterns (from my own cluttered home directories on various systems) and file format will start in parallel.
- Week 3-4: Once something 'presentable' is there, the results will be discussed with Freedesktop.org, GNOME and KDE to get a shared file format.
- Week 5-6: Integration with package management (admin application part), during which also the patterns will then be prepared for submission to upstream and package maintainers.
- Week 7: documentation and cleanup.
Other summer plans: I don't have any summer plans so far (except pentecost weekend, 3 days, and my university research), so I'm available during the full timeframe. There might be a conference interruption such as e.g. SigKDD09 in Paris, June 28th-July 1st as part of my university research, but nothing bigger.
Exams and other commitments: I will not have exams until 2011, but the occassional conference deadline will push my work to the night hours every now and then.
If you are a Debian Developer: I've been rather inactive the last years in Debian, so I hope this project will make me more active again on the long run. One thing I find particularly interesting for Debian is that this project is bound to pushing these ideas 'upstream' to Freedesktop and GNOME, and maybe help making Debian more visible again in the 'Desktop' development field, where much attention had been drawn by Ubuntu. I also like that this project is not just a userspace application, but one that actually benefits from having a powerful package manager behind it, connecting and linking projects.