Many Debian packages produce logs. We'd like to make the logs we produce respect the privacy and reflect the needs of our users.

Logging Guidelines

Proposed guidelines for how debian packages should behave:

TODO

What things do we actually need

To make an abuse report to an ISP, you obviously need to know the IP, and they'll want to know the timestamp and some evidence:

Some things to avoid logging

Developer Resources

If your package produces logs:

TODO

Upstream Resources

If you're upstream, how can you make your package meet these same goals:

TODO

Reported bugs

Bugs have been reported on those packages to fix this:

TODO: create a usertag to track those instead of listing them in the wiki here.

Bugs to report

The following packages should be addressed:

TODO

References

EU member countries

(disclaimer: I am not a lawyer)

The EU Data Retention Directive was adopted in 2006; in 2014 it was ruled as 'invalid' by the EU Court of Justice.

When first introduced, some IT admins scrambled to increase the amount of logging they were doing - especially in Email services and such (is there an example in the BTS for Exim?) - in case they were (or soon would be) obliged to retain data and be ready to share it with authorities.

Some countries may have put data retention obligations into national law; I'm not sure how the EU ruling affects that.

In the UK, the draft Data Retention and Investigatory Powers Act 2014 clearly intends to work around the court ruling. It would allow a 'public telecommunications operator' to be served with a notice to retain for up to 1 year... (TL;DR) pretty much anything. (The DRIP Act actually says 'subscriber data' and 'traffic data', but references RIPA 21(4) for a definition of the latter. RIPA 21(4) says 'any traffic data comprised in or attached to a communication' so it doesn't seem limited to metadata).

The good news is that, even if this legislation is passed, it seems nobody in the UK is obliged to retain anything until served with a notice from the Secretary of State.

Furthermore, the UK's Data Protection Act requires that personally identifying data only be collected when it's necessary for a particular stated purpose, and that a data subject can request a copy of this data (which sounds like hard work to deal with), or request its deletion. Certainly for businesses, it's a liability to be collecting too much data in case it is compromised. Use this as an argument against management if you're being asked to retain a questionable amount of data.

I recall someone's (Google's?) legal argument that IP addresses in webserver logs are not personally identifying. But if the data being collected under the EU Directive was thought to be useful to law enforcement, then surely it is. We've seen that having enough data, it eventually becomes personally identifying, especially when cross-referenced with other sources.

In the IPv6 world, SLAAC addresses could include the OUI-48 (MAC) of a device, uniquely identifying someone's phone for example. It will be difficult to argue the Data Protection Act still doesn't apply to IPv6 addresses in logs.


stevenc: personally I suggest the more careful approach of not logging any more than you really want to, in any legal jurisdiction, until you become intimidated into doing so. Take a stand - don't make data retention the norm, so that if someone seeks to put it into law, make sure that's going to be difficult to enact and rarely complied with in practice. (IMHO the Data Protection Act is almost never complied with).

On your own, personal systems I don't see much reason to comply with data retention laws. You, or people you care about might be the only data subjects. Data coming from your own systems can't be trusted to defend you in court; but it can and will be used to incriminate you (e.g. recent cases of Google search queries used in UK and US courts to infer state of mind). Maybe log to tmpfs, if at all, and/or have logrotate (compress and) encrypt with gpg if you must keep it for a long time.