On Locking Schemes on Linux Device Drivers

Hello fellow application developer or maintainer,

recently we (cdrkit and libburnia developers) came accross increasing problems with reliable and safe device locking. This paper enlightens the issues behind the scenes and presents possible future solutions.

Introduction

Our original concern is the influence of even read-only operations on optical media drives (recorders) during their duty as recorders -- depending on the device model such read-only work may interrupt the process badly practically destroying the medium.

Since many programs already do act on such devices in an unsafe manner, either willingly (e.g. liblkid) or accidentally (e.g. hald, opening with O_EXCL but still clashing with cdr applications working on the competing sg driver), we see the need for reliable communication in order to ensure proper device locking where appropriate, in a way which is appropriate for the particular application. In the following document, first the currently possible mechanisms are itemized with their advantages and their problems, followed by a draft of an incomplete locking scheme of limited system impact and of a complete one which needs coordination with the Linux community, nevertheless.

State of the practice

There are various locking techniques used in other areas which are more or less applicable in our case.

General inter-process locking mechanisms

In general, all the mechanisms listed below are not optimally appropriate for our purpose. They lack on two places which make then not reliable when used alone:

Finally, they may be sufficient to lower the risk on inappropriate operation. Which exactly are available in the wild?

Advanced Linux-specific locking mechanisms

Applicability on CD/(HD)DVD/BD drives

As explained in the introduction, the locking is important on optical media recording due to the delicate operation mode during the recording. Ideally, no application should touch them, even reading from the media is an evil task. But how does the state of the practice look like?

read magic data from it. This also provides no solution for operation through the sg driver.

device file path for serious operations on the drive. This is /dev/sg* on kernel 2.4, and recently has become /dev/sr* on kernel 2.6. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.

Proposed general locking algorithms

What can be done with device files alone ?

The following method is proposed to create a midway between the limitations of kernel and the requirements of others, also unifying the way of dealing with the device locks.

The /var/lock locking method with proxy lock files was identified as obstacle because of restrictive security settings on popular Linux distributions (see above). Instead, this proposal relies on a two-step locking method directly on the device file and solves the ambiguity problem of sr-scd-sg paths at the risk that open(O_RDONLY); ioctl(SCSI_IOCTL_GET_IDLUN) could be harmful to running burner activities.

  1. Open the device. For applications that operate in a delicate way (burning tools), O_EXCL shall be set. For others, it may be omited.
  2. Set or check the additional fcntl lock on the device file. It must be exclusive! Sample code (by Thomas Schmitt)

        struct flock lockthing;
        ...
        f = open(device, mode|O_EXCL);
        if (f != -1) {
             memset(&lockthing, 0, sizeof(lockthing));
             lockthing.l_type = F_WRLCK;
             lockthing.l_whence = SEEK_SET;
             lockthing.l_start = 0;
             lockthing.l_len = 0;
             if (fcntl(f, F_SETLK, &lockthing)) {
                close(f);
                /* user feedback, report error, etc... */
                f = -1;
             }
        }

Normally, fcntl(2) imposes advisory locking. But the sysadmin can make this mandatory locking by a mount option. (Is /dev mount(8)ed at all ?)

Unique path resolution is necessary so all interested processes come together at the same inode where this locking is guaranteed to collide. (Does F_SETLK work with all the implementations of directory /dev/ ?)

Links or additional device paths created by mknod(1) can be translated into /dev/sg* resp. /dev/sr* by help of call stat(2) and its result element .st_rdev. But the jump from /dev/sg* to /dev/sr* is not possible via stat(2). For that we need open(2) O_RDONLY, ioctl(SCSI_IOCTL_GET_IDLUN), close(2).

The translation is done by obtaining info from the given path, by iterating over the desired device paths /dev/hd%c , /dev/sg%d , /dev/sr%d, and by comparing their info with the one we look for. Kernel 2.6: If the result is a /dev/sg%d then it has to be translated into a /dev/sr%d in another step.

NOTE: there are sysfs symlinks that set up a path usable to map exactly. However, this depends on a mounted sysfs and the required symlinks have also been declared deprecated in the recent Linux kernel versions.

Kernel 2.4 imposes the problem that ioctl(SG_IO) is not possible with sr, so most of the burn programs have to use sg. But growisofs uses sr via ioctl(CDROM_SEND_PACKET) which does not work with sg. We will possibly not come to a completely sufficient agreement under these circumstances. Well, we 2.4ers are used to suffer neglect. (sob, hehe, see below)

Obstacles for using FHS compliant /var/lock/ files

First: races

Second: unclear or unreliable cleanup technique, dangling bad lockfiles possible

Third: The most obvious problem is the usual permission setting of /var/lock :

SuSE 9.0 (kernel 2.4):
  drwxrwxr-x    3 root     uucp         4096 Apr  4     05:07 /var/lock
SuSE 9.3 (kernel 2.6):
  drwxrwxr-t    4 root     uucp         4096 2007-04-04 17:50 /var/lock
Fedora Core 3.x:
  drwxrwxr-x    5 root     lock         4096 Apr  4     04:03 /var/lock
Debian gives rw-permission to anybody and thus would be no problem.

This system may work with the plain old UUCP program and few others programs with low device opening activity AND administrated by root but is a real PITA nowadays.

And the better solution would be ...

* to adopt the FHS idea of locking a proxy before any open(2) is performed, but to avoid the known drawbacks of FHS /var/lock/ protocol.

* to allow the use of any of the sg, sr, scd device drivers at the discretion of the programs.


Proposal for unambigous advisory device locking on Linux kernel 2.4 and 2.6

It is inspired in part by traditional UUCP locking in /var/lock/ as described by http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES


Compliant processes apply open(2) to suspected CD/DVD burner device files only if they are able to do this via one of the following paths:

and only after they have obtained a lock on them. (N= 31 or 255 ?) Further precautions like open(O_EXCL) or fcntl(F_SETLK) on the device file are allowed. Programs should offer expert options to disable them, though.

Locking is performed similar to UUCP tradition but without the potential race conditions or potential stale locks: Other than with FHS /var/lock, not the mere existence of the lock file establishes the lock state. It is instead implemented by open(2) with O_RDWR and then fcntl(2) with F_SETLK. The lock file descriptor is held open until the lock is obsolete.

To circumvent the sg-sr-scd ambiguity, those devices must get locked in triples. (One could even argue in favor of sg-sr-scd-st-sd locking.) The triples are formed from those device files which have the same SCSI parameters Host,Channel,Id,Lun from ioctl(SCSI_IOCTL_GET_IDLUN). Since this needs open(2), the search has to be accompanied by locking of the tested files (as singles, not as triples). Triple locking imposes the risk that of two simultaneous contestants none will get the lock. It does not impose other race condition risks.

Paths other than the permissible ones have to be translated. The call stat(2) with its result element .st_rdev allows to search a matching device file among the permissible ones. So /dev/nec_burner can be translated to one of /dev/sr1 , /dev/sg2, /dev/hdd. (If not, then it is hardly a burner device.)

A special case of sg-sr-scd triple searching is the translation of traditional Bus,Target,Lun addresses as of program cdrecord. "Bus" we get from ioctl(SCSI_IOCTL_GET_BUS_NUMBER), "Target","Lun" is id and lun from ioctl(SCSI_IOCTL_GET_IDLUN).

(ATA: Bus,Target,Lun may be translated literally: /dev/hd 'a' + 2*bus + target)


All we need for this is a directory which is present on any Linux system and is supposed to offer rwx-permissions to anybody who is allowed to access the devices.

As an application programmer i would propose /tmp/ and some file name prefix. It would work, after all. Possibly one would have to remove the lock file after releasing the lock. That would play nice with the t-permission.

To perform the sketched algorithm in /var/lock would violate FHS. The often restrictive permission settings of /var/lock would also make necessary an additional rule: A missing lock file which cannot be created allows to use the device as if a lock had been granted. (Provident sysadmins would then once create the lock files in /var/lock/ and allow rw-permission for all intended users.)

This is where we should ask the broad Linux public for opinions and advise. We are not much in a hurry and therefore should ponder duely over any aspect.

E.g. shall we include st and sd into the locking range ? It would not touch the devices themselves but would make users of sg aware of them. It is orthogonal to our core topic of CD/DVD drives since they are not supposed to appear as st or sd.