Differences between revisions 18 and 55 (spanning 37 versions)
Revision 18 as of 2007-04-06 09:32:42
Size: 10804
Comment:
Revision 55 as of 2009-03-16 03:30:58
Size: 6249
Editor: anonymous
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
problems with reliable and safe device locking. This paper enlightens the issues behind the scenes and presents possible future solutions. problems with reliable and safe device locking. This paper
collects our ponderings after having received this advise from Alan Cox
on LKML: http://lkml.org/lkml/2007/3/31/175 and having sincerely attempted to solve the problem in user space.
Line 10: Line 12:
Our original concern is
the influence of even read-only operations on optical media drives
(recorders) during their duty as recorders -- depending on the device
model such read-only work may interrupt the process badly practically
destroying the medium.
Our concern is the influence of even read-only operations on
optical media drives (recorders) during their duty as recorders --
depending on the device model such interference can spoil the
process of recording, eventually wasting the medium.
Line 16: Line 17:
Since many programs already do act on such devices in an unsafe manner,
either willingly (e.g. liblkid) or accidentally (e.g. hald, opening with
O_EXCL but still clashing with cdr applications working on the competing
sg driver), we see the need for reliable communication in order to
ensure proper device locking where appropriate, in a way which is
appropriate for the particular application. In the following document,
first the currently possible mechanisms are itemized with their
advantages and their problems, followed by a draft of a locking scheme
which shall cope with the particular requirements.
Since many programs already act on such devices we see the need for
reliable communication in order to allow proper device locking if
good will for cooperation is present.

But in short: Good will seems not to be enough. We failed to find a viable method for the nexessary coordination of the participants.
Line 30: Line 28:
=== General inter-process locking mechanisms === === Path/Inode based locking mechanisms ===
Line 32: Line 30:
In general, all the mechanisms listed below are not optimally appropriate for our purpose. They lack on two places which make then not reliable when used alone: In general, these mechanisms are not optimally appropriate for our purpose.
They use the filename or inode as identity. In our case this imposes problems:
but they lack on two places which make then not reliable when used alone:
 * they do not cope with multiple device files which imply the access to the same driver through different files
 * they do not automatically cope with multiple device '''drivers''' accessible through different co-existing user space interfaces, like with sg vs. sr drivers.
Line 34: Line 36:
 * they do not cope with multiple device file which imply the access to the same driver through different files
 * they do not automatically cope with multiple device '''drivers''' accessible through '''different''' user space interfaces, like with sg vs. sr drivers on Linux. No matter how many excuses some kernel developers do present to paper over this obvious shortcomings. Automatic use of /dev/sr instead of /dev/sg is not always possible or may not be wanted by the user.

Finally, they may be sufficient to lower the risk on inappropriate operation. Which exactly are available in the wild?

 * System V Semaphores

 Principle: a magic integer, "key" or "semid", identifies a set of state objects on which the atomic operations can be performed which are necessary for implementing a proper locking algorithm. See man semget(2), semop(2) SEM_UNDO.

 Pros: semaphores are originally designed for our purpose and they are very traditional Unix requisites.

 Cons:
  * the semaphore key must be systemwide unique for the set of lockable drives and all participating programs have to use the same key. This situation is prone to collisions with locking mechanisms for other system resources. Function ftok(3) is not a secure solution.
  * each device needs a fixely defined index number in the set of semaphores which are allocated system resources. So we can hardly span up a giant index space where we can map different device file classes to disjoint index intervals.
We evaluated:
Line 51: Line 40:
 Principle: an additional file is created during the action on the real target file.  Principle: an additional file is created during the action on the real target file. See http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES
Line 53: Line 42:
 Pros: regular filesystem operation, no additional infrastructure required  Pros:
  *
regular filesystem operation, no additional infrastructure required
Line 56: Line 46:
  * Possible races unless OS mechanisms are used for exclusive operation on the lock file, see below
Line 63: Line 52:
 Principle: lock applied on open file handles. Internally associated with a path, see fcntl(2) for details.  Principle: lock applied on open file handles. Thus probably refering to an inode. See fcntl(2) for details.
Line 66: Line 55:

  * known (POSIX.1-2001), usually reliable mechanism
  * POSIX
Line 70: Line 58:
  * needs open(2) as precondition which has to be avoided on unlocked device files
  * locks can be released inadvertedly by submodules which just open and close the same file (inode ?).
Line 71: Line 61:
  * diverges from flock() implementation on Linux, see below. Results in independent locking.
  * possible problems on network file systems

 * flock(2) exclusive file locking

 Principle: similar to fcntl locks, applied with a different system function.

 Pros: see fcntl(2) locking above

 Cons: like flock(2), but less portable, not working over network file systems
==== Other locking mechanisms ====
Line 82: Line 63:
==== Advanced Linux-specific locking mechanisms ====
Line 86: Line 65:
   Principle: passing of the O_EXCL flag to the open call. The device is
  
locked exclusively for the calling PID, the lock is maintained in the
  
device driver to the particular major/minor combination.
 Principle: passing of the O_EXCL flag to the open call of a device file. The device is locked exclusively for the calling PID, the lock is maintained in the device driver to the particular major/minor combination.
Line 91: Line 68:
  * reliable for a device accessible through one driver   * reliable advisory exclusive locking for a device within one device driver
Line 94: Line 71:
  * requires kernel 2.6.x (x>=7 or so)
  * does not automagicaly make the device inaccessible, only applications using O_EXCL will know about the locked state when getting negative result with EBUSY errno value.
  * for sr it requires kernel 2.6.x (x>=7 or so), with sg it might work on 2.4.
  * O_EXCL already has a meaning for software like libbklid and this is not the same as we would need.


 * System V Semaphores

 See man semget(2), semop(2) SEM_UNDO. They have been considered and rejected mainly because of too many potential device names which would need pre-allocated semaphore objects.

----------------------------------------------------------------

None of the mechanisms above solves the problem with the co-existing drivers for sr and sg, anyway.
 
Line 99: Line 86:
As explained in the introduction, the locking is important on optical media recording due to the delicate operation mode during the recording. Ideally, no application should touch them, even reading from the media is an evil task. But how does the state of the practice look like? As explained in the introduction, the locking is important on optical media
recording due to the delicate operation mode during the recording.
Ideally, no other application should touch them. Even reading info from the
drive can spoil the recording run.
Currently we are aware of at least the following participants in drive
collisions. They take differing precautions for this case, of which none
is really able to prevent inadverted open(2) of a busy drive under all
circumstances.
Line 101: Line 95:
 * mount: the block device is mounted with the O_EXCL flag '''BUT''' the mount executable also uses '''libblkid''' which opens the devices without locking and read magic data from it. This also provides no solution for operation through the sg driver.
 
 * hald (HAL daemon): periodically opens the cdrom block devices with O_EXCL flag. Clashes with operation on sg is possible.
 * mount: the block device is mounted with the O_EXCL flag but the mount executable also uses libblkid which opens the devices without locking and reads magic data from it. (The problem is not with mutual exclusion of mount(8) and burn programs but with libblkid justifiably misunderstanding the meaning of our O_EXCL lock.)

 * hald (HAL daemon): frequently opens the block devices with O_EXCL flag.
Line 109: Line 103:
 * cdrskin (via libburn): opens the devices with O_EXCL flag. It uses a unique device file path for serious operations on the drive. This is /dev/sg* on kernel 2.4, and recently has become /dev/sr* on kernel 2.6. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.  * cdrskin (via libburn): opens the devices with O_EXCL flag. It uses only /dev/sr* exor /dev/hd* for serious operations on the drive. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.
Line 111: Line 105:
 * cdrecord: no locking. Author recommends to get rid of applications which may touch the device somehow.  * cdrecord: no locking. Author recommends to do it like Solaris does (which seems to do explicite locking, maintained internally on device driver or on major/minor pairs).
Line 113: Line 107:
=== Proposed general locking algorithms === Any of the listed programs is currently able to spoil a recording run
just by its proper operation if only the circumstances are unfortunate enough.
This compilation is mostly heuristic and may be erroneous in details.
Whatever, the problems and the users' disappointment are real.
Line 115: Line 112:
=== What can be done within the CD/DVD tool community === === Hopeless proposal of a locking algorithm ===
Line 117: Line 114:
The following method is proposed to create a midway between the limitations of kernel and the requirements of others, also unifying the way of dealing with the device locks. We developed in dialog with Ted T'so a proposal which would nearly fulfill the coordination needs of good willing programs. Nearly. But not sufficiently and with substantial effort.
Line 119: Line 116:
The locking methods with additional lock files are identified as very cumbersome because of inconsistent security settings on different applications, see above. Instead, this proposal relies on a two-step locking method and does not yet solve the ambiguity problem of sr-scd-sg paths: We finally failed due to the coarseness of O_EXCL and the implementation of fcntl(F_SETLK) which is not really suitable for a modular software architecture.
Line 121: Line 118:
 1. Open the device. For applications that operate in a delicate way (burning tools), O_EXCL shall be set. For others, it may be omited.
 2. Set or check the additional fcntl lock on the device file. It must be exclusive! Sample code (by Thomas Schmitt)
{{{
        struct flock lockthing;
        ...
        f = open(device, mode|O_EXCL);
        if (f != -1) {
             memset(&lockthing, 0, sizeof(lockthing));
             lockthing.l_type = F_WRLCK;
             lockthing.l_whence = SEEK_SET;
             lockthing.l_start = 0;
             lockthing.l_len = 0;
             if (fcntl(f, F_SETLK, &lockthing)) {
                close(f);
                /* user feedback, report error, etc... */
                f = -1;
             }
        }
}}}

Normally, fcntl(2) imposes advisory locking. But the sysadmin can make this mandatory locking by a mount option. (Is /dev mount(8)ed at all ?)

Unique path resolution is necessary so all interested processes come together at the same inode where this locking is guaranteed to collide. (Does F_SETLK work with all the implementations of directory /dev/ ?)

Links or additional device paths created by mknod(1) can be translated into /dev/sg* resp. /dev/sr* by help of call stat(2) and its result element .st_rdev. But the jump from /dev/sg* to /dev/sr* is not possible via stat(2). For that we need open(2) O_RDONLY, ioctl(SCSI_IOCTL_GET_IDLUN), close(2).

The translation is done by obtaining info from the given path, by iterating over the desired device paths /dev/hd%c , /dev/sg%d , /dev/sr%d, and by comparing their info with the one we look for. Kernel 2.6: If the result is a /dev/sg%d then it has to be translated into a /dev/sr%d in another step.

NOTE: there are sysfs symlinks that set up a path usable to map exactly. However, this depends on a mounted sysfs and the required symlinks have also been declared deprecated in the recent Linux kernel versions.

Kernel 2.4 imposes the problem that ioctl(SG_IO) is not possible with sr, so most of the burn programs have to use sg. But growisofs uses sr via ioctl(CDROM_SEND_PACKET) which does not work with sg. We will possibly not come to a completely sufficient agreement under these circumstances. Well, we 2.4ers are used to suffer neglect. (sob, hehe, see below)

== Obstacles for using FHS compliant /var/lock/ files ==

First: races

Second: unclear or unreliable cleanup technique, dangling bad lockfiles possible

Third: The most obvious problem is the usual permission setting of /var/lock :
{{{
SuSE 9.0 (kernel 2.4):
  drwxrwxr-x 3 root uucp 4096 Apr 4 05:07 /var/lock
SuSE 9.3 (kernel 2.6):
  drwxrwxr-t 4 root uucp 4096 2007-04-04 17:50 /var/lock
| Fedora Core 3.x:
| drwxrwxr-x 5 root lock 4096 Apr 4 04:03 /var/lock
| Debian gives rw-permission to anybody and thus would be no problem.
}}}
This system may work with the plain old UUCP program and few others programs with low device opening activity AND administrated by root but is a real PITA nowadays.

=== And the better solution would be ... ===

* to adopt the FHS idea of locking a proxy before any open(2)
is performed, but to avoid the known drawbacks of FHS /var/lock/
protocol.

* to allow the use of any of the sg, sr, scd device drivers at
the discretion of the programs.

----------------------------------------------------------------------------

=== Proposal for unambigous advisory device locking on Linux kernel 2.4 and 2.6 ===

It is inspired in part by traditional UUCP locking in /var/lock/ as described
by http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES

----------------------------------------------------------------------------
See the detailed specification and declaration of failure at
http://libburnia.pykix.org/browser/libburn/trunk/doc/ddlp.txt?format=txt

On Locking Schemes on Linux Device Drivers

Hello fellow application developer or maintainer,

recently we (cdrkit and libburnia developers) came accross increasing problems with reliable and safe device locking. This paper collects our ponderings after having received this advise from Alan Cox on LKML: http://lkml.org/lkml/2007/3/31/175 and having sincerely attempted to solve the problem in user space.

Introduction

Our concern is the influence of even read-only operations on optical media drives (recorders) during their duty as recorders -- depending on the device model such interference can spoil the process of recording, eventually wasting the medium.

Since many programs already act on such devices we see the need for reliable communication in order to allow proper device locking if good will for cooperation is present.

But in short: Good will seems not to be enough. We failed to find a viable method for the nexessary coordination of the participants.

State of the practice

There are various locking techniques used in other areas which are more or less applicable in our case.

Path/Inode based locking mechanisms

In general, these mechanisms are not optimally appropriate for our purpose. They use the filename or inode as identity. In our case this imposes problems: but they lack on two places which make then not reliable when used alone:

  • they do not cope with multiple device files which imply the access to the same driver through different files
  • they do not automatically cope with multiple device drivers accessible through different co-existing user space interfaces, like with sg vs. sr drivers.

We evaluated:

  • Lock files associated with target file

    Principle: an additional file is created during the action on the real target file. See http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES Pros:

    • regular filesystem operation, no additional infrastructure required
    Cons:
    • The location and name of the lock file need to be known and discussed upfront among all application developers, or be documented excessively
    • Permission problems may disallow the creation of lock files (security issues), especially for self-compiled applications and having no root permissions to install them in a required way
    • Special precautions are necessary against stale locks
  • fcntl(2) exclusive file locking Principle: lock applied on open file handles. Thus probably refering to an inode. See fcntl(2) for details. Pros:
    • POSIX
    Cons:
    • needs open(2) as precondition which has to be avoided on unlocked device files
    • locks can be released inadvertedly by submodules which just open and close the same file (inode ?).

Other locking mechanisms

  • O_EXCL locking Principle: passing of the O_EXCL flag to the open call of a device file. The device is locked exclusively for the calling PID, the lock is maintained in the device driver to the particular major/minor combination. Pros:
    • reliable advisory exclusive locking for a device within one device driver
    Cons:
    • for sr it requires kernel 2.6.x (x>=7 or so), with sg it might work on 2.4.

    • O_EXCL already has a meaning for software like libbklid and this is not the same as we would need.
  • System V Semaphores See man semget(2), semop(2) SEM_UNDO. They have been considered and rejected mainly because of too many potential device names which would need pre-allocated semaphore objects.


None of the mechanisms above solves the problem with the co-existing drivers for sr and sg, anyway.

Applicability on CD/(HD)DVD/BD drives

As explained in the introduction, the locking is important on optical media recording due to the delicate operation mode during the recording. Ideally, no other application should touch them. Even reading info from the drive can spoil the recording run. Currently we are aware of at least the following participants in drive collisions. They take differing precautions for this case, of which none is really able to prevent inadverted open(2) of a busy drive under all circumstances.

  • mount: the block device is mounted with the O_EXCL flag but the mount executable also uses libblkid which opens the devices without locking and reads magic data from it. (The problem is not with mutual exclusion of mount(8) and burn programs but with libblkid justifiably misunderstanding the meaning of our O_EXCL lock.)
  • hald (HAL daemon): frequently opens the block devices with O_EXCL flag.
  • wodim: opens the devices with O_EXCL flag. Opening /dev/sg is possible and happens more likely with versions prior to 1.1.4.
  • growisofs: opens the block devices with O_EXCL flag. Opening /dev/sg was never encouraged and does not work on kernel 2.4 (not tested yet on 2.6).
  • cdrskin (via libburn): opens the devices with O_EXCL flag. It uses only /dev/sr* exor /dev/hd* for serious operations on the drive. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.
  • cdrecord: no locking. Author recommends to do it like Solaris does (which seems to do explicite locking, maintained internally on device driver or on major/minor pairs).

Any of the listed programs is currently able to spoil a recording run just by its proper operation if only the circumstances are unfortunate enough. This compilation is mostly heuristic and may be erroneous in details. Whatever, the problems and the users' disappointment are real.

Hopeless proposal of a locking algorithm

We developed in dialog with Ted T'so a proposal which would nearly fulfill the coordination needs of good willing programs. Nearly. But not sufficiently and with substantial effort.

We finally failed due to the coarseness of O_EXCL and the implementation of fcntl(F_SETLK) which is not really suitable for a modular software architecture.

See the detailed specification and declaration of failure at http://libburnia.pykix.org/browser/libburn/trunk/doc/ddlp.txt?format=txt