Differences between revisions 42 and 55 (spanning 13 versions)
Revision 42 as of 2007-04-07 13:37:19
Size: 10307
Comment:
Revision 55 as of 2009-03-16 03:30:58
Size: 6249
Editor: anonymous
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
on LKML: http://lkml.org/lkml/2007/3/31/175 on LKML: http://lkml.org/lkml/2007/3/31/175 and having sincerely attempted to solve the problem in user space.
Line 14: Line 14:
depending on the device model such read-only work may interrupt the
process badly, spoiling the result, eventually wasting the medium.
depending on the device model such interference can spoil the
process of recording, eventually wasting the medium.
Line 19: Line 19:
good will for cooperation is present. After such a locking mechanism is
implemented, we will invite any project to join it.
good will for cooperation is present.
Line 22: Line 21:
In the following document, at first a few possible mechanisms are
evaluated. Then a suitable locking algorithm is proposed which needs
coordination with the Linux community, nevertheless.
But in short: Good will seems not to be enough. We failed to find a viable method for the nexessary coordination of the participants.
Line 26: Line 23:
The remaining open questions is:
 ''Where to create lock files under a protocol that is not (yet) covered by FHS ?''

We do not want to make this choice a reason for the rejection of our proposal.
Line 49: Line 42:
 Pros: regular filesystem operation, no additional infrastructure required  Pros:
  *
regular filesystem operation, no additional infrastructure required
Line 52: Line 46:
  * Possible races unless OS mechanisms are used for exclusive operation on the lock file, see below
Line 62: Line 55:

  * known (POSIX.1-2001), usually reliable mechanism
  * POSIX
Line 66: Line 58:
Line 68: Line 59:
  * locks can be released inadvertedly by submodules which just open and close the same file (inode ?).
Line 76: Line 68:
  * reliable for a device accessible through one driver   * reliable advisory exclusive locking for a device within one device driver
Line 79: Line 71:
  * for sr it requires kernel 2.6.x (x>=7 or so), with sg it might work on 2.4. 
  * it does not solve the problem with the co-existing drivers for sr and sg
  * for sr it requires kernel 2.6.x (x>=7 or so), with sg it might work on 2.4.
  * O_EXCL already has a meaning for software like libbklid and this is not the same as we would need.
Line 84: Line 77:
 See man semget(2), semop(2) SEM_UNDO. They have been considered and rejected
mainly because of too many potential device names which would need
pre-allocated semaphore objects.
 See man semget(2), semop(2) SEM_UNDO. They have been considered and rejected mainly because of too many potential device names which would need pre-allocated semaphore objects.

----------------------------------------------------------------

None of the mechanisms above solves the problem with the co-existing drivers for sr and sg, anyway.
 
Line 99: Line 95:
 * mount: the block device is mounted with the O_EXCL flag but the mount executable also uses libblkid which opens the devices without locking and reads magic data from it. (We understand that for the duration of a recording run, mounting should best be prevented in total.)
 
 * mount: the block device is mounted with the O_EXCL flag but the mount executable also uses libblkid which opens the devices without locking and reads magic data from it. (The problem is not with mutual exclusion of mount(8) and burn programs but with libblkid justifiably misunderstanding the meaning of our O_EXCL lock.)
Line 107: Line 103:
 * cdrskin (via libburn): opens the devices with O_EXCL flag. It uses a unique device file path for serious operations on the drive. This is /dev/sg* on kernel 2.4, and recently has become /dev/sr* on kernel 2.6. The only other permissible paths are /dev/hd* . Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.  Mapping to the unique device names is strictly enforced.  * cdrskin (via libburn): opens the devices with O_EXCL flag. It uses only /dev/sr* exor /dev/hd* for serious operations on the drive. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.
Line 116: Line 112:
== Obstacles for using FHS compliant /var/lock/ files == === Hopeless proposal of a locking algorithm ===
Line 118: Line 114:
First: race conditions We developed in dialog with Ted T'so a proposal which would nearly fulfill the coordination needs of good willing programs. Nearly. But not sufficiently and with substantial effort.
Line 120: Line 116:
Second: unclear or unreliable cleanup technique, dangling bad lockfiles possible We finally failed due to the coarseness of O_EXCL and the implementation of fcntl(F_SETLK) which is not really suitable for a modular software architecture.
Line 122: Line 118:
Third: The most obvious problem is the usual permission setting of /var/lock :
{{{
SuSE 9.0 (kernel 2.4):
  drwxrwxr-x 3 root uucp 4096 Apr 4 05:07 /var/lock
SuSE 9.3 (kernel 2.6):
  drwxrwxr-t 4 root uucp 4096 2007-04-04 17:50 /var/lock
Fedora Core 3.x:
  drwxrwxr-x 5 root lock 4096 Apr 4 04:03 /var/lock
Debian gives rw-permission to anybody and thus would be no problem.
}}}

=== Proposed locking algorithm ===

It adopts the FHS idea of locking a proxy before any open(2)
is performed, but avoids the known drawbacks of the FHS /var/lock/
protocol.

It is designed to allow the use of any of the sg, sr, scd device drivers at
the discretion of the programs. It is also designed to include the less
ambiguous situation of drive access via /dev/hd*.

----------------------------------------------------------------------------

Compliant processes apply open(2) to suspected CD/DVD burner device files
only if they are able to do this via one of the following paths:
 /dev/sg[0..N] , /dev/scd[0..N] , /dev/sr[0..N] , /dev/hd[a-z]
and only after they have obtained a lock on them. (N= 31 or 255 ?)

Locking is performed similar to UUCP tradition but without the potential race
conditions or potential stale locks: Other than with FHS /var/lock, not the
mere existence of the lock file establishes the lock state. It is instead
implemented by open(2) with O_RDWR and then fcntl(2) with F_SETLK. The lock
file descriptor is held open until the lock is obsolete.

Paths other than the permissible ones have to be translated. The call stat(2)
with its result element .st_rdev allows to search a matching device file among
the permissible ones. So /dev/nec_burner can be translated to exactly one of
/dev/sr0, /dev/sg2, /dev/hdd. (If not, then it is hardly a burner device.)

To circumvent the sg-sr-scd ambiguity, those devices must get locked in all
their three permissible path instances.
E.g. not only /dev/sr0 has to be locked before open(2) for serious usage is
allowed, but also /dev/sg2 and /dev/scd0.

The device triples are formed from those device files which have the same SCSI
parameters Host,Channel,Id,Lun from ioctl(SCSI_IOCTL_GET_IDLUN). Since this
needs open(2), the search has to be accompanied by the locking of the tested
files. Those which do not match get released immediately. If all three files
are found and locked, it is guaranteed that any of them is free for usage.
If any of the three is not found, then the lock is not granted due to
a suspected collision between two locking contestants.

This cannot disturb a serious drive operation because such is allowed to
start only if all three paths are locked. Thus there would be no starting
point for a device-triple search at all.

Further precautions like open(O_EXCL) or fcntl(F_SETLK) on the device file
itself are allowed. Programs are asked politely to offer expert options to
disable them. In general a program is free to use a device in any way after
a lock has been obtained successfully.

----------------------------------------------------------------------------

All we need for this is a directory which is present on any Linux system and is
supposed to offer rwx-permissions to anybody who is allowed to access the
devices.

As an application programmer i would propose /tmp/ and some file name prefix.
It would work, after all. It would be covered by FHS specs except the fact that
/var/lock is the paragraph which matches our problem more specifically
- and fails to solve it.

To perform the sketched algorithm in /var/lock would violate FHS. The often
restrictive permission settings of /var/lock would also make necessary
an additional rule: A missing lock file which cannot be created allows to use
the device as if a lock had been granted. (Provident sysadmins would then once
create the lock files in /var/lock/ and allow rw-permission for all intended
users.)


'''This is where we should ask the broad Linux public for opinions and advise.'''
We are not much in a hurry and therefore should ponder duely over any aspect.
See the detailed specification and declaration of failure at
http://libburnia.pykix.org/browser/libburn/trunk/doc/ddlp.txt?format=txt

On Locking Schemes on Linux Device Drivers

Hello fellow application developer or maintainer,

recently we (cdrkit and libburnia developers) came accross increasing problems with reliable and safe device locking. This paper collects our ponderings after having received this advise from Alan Cox on LKML: http://lkml.org/lkml/2007/3/31/175 and having sincerely attempted to solve the problem in user space.

Introduction

Our concern is the influence of even read-only operations on optical media drives (recorders) during their duty as recorders -- depending on the device model such interference can spoil the process of recording, eventually wasting the medium.

Since many programs already act on such devices we see the need for reliable communication in order to allow proper device locking if good will for cooperation is present.

But in short: Good will seems not to be enough. We failed to find a viable method for the nexessary coordination of the participants.

State of the practice

There are various locking techniques used in other areas which are more or less applicable in our case.

Path/Inode based locking mechanisms

In general, these mechanisms are not optimally appropriate for our purpose. They use the filename or inode as identity. In our case this imposes problems: but they lack on two places which make then not reliable when used alone:

  • they do not cope with multiple device files which imply the access to the same driver through different files
  • they do not automatically cope with multiple device drivers accessible through different co-existing user space interfaces, like with sg vs. sr drivers.

We evaluated:

  • Lock files associated with target file

    Principle: an additional file is created during the action on the real target file. See http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES Pros:

    • regular filesystem operation, no additional infrastructure required
    Cons:
    • The location and name of the lock file need to be known and discussed upfront among all application developers, or be documented excessively
    • Permission problems may disallow the creation of lock files (security issues), especially for self-compiled applications and having no root permissions to install them in a required way
    • Special precautions are necessary against stale locks
  • fcntl(2) exclusive file locking Principle: lock applied on open file handles. Thus probably refering to an inode. See fcntl(2) for details. Pros:
    • POSIX
    Cons:
    • needs open(2) as precondition which has to be avoided on unlocked device files
    • locks can be released inadvertedly by submodules which just open and close the same file (inode ?).

Other locking mechanisms

  • O_EXCL locking Principle: passing of the O_EXCL flag to the open call of a device file. The device is locked exclusively for the calling PID, the lock is maintained in the device driver to the particular major/minor combination. Pros:
    • reliable advisory exclusive locking for a device within one device driver
    Cons:
    • for sr it requires kernel 2.6.x (x>=7 or so), with sg it might work on 2.4.

    • O_EXCL already has a meaning for software like libbklid and this is not the same as we would need.
  • System V Semaphores See man semget(2), semop(2) SEM_UNDO. They have been considered and rejected mainly because of too many potential device names which would need pre-allocated semaphore objects.


None of the mechanisms above solves the problem with the co-existing drivers for sr and sg, anyway.

Applicability on CD/(HD)DVD/BD drives

As explained in the introduction, the locking is important on optical media recording due to the delicate operation mode during the recording. Ideally, no other application should touch them. Even reading info from the drive can spoil the recording run. Currently we are aware of at least the following participants in drive collisions. They take differing precautions for this case, of which none is really able to prevent inadverted open(2) of a busy drive under all circumstances.

  • mount: the block device is mounted with the O_EXCL flag but the mount executable also uses libblkid which opens the devices without locking and reads magic data from it. (The problem is not with mutual exclusion of mount(8) and burn programs but with libblkid justifiably misunderstanding the meaning of our O_EXCL lock.)
  • hald (HAL daemon): frequently opens the block devices with O_EXCL flag.
  • wodim: opens the devices with O_EXCL flag. Opening /dev/sg is possible and happens more likely with versions prior to 1.1.4.
  • growisofs: opens the block devices with O_EXCL flag. Opening /dev/sg was never encouraged and does not work on kernel 2.4 (not tested yet on 2.6).
  • cdrskin (via libburn): opens the devices with O_EXCL flag. It uses only /dev/sr* exor /dev/hd* for serious operations on the drive. Operations on other path representations of the same device are restricted to open(2) O_RDONLY and to obtaining SCSI parameters host,channel,id,lun.
  • cdrecord: no locking. Author recommends to do it like Solaris does (which seems to do explicite locking, maintained internally on device driver or on major/minor pairs).

Any of the listed programs is currently able to spoil a recording run just by its proper operation if only the circumstances are unfortunate enough. This compilation is mostly heuristic and may be erroneous in details. Whatever, the problems and the users' disappointment are real.

Hopeless proposal of a locking algorithm

We developed in dialog with Ted T'so a proposal which would nearly fulfill the coordination needs of good willing programs. Nearly. But not sufficiently and with substantial effort.

We finally failed due to the coarseness of O_EXCL and the implementation of fcntl(F_SETLK) which is not really suitable for a modular software architecture.

See the detailed specification and declaration of failure at http://libburnia.pykix.org/browser/libburn/trunk/doc/ddlp.txt?format=txt