Do not use Edit(GUI) button.

?TableOfContents(4)

Copyright 2007, 2008 Osamu Aoki GPL, (Please agree to GPL, GPL2, and any version of GPL which is compatible with DSFG if you update any part of wiki page)

Generated HTML is at "[http://people.debian.org/~osamu/pub/getwiki/html/ch11.en.html Debian Reference: Chapter 11. Data management]".

I welcome your contributions to update this wiki page. You must follow these rules:

Data management

Sharing, copying, and archiving

The security of the data and its controlled sharing have several aspects:

These can be realized by using some combination of:

Archive and compression tools

Here is a summary of archive and compression tools available on the Debian system:

List of archive and compression tools.

1

2

3

package

popcon

size

command

comment

extension

tar

29915

-

tar(1)

the standard archiver (de facto standard)

.tar

cpio

15940

-

cpio(1)

Unix System V style archiver, use with find command

.cpio

binutils

15167

-

ar(1)

archiver for the creation of static libraries

.ar

fastjar

2307

-

fastjar(1)

archiver for Java (zip like)

.jar

pax

530

-

pax(1)

new POSIX standard archiver, compromise between tar and cpio

.pax

afio

308

-

afio(1)

extended cpio with per-file compression etc.

.afio

gzip

38002

-

gzip(1), zcat(1), ...

GNU [http://en.wikipedia.org/wiki/LZ77_and_LZ78 LZ77] compression utility (de facto standard)

.gz

bzip2

25807

-

bzip2(1), bzcat(1), ...

[http://en.wikipedia.org/wiki/Burrows-Wheeler_transform Burrows-Wheeler block-sorting compression] utility with higher compression ratio than gzip(1) (slower than gzip with similar syntax)

.bz2

lzma

-

-

lzma(1)

[http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm LZMA] compression utility with higher compression ratio than gzip(1) (slower than gzip with similar syntax)

.lzma

p7zip

-

-

7zr(1), p7zip(1)

[http://en.wikipedia.org/wiki/7-Zip 7-Zip] file archiver with high compression ratio ([http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm LZMA] compression)

.7z

p7zip-full

-

-

7z(1), 7za(1)

[http://en.wikipedia.org/wiki/7-Zip 7-Zip] file archiver with high compression ratio ([http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm LZMA] compression and others)

.7z

lzop

-

-

lzop(1)

[http://en.wikipedia.org/wiki/Lempel-Ziv-Oberhumer LZO] compression utility with higher compression and decompression speed than gzip(1) (lower compression ratio than gzip with similar syntax)

.lzo

zip

-

-

zip(1)

[http://en.wikipedia.org/wiki/Info-ZIP InfoZIP]: DOS archive and compression tool

.zip

unzip

-

-

unzip(1)

[http://en.wikipedia.org/wiki/Info-ZIP InfoZIP]: DOS unarchive and decompression tool

.zip

/!\ Do not set the "$TAPE" variable unless you know what to expect. It will change tar(1) behavior.

(!) The gzipped .tar archive sometimes uses the file extension .tgz.

(!) The cp, scp and tar may have some limitation for special files. The cpio and afio are most versatile.

(!) The cpio and afio commands are designed to be used with the find and other commands and suitable for creating backup scripts since the file selection part of the script can be tested independently.

(!) afio compresses each file in the archive. This makes afio to be much safer for the file corruption than the globally compressed tar or cpio archives and to be the best archive engine for the backup script.

(!) Internal structure of OpenOffice data files are .jar file.

Copy and synchronization tools

Here is a summary of simple copy and backup tools available on the Debian system:

List of copy and synchronization tools.

1

2

3

package

popcon

size

tool

function

coreutils

37945

-

GNU cp

Locally copy files and directories ("-a" for recursive).

openssh-client

29037

-

scp

Remotely copy files and directories (client). "-r" for recursive.

openssh-server

22918

-

sshd

Remotely copy files and directories (remote server).

rsync

6383

Rsync

-

1-way remote synchronization and backup.

unison

634

Unison

-

2-way remote synchronization and backup.

pdumpfs

51

pdumpfs

-

Daily local backup using hardlinks, similar to Plan9's dumpfs.

{i} Execution of the bkup script mentioned in @{@acopyscriptforthedatabackup@}@ with the "-gl" option under cron(8) should provide very similar functionality as pdumpfs for the static data archive.

{i} Version control system (VCS) tools in @{@listofversioncontrolsystemtools@}@ can function as the multi-way copy and synchronization tools.

Idioms for the archive

Here are several ways to archive and unarchive the entire contents of the directory "/source".

With GNU tar:

$ tar cvzf archive.tar.gz /source
$ tar xvzf archive.tar.gz

With cpio:

$ find /source -xdev -print0 | cpio -ov --null > archive.cpio; gzip archive.cpio
$ zcat archive.cpio.gz | cpio -i

With afio:

$ find /source -xdev -print0 | afio -ovZ0 archive.afio
$ afio -ivZ archive.afio

Idioms for the copy

Here are several ways to copy the entire contents of the directory

With GNU cp and openSSH scp:

# cp -a /source /dest
# scp -pr /source user@host.dom:/dest

With GNU tar:

# (cd /source && tar cf - . ) | (cd /dest && tar xvfp - )
# (cd /source && tar cf - . ) | ssh user@host.dom '(cd /dest && tar xvfp - )'

With cpio:

# cd /source; find . -print0 | cpio -pvdm --null --sparse /dest

With afio:

# cd /source; find . -print0 | afio -pv0a /dest

The scp command can even copy files between remote hosts:

# scp -pr user1@host1.dom:/source user2@host2.dom:/dest

Idioms for the selection of files

The find(1) command is used to select files for archive and copy commands (see @{@idiomsforthearchive@}@ and @{@idiomsforthecopy@}@) or for the xargs(1) command (see @{@repeatingacommandloopingoverfiles@}@). This can be enhanced by using its command arguments.

Basic syntax of find(1) can be summarized as:

This find(1) command is often used with an idiomatic style. For example:

# find /path/to \
    -xdev -regextype posix-extended \
    -type f -regex ".*\.afio|.*~" -prune -o \
    -type d -regex ".*/\.git" -prune -o \
    -type f -size +99M -prune -o \
    -type f -newer /path/to/timestamp -print0

This means to do following actions:

Please note the idiomatic use of "-prune -o" to exclude files in the above example.

(!) For non-Debian unix-like system, some options may not be supported for find(1). In such a case, please consider to adjust matching methods and replace "-print0" with "-print". You may need to adjust related commands too.

Backup and recovery

We all know that computers fail sometime or human errors cause system and data damages. Backup and recovery operations are the essential part of successful system administration. All possible failure modes will hit you some day.

There are 3 key factors which determine actual backup and recovery policy:

  1. Knowing what to backup and recover.
    • Data files directly created by you: data in "~/"

    • Data files created by applications used by you: data in "/var/" (except "/var/cache/", "/var/run/", and "/var/tmp/").

    • System configuration files: data in "/etc/"

    • Local softwares: data in "/usr/local/" or "/opt/"

    • System installation information: a memo in plain text on key steps (partition, ...).
    • Proven set of data: experimenting with recovery operations in advance.
  2. Knowing how to backup and recover.
    • Secure storage of data: protection from overwrite and system failure.
    • Frequent backup: scheduled backup.
    • Redundant backup: data mirroring.
    • Fool proof process: easy single command backup.
  3. Assessing risks and costs involved.
    • Failure mode and their possibility.
    • Value of data when lost.
    • Required resources for backup: human, hardware, software, ...

As for secure storage of data, data should be at least on different disk partitions preferably on different disks and machines to withstand the filesystem corruption. Important data are best stored on a write-once media such as CD/DVD-R to prevent overwrite accidents. (See @{@thebinarydata@}@ for how to write to the storage media from the shell commandline. Gnome desktop GUI environment gives you easy access via menu: "Places->CD/DVD Creator".)

(!) You may wish to stop some application daemons such as MTA (see @{@mta@}@) while backing up data.

(!) You should pay extra care to the backup and restoration of identity related data files such as "/etc/ssh/ssh_host_dsa_key", "/etc/ssh/ssh_host_rsa_key", "~/.gnupg/*", "~/.ssh/*", "/etc/passwd", "/etc/shadow", "/etc/fetchmailrc", "popularity-contest.conf", "/etc/ppp/pap-secrets", and "/etc/exim4/passwd.client". Some of these data can not be regenerated by entering the same input string to the system.

(!) If you run a cron job as a user process, you need to restart it after the system restoration. See @{@scheduletasksregularly@}@ for cron(8) and crontab(1).

Backup utility suites

Here is a select list of notable backup utility suites available on the Debian system:

List of backup suite utilities.

1

2

3

package

popcon

size

description

rdiff-backup

-

-

remote incremental backup

backupninja

-

-

lightweight, extensible meta-backup system

mondo

-

-

[http://en.wikipedia.org/wiki/Mondo_Rescue Mondo Rescue]: disaster recovery backup suite

dump

-

-

4.4[http://en.wikipedia.org/wiki/Berkeley_Software_Distribution BSD] dump(8) and restore(8) for [http://en.wikipedia.org/wiki/Ext2 ext2]/[http://en.wikipedia.org/wiki/Ext3 ext3] filesystems

sbackup

-

-

Simple Backup Suite for Gnome desktop

keep

-

-

backup system for KDE

bacula-common

-

-

[http://en.wikipedia.org/wiki/Bacula Bacula]: network backup, recovery and verification - common support files

bacula-client

-

-

[http://en.wikipedia.org/wiki/Bacula Bacula]: network backup, recovery and verification - client meta-package

bacula-console

-

-

[http://en.wikipedia.org/wiki/Bacula Bacula]: network backup, recovery and verification - text console

bacula-server

-

-

[http://en.wikipedia.org/wiki/Bacula Bacula]: network backup, recovery and verification - server meta-package

amanda-common

-

-

[http://en.wikipedia.org/wiki/Advanced_Maryland_Automatic_Network_Disk_Archiver Amanda]: Advanced Maryland Automatic Network Disk Archiver (Libs)

amanda-client

-

-

[http://en.wikipedia.org/wiki/Advanced_Maryland_Automatic_Network_Disk_Archiver Amanda]: Advanced Maryland Automatic Network Disk Archiver (Client)

amanda-server

-

-

[http://en.wikipedia.org/wiki/Advanced_Maryland_Automatic_Network_Disk_Archiver Amanda]: Advanced Maryland Automatic Network Disk Archiver (Server)

cdrw-taper

-

-

taper replacement for [http://en.wikipedia.org/wiki/Advanced_Maryland_Automatic_Network_Disk_Archiver Amanda] to support backups to CD-RW or DVD+RW

backuppc

-

-

[http://en.wikipedia.org/wiki/Backuppc BackupPC] is a high-performance, enterprise-grade system for backing up PCs (disk based)

backup-manager

-

-

command-line backup tool

backup2l

-

-

low-maintenance backup/restore tool for mountable media (disk based)

faubackup

-

-

backup system using a filesystem for storage (disk based)

[http://en.wikipedia.org/wiki/Mondo_Rescue Mondo Rescue] facilitates restoration of complete system from backup CD/DVD etc. without going through normal system installation processes.

The dump package enables backup and restore of filesystems themselves with feature for incremental archiving and facilitates restoration of complete system too (see [http://dump.sourceforge.net/isdumpdeprecated.html "Is dump really deprecated?"]).

The sbackup and keep packages provide easy GUI access to regular backups of user data for desktop users. An equivalent function can be realized by a simple script (@{@anexamplescriptforthesystembackup@}@) and cron(8).

[http://en.wikipedia.org/wiki/Bacula Bacula], [http://en.wikipedia.org/wiki/Advanced_Maryland_Automatic_Network_Disk_Archiver Amanda], and [http://en.wikipedia.org/wiki/Backuppc BackupPC] are full featured backup suite utilities which are focused on regular backups over network.

An example script for the system backup

For a personal Debian desktop system running unstable suite, I only need to protect personal and critical data. I reinstall system once a year anyway. Thus I see no reason to backup the whole system or to install a full featured backup utility.

I use a simple script to make a backup archive and burn it into CD/DVD using GUI. Here is an example script for this.

# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
BUUID=1000; USER=osamu # UID and name of a user who accesses backup files
BUDIR="/var/backups"
XDIR0=".+/Mail|.+/Desktop"
XDIR1=".+/\.thumbnails|.+/\.?Trash|.+/\.?[cC]ache|.+/\.gvfs|.+/sessions"
XDIR2=".+/CVS|.+/\.git|.+/\.svn|.+/Downloads|.+/Archive|.+/Checkout|.+/tmp"
XSFX=".+\.iso|.+\.tgz|.+\.tar\.gz|.+\.tar\.bz2|.+\.afio|.+\.tmp|.+\.swp|.+~"
SIZE="+99M"
DATE=$(date --utc +"%Y%m%d-%H%M")
[ -d "$BUDIR" ] || mkdir -p "BUDIR"
umask 077
dpkg --get-selections \* > /var/lib/dpkg/dpkg-selections.list
debconf-get-selections > /var/cache/debconf/debconf-selections

{
find /etc /usr/local /opt /var/lib/dpkg/dpkg-selections.list \
     /var/cache/debconf/debconf-selections -xdev -print0
find /home/$USER /root -xdev -regextype posix-extended \
  -type d -regex "$XDIR0|$XDIR1" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
find /home/$USER/Mail/Inbox /home/$USER/Mail/Outbox -print0
find /home/$USER/Desktop  -xdev -regextype posix-extended \
  -type d -regex "$XDIR2" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
} | cpio -ov --null -O $BUDIR/BU$DATE.cpio
chown $BUUID $BUDIR/BU$DATE.cpio
touch $BUDIR/backup.stamp

This is meant to be a script example executed from root:

{i} You can recover debconf configuration data with "debconf-set-selections debconf-selections" and dpkg selection data with "dpkg --set-selection <dpkg-selections.list".

A copy script for the data backup

For the set of data under a directory tree, the copy with "cp -a" provides the normal backup.

For the set of large non-overwritten static data under a directory tree such as the data under the "/var/cache/apt/packages/" directory, hardlinks with "cp -al" provide an alternative to the normal backup with efficient use of the disk space.

Here is a copy script, which I named as bkup, for the data backup. This script copies all (non-VCS) files under the current directory to the dated directory on the parent directory or on a remote host.

# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
function fdot(){ find . -type d \( -iname ".?*" -o -iname "CVS" \) -prune -o -print0;}
function fall(){ find . -print0;}
function mkdircd(){ mkdir -p "$1";chmod 700 "$1";cd "$1">/dev/null;}
FIND="fdot";OPT="-a";MODE="CPIOP";HOST="localhost";EXTP="$(hostname -f)"
BKUP="$(basename $(pwd)).bkup";TIME="$(date  +%Y%m%d-%H%M%S)";BU="$BKUP/$TIME"
while getopts gcCsStrlLaAxe:h:T f; do case $f in
g)  MODE="GNUCP";; # cp (GNU)
c)  MODE="CPIOP";; # cpio -p
C)  MODE="CPIOI";; # cpio -i
s)  MODE="CPIOSSH";; # cpio/ssh
S)  MODE="AFIOSSH";; # afio/ssh
t)  MODE="TARSSH";; # tar/ssh
r)  MODE="RSYNCSSH";; # rsync/ssh
l)  OPT="-alv";; # hardlink (GNU cp)
L)  OPT="-av";;  # copy (GNU cp)
a)  FIND="fall";; # find all
A)  FIND="fdot";; # find non CVS/ .???/
x)  set -x;; # trace
e)  EXTP="${OPTARG}";; # hostname -f
h)  HOST="${OPTARG}";; # user@remotehost.example.com
T)  MODE="TEST";; # test find mode
\?) echo "use -x for trace."
esac; done
shift $(expr $OPTIND - 1)
if [ $# -gt 0 ]; then
  for x in $@; do cp $OPT $x $x.$TIME; done
elif [ $MODE = GNUCP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU";cp $OPT . "../$BU/"
elif [ $MODE = CPIOP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU"
  $FIND|cpio --null --sparse -pvd ../$BU
elif [ $MODE = CPIOI ]; then
  $FIND|cpio -ov --null | ( mkdircd "../$BU"&&cpio -i )
elif [ $MODE = CPIOSSH ]; then
  $FIND|cpio -ov --null|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&&cpio -i )"
elif [ $MODE = AFIOSSH ]; then
  $FIND|afio -ov -0 -|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&&afio -i - )"
elif [ $MODE = TARSSH ]; then
  (tar cvf - . )|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&& tar xvfp - )"
elif [ $MODE = RSYNCSSH ]; then
  rsync -rlpt ./ "${HOST}:${EXTP}-${BKUP}-${TIME}"
else
  echo "Any other idea to backup?"
  $FIND |xargs -0 -n 1 echo
fi

This is meant to be command examples. Please read script and test it by yourself.

{i} I keep this bkup in my "/usr/local/bin/" directory. I issue bkup command without any option in the working directory whenever I need a temporary snapshot backup.

{i} For making snapshot history of a source file tree or a configuration file tree, it is easier and space efficient to use git(7) (see @{@gitforrecordingcigurationhistory@}@).

Removable mass storage device

Removable mass storage devices may be any one of

which are connected via [http://en.wikipedia.org/wiki/Universal_Serial_Bus USB], [http://en.wikipedia.org/wiki/IEEE_1394_interface IEEE 1394 / Firewire], [http://en.wikipedia.org/wiki/PC_card PC Card], etc.

These removable mass storage devices can be automatically mounted as a user under modern desktop environment, such as Gnome using gnome-mount(1).

(!) Automounting under modern desktop environment happens only when those removable media devices are not listed in "/etc/fstab".

{i} When providing wrong mount option causes problem, erase its corresponding setting under "/system/storage/" via gconf-editor(1).

List of packages which permit normal users to mount removable devices without a matching "/etc/fstab" entry.

1

2

3

package

popcon

size

description

gnome-mount

-

-

wrapper for (un)mounting and ejecting storage devices (used by Gnome)

pmount

-

-

mount removable devices as normal user (used by KDE)

cryptmount

-

-

Management and user-mode mounting of encrypted file systems

usbmount

-

-

automatically mount and unmount USB mass storage devices

When sharing data with other system via removable mass storage device, you should format it with common [http://en.wikipedia.org/wiki/File_system filesystem] supported by both systems. Here is a list of filesystem choices.

List of filesystem choices for removable storage devices with typical usage scenarios.

filesystem

typical usage scenario

[http://en.wikipedia.org/wiki/File_allocation_table FAT12]

Cross platform sharing of data on the floppy disk. (<=32MiB)

[http://en.wikipedia.org/wiki/File_allocation_table FAT16]

Cross platform sharing of data on the small harddisk like device. (<=2GiB)

[http://en.wikipedia.org/wiki/File_allocation_table FAT32]

Cross platform sharing of data on the large harddisk like device. (<=8TiB, supported by newer than MS Windows95 OSR2)

[http://en.wikipedia.org/wiki/NTFS NTFS]

Cross platform sharing of data on the large harddisk like device. (supported natively on [http://en.wikipedia.org/wiki/Windows_NT MS Windows NT] and later version, and supported by [http://en.wikipedia.org/wiki/NTFS-3G NTFS-3G] via [http://en.wikipedia.org/wiki/Filesystem_in_Userspace FUSE] on Linux)

[http://en.wikipedia.org/wiki/ISO_9660 ISO9660]

Cross platform sharing of static data on CD-R and DVD+/-R

[http://en.wikipedia.org/wiki/Universal_Disk_Format UDF]

Incremental data writing on CD-R and DVD+/-R (new)

[http://en.wikipedia.org/wiki/Minix_file_system MINIX filesystem]

Space efficient unix file data storage on the floppy disk.

[http://en.wikipedia.org/wiki/Ext2 ext2 filesystem]

Sharing of data on the harddisk like device with older Linux systems.

[http://en.wikipedia.org/wiki/Ext3 ext3 filesystem]

Sharing of data on the harddisk like device with current Linux systems. (Journaling file system)

{i} See @{@removablediskencnwithdmcryptluks@}@ for cross platform sharing of data using device level encryption.

The FAT filesystem is supported by almost all modern operating systems and is quite useful for the data exchange purpose via removable harddisk like media (.

When formatting removable harddisk like devices for cross platform sharing of data with the FAT filesystem, the following should be safe choices:

When using the FAT or ISO9660 filesystems for sharing data, the following should be the safe considerations:

(!) For FAT filesystems by its design, the maximum file size is (2^32 - 1) bytes = (4GiB - 1 byte). For some applications on the older 32 bit OSs, the maximum file size was even smaller (2^31 - 1) bytes = (42GiB - 1 byte). Debian does not suffer the latter problem.

(!) Microsoft itself does not recommend to use FAT for drives or partitions of over 200 MB. Microsoft highlights its short comings such as inefficient disk space usage in their "[http://support.microsoft.com/kb/100108/EN-US/ Overview of FAT, HPFS, and NTFS File Systems]". Of course for the Linux, we should normally use the ext3 filesystem.

{i} For more on filesystems and accessing filesystems, please read "[http://tldp.org/HOWTO/Filesystems-HOWTO.html Filesystems HOWTO]".

Sharing data via network

When sharing data with other system via network, you should use common service. Here are some hints.

List of the network service to chose with the typical usage scenario.

network service

typical usage scenario

[http://en.wikipedia.org/wiki/Server_Message_Block SMB/CIFS] network mounted filesystem with [http://en.wikipedia.org/wiki/Samba_(software) Samba]

Sharing files via "Microsoft Windows Network". See smb.conf(5) and [http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/ The Official Samba 3.2.x HOWTO and Reference Guide] or the samba-doc package.

[http://en.wikipedia.org/wiki/Network_File_System_(protocol) NFS] network mounted filesystem with the Linux kernel

Sharing files via "Unix/Linux Network". See exports(5) and [http://tldp.org/HOWTO/NFS-HOWTO/index.html Linux NFS-HOWTO].

[http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol HTTP] service

Sharing file between the web server/client.

[http://en.wikipedia.org/wiki/Https HTTPS] service

Sharing file between the web server/client with encrypted Secure Sockets Layer (SSL) or [http://en.wikipedia.org/wiki/Transport_Layer_Security Transport Layer Security] (TLS).

[http://en.wikipedia.org/wiki/File_Transfer_Protocol FTP] service

Sharing file between the FTP server/client.

Although these filesystems mounted over network or file transfer methods over network are quite convenient for sharing data, these may be insecure. Their network connection must be secured by:

See also @{@othernetworkapplicationservers@}@ and @{@othernetworkapplicationclients@}@.

Archive media

When choosing [http://en.wikipedia.org/wiki/Computer_data_storage computer data storage media] for important data archive, you should be careful about their limitations. For small personal data back up, I use CD-R and DVD-R by the brand name company and store in a cool, dry, clean environment. (Tape archive media seem to be popular for professional use.)

(!) [http://en.wikipedia.org/wiki/Safe A fire-resistant safe] are usually meant for paper documents. Most of the computer data storage media have less temperature tolerance than paper. I usually rely on multiple secure encrypted copies stored in multiple secure locations.

Optimistic storage life of archive media seen on the net (mostly from vendor info):

These do not count on the mechanical failures due to handling etc.

Optimistic write cycle of archive media seen on the net (mostly from vendor info):

<!> Figures of storage life and write cycle here should not be used for decisions on any critical data storage. Please consult the specific product information provided by the manufacture.

{i} Since CD/DVD-R and paper have only 1 write cycle, they inherently prevent accidental data loss by overwriting. This is advantage!

{i} If you need fast and frequent backup of large amount of data, a harddisk on a remote host linked by a fast network connection, may be the only realistic option.

The binary data

Here, we discuss direct manipulation of the binary data on storage media. See @{@datastoragetips@}@, too.

Make the disk image file

The disk image file, disk.img, of an unmounted device, e.g., the second SCSI drive "/dev/sdb", can be made using cp(1) or dd(1):

# cp /dev/sda disk.img
# dd if=/dev/sda of=disk.img

The disk image of the traditional PC's [http://en.wikipedia.org/wiki/Master_boot_record master boot record (MBR)] (see @{@partitionconfiguration@}@) which reside on the first sector on the primary IDE disk partial disk can be made by using dd(1):

# dd if=/dev/hda of=mbr.img bs=512 count=1
# dd if=/dev/hda of=mbr-nopart.img bs=446 count=1
# dd if=/dev/hda of=mbr-part.img skip=446 bs=1 count=66

If you have a SCSI device (including the new serial ATA drive) as the boot disk, substitute "/dev/hda" with "/dev/sda".

If you are making an image of a disk partition of the original disk, substitute "/dev/hda" with "/dev/hda1" etc.

Writing directly to the disk

The disk image file, "disk.img" can be written to an unmounted device, e.g., the second SCSI drive "/dev/sdb" with matching size, by dd(1):

# dd if=disk.img of=/dev/sda

Similarly, the disk partition image file, "disk.img" can be written to an unmounted partition, e.g., the first partition of the second SCSI drive "/dev/sdb1" with matching size, by dd(1):

# dd if=disk.img of=/dev/sda1

View and edit binary data

The most basic viewing method of binary data is to use "od -t x1" command.

List of packages which view and edit binary data.

1

2

3

package

popcon

size

description

coreutils

-

-

This basic package has od(1) command to dump files in octal and other formats.

bsdmainutils

-

-

This utility package has hd(1) command to dump files in ASCII, decimal, hexadecimal, and octal formats.

hexedit

-

-

View and edit files in hexadecimal or in ASCII

bless

-

-

Full featured hexadecimal editor (Gnome)

khexedit

-

-

Full featured hexadecimal editor (KDE).

ncurses-hexedit

-

-

Edit files/disks in HEX, ASCII and EBCDIC

lde

-

-

Linux Disk Editor

beav

-

-

Binary editor and viewer for HEX, ASCII, EBCDIC, OCTAL, DECIMAL, and BINARY formats.

hexcat

-

-

Hexadecimal dumping utility

hex

-

-

Hexadecimal dumping tool for Japanese

{i} HEX is used as an acronym for hexadecimal format.

Mount the disk image file

If disk.img contains an image of the disk contents and the original disk had a disk configuration which gives xxxx = (bytes/sector) * (sectors/cylinder), then the following will mount it to "/mnt":

# mount -o loop,offset=xxxx disk.img /mnt

Note that most hard disks have 512 bytes/sector. This offset is to skip MBR of the hard disk. You can skip offset in the above example, if "disk.img" contains

Manipulating files without mounting disk

There are tools to write files without mounting disk.

List of packages to manipulate files without mounting.

1

2

3

package

popcon

size

description

mtools

-

-

Utilities for MSDOS files without mounting them.

hfsutils

-

-

Utilities for HFS and HFS+ files without mounting them.

Make the ISO9660 image file

The [http://en.wikipedia.org/wiki/ISO_9660 ISO9660] image file, cd.iso, from the source directory tree at source_directory can be made using genisoimage(1) command:

#  genisoimage -r -J -T -V volume_id -o cd.iso source_directory

Similary, the bootable ISO9660 image file, cdboot.iso, can be made from debian-installer like directory tree at source_directory:

#  genisoimage -r -o cdboot.iso -V volume_id \
   -b isolinux/isolinux.bin -c isolinux/boot.cat \
   -no-emul-boot -boot-load-size 4 -boot-info-table source_directory

Here [http://en.wikipedia.org/wiki/SYSLINUX Isolinux boot loader] (see @{@stagecthebootloader@}@) is used for booting.

To make the disk image directly from the CD-ROM device using cp(1) or dd(1) has a few problems. The first run of the dd(1) command may cause an error message and may yield a shorter disk image with a lost tail-end. The second run of the dd(1) command may yield a larger disk image with garbage data attached at the end on some systems if the data size is not specified. Only the second run of the dd(1) command with the correct data size specified, and without ejecting the CD after an error message, seems to avoid these problems. If for example the image size displayed by df(1) is 46301184 blocks, use the following command twice to get the right image (this is my empirical information):

# dd if=/dev/cdrom of=cd.iso bs=2048 count=$((46301184/2))

Writing directly to the CD/DVD-R/RW

{i} DVD is only a large CD to wodim(1).

You can find a usable device by:

# wodim --devices

Then the blank CD-R is inserted to the device, and the ISO9660 image file, "cd.iso" is written to this device, e.g., "/dev/hda", by wodim(1):

# wodim -v -eject dev=/dev/hda cd.iso

If CD-RW is used instead of CD-R, do this instead:

# wodim -v -eject blank=fast dev=/dev/hda cd.iso

{i} If your desktop system mounts CD automatically, unmount it before issuing the wodim(1) command by "sudo unmount /dev/hda".

Mount the ISO9660 image file

If "cd.iso" contains an ISO9660 image, then the following will manually mount it to "/cdrom":

# mount -t iso9660 -o ro,loop cd.iso /cdrom

{i} Modern desktop system mounts removable media automatically (see @{@removablemassstoragedevice@}@).

Split a large file into small files

When a data is too big to backup, you can back up a large file into, e.g. 2000MiB chunks and merge those files into a large file.

$ split -b 2000m large_file
$ cat x* >large_file

<!> Please make sure you do not have any file starting with "x" to avoid the file name crash.

Clear file contents

In order to clear the contents of a file such as a log file, do not use rm to delete the file and then create a new empty file, because the file may still be accessed in the interval between commands. The following is the safe way to clear the contents of the file.

$ :>file_to_be_cleared

Dummy files

The following commands will create dummy or empty files:

$ dd if=/dev/zero    of=5kb.file bs=1k count=5
$ dd if=/dev/urandom of=7mb.file bs=1M count=7
$ touch zero.file
$ : > alwayszero.file

Erase entire harddisk

There are several ways to completely erase data from an entire harddisk-like device, e.g., USB memory stick at "/dev/sda".

<!> Check your USB memory stick location with the "mount" command first before executing commands here. The device pointed by "/dev/sda" may be SCSI harddisk or serial-ATA harddisk where your entire system resides.

dd if=/dev/zero of=/dev/sda

# dd if=/dev/urandom of=/dev/sda

# shred -v -n 1 /dev/sda

Since the dd command is available from the shell of many bootable Linux CDs such as Debian installer CD, you can erase your installed system completely by running an erase command from such media on the system hard disk, e.g., "/dev/hda", "/dev/sda", etc.

Undelete deleted but still open file

Even if you have accidentally deleted a file, as long as that file is still being used by some application (read or write mode), it is possible to recover such a file.

$ echo foo > bar
$ less bar

$ ps aux | grep ' less[ ]'
bozo    4775  0.0  0.0  92200   884 pts/8    S+   00:18   0:00 less bar
$ rm bar
$ ls -l /proc/4775/fd | grep bar
lr-x------ 1 bozo bozo 64 2008-05-09 00:19 4 -> /home/bozo/bar (deleted)
$ cat /proc/4775/fd/4 >bar
$ ls -l
-rw-r--r-- 1 bozo bozo 4 2008-05-09 00:25 bar
$ cat bar
foo

$ ls -li bar
2228329 -rw-r--r-- 1 bozo bozo 4 2008-05-11 11:02 bar
$ lsof |grep bar|grep less
less 4775 bozo 4r REG 8,3 4 2228329 /home/bozo/bar
$ rm bar
$ lsof |grep bar|grep less
less 4775 bozo 4r REG 8,3 4 2228329 /home/bozo/bar (deleted)
$ cat /proc/4775/fd/4 >bar
$ ls -li bar
2228302 -rw-r--r-- 1 bozo bozo 4 2008-05-11 11:05 bar
$ cat bar
foo

Files with hardlinks can be identified by "ls -li", e.g.:

$ ls -li
total 0
2738405 -rw-r--r-- 1 root root 0 2008-09-15 20:21 bar
2738404 -rw-r--r-- 2 root root 0 2008-09-15 20:21 baz
2738404 -rw-r--r-- 2 root root 0 2008-09-15 20:21 foo

Both "baz" and "foo" have link count of "2" (>1) showing them to have hardlinks. Their inode numbers are common "2738404". This means they are the same hardlinked file. If you do not happen to find all hardlinked files by chance, you can search it by the inode, e.g., "2738404":

# find /path/to/mount/point -xdev -inum 2738404 

Invisible disk space consumption

All deleted but open files consumes disk space although they are not visible from normal du(1). They can be listed with their size by:

# lsof -s -X / |grep deleted

Data security infrastructure

The data security infrastructure is provided by the combination of data encryption tool, message digest tool, and signature tool.

List of data security infrastructure tools.

1

2

3

package

popcon

size

function

gnupg

-

-

[http://en.wikipedia.org/wiki/GNU_Privacy_Guard GNU privacy guard] - OpenPGP encryption and signing tool. gpg(1)

gnupg-doc

-

-

GNU Privacy Guard documentation

gpgv

-

-

GNU privacy guard - signature verification tool

cryptsetup

-

-

Utilities for [http://en.wikipedia.org/wiki/Dm-crypt dm-crypto] block device encryption supporting [http://en.wikipedia.org/wiki/Linux_Unified_Key_Setup LUKS]

ecryptfs-utils

-

-

Utilities for [http://ecryptfs.sourceforge.net/ ecryptfs] stacked filesystem encryption

coreutils

-

-

The md5sum command computes and checks MD5 message digest

coreutils

-

-

The sha1sum command computes and checks SHA1 message digest

openssl

-

-

The "openssl dgst" command computes message digest (OpenSSL). dgst(1ssl)

See @{@dataencryptiontips@}@ on [http://en.wikipedia.org/wiki/Dm-crypt dm-crypto] and [http://ecryptfs.sourceforge.net/ ecryptfs] which implement automatic data encryption infrastructure via Linux kernel modules.

Key management for Gnupg

Here are [http://en.wikipedia.org/wiki/GNU_Privacy_Guard GNU Privacy Guard] commands for the basic key management:

List of GNU Privacy Guard commands for the key management

command

effects

gpg --gen-key

generate a new key

gpg --gen-revoke my_user_ID

generate revoke key for my_user_ID

gpg --edit-key user_ID

"help" for help, interactive

gpg -o file --exports

export all keys to file

gpg --imports file

import all keys from file

gpg --send-keys user_ID

send key of user_ID to keyserver

gpg --recv-keys user_ID

recv. key of user_ID from keyserver

gpg --list-keys user_ID

list keys of user_ID

gpg --list-sigs user_ID

list sig. of user_ID

gpg --check-sigs user_ID

check sig. of user_ID

gpg --fingerprint user_ID

check fingerprint of "user_ID"

gpg --refresh-keys

update local keyring

Here is the meaning of trust code:

List of the meaning of trust code.

code

trust

-

No owner trust assigned / not yet calculated.

e

Trust calculation has failed.

q

Not enough information for calculation.

n

Never trust this key.

m

Marginally trusted.

f

Fully trusted.

u

Ultimately trusted.

The following will upload my key "A8061F32" to the popular keyserver "hkp://subkeys.pgp.net":

$ gpg --keyserver hkp://subkeys.pgp.net --send-keys A8061F32

A good default keyserver set up in "~/.gnupg/gpg.conf" (or old location "~/.gnupg/options") contains:

keyserver hkp://subkeys.pgp.net

The following will obtain unknown keys from the keyserver:

$ gpg --list-sigs | \
  sed -n '/^sig.*\[User ID not found\]/s/^sig..........\(\w\w*\)\W.*/\1/p' |\
  sort | uniq | xargs gpg --recv-keys

There was a bug in [http://sourceforge.net/projects/pks/ OpenPGP Public Key Server] (pre version 0.9.6) which corrupted key with more than 2 sub-keys. The newer gnupg (>1.2.1-2) can handle these corrupted subkeys. See gpg(1) manpage under --repair-pks-subkey-bug option.

Using GnuPG with files

File handling:

List of gnu privacy guard commands on files

command

effects

gpg -a -s file

sign file into ascii armored file.asc

gpg --armor --sign file

, ,

gpg --clearsign file

clear-sign message

gpg --clearsign --not-dash-escaped patchfile

clear-sign patchfile

gpg --verify file

verify clear-signed file

gpg -o file.sig -b file

create detached signature

gpg -o file.sig --detach-sig file

, ,

gpg --verify file.sig file

verify file with file.sig

gpg -o crypt_file.gpg -r name -e file

public-key encryption intended for name from file to binary crypt_file.gpg

gpg -o crypt_file.gpg --recipient name --encrypt file

, ,

gpg -o crypt_file.asc -a -r name -e file

public-key encryption intended for name from file to ASCII armored crypt_file.asc

gpg -o crypt_file.gpg -c file

symmetric encryption from file to crypt_file.gpg

gpg -o crypt_file.gpg --symmetric file

, ,

gpg -o crypt_file.asc -a -c file

symmetric encryption intended for name from file to ASCII armored crypt_file.asc

gpg -o file -d crypt_file.gpg -r name

decryption

gpg -o file --decrypt crypt_file.gpg

, ,

Using GnuPG with Mutt

Add the following to ~/.muttrc to keep a slow GnuPG from automatically starting, while allowing it to be used by typing "S" at the index menu.

macro index S ":toggle pgp_verify_sig\n"
set pgp_verify_sig=no

Using GnuPG with Vim

The gnupg plugin let you run GnuPG transparently for files with extension .gpg, .asc, and .ppg.

# aptitude install vim-scripts vim-addon-manager
$ vim-addons install gnupg

The MD5 sum

The md5sum program provides utility to make a digest file using the method in [http://tools.ietf.org/html/rfc1321 rfc1321] and verifying each file with it.

$ md5sum foo bar >baz.md5
$ cat baz.md5
d3b07384d113edec49eaa6238ad5ff00  foo
c157a79031e1c40f85931829bc5fc552  bar
$ md5sum -c baz.md5
foo: OK
bar: OK

(!) The computation for the MD5 sum is less CPU intensive than the one for the cryptographic signature by the Gnupg. Usually, only the top level digest file is cryptographically signed to ensure data integrity.

Source code merge tools

There are many merge tools for the source code. Following commands caught my eyes.:

List of source code merge tools.

2

3

4

command

package

popcon

size

description

diff(1)

diff

37745

-

This compares files line by line.

diff3(1)

diff

37745

-

This compares and merges three files line by line.

vimdiff(1)

vim

15655

-

This compares 2 files side by side in vim.

patch(1)

patch

8068

-

This applies a diff file to an original.

dpatch(1)

dpatch

1446

-

This manage series of patches for Debian package.

diffstat(1)

diffstat

1008

-

This produces a histogram of changes by the diff.

combinediff(1)

patchutils

759

-

This creates a cumulative patch from two incremental patches.

dehtmldiff(1)

patchutils

x

-

This extracts a diff from an HTML page.

filterdiff(1)

patchutils

x

-

This extracts or excludes diffs from a diff file.

fixcvsdiff(1)

patchutils

x

-

This fixes diff files created by CVS that "patch" mis-interprets.

flipdiff(1)

patchutils

x

-

This exchanges the order of two patches.

grepdiff(1)

patchutils

x

-

This shows which files are modified by a patch matching a regex.

interdiff(1)

patchutils

x

-

This shows differences between two unified diff files.

lsdiff(1)

patchutils

x

-

This shows which files are modified by a patch.

recountdiff(1)

patchutils

x

-

This recomputes counts and offsets in unified context diffs.

rediff(1)

patchutils

x

-

This fixes offsets and counts of a hand-edited diff.

splitdiff(1)

patchutils

x

-

This separates out incremental patches.

unwrapdiff(1)

patchutils

x

-

This demangles patches that have been word-wrapped.

wiggle(1)

wiggle

451

-

This applies rejected patches.

quilt(1)

quilt

430

-

This manage series of patches.

meld(1)

meld

256

-

This is a GTK graphical file comparator and merge tool.

xxdiff(1)

xxdiff

182

-

This is a plain X graphical file comparator and merge tool.

dirdiff(1)

dirdiff

61

-

This displays and merges changes between directory trees.

docdiff(1)

docdiff

38

-

This compares two files word by word / char by char.

imediff2(1)

imediff2

24

-

This is an interactive full screen 2-way merge tool.

makepatch(1)

makepatch

20

-

This generates extended patch files.

applypatch(1)

makepatch

20

-

This applies extended patch files.

wdiff(1)

wdiff

16

-

This displays word differences between text files.

Extract differences for source files

Following one of these procedures will extract differences between two source files and create unified diff files file.patch0 or file.patch1 depending on the file location:

$ diff -u file.old file.new > file.patch0
$ diff -u old/file new/file > file.patch1

Merge updates for source files

The diff file (alternatively called patch file) is used to send a program update. The receiving party will apply this update to another file by:

$ patch -p0 file < file.patch0
$ patch -p1 file < file.patch1

3 way merge updates

If you have three versions of source code, you can merge them more effectively using diff3:

$ diff3 -m file.mine file.old file.yours > file

Version control systems

Here is a summary of the [http://en.wikipedia.org/wiki/Revision_control version control systems (VCS)] on the Debian system:

(!) If you are new to VCS systems, you should start learning with Git, which is growing fast in popularity.

List of version control system tools.

1

2

3

package

popcon

size

tool

VCS type

comment

cssc

7

-

[http://cssc.sourceforge.net/ CSSC]

local

Clone of the [http://en.wikipedia.org/wiki/Source_Code_Control_System Unix SCCS] (deprecated)

rcs

1658

-

[http://en.wikipedia.org/wiki/Revision_Control_System RCS]

local

"[http://en.wikipedia.org/wiki/Source_Code_Control_System Unix SCCS] done right"

cvs

4265

-

[http://en.wikipedia.org/wiki/Concurrent_Versions_System CVS]

remote

The previous standard remote VCS

subversion

5276

-

[http://en.wikipedia.org/wiki/Subversion_(software) Subversion]

remote

"CVS done right", the new de facto standard remote VCS

git-core

512

-

[http://en.wikipedia.org/wiki/Git_(software) Git]

distributed

fast DVCS in C (used by the Linux kernel and others)

mercurial

256

-

[http://en.wikipedia.org/wiki/Mercurial_(software) Mercurial]

distributed

DVCS in python and some C.

bzr

158

-

[http://en.wikipedia.org/wiki/Bazaar_(software) Bazaar]

distributed

DVCS influenced by tla written in python (used by [http://www.ubuntu.com/ Ubuntu])

darcs

-

-

[http://en.wikipedia.org/wiki/Darcs Darcs]

distributed

DVCS with smart algebra of patches (slow).

tla

-

-

[http://en.wikipedia.org/wiki/GNU_arch GNU arch]

distributed

DVCS mainly by Tom Lord. (Historic)

monotone

88

-

[http://en.wikipedia.org/wiki/Monotone_(software) Monotone]

distributed

DVCS in C++

VCS is sometimes known as revision control system (RCS), or software configuration management (SCM).

Distributed VCS such as Git is the tool of choice these days. CVS and Subversion may still be useful to join some existing open source program activities.

Debian provides free VCS services via [http://alioth.debian.org/ Debian Alioth service]. It supports practically all VCSs. Its documentation can be found at http://wiki.debian.org/Alioth .

<!> The git package is "GNU Interactive Tools" which is not the DVCS.

Native VCS commands

Here is an oversimplified comparison of native VCS commands to provide the big picture. The typical command sequence may require options and arguments.

Comparison of native VCS commands.

CVS

Subversion

Git

function

cvs init

svn create

git init

create the (local) repository

cvs login

-

-

login to the remote repository

cvs co

svn co

git clone

check out the remote repository as the working tree

cvs up

svn up

git pull

update the working tree by merging the remote repository

cvs add

svn add

git add .

add file(s) in the working tree to the VCS

cvs rm

svn rm

git rm

remove file(s) in working tree from the VCS

cvs ci

svn ci

-

commit changes to the remote repository

-

-

git commit -a

commit changes to the local repository

-

-

git push

update the remote repository by the local repository

cvs status

svn status

git status

display the working tree status from the VCS

cvs diff

svn diff

git diff

diff <reference_repository> <working_tree>

-

-

git repack -a -d; git prune

repack the local repository into single pack.

<!> Invoking a git subcommand as "git-xyzzy" from the command line has been deprecated since early 2006.

{i} Git can work directly with different VCS repositories such as ones provided by CVS and Subversion, and provides the local repository for local changes with the git-cvs and git-svn packages. See [http://www.kernel.org/pub/software/scm/git/docs/gitcvs-migration.html git for CVS users], [http://live.gnome.org/GitForGnomeDevelopers Git for GNOME developers] and @{@git@}@.

{i} Git has commands which have no equivalents in CVS and Subversion. "Fetch", "Rebase", "Cherrypick", ...

CVS

Check

for detailed information.

Installing a CVS server

The following setup will allow commits to the CVS repository only by a member of the "src" group, and administration of CVS only by a member of the "staff" group, thus reducing the chance of shooting oneself.

# cd /var/lib; umask 002; mkdir cvs
# export CVSROOT=/var/lib/cvs
# cd $CVSROOT
# chown root:src .
# chmod 2775 .
# cvs -d $CVSROOT init
# cd CVSROOT
# chown -R root:staff .
# chmod 2775 .
# touch val-tags
# chmod 664 history val-tags
# chown root:src history val-tags

You may restrict creation of new project by changing the owner of "$CVSROOT" directory to "root:staff and its permission to "3775".

Use local CVS server

The following will set up shell environments for the local access to the CVS repository:

$ export CVSROOT=/var/lib/cvs

Use remote CVS pserver

The following will set up shell environments for the read-only remote access to the CVS repository without SSH (use RSH protocol capability in cvs):

$ export CVSROOT=:pserver:account@cvs.foobar.com:/var/lib/cvs
$ cvs login

This is prone to eavesdropping attack.

Anonymous CVS (download only)

The following will set up shell environments for the read-only remote access to the CVS repository:

$ export CVSROOT=:pserver:anonymous@cvs.sf.net:/cvsroot/qref
$ cvs login
$ cvs -z3 co qref

Use remote CVS through ssh

The following will set up shell environments for the read-only remote access to the CVS repository with SSH:

$ export CVSROOT=:ext:account@cvs.foobar.com:/var/lib/cvs

or for ?SourceForge:

$ export CVSROOT=:ext:account@cvs.sf.net:/cvsroot/qref

You can also use public key authentication for SSH which eliminates the password prompt.

Create a new CVS archive

For,

Assumption for the CVS archive.

ITEM

VALUE

MEANING

source tree

~/project-x

All source codes

Project name

project-x

Name for this project

Vendor Tag

Main-branch

Tag for the entire branch

Release Tag

Release-initial

Tag for a specific release

Then,

$ cd ~/project-x

$ cvs import -m "Start project-x" project-x Main-branch Release-initial
$ cd ..; rm -R ~/project-x

Work with CVS

To work with project-x using the local CVS repository:

$ mkdir -p /path/to; cd /path/to
$ cvs co project-x

$ cd project-x

$ cvs diff -u

$ cvs up -C modified_file

$ cvs ci -m "Describe change"

$ vi newfile_added
$ cvs add newfile_added
$ cvs ci -m "Added newfile_added"
$ cvs up

$ cvs tag Release-1

$ cvs tag -d Release-1

$ cvs ci -m "more comments"
$ cvs tag Release-1

* re-add release tag

$ cd /path/to
$ cvs co -r Release-initial -d old project-x

$ cd old
$ cvs tag -b Release-initial-bugfixes

$ cvs update -d -P

$ cvs up -d -P

$ cvs ci -m "check into this branch"
$ cvs update -kk -A -d -P

$ cvs update -kk -d -P -j Release-initial-bugfixes

$ cvs ci -m "merge Release-initial-bugfixes"
$ cd
$ tar -cvzf old-project-x.tar.gz old

$ cvs release -d old

Notable options for CVS commands (use as first argument(s) to cvs).

option

meaning

-n

dry run, no effect

-t

display messages showing steps of cvs activity

Export files from CVS

To get the latest version from CVS, use "tomorrow":

$ cvs ex -D tomorrow module_name

Administer CVS

Add alias to a project (local server):

$ export CVSROOT=/var/lib/cvs
$ cvs co CVSROOT/modules
$ cd CVSROOT
$ echo "px -a project-x" >>modules
$ cvs ci -m "Now px is an alias for project-x"
$ cvs release -d .
$ cvs co -d project px

$ cd project

In order to perform above procedure, you should have the appropriate file permission.

File permissions in repository

CVS will not overwrite the current repository file but replaces it with another one. Thus, write permission to the repository directory is critical. For every new repository creation, run the following to ensure this condition if needed.

# cd /var/lib/cvs
# chown -R root:src repository
# chmod -R ug+rwX   repository
# chmod    2775     repository

Execution bit

A file's execution bit is retained when checked out. Whenever you see execution permission problems in checked-out files, change permissions of the file in the CVS repository with the following command.

# chmod ugo-x filename

Subversion

Subversion is a "next-generation" version control system, intended to replace CVS, so it has most of CVS's features. Generally, Subversion's interface to a particular feature is similar to CVS's, except where there's a compelling reason to do otherwise.

Installing a Subversion server

You need to install the subversion, libapache2-svn and subversion-tools packages to set up a server.

Setting up a repository

Currently, the subversion package does not set up a repository, so one must be set up manually. One possible location for a repository is in "/var/local/repos".

Create the directory:

# mkdir -p /var/local/repos

Create the repository database:

# svnadmin create /var/local/repos

Make the repository writable by the WWW server:

# chown -R www-data:www-data /var/local/repos

Configuring Apache2

To allow access to the repository via user authentication, add (or uncomment) the following in "/etc/apache2/mods-available/dav_svn.conf":

<Location /repos>
  DAV svn
  SVNPath /var/local/repos
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/subversion/passwd
<LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
</LimitExcept>
</Location>

Then, create a user authentication file with the command:

htpasswd2 -c /etc/subversion/passwd some-username

Restart Apache2, and your new Subversion repository will be accessible with the URL http://hostname/repos.

Subversion usage examples

The following sections teach you how to use different commands in Subversion.

Create a new Subversion archive

To create a new Subversion archive, type the following:

$ cd ~/your-project         # go to your source directory
$ svn import http://localhost/repos your-project project-name -m "initial project import"

This creates a directory named project-name in your Subversion repository which contains your project files. Look at http://localhost/repos/ to see if it's there.

Working with Subversion

Working with project-y using Subversion:

$ mkdir -p /path/to ;cd  /path/to
$ svn co http://localhost/repos/project-y

$ cd project-y

$ svn diff

$ svn revert modified_file

$ svn ci -m "Describe changes"

$ vi newfile_added
$ svn add newfile_added
$ svn add new_dir

$ svn add -N new_dir2

$ svn ci -m "Added newfile_added, new_dir, new_dir2"
$ svn up

$ svn log

$ svn copy http://localhost/repos/project-y \
      http://localhost/repos/project-y-branch \
      -m "creating my branch of project-y"

$ svn copy http://localhost/repos/project-y \
      http://localhost/repos/projct-y-release1.0 \
      -m "project-y 1.0 release"

$ svn merge http://localhost/repos/project-y \
   http://localhost/repos/project-y-branch

$ svn co -r 4 http://localhost/repos/project-y

Git

Git can do everything for both local and remote source code management. This means that you can record the source code changes without needing network connectivity to the remote repository.

Before using Git

You may wish to set several global configuration in ~/.gitconfig such as your name and email address used by Git:

$ git config --global user.name "Name Surname"
$ git config --global user.email yourname@example.com

If you are too used to CVS or Subversion commands, you may wish to set several command aliases;

$ git config --global alias.ci "commit -a"
$ git config --global alias.co checkout

You can check your global configuration by:

$ git config --global --list

Git references

There are good references for Git.

The git-gui and gitk commands make using Git very easy.

/!\ Do not use the tag string with spaces in it even if some tools such as gitk allow you to use it. It will choke some other git commands.

Git commands

Even if your upstream uses different VCS, it is good idea to use git(1) for local activity since you can manage your local copy of source tree without the network connection to the upstream. Here are commands used with git(1).

List of git packages and commands.

2

3

4

command

package

popcon

size

description

N/A

git-doc

*862

-

This provide the documentation for Git.

git(7)

git-core

512

-

The main command for Git.

gitk(1)

gitk

94

-

The GUI Git repository browser with history.

git-gui(1)

git-gui

28

-

The GUI for Git. (No history)

git-svnimport(1)

git-svn

68

-

This import the data out of Subversion into Git.

git-svn(1)

git-svn

68

-

This provides bidirectional operation between the Subversion and Git.

git-cvsimport(1)

git-cvs

49

-

This import the data out of CVS into Git.

git-cvsexportcommit(1)

git-cvs

49

-

This exports a commit to a CVS checkout from Git.

git-cvsserver(1)

git-cvs

49

-

A CVS server emulator for Git.

git-send-email(1)

git-email

37

-

This sends a collection of patches as email from the Git.

stg(1)

stgit

31

-

This is quilt on top of git. (Python)

git-buildpackage(1)

git-buildpackage

17

-

This automates the Debian packaging with the Git.

guilt(7)

guilt

9

-

This is quilt on top of git. (SH/AWK/SED/...)

Git for recording configuration history

You can manually record chronological history of configuration using [http://en.wikipedia.org/wiki/Git_(software) Git] tools. Here is a simple example for your practice to record "/etc/apt/" contents.:

$ cd /etc/apt/
$ sudo git init
$ sudo chmod 700 .git
$ sudo git add .
$ sudo git commit -a

$ cd /etc/apt/
$ sudo git commit -a

$ cd /etc/apt/
$ sudo gitk --all

(!) The sudo(8) command is needed to work with permissions of configuration data. For user configuration data, you may skip the sudo(8) command.

(!) The "chmod 700 .git" command in the above example is needed to protect archive data from unauthorized read access.

{i} For more complete setup for recording configuration history, please look for the etckeeper package: @{@recordingchangesinconfigurationfiles@}@.