Differences between revisions 12 and 13
Revision 12 as of 2008-01-26 10:34:09
Size: 78947
Editor: OsamuAoki
Comment:
Revision 13 as of 2008-01-26 12:54:24
Size: 79127
Editor: OsamuAoki
Comment:
Deletions are marked like this. Additions are marked like this.
Line 947: Line 947:
(!) BASH can be tweaked to change its glob behavior with its shopt builtin options such as dotglob, noglob, nocaseglob, nullglob, nocaseglob, extglob, etc. See {{{bash}}}(1).

Do not use Edit(GUI) button.

?TableOfContents(4)

Copyright 2007 Osamu Aoki GPL, (Please agree to GPL, GPL2, and any version of GPL which is compatible with DSFG if you update any part of wiki page)

The GNU/Linux tutorials

I think learning a computer system is like learning a new foreign language. Although tutorial books and documentations are helpful, you have to practice it by yourself. In order to help you get started smoothly, I will elaborate few basic points.

The powerful design of Debian GNU/Linux comes from the [http://en.wikipedia.org/wiki/Unix Unix] operating system, i.e., a multiuser, multitasking operating system. You must learn to take advantage of the power of these features.

"[http://packages.debian.org/search?keywords=rutebook Rute User's Tutorial and Exposition]", in the Debian non-free archive as rutebook package (popcon: @@@pop-rutebook@@@), provides good online resource to the generic system administration.

(!) If you have been using any [http://en.wikipedia.org/wiki/Unix-like Unix-like] system for a while with command line tools, you probably know everything I explain here. Please use this as a reality check.

The console basics

The shell prompt

Upon starting the system, you are presented with the character based login screen if you did not install X Window System with the display manager such as gdm. Suppose your hostname is foo, the login prompt looks like:

foo login:

Following what you selected during the installation process, you type your username, e.g. penguin, and press the Enter-key, then type your password and press the Enter-key again.

(!) Following the Unix tradition, the username and password of the Debian system are case sensitive.

Then the system starts with the greeting message stored in /etc/motd and with the command prompt as:

Debian GNU/Linux lenny/sid foo tty1
foo login: penguin
Password:
Last login: Sun Apr 22 09:29:34 2007 on tty1
Linux snoopy 2.6.20-1-amd64 #1 SMP Sun Apr 15 20:25:49 UTC 2007 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
foo:~$

Here the main part of the greeting message can be customized by editing the /etc/motd.tail file. The first line is generated from the system information using "uname -snrvm".

The shell prompt under X

If you installed [http://en.wikipedia.org/wiki/X_Window_System X Window System] with the display manager such as gdm by selecting "Desktop environment" task during the installation, you will be presented with the graphical login screen upon starting your system. You type your username and your password to GUI to login to the non-privileged user account.

/!\ Never start the X server under the root account by typing in root to the prompt of the display manager such as gdm because it is considered unsafe action. This is true even when you plan to perform administrative activities using GUI tools.

You can gain the shell prompt under X by starting a x-terminal-emulator program such as gnome-terminal, rxvt or xterm. Under the Gnome Desktop environment, clicking "Applications" -> "Accessories" -> "Terminal" does the trick.

Under some other Desktop systems, there may be no obvious starting point for the menu. If this happens, just try (right) clicking the center of the screen and hope some menu to pop-up.

The root account

The root account is also called [http://en.wikipedia.org/wiki/Superuser superuser] or privileged user. From this account, you can perform the system administration activities.

  • read, write, and remove any files on the system irrespective of their file permissions
  • set file ownership and permission of any files on the system
  • set the password of any non-privileged users on the system
  • login to any accounts without their passwords

This unlimited power of root account requires you to be considerate and responsible when using it.

(!) The file permission of the file (including hardware devices such as CD-ROM etc. which are just another file for the Debian system) may render them not to be used by non-root users. Although the use of root account is a quick way to test this kind of situation, the resolution of this situation should be done through proper setting of the access permission and the group.

/!\ Never share the root password with others.

The root shell prompt

There are few basic methods to gain the root shell prompt using the root password:

  • At the character based login prompt, you simply type root.

  • From any user shell prompt, type "su -l".

  • Under the Gnome Desktop environment, click "Applications" -> "Accessories" -> "Root Terminal".

The virtual console

In the default Debian system, there are six switchable VT-100-like character consoles available to start the command shell. You can switch between them by pressing the Left-Alt-key and one of the F1--F6 keys simultaneously. Each character console allows independent login to the account and offers the multiuser environment. This multiuser environment is a great Unix feature, and very addictive.

If you are under X Window System, you can regain the access to the character console by pressing Ctrl-Alt-F1 key, i.e., the left-Ctrl-key, the left-Alt-key, and the F1-key are pressed together. You can get back to the X Window System, normally running on the virtual console 7, by pressing Alt-F7.

You can alternatively change to another virtual console, e.g. to the console 1, by the command:

# chvt 1

How to leave the command prompt

You type Ctrl-D, i.e., the left-Ctrl-key and the d-key pressed together, at the command prompt to close the shell activity. If you are at the character console, you will return to the login prompt with this. Even though these control characters are referred as "control D" with the upper case, you do not need to press the Shift-key. The short hand expression, ^D, is also used for Ctrl-D.

If you are at x-terminal-emulator, you can close x-terminal-emulator window with this.

How to shut down the system

Just like any other modern OSs where the file operation involves caching data in the memory, the Debian system needs the proper shutdown procedure before power can safely be turned off to maintain the integrity of files.

Use the following command from the root command prompt to shutdown the system:

# shutdown -h now

This is for the normal multiuser mode.

If you are in the single-user mode, use following from the root command prompt:

# poweroff -i -f

Alternatively, you may type Ctrl-Alt-Delete (The left-Ctrl-key, the left-Alt-Key, and the Delete are pressed together) to shutdown if /etc/inittab contains "ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -h now" in it. See manpage for inittab(5) for details.

Recovering the sane console

When the screen goes berserk after doing some funny things such as "cat <some-binary-file>", type "reset" at the command prompt. You may not be able to see the command echoed as you type. You may also issue "clear" to clean up the screen.

Additional package suggestions for the newbie

Although even the minimal installation of the Debian system without any desktop environment tasks provides the basic Unix functionality, it is a good idea to install few additional commandline and curses based character terminal packages such as mc and vim with aptitude for beginners to get started. From the shell prompt as root:

# aptitude update
...
# aptitude install mc vim sudo
...

If you already had these packages installed, nothing will be installed.

List of interesting text-mode program packages.

1

2

package

popcon

description

mc

5740

A text-mode full-screen file manager

vim

15655

Unix text editor Vi IMproved, a programmers text editor

vim-tiny

Unix text editor Vi IMproved, a programmers text editor (compact version)

emacs21

GNU project Emacs, the Lisp based extensible text editor

emacs22

GNU project Emacs, the Lisp based extensible text editor

w3m

6313

Text-mode WWW browsers

gpm

2542

The Unix style cut-and-paste on the text console (daemon)

It may be a good idea to read some informative documentations.

List of informative documentation packages.

1

2

package

popcon

description

doc-debian

*42664

Debian Project documentation, Debian FAQ and other documents

debian-policy

*2288

Debian Policy Manual and related documents

developers-reference

*1058

Guidelines and information for Debian developers

maint-guide

*896

Debian New Maintainers' Guide

doc-linux-text

*42187

Linux HOWTOs and FAQ (text)

doc-linux-html

613

Linux HOWTOs and FAQ (html)

sysadmin-guide

*283

The Linux System Administrators' Guide

rutebook

182

Linux: Rute User's Tutorial and Exposition (in Debian non-free archive)

You can install some of these packages by issuing following command from the root shell prompt:

# aptitude install package_name

An extra user account

If you may not want to use your main user account for the following training activities, you can create a sandbag user account, e.g. fish. Type at root shell prompt:

# adduser fish
  • answer all the questions

This will create a new account named as fish. After your practice, you can remove this user account for practice and its home directory by:

# deluser --remove-home fish

The sudo configuration

For the typical single user workstation such as the desktop Debian system on the laptop PC, it is common to deploy simple configuration of sudo(8) as follows to let the non-privileged user, e.g. penguin, to gain administrative privilege just with his user password (not with the root password).

# echo "penguin  ALL=(ALL) ALL" >> /etc/sudoers

/!\ This is handy but do not set up like this for the account of non-administrative users for the system security.

<!> The password and the account of the penguin in the above example requires as much protection as the root password and the root account.

(!) For providing access privilege to the limited devices and the limited files, you should consider to use group to provide limited access instead of using the root privilege via sudo(8).

(!) With more complicated and careful configuration, the sudo may provide facility to grant the limited administrative privileges to other users on a shared system without sharing the root password.

Play time

Now you are ready to play with the Debian system without risks as long as you use the non-privileged user account.

This is because the Debian system is, even just after the default installation, configured with the proper file permissions which prevent non-privileged user to damage the system. Of course, there may still exist some holes which can be exploited but those who worry about these issues should not be reading this section but should be reading [http://www.debian.org/doc/manuals/securing-debian-howto/ Securing Debian Manual].

We will review basic Unix filesystem theory first. Then we will learn the Debian system with the easy way using Midnight Commander (MC) and with the proper Unix-like' ways.

Unix-like filesystem

In the GNU/Linux and other Unix-like OS systems, the files are organized into directories. All files and directories are arranged in one big tree, the file hierarchy, rooted at /.

These files and directories can be spread out over several devices. The mount(8) command serves to attach the file system found on some device to the big file tree. Conversely, the umount(8) command will detach it again. On recent Linux kernels, mount(8) operation can bind part of the file hierarchy somewhere else or can mount filesystem as shared, private, slave, or unbindable. Supported mount options for each filesystem are available in /share/doc/linux-doc-2.6.*/Documentation/filesystems/.

Directories on Unix system are called folders on some other systems. Please also note that there is no concept for drive such as A: on Unix system.

Unix file basics

Here are the basics:

  • Filenames are case sensitive. That is, MYFILE and MyFile are different files.

  • The root directory is referred to as simply /. Don't confuse this with the root user or the home directory for the root user: /root.

  • Every directory has a name which can contain any letters or symbols except /. The root directory is an exception; its name is / (pronounced "slash" or "the root directory") and it cannot be renamed.

  • Each file or directory is designated by a fully-qualified filename, absolute filename, or path, giving the sequence of directories which must be passed through to reach it. The three terms are synonymous.

  • All fully-qualified filenames begin with the / directory, and there's a / between each directory or file in the filename. The first / is the name of a directory, but the others are simply separators to distinguish the parts of the filename. The words used here can be confusing. Take the following fully-qualified filename as an example: /usr/share/keytables/us.map.gz. However, people will also refer to its basename us.map.gz alone as a filename.

  • The root directory has a number of branches, such as /etc/ and /usr/. These subdirectories in turn branch into still more subdirectories, such as /etc/init.d/ and /usr/local/. The whole thing together is called the directory tree. You can think of an absolute filename as a route from the base of the tree (/) to the end of some branch (a file). You will also hear people talk about the directory tree as if it were a family tree: thus subdirectories have parents, and a path shows the complete ancestry of a file. There are also relative paths that begin somewhere other than the root directory. You should remember that the directory ../ refers to the parent directory.

  • There's no directory that corresponds to a physical device, such as your hard disk. This differs from RT-11 (OS for PDP-11), CP/M, VMS, MS-DOS, AmigaOS, and Windows, where the path contains a device name such as C:\.

(!) While you can use almost any letters or symbols in a file name, in practice it is a bad idea to do so. It is better to avoid any characters that often have special meanings on the command line, including spaces, tabs, newlines, and other special characters:  { } ( ) [ ] ' ` " \ / > < | ; !  # & ^ * % @ $  . If you want to separate words in a name, good choices are the period, hyphen, and underscore. You could also capitalize each word, "LikeThis".

(!) The word path is used not only for fully-qualified filename as above but also for the command search path. The intended meaning is usually clear from the context.

The detailed best practices for the file hierarchy are described in the Filesystem Hierarchy Standard (/usr/share/doc/debian-policy/fhs/fhs.txt.gz). You should remember the following facts as the starter:

List of usage of key directories.

directory

usage

/

A simple / represents the root directory.

/etc/

This is the place for the system wide configuration files.

/var/log/

This is the place for the system log files.

/home/

This is the directory which contains all the home directories for all non-privileged users.

The filesystem internals

Following the Unix tradition, the Debian GNU/Linux system provides the [http://en.wikipedia.org/wiki/File_system filesystem] under which physical data on harddisks and other storage devices, and the interaction with the hardware devices such as console screens and remote serial consoles are represented in an unified manner.

Each file, directory, named pipe, or physical device on a Debian GNU/Linux system has a data structure called an [http://en.wikipedia.org/wiki/Inode inode] which describes its associated attributes such as the user who owns it (owner), the group that it belongs to, the time last accessed, etc. See /usr/include/linux/fs.h for the exact definition of struct inode in the Debian GNU/Linux system.

This abstract and unified representation of physical entities is very powerful since this allows us to use the same command for the same kind of operation on many totally different devices.

All your files could be on one disk --- or you could have 20 disks, some of them connected to a different computer elsewhere on the network. You can't tell just by looking at the directory tree, and nearly all commands work just the same way no matter what physical device(s) your files are really on.

The filesystem permission

The [http://en.wikipedia.org/wiki/File_system_permissions filesystem permissions] of Unix-like system are defined for three categories of affected users:

  • the user who owns the file (u),

  • other users in the group which the file belongs to (g), and

  • all other users (o).

For the file, each corresponding permission allows:

  • read (r): to examine contents of the file,

  • write (w): to modify the file, and

  • execute (x): to run the file as a command.

For the directory, each corresponding permission allows:

  • read (r): to list contents of the directory,

  • write (w): to add or remove files in the directory, and

  • execute (x): to access files in the directory.

Here, execute permission on the directory means not only to allow reading of files in its directory but also to allow viewing their attributes, such as the size and the modification time.

To display permission information (and more) for files and directories, ls(1) is used. When ls invoked with the -l option, it displays the following information in the order given:

  • the type of file (first character)

  • the file's access permissions (the next nine characters, consisting of three characters each for user, group, and other in this order)

  • the number of hard links to the file

  • the name of the user who owns the file

  • the name of the group which the file belongs to

  • the size of the file in characters (bytes)

  • the date and time of the file (mtime)

  • the name of the file.

List of the first character of "ls -l" output

character

meaning

-

normal file

d

directory

l

symlink

c

character device node

b

block device node

p

named pipe

s

socket

To change the owner of the file, chown(1) is used from the root account. To change the group of the file, chgrp(1) is used from the file's owner or root account. To change file and directory access permissions, chmod(1) is used from the file's owner or root account. Basic syntax to manipulate foo file is:

# chown <newowner> foo
# chgrp <newgroup> foo
# chmod  [ugoa][+-=][rwx][,...] foo

For example, in order to make a directory tree to be owned by a user foo and shared by a group bar, issue the following commands from the root account:

# cd /some/location/
# chown -R foo:bar .
# chmod -R ug+rwX,o=rX .

There are three more special permission bits:

  • set user ID (s or S instead of user's x),

  • set group ID (s or S instead of group's x), and

  • sticky bit (t or T instead of other's x).

Here the output of ls -l for these bits is capitalized if execution bits hidden by these outputs are unset.

Setting set user ID on an executable file allows a user to execute the executable file with the owner ID of the file (for example root). Similarly, setting set group ID on an executable file allows a user to execute the executable file with the group ID of the file (for example root). Because these settings can cause security risks, enabling them requires extra caution.

Setting set group ID on a directory enables the BSD-like file creation scheme where all files created in the directory belong to the group of the directory.

Setting the sticky bit on a directory prevents a file in the directory from being removed by a user who is not the owner of the file. In order to secure the contents of a file in world-writable directories such as /tmp or in group-writable directories, one must not only set write permission off for the file but also set the sticky bit on the directory. Otherwise, the file can be removed and a new file can be created with the same name by any user who has write access to the directory.

Here are a few interesting examples of the file permissions.

$ ls -l /etc/passwd /etc/shadow /dev/ppp /usr/sbin/exim4
crw------- 1 root root   108, 0 2007-04-29 07:00 /dev/ppp
-rw-r--r-- 1 root root     1427 2007-04-16 00:19 /etc/passwd
-rw-r----- 1 root shadow    943 2007-04-16 00:19 /etc/shadow
-rwsr-xr-x 1 root root   700056 2007-04-22 05:29 /usr/sbin/exim4
$ ls -ld /tmp /var/tmp /usr/local /var/mail /usr/src
drwxrwxrwt 10 root root  4096 2007-04-29 07:59 /tmp
drwxrwsr-x 10 root staff 4096 2007-03-24 18:48 /usr/local
drwxrwsr-x  4 root src   4096 2007-04-27 00:31 /usr/src
drwxrwsr-x  2 root mail  4096 2007-03-28 23:33 /var/mail
drwxrwxrwt  2 root root  4096 2007-04-29 07:11 /var/tmp

There is an alternative numeric mode to describe file permissions in chmod(1) commands. This numeric mode uses 3 to 4 digit wide octal (radix=8) numbers.

The numeric mode for file permissions in chmod(1) commands.

digit

meaning

1st optional digit

sum of set user ID (=4), set group ID (=2), and sticky bit (=1)

2nd digit

sum of read (=4), write (=2), and execute (=1) permissions for user

3rd digit

ditto for group

4th digit

ditto for other

This sounds complicated but it is actually quite simple. If you look at the first few (2-10) columns from "ls -l" command output and read it as a binary (radix=2) representation of file permissions ("-" being "0" and "rwx" being "1"), the last 3 digit of the numeric mode value should make sense as an octal (radix=8) representation of file permissions to you. For example, try:

$ touch foo bar
$ chmod u=rw,go=r foo
$ chmod 644 bar
$ ls -l foo bar
-rw-r--r-- 1 penguin penguin 17 2007-04-29 08:22 bar
-rw-r--r-- 1 penguin penguin 12 2007-04-29 08:22 foo

The umask

What permissions are applied to a newly created file or directory is restricted by the umask shell built-in command. See dash(1), bash(1), and builtins(7).

 (file permission) = (requested file permission) & ~(umask value)

The umask value examples.

umask

usage

file permission created

directory permission created

0022

writable only by the user

-rw-rw-r--

-rwxrwxr-x

0002

writable by the group

-rw-r--r--

-rwxr-xr-x

The Debian system uses a user private group (UPG) scheme as its default. A UPG is created whenever a new user is added to the system. A UPG has the same name as the user for which it was created and that user is the only member of the UPG. UPGs makes it is safe to set umask to 0002 since every user has their own private group. (In some unix variants, it is quite common to setup all normal users belonging to a single users group and is good idea to set umask to 0022 for security in such case.)

The group

In order to make group permission to be applied to a particular user, that user needs to be made a member of the group using "sudo vigr". (Alternatively, you may dynamically add users to groups during authentication process by adding "auth optional pam_group.so" line to /etc/pam.d/common-auth and setting /etc/security/group.conf. See the authentication.)

The hardware devices are just another kind of file on the Debian system. If you have problem accessing devices such as CD-ROM and USB memory stick from a user account, you should make that user a member of the pertinent group.

Some notable system provided groups allow their member to access particular files and devices without root privilege.

List of notable system provided groups for file access.

group

accessible files and devices

dialout

Full and direct access to serial ports. (reconfigure modem, dial anywhere, etc.)

dip

"Dialup IP", enough to run ppp or dip commands.

cdrom

CD-ROM, DVD+/-RW drives.

audio

An audio device.

video

A video device.

scanner

Scanner(s).

adm

System monitoring logs.

staff

Some directories for junior administrative work: /usr/local, /home .

Some notable system provided groups allow their member to execute particular commands without root privilege.

List of notable system provided groups for particular command executions.

group

accessible commands

sudo

execute sudo without their password.

lpadmin

execute commands to add, modify, and remove printers from printer databases.

plugdev

execute pmount(1) for removable devices such as USB memories.

For the full listing of the system provided users and groups, see the recent version of the "Users and Groups" (/usr/share/doc/base-passwd/users-and-groups.html) document provided by the base-passwd package.

See manpages of passwd(5), group(5), shadow(5), group(5), vipw(8), vigr(8), and pam_group(8) for the management commands of the user and group system.

Timestamps

There are three types of timestamps for a GNU/Linux file.

List of types of timestamps.

type

meaning

mtime

the file modification time (ls -l)

ctime

the file status change time (ls -lc)

atime

the last file access time (ls -lu)

Note that ctime is not file creation time.

  • Overwriting a file will change all of mtime, ctime, and atime of the file.

  • Changing permission or owner of a file will change ctime and atime of the file.

  • Reading a file will change atime of the file.

Note that even simply reading a file on the Debian system will normally cause a file write operation to update atime information in the inode. Mounting a filesystem with the noatime option will let the system skip this operation and will result in faster file access for the read. See mount(8).

Use touch(1) command to change timestamps of existing files.

There are two methods of associating a file foo with a different filename bar.

See the following example for the changes in link counts and the subtle differences in the result of the rm command.

$ echo "Original Content" > foo
$ ls -li foo
2398521 -rw-r--r-- 1 penguin penguin 17 2007-04-29 08:15 foo
$ ln foo bar     # hard link
$ ln -s foo baz  # symlink
$ ls -li foo bar baz
2398521 -rw-r--r-- 2 penguin penguin 17 2007-04-29 08:15 bar
2398538 lrwxrwxrwx 1 penguin penguin  3 2007-04-29 08:16 baz -> foo
2398521 -rw-r--r-- 2 penguin penguin 17 2007-04-29 08:15 foo
$ rm foo
$ echo "New Content" > foo
$ ls -li foo bar baz
2398521 -rw-r--r-- 2 penguin penguin 17 2007-04-29 08:15 bar
2398538 lrwxrwxrwx 1 penguin penguin  3 2007-04-29 08:16 baz -> foo
2398540 -rw-r--r-- 2 penguin penguin 12 2007-04-29 08:17 foo
$ cat bar
Original Content
$ cat baz
New Content

The hardlink can be made within the same file system and shares the same inode number which the "-i" option with ls command reveals.

The symlink always has nominal file access permissions of "rwxrwxrwx", as shown in the above example, with the effective access permissions dictated by the permissions of the file that it points to.

The "." directory links to the directory that it appears in, thus the link count of any new directory starts at 2. The ".." directory links to the parent directory, thus the link count of the directory increases with the addition of new subdirectories.

Named pipes (FIFOs)

A [http://en.wikipedia.org/wiki/Named_pipe named pipe] is a file that acts like a pipe. You put something into the file, and it comes out the other end. Thus it's called a FIFO, or First-In-First-Out: the first thing you put in the pipe is the first thing to come out the other end.

If you write to a named pipe, the process which is writing to the pipe doesn't terminate until the information being written is read from the pipe. If you read from a named pipe, the reading process waits until there is something to read before terminating. The size of the pipe is always zero --- it does not store data, it just links two processes like the shell "|". However, since this pipe has a name, the two processes don't have to be on the same command line or even be run by the same user.

You can try it by doing the following:

$ cd; mkfifo mypipe
$ echo "hello" >mypipe & # put into background
[1] 8022
$ ls -l mypipe
prw-r--r-- 1 penguin penguin 0 2007-04-29 08:25 mypipe
$ cat mypipe
hello
[1]+  Done                    echo "hello" >mypipe
$ ls mypipe
mypipe
$ rm mypipe

Sockets

The socket is similar to the named pipe (FIFO) and allows processes to exchange information. For the socket, those processes do not need to be running at the same time nor need to be the children of the same ancestor process. This is the endpoint for the inter process communication. The exchange of information may occur over the network between different hosts. The two most common ones are [http://en.wikipedia.org/wiki/Internet_socket the Internet socket] and [http://en.wikipedia.org/wiki/Unix_domain_socket the Unix domain socket].

Device files

[http://en.wikipedia.org/wiki/Device_file Device files] refer to physical or virtual devices on your system, such as your hard disk, video card, screen, or keyboard. An example of a virtual device is the console, represented by /dev/console.

The device types.

device type

meaning

character device

This can be accessed one character at a time, that is, the smallest unit of data which can be written to or read from the device is a character (byte).

block device

This must be accessed in larger units called blocks, which contain a number of characters. Your hard disk is a block device.

You can read and write device files, though the file may well contain binary data which may be an incomprehensible-to-humans gibberish. Writing data directly to these files is sometimes useful for the troubleshooting of hardware connections. For example, you can dump a text file to the printer device /dev/lp0 or send modem commands to the appropriate serial port /dev/ttyS0. But, unless this is done carefully, it may cause a major disaster. So be cautious.

The device node number are displayed by executing ls as:

$ ls -l /dev/hda /dev/ttyS0 /dev/zero
brw-rw---- 1 root cdrom   3,  0 2007-04-29 07:00 /dev/hda
crw-rw---- 1 root dialout 4, 64 2007-04-29 07:00 /dev/ttyS0
crw-rw-rw- 1 root root    1,  5 2007-04-29 07:00 /dev/zero

Here,

  • /dev/hda has the major device number 3 and the minor device number 0. This is read/write accessible by the user who belongs to disk group,

  • /dev/ttyS0 has the major device number 4 and the minor device number 64. This is read/write accessible by the user who belongs to dialout group, and

  • /dev/zero has the major device number 1 and the minor device number 5. This is read/write accessible by anyone.

In the Linux 2.6 system, the filesystem under /dev is automatically populated by the udev(7) mechanism.

Special device files

There are some special device files.

List of special device files.

device file

action

response

/dev/null

read

it returns "end-of-file (EOF) character".

/dev/null

write

it is a bottomless data dump pit.

/dev/zero

read

it returns "the \0 (NUL) character" (not the same as the number zero ASCII).

/dev/random

read

it returns random characters from a true random number generator, delivering real entropy. (slow)

/dev/urandom

read

it returns random characters from a cryptographically secure pseudorandom number generator.

/dev/full

write

it returns the disk-full (ENOSPC) error.

procfs and sysfs

The [http://en.wikipedia.org/wiki/Procfs procfs] and [http://en.wikipedia.org/wiki/Sysfs sysfs] mounted on /proc and /sys are the pseudo-filesystem and expose internal data structures of the kernel to the userspace.

The directory /proc contains (among other things) one subdirectory for each process running on the system, which is named after the process ID (PID).

The directories under /proc/sys/ contain interface to change certain kernel parameters at run time. (You may do the same through specialized command sysctl(8) or its preload/configuration file /etc/sysctrl.conf.)

(!) The Linux kernel may complain "Too many open files". You can fix avoid this by executing "echo "65536"  > /proc/sys/fs/file-max" from the root shell to increase file-max value.

People frequently panic when they notice one file in particular - /proc/kcore - which is generally huge. This is (more or less) a copy of the contents of your computer's memory. It's used to debug the kernel. It doesn't actually exist anywhere, so don't worry about its size.

The sys file system provides a means to export kernel data structures, their attributes, and the linkages between them to userspace. The directories under /sys also contain interface to change certain kernel parameters at run time.

See proc.txt(.gz), sysfs.txt(.gz) and other related documents in the Linux kernel documentation (/usr/share/doc/linux-doc-2.6.*/Documentation/filesystems/*) provided by the linux-doc-2.6.* package.

Midnight Commander (MC)

Midnight Commander (MC) is a GNU "Swiss army knife" for the Linux console and other terminal environments. This gives newbie a menu driven console experience which is much easier to learn than standard Unix commands.

Use this command to explore the Debian system. This is the best way to learn. Please explore few interesting locations just using the cursor keys and Enter key:

  • /etc and its subdirectories.

  • /var/log and its subdirectories.

  • /usr/share/doc and its subdirectories.

  • /sbin and /bin

Customization of MC

In order to make MC to change working directory upon exit and cd to the frequently used directories, I suggest to modify ~/.bashrc to include:

. /usr/share/mc/bin/mc.sh                                                                                                        

See mc(1) (under the "-P" option) for the reason. (If you do not understand what exactly I am talking here, you can do this later.)

Starting MC

MC can be started by:

$ mc

MC takes care of all file operations through its menu, requiring minimal user effort. Just press F1 to get the help screen. You can play with MC just by pressing cursor-keys and function-keys.

In some consoles such as gnome-terminal, key strokes of function-keys may be stolen by the console program. You can disable these features by "Edit" -> "Keyboard Shortcuts" for gnome-terminal.

If you encounter character encoding problem which displays garbage characters, adding "-a" to MC's command line may help prevent problems.

File manager in MC

The default is two directory panels containing file lists. Another useful mode is to set the right window to "information" to see file access privilege information, etc. Following are some essential keystrokes. With the gpm daemon running, one can use a mouse, too. (Make sure to press the shift-key to obtain the normal behavior of cut and paste in MC.)

The key bindings of MC.

key

key binding

F1

Help menu

F3

Internal file viewer

F4

Internal editor

F9

Activate pull down menu

F10

Exit Midnight Commander

Tab

Move between two windows

Insert or Ctrl-T

Mark file for a multiple-file operation such as copy

Del

Delete file (be careful---set MC to safe delete mode)

Cursor keys

Self-explanatory

Command-line tricks in MC

  • Any cd command will change the directory shown on the selected screen.

  • Ctrl-Enter or Alt-Enter will copy a filename to the command line. Use this with the cp or mv command together with command-line editing.

  • Alt-Tab will show shell filename expansion choices.

  • One can specify the starting directory for both windows as arguments to MC; for example, mc /etc /root.

  • Esc + n-key == Fn (i.e., Esc + 1 = F1, etc.; Esc + 0 = F10)

  • Pressing Esc before the key has the same effect as pressing the Alt and the key together.; i.e., type Esc + c for Alt-C. Esc is called meta-key and sometimes noted as "M-"

Editor in MC

The internal editor has an interesting cut-and-paste scheme. Pressing F3 marks the start of a selection, a second F3 marks the end of selection and highlights the selection. Then you can move your cursor. If you press F6, the selected area will be moved to the cursor location. If you press F5, the selected area will be copied and inserted at the cursor location. F2 will save the file. F10 will get you out. Most cursor keys work intuitively.

This editor can be directly started on a file:

$ mc -e filename_to_edit
$ mcedit filename_to_edit

This is not a multi-window editor, but one can use multiple Linux consoles to achieve the same effect. To copy between windows, use Alt-F<n> keys to switch virtual consoles and use "File->Insert file" or "File->Copy to file" to move a portion of a file to another file.

This internal editor can be replaced with any external editor of choice.

Also, many programs use the environment variables EDITOR or VISUAL to decide which editor or viewer to use. If you are uncomfortable with vim or nano initially, you may set these to mcedit by adding these lines to ~/.bashrc:

...
export EDITOR=mcedit
export VISUAL=mcedit
...

I do recommend setting these to vim if possible.

If you are uncomfortable with vim, you can keep using mcedit for most system maintenance tasks. Since mcedit is 8-bit clean and dumb (it does not care about the text encoding), it sometimes has advantages when editing unknown encoding files. mcedit cannot display UTF-8 files with Asian characters correctly. (At least it was so, as of 2007, lenny/testing.)

Viewer in MC

Very smart viewer. This is a great tool for searching words in documents. I always use this for files in the /usr/share/doc directory. This is the fastest way to browse through masses of Linux information. This viewer can be directly started like so:

$ mc -v path/to/filename_to_view
$ mcview path/to/filename_to_view

Auto-start features of MC

Press Enter on a file, and the appropriate program will handle the content of the file. This is a very convenient MC feature.

The reaction to the enter key in MC.

file type

reaction to enter key

executable file

Execute command

man file

Pipe content to viewer software

html file

Pipe content to web browser

tar.gz .deb file

Browse its contents as if subdirectory

In order to allow these viewer and virtual file features to function, viewable files should not be set as executable. Change their status using the chmod command or via the MC file menu.

FTP virtual filesystem of MC

MC can be used to access files over the Internet using FTP. Go to the menu by pressing F9, then type "p" to activate the FTP virtual filesystem. Enter a URL in the form "username:passwd@hostname.domainname", which will retrieve a remote directory that appears like a local one.

Try "http.us.debian.org/debian" as the URL and browse the Debian archive.

The basic unix-like work environment

Although MC enables you to do almost everything, it is very important for you to learn how to use the command line tools invoked from the shell prompt and become familiar with the unix-like work environment.

The login shell

You can select your login shell with the chsh command.

List of shell programs.

1

2

package

popcon

POSIX shell

description

bash

38091

Yes

The GNU Bourne Again SHell. (de facto standard)

tcsh

6855

No

TENEX C Shell, an enhanced version of Berkeley csh.

dash

2624

Yes

The Debian Almquist Shell. Good for shell script.

zsh

1639

Yes

The standard shell with many enhancements.

pdksh

290

Yes

A public domain version of the Korn shell.

csh

256

No

OpenBSD C Shell, a version of Berkeley csh.

ksh

161

Yes

The real, AT&T version of the Korn shell.

In this tutorial chapter, the interactive shell always means bash.

Customizing bash

You can customize bash behavior by ~/.bashrc. For example, I added followings to ~/.bashrc:

# CD upon exiting MC
. /usr/share/mc/bin/mc.sh

# set CDPATH to good one
CDPATH=.:/usr/share/doc:~/Desktop/src:~/Desktop:~
export CDPATH

PATH="${PATH}":/usr/sbin:/sbin
# set PATH so it includes user's private bin if it exists
if [ -d ~/bin ] ; then
  PATH=~/bin:"${PATH}"
fi
export PATH

EDITOR=vim
export EDITOR

Special key strokes

In the unix-like environment, there are few key strokes which have special meanings. Please note that on a normal Linux character console, only the left-hand Ctrl and Alt keys work as expected. Here are few notable key strokes to remember.

List of key bindings for bash.

key

key binding

Ctrl-U

Erase line before cursor.

Ctrl-H

Erase a character before cursor.

Ctrl-D

Terminate input. (exit shell if you are using shell)

Ctrl-C

Terminate a running program.

Ctrl-Z

Temporarily stop program by moving it to the background job

Ctrl-S

Halt output to screen.

Ctrl-Q

Reactivate output to screen.

Ctrl-Alt-Del

Reboot/halt the system, see manpage for inittab.

Left-Alt-key (optionally, Windows-key)

Meta-key for Emacs and the similar UI.

Up-arrow

Start command history search under bash.

Ctrl-R

Start incremental command history search under bash.

Tab

Complete input of the filename to the command line under bash.

Ctrl-V Tab

Input Tab without expansion to the command line under bash.

{i} The terminal feature of Ctrl-S can be disabled using stty(1) command.

Unix style mouse operations

The Unix style mouse operations are based on the 3 button mouse system.

The Unix style mouse operations.

action

response

Left-click-and-drag mouse

Select and copy to the clipboard.

Left-click

Select the start of selection.

Right-click

Select the end of selection and copy to the clipboard.

Middle-click

Paste clipboard at the cursor.

The center wheel on the modern wheel mouse is considered middle mouse button and can be used for middle-click. Clicking left and right mouse buttons together serves as the middle-click under the 2 button mouse system situation. In order to use a mouse in the Linux character console, you need to have gpm running as daemon.

The pager

The less program is the enhanced pager (file content browser). Hit "h" for help. It can do much more than more. This less command can be supercharged by executing eval $(lesspipe) or eval $(lessfile) in the shell startup script. See more in /usr/share/doc/lessf/LESSOPEN. The -R option allows raw character output and enables ANSI color escape sequences. See less(1).

The text editor

You should become proficient in one of the variant of [http://www.vim.org/ vim] or [http://www.gnu.org/software/emacs/ emacs] programs which are popular in the unix-like system.

I think getting used to vim commands is the right thing to do, since Vi-editor is always there in the Linux/unix world. (Actually, vi or nvi are the programs you find everywhere. I chose vim instead for newbie since it offers you help through F1 key while it is similar enough and more powerful.)

If you chose either [http://www.gnu.org/software/emacs/ emacs] or [http://www.xemacs.org/ xemacs] instead as your choice of the editor, that is another good choice indeed.

All these programs usually come with tutoring program for you to learn them by practice. For vim, start it and press F1-key. You should at least read the first 35 lines. Then do the online training course by moving cursor to |tutor| and pressing Ctrl-].

(!) The good editors, such as vim and emacs, can be used to handle UTF-8 and other exotic encoding texts correctly with proper option in the x-terminal-emulator on X under UTF-8 locale with proper font settings. Please refer to their documentation on multibyte text.

Customizing vim

You can customize vim behavior by ~/.vimrc. For example, I use:

set nocompatible
set nopaste
set pastetoggle=<f2>
syn on

Recording the shell activities

The output of the shell command may roll off your screen and may be lost forever. It is good practice to log shell activities into the file for you to review them later. This kind of record is essential when you perform any system administration tasks.

The basic method of recording the shell activity is to run it under the script(1) command.

$ script
Script started, file is typescript
  • do whatever shell commands ...
  • press Ctrl-D to exit script.

$ vim typescript

Basic Unix commands

Let's learn the basic Unix commands. Here I use "Unix" in its generic sense. Any Unix clone OSs usually offer the equivalent commands. The Debian system is no exception. Do not worry if some commands do not work as you wish now. If alias is used in the shell, its corresponding command outputs are different. These examples are not meant to be executed in this order.

Try all the following commands from the non-privileged user account:

List of basic Unix commands.

command

description

pwd

Display name of current/working directory.

whoami

Display current user name.

file <foo>

Display a type of file for the file <foo>.

type -p <commandname>

Display a file location of command <commandname>.

which <commandname>

, ,

type <commandname>

Display information on command <commandname>.

apropos <key-word>

Find commands related to <key-word>.

man -k <key-word>

, ,

whatis <commandname>

Display one line explanation on command <commandname>.

man -a <commandname>

Display explanation on command <commandname>. (Unix style)

info <commandname>

Display rather long explanation on command <commandname>. (GNU style)

ls

List contents of directory. (non-dot files and directories)

ls -a

List contents of directory. (all files and directories)

ls -A

List contents of directory. (almost all files and directories, i.e., skip ".." and ".")

ls -la

List all contents of directory with detail information.

ls -lai

List all contents of directory with inode number and detail information.

ls -d

List all directories under the current directory.

tree

Display file tree contents.

lsof <foo>

List open status of file <foo>.

mkdir <foo>

Make a new directory <foo> in the current directory.

rmdir <foo>

Remove a directory <foo> in the current directory.

cd <foo>

Change directory to the directory <foo> in the current directory or in the directory listed in the variable CDPATH.

cd /

Change directory to the root directory.

cd

Change directory to the current user's home directory.

cd /<foo>

Change directory to the absolute path directory /<foo>.

cd ..

Change directory to the parent directory.

cd ~<foo>

Change directory to the home directory of the user <foo>.

cd -

Change directory to the previous directory.

</etc/motd pager

Display contents of /etc/motd using the default pager.

touch <junkfile>

Create a empty file <junkfile>.

cp <foo> <bar>

Copy a existing file <foo> to a new file <bar>.

rm <junkfile>

Remove a file <junkfile>.

mv <foo> <bar>

Rename an existing file <foo> to a new name <bar>. The directory <bar> must not exist.

mv <foo> <bar>

Move an existing file <foo> to a new location <bar>/<foo>. The directory <bar> must exist.

mv <foo> <bar>/<baz>

Move an existing file <foo> to a new location with a new name <bar>/<baz>. The directory <bar> must exist but the directory <bar>/<baz> must not exist.

chmod 600 <foo>

Make an existing file <foo> to be non-readable and non-writable by the other people. (non-executable for all)

chmod 644 <foo>

Make an existing file <foo> to be readable but non-writable by the other people. (non-executable for all)

chmod 755 <foo>

Make an existing file <foo> to be readable but non-writable by the other people. (executable for all)

find .  -name <pattern>

find matching filenames using shell <pattern>. (slower)

locate -d .  <pattern>

find matching filenames using shell <pattern>. (quicker using regularly generated database)

grep -e "<pattern>" *.html

Find a "<pattern>" in all of the files ending with ".html" in current directory and display them all.

top

Display process information using full screen. Type "q" to quit.

ps aux | pager

Display information on all the running processes using BSD style output.

ps -ef | pager

Display information on all the running processes using Unix system-V style output.

ps aux | grep -e "[e]xim4*"

Display all processes running exim or exim4.

ps axf | pager

Display information on all the running processes with ASCII art output.

kill <1234>

Kill a process identified by the process ID: <1234>.

gzip <foo>

Compress <foo> to create <foo>.gz using the Lempel-Ziv coding (LZ77).

gunzip <foo>.gz

Decompress <foo>.gz to create <foo>.

bzip2 <foo>

Compress <foo> to create <foo>.bz2 using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. (Better compression than gzip)

bunzip2 <foo>.bz2

Decompress <foo>.bz2 to create <foo>.

tar -xvf <foo.tar>

Extract files from <foo>.tar archive.

tar -xvzf <foo>.tar.gz

Extract files from gzipped <foo>.tar.gz archive.

tar -xvf -j <foo.tar.bz2>

Extract files from <foo>.tar.bz2 archive.

tar -cvf <foo>.tar <bar>/

Archive contents of folder <bar>/ in <foo>.tar archive.

tar -cvzf <foo>.tar.gz <bar>/

Archive contents of folder <bar>/ in compressed <foo>.tar.gz archive.

tar -cvjf <foo>.tar.bz2 <bar>/

Archive contents of folder <bar>/ in <foo>.tar.bz2 archive.

zcat README.gz | pager

Display contents of compressed README.gz using the default pager.

zcat README.gz > foo

Create a file foo with the decompressed content of README.gz.

zcat README.gz >> foo

Append the decompressed content of README.gz to the end of the file foo. (If it does not exist, create it first.)

(!) Unix has a tradition to hide filenames which start with ".". They are traditionally files that contain configuration information and user preferences.

(!) For cd command, see manpage of builtins(7).

(!) The default pager of the bare bone Debian system is more which cannot scroll back. By installing less package using command line "aptitude install less", less becomes default pager and you can scroll back with cursor keys.

(!) The "[" and "]" in the regular expression of the "ps aux | grep -e "[e]xim4*"" command above enable grep to avoid matching itself. The "4*" in the regular expression means 0 or more repeats of character "4" thus enables grep to match both "exim" and "exim4". Although "*" is used in the shell filename glob and the regular expression, their meanings are different in the regular expression. Learn the regular expression from the manpage of grep(1).

Please traverse directories and peek into the system using above commands as a training. If you have questions on any of the console commands, please make sure to read the manual page. For example, these commands are the good start:

$ man man
$ man bash
$ man builtins
$ man grep
$ man ls

Please note that many Unix-like commands including ones from GNU and BSD will display brief help information if you invoke them in one of the following ways (or without any arguments in some cases):

$ <commandname> --help
$ <commandname> -h

The simple shell command

Now you have some feel on how to use the Debian system. Let's look deep into the mechanism of the command execution in the Debian system. Here, I have simplified reality for the newbie. See manpages for bash(1) for the exact explanation.

A simple command is a sequence of

  1. variable assignments (optional)
  2. command name
  3. arguments (optional)
  4. redirections (optional: > , >> , < , << , etc.)

  5. control operator (optional: && , || ; <newline> , ; , & , ( , ) )

Command execution and environment variable

The values of some [http://en.wikipedia.org/wiki/Environment_variable environment variables] change the behavior of some unix commands.

The default values of environment variables are initially set by the PAM system and then some of them may be reset by some application programs:

  • the display manager such as gdm, and

  • the shell in its start up codes bash_profile and .bashrc.

LANG and LC_*

Typical command execution uses a shell line sequence like the following:

$ date
Sun Jun  3 10:27:39 JST 2007
$ LANG=fr_FR.UTF-8 date
dimanche 3 juin 2007, 10:27:33 (UTC+0900)

Here, the program date is executed in the foreground job. The environment variable "LANG" is:

Most command executions usually do not have preceding environment variable definition. For the above example, you can alternatively execute:

$ LANG=fr_FR.UTF-8
$ date
dimanche 3 juin 2007, 10:27:33 (UTC+0900)

As you can see here, the output of command is affected by the environment variable to produce French output. If you want the environment variable to be inherited to the subprocesses (e.g., when calling shell script), you need to "export" it instead by using:

$ export LANG

{i} When filing a bug report, running the command under "LANG=en_US.UTF-8" is good idea if you use non-English environment.

See locale(5) and locale(7) for LANG and related environment variables.

PATH

When you type a command into the shell, the shell searches the command in the list of directories contained in the PATH environment variable. The value of the PATH environment variable is also called the shell's search path.

In the default Debian installation, the PATH environment variable of user accounts may not include /sbin/. For example, The ifconfig command needs to be issued with full path as /sbin/ifconfig.

You can change the PATH environment variable by the ~/.bash_profile or ~/.bashrc files.

HOME

Many commands stores user specific configuration in the home directory and changes their behavior by their contents. The home directory is identified by the environment variable: HOME:

List of HOME values.

situation

value of HOME

program run by the init process (daemon)

/

program run from the normal root shell

/root

program run from the normal user shell

/home/<normal_user>

program run from the normal user GUI desktop menu

/home/<normal_user>

program run as root with "sudo program"

/home/<normal_user>

program run as root with "sudo -H program"

/root

Command line options

Some commands take arguments. The arguments starting with "-" or "--" are called options and control the behavior of the command.

$ date
Mon Oct 27 23:02:09 CET 2003
$ date -R
Mon, 27 Oct 2003 23:02:40 +0100

Here the command-line argument "-R" changes the date command behavior to output RFC-2822 compliant date string.

Shell glob

Often you want a command to work with a group of files without typing all of them. The filename expansion pattern using the shell glob, (sometimes referred as wildcards), facilitate this needs.

The shell glob patterns.

shell glob pattern

match

*

This matches filename (segment) not started with ".".

.*

This matches filename (segment) started with ".".

?

This matches exactly one character.

[...]

This matches exactly one character with any character enclosed in brackets.

[a-z]

This matches exactly one character with any character between "a" and "z".

[^...]

This matches exactly one character other than any character enclosed in brackets (excluding "^").

For example, try the following and think yourself:

$ mkdir junk; cd junk; .[^.]*touch 1.txt 2.txt 3.c 4.h .5.txt ..6.txt
$ echo *.txt
1.txt 2.txt
$ echo *
1.txt 2.txt 3.c 4.h
$ echo *.[hc]
3.c 4.h
$ echo .*
. .. .5.txt ..6.txt
$ echo .*[^.]*
.5.txt ..6.txt
$ echo [^1-3]*
4.h
$ cd ..; rm -rf junk

See "man 7 glob" for more.

(!) Unlike normal filename expansion by the shell, the shell pattern "*" tested in the find command with -name test etc., matches the initial "." of the filename. (New feature)

(!) BASH can be tweaked to change its glob behavior with its shopt builtin options such as dotglob, noglob, nocaseglob, nullglob, nocaseglob, extglob, etc. See bash(1).

Return value of the command

Each command returns its exit status as the return value.

Command exit code.

command exit state

numeric return value

logical return value

command executed successfully.

$? = 0

TRUE

command exited with error.

$? != 0

FALSE

Thus:

$ [ 1 = 1 ] ; echo $?
0
$ [ 1 = 2 ] ; echo $?
1

Please note that, in the logical context for the shell, success is treated as the logical TRUE which has 0 (zero) as its value. This is somewhat non-intuitive and needs to be reminded here.

Typical command sequences and shell redirection

Let's try to remember following shell command idioms.

The shell command idioms.

command idiom (type in one line)

effects

command &

The command is executed in the subshell in the background.

command1 | command2

The standard output of command1 is piped to the standard input of command2 . Both commands may be running concurrently.

command1 2>&1 | command2

Both standard output and standard error of command1 are piped to the standard input of command2. Both commands may be running concurrently.

command1 ; command2

The command1 and command2 are executed sequentially.

command1 && command2

The command1 is executed. If successful, command2 is also executed sequentially. Return success if both command1 and command2 are successful.

command1 || command2

The command1 is executed. If not successful, command2 is also executed sequentially. Return success if command1 or command2 are successful.

command > foo

Redirect standard output of command to a file foo. (overwrite)

command 2> foo

Redirect standard error of command to a file foo. (overwrite)

command >> foo

Redirect standard output of command to a file foo. (append)

command 2>> foo

Redirect standard error of command to a file foo. (append)

command > foo 2>&1

Redirect both standard output and standard error of command to a file foo.

command < foo

Redirect standard input of command to a file foo.

command << delimiter

Redirect standard input of command to the following lines until delimiter is met. (Here documents)

command <<- delimiter

Redirect standard input of command to the following lines until delimiter is met. The leading tab characters are stripped from input lines. (Here documents)

The Debian system is a multi-tasking system. Background jobs allow users to run multiple programs in a single shell. The management of the background process involves the shell built-ins: jobs, fg, bg, and kill. Please read the sections of the bash(1) manpage under "SIGNALS", and "JOB CONTROL", and the builtins(1) manpage.

Let's try simple examples of the redirect:

$ </etc/motd pager

$ pager </etc/motd

$ pager /etc/motd

$ cat /etc/motd | pager

Although all 4 syntaxes display the same thing, the last example runs extra cat command and wastes resources with no reason.

The shell allows you to open files using the exec built-in with an arbitrary file descriptor.

$ echo Hello >foo
$ exec 3<foo 4>bar  # open files
$ cat <&3 >&4       # redirect stdin to 3, stdout to 4
$ exec 3<&- 4>&-    # close files
$ cat bar
Hello

Here, "n<&-" and "n>&-" mean to close the file descriptor "n".

The file descriptor 0-2 are predefined:

The predefined file descriptors.

device

description

file descriptor

stdin

standard input

0

stdout

standard output

1

stderr

standard error

2

Command alias

You can set an alias for the frequently used command. For example:

$ alias la='ls -la'

Now, la works as a short hand for "ls -la" which lists all files in the long listing format.

You can identity exact path or identity of the command using type command. For example:

$ type ls
ls is hashed (/bin/ls)
$ type la
la is aliased to {{{ls -la

$ type echo echo is a shell builtin $ type file file is /usr/bin/file }}} Here ls was recently searched while file was not, thus ls is "hashed", i.e., the shell has an internal record for the quick access to the location of the ls command.

Unix-like text processing

In unix-like work environment, text processing is done by piping text through chains of standard text processing tools.

Unix text tools

There are few standard text processing tools which are used very often on the Unix-like system.

  • No regular expression is used:
    • cat(1) concatenates files and outputs the whole content.

    • tac(1) concatenates files and outputs in reverse.

    • cut(1) selects parts of lines and outputs.

    • head(1) outputs the first part of files.

    • tail(1) outputs the last part of files.

    • sort(1) sorts lines of text files.

    • uniq(1) removes duplicate lines from a sorted file.

    • tr(1) translates or deletes characters.

    • diff(1) compares files line by line.

  • Basic regular expression (BRE) is used:

    • grep(1) matches text with the pattern.

    • ed(1) is a primitive line editor.

    • sed(1) is a stream editor.

    • vim(1) is a screen editor.

    • emacs(1) is a screen editor. (somewhat extended BRE)

  • Extended regular expression (ERE) is used:

    • egrep(1) matches text with pattern.

    • awk(1) does simple text processing.

    • tcl does every conceivable text processing. re_syntax(3).

    • perl(1) does every conceivable text processing. perlre(1).

    • python with re module does every conceivable text processing. See /usr/share/doc/python/html/index.html .

If you are not sure what exactly these commands do, please use "man command" to figure it out by yourself.

Regular expressions

[http://en.wikipedia.org/wiki/Regular_expression Regular expressions] are used in many text processing tools. They are analogous to the shell globs, but they are both more complicated and more powerful.

The regular expression describes the matching pattern and is made up of text characters and metacharacters.

The metacharacter is just a character with a special meaning. There are 2 major styles, BRE and ERE, depending on the text tools as described above.

The metacharacters for BRE and ERE.

BRE

ERE

The meaning of the regular expression

 \ .  [ ] ^ $ *  

 \ .  [ ] ^ $ * 

common metacharacters

 \+ \? \( \) \{ \} \| 

BRE only "\" quoted metacharacters

 + ? ( ) { } | 

ERE only non-"\" quoted metacharacters

c

c

This matches the non-metacharacter "c".

\c

\c

This sequence matches the literal character "c" even if "c" is metacharacter by itself.

.

.

This matches any character including newline.

^

^

This matches the beginning of a string.

$

$

This matches the end of a string.

\<

\<

This matches the beginning of a word.

\>

\>

This matches the end of a word.

\[abc...\]

[abc...]

This character list matches any of the characters "abc...".

\[^abc...\]

[^abc...]

This negated character list matches any of the characters except "abc...".

r*

r*

This matches zero or more regular expressions identified by "r".

r\+

r+

This matches one or more regular expressions identified by "r".

r\?

r?

This matches zero or one regular expressions identified by "r".

r1\|r2

r1|r2

This matches one of the regular expressions identified by "r1" or "r2".

\(r1\|r2\)

(r1|r2)

This matches one of the regular expressions identified by "r1" or "r2" and treats it as a bracketed regular expression.

The regular expression of emacs is basically BRE but has been extended to treat "+"and "?" as the metacharacters as in ERE. Thus, there are no needs to quote them with "\" in the regular expression of emacs.

For example, grep can be used to perform the text search using the regular expression:

$ egrep 'GNU.*LICENSE|Yoyodyne' /usr/share/common-licenses/GPL
GNU GENERAL PUBLIC LICENSE
GNU GENERAL PUBLIC LICENSE
Yoyodyne, Inc., hereby disclaims all copyright interest in the program

Replacement expressions

For the replacement expression, following characters have special meanings:

The replacement expression.

character

meaning

&

This represents what the regular expression matched. (use \\& in emacs)

\n

This represents what the n-th _bracketed_ regular expression matched. ("n" being number)

For Perl replacement string, $n is used instead of \\n}} and {{{& has no special meaning.

For example:

$ echo zzz1abc2efg3hij4 | \
sed -e 's/\(1[a-z]*\)[0-9]*\(.*\)$/=&=/'
zzz=1abc2efg3hij4=
$ echo zzz1abc2efg3hij4 | \
sed -e 's/\(1[a-z]*\)[0-9]*\(.*\)$/\2===\1/'
zzzefg3hij4===1abc
$ echo zzz1abc2efg3hij4 | \
perl -pe 's/(1[a-z]*)[0-9]*(.*)$/$2===$1/'
zzzefg3hij4===1abc
$ echo zzz1abc2efg3hij4 | \
perl -pe 's/(1[a-z]*)[0-9]*(.*)$/=&=/'
zzz=&=

Here please pay extra attention to the style of the bracketed regular expression and how the matched strings are used in the text replacement process on different tools.

These regular expressions can be used for the cursor movements and the text replacement actions in the editors too.

The back slash "\" at the end of line in the shell commandline escapes newline as a white space character and continues shell command line input to the next line.

Please read all the related manual pages to learn these commands.

Extract data from text file table

Let's consider a text file called DPL in which all previous Debian project leader's names and their initiation days are listed in a space-separated format.

Ian     Murdock   August  1993
Bruce   Perens    April   1996
Ian     Jackson   January 1998
Wichert Akkerman  January 1999
Ben     Collins   April   2001
Bdale   Garbee    April   2002
Martin  Michlmayr March   2003

Awk is frequently used to extract data from these types of files.

$ awk '{ print $3 }' <DPL                   # month started
August
April
January
January
April
April
March
$ awk '($1=="Ian") { print }' <DPL          # DPL called Ian
Ian     Murdock   August  1993
Ian     Jackson   January 1998
$ awk '($2=="Perens") { print $3,$4 }' <DPL # When Perens started
April 1996

Shells such as Bash can be also used to parse this kind of file:

$ while read first last month year; do
    echo $month
  done <DPL
... same output as the first Awk example

Here, read built-in command uses the characters in $IFS (internal field separators) to split lines into words.

If you change IFS to ":", you can parse /etc/passwd with shell nicely:

$ oldIFS="$IFS"   # save old value
$ IFS=":"
$ while read user password uid gid rest_of_line; do
    if [ "$user" = "osamu" ]; then
      echo "$user's ID is $uid"
    fi
  done < /etc/passwd
osamu's ID is 1000
$ IFS="$oldIFS"   # restore old value

(If Awk is used to do the equivalent, use "FS=":"" to set the field separator.)

IFS is also used by the shell to split results of parameter expansion, command substitution, and arithmetic expansion. These do not occur within double or single quoted words. The default value of IFS is <space>, <tab>, and <newline> combined.

Be careful about using this shell IFS tricks. Strange things may happen, when shell interprets some parts of the script as its input.

$ IFS=":,"                        # use ":" and "," as IFS
$ echo IFS=$IFS,   IFS="$IFS"     # echo is a Bash built-in
IFS=  , IFS=:,
$ date -R                         # just a command output
Sat, 23 Aug 2003 08:30:15 +0200
$ echo $(date -R)                 # sub shell --> input to main shell
Sat  23 Aug 2003 08 30 36 +0200
$ unset IFS                       # reset IFS to the default
$ echo $(date -R)
Sat, 23 Aug 2003 08:30:50 +0200

Script snippets for piping commands

The following scripts will do nice things as a part of a pipe.

The script snippets for piping commands.

script snippet (type in one line)

effect

find /usr | egrep -v "/usr/var|/usr/tmp|/usr/local"

find all files in /usr excluding some files

xargs -n 1 <command>

run command for all items from stdin

xargs -n 1 echo |

split white-space-separated items into lines

xargs echo |

merge all lines into a line

grep -e <regex_pattern>|

extract lines containing <regex_pattern>

cut -d: -f3 -|

extract third field separated by : (passwd file etc.)

awk '{ print $3 }' |

extract third field separated by whitespaces

awk -F'\t' '{ print $3 }' |

extract third field separated by tab

col -bx |

remove backspace and expand tabs to spaces

expand -|

expand tabs

sort|uniq|

sort and remove duplicates

tr 'A-Z' 'a-z'|

convert uppercase to lowercase

tr -d '\n'|

concatenate lines into one line

tr -d '\r'|

remove CR

sed 's/^/# /'|

make each line a comment

sed 's/\.ext//g'|

remove .ext

sed -n -e 2p|

print the second line

head -n 2 -|

print the first 2 lines

tail -n 2 -|

print the last 2 lines

seq 1 100 |

print 1 to 100

Perl one liner for the regular-expression substitution

The following execution of perl(1) one liner command will replace all instances of FROM_REGEX with TO_TEXT in all of the files <target_file> ...:

$ perl -i -p -e 's/FROM_REGEX/TO_TEXT/g;' <target_file> ...

"-i" is for "in-place editing", "-p" is for "implicit loop over <target_file> ...". If the substitution is complex, you can make recovery from errors easier by using the parameter "-i.bak" instead of "-i"; this will keep each original file, adding ".bak" as a file extension.

(!) Although this is somewhat waste of the resource, this is used frequently to change file contents across the whole directly with minimal typing.

You can do the similar with ed(1) command too.

$ ed <target_file> <<EOF
,s/FROM_REGEX/TO_TEXT/g
w
q
EOF

Here, the ed commands are practically the same command as the vi command-mode command.

The comparison of ed vs perl for in-place editing.

command

type

argument

regex

script

ed

lighter and faster

works on one file

BRE

read from stdin

perl

heavier and slower

works on multiple files

ERE

can be as a part of the argument