Take Command: Init

Take Command: Init
(November 1998)

Reprinted with permission of the Linux Journnal

Init is the driving force that keeps our Linux box alive, and it is the one that can put it to death. This article is meant to summarize why Init is so powerful and how you can instruct it to behave differently from its default behaviour (yes, Init is powerful, but the superuser rules over Init).

by Alessandro Rubini

Which Init?

In Unix parlance, the word ``init'' doesn't identify a specific program, but rather a class of programs. The name ``init'' is generically used to call the first process that is executed at system boot -- actually, the only process that is executed at system boot. When the kernel is done with setting up the computer's hardware, it invokes init and gives up controlling the computer. From now on the kernel only processes system calls, without taking any decisional role in system operation. After the kernel is done mounting the root filesystem, everything is controlled by init.

Currently, there are several choices as far as init is concerned: you can use the now-classic program that comes in the SysVinit package by Miquel van Smoorenburg, or simpleinit by Peter Orbaek (found in the source package of util-linux), or a simple shell script (like the one shown in this article, which has a lot of functionality less than any C-language implementation). If you set up embedded systems you can even just run the target application like it was init. Insane people who dislike multitasking could even port command.com to Linux and run it as the init process, although you won't ever be able to restrict yourself to 640k when running a Linux kernel.

No matter what is the program you choose, it needs to be accessed with a pathname of /sbin/init, /etc/init or /bin/init, because these pathnames are compiled in the kernel. If neither of them can be executed than the system is severely broken, and the kernel will spawn a root shell to allow interactive recovery (i.e., /bin/sh is used as an init process).

To achieve maximum flexibility, kernel developers offered a way to select a different pathname for the init process. The kernel accepts a command line option of init= exactly for that purpose. Kernel options can be passed interactively at boot time, or you can use the append= directive in /etc/lilo.conf. Silo, Milo, Loadlin and other loaders allow specifying kernel options as well.

As you may imagine, the easiest way to get root access to a Linux box is by typing init=/bin/sh to the Lilo prompt. Note that this is not a security hole per se, because the real security hole here is physical access to the console. If you are concerned about the init= option, Lilo can prevent interaction using its own password protection.

The task of init

Ok, so init is a generic naming, and almost anything can be used as init. The question now is what is a real init supposed to do. Being the first (and only) process spawned by the kernel, the task of init consists in spawning every other process in the system. This usually includes the various daemons used in system operation as well as any login session on the text console. Init is also expected to restart some of its child processes as soon as they exit. This typically applies to the login sessions running on the text consoles: as soon as you logout the system should run another ``getty'' to allow starting another session. Init should also collect dead processes and dispose of them. In the Unix abstraction of processes, a process can't be removed from the system table unless its death is reported to its parent (or another ancestor in case its parent doesn't exist anymore). Whenever a process dies, by calling exit or otherwise, it remains around in the state of a zombie process until someone collects it. Init, being the ancestor of any other process, is expected to collect the exit status of any orphaned zombie process -- note that every well-written program should reap its own children, zombies only exist when some program is misbehaving. If init wouldn't collect zombies, lazy programmers could easily consume system resources and hang the system by filling the process table. The last task of Init is handling system shutdown. The init program must stop any process and unmount all the filesystems when the superuser tells that shutdown time has arrived. The shutdown executable, actually, doens't do anything but tell init that anything is over. As we have seen, the task of init is not too hard to implement, and a shell script could well perform most of the required tasks. Note that every decent shell collects its dead children, so this is not a problem with shell scripts. What real init implementations add to the simple shell script approach is a greater control over system activity, and thus a huge benefit in overall flexibility. This article will now proceed by showing different implementations of the init concept, in ascending order of complexity.

Using /bin/sh as a minimal choice

As suggested above, the shell can be used as an init program. Using a bare shell, in the init=/bin/sh way, only opens a root shell in a completely unconfigured system. This section shows how a shell script can perform all of the tasks you need to have a minimal running system. This kind of tiny init can be used in embedded system or similar reduced environments, where you must squeeze out every single byte out of the system. Note that the most radical approach to embedded systems is directly running the target application as the init process; this results in a closed system (no way for the administrator to interact should problems arise), but it sometimes suites the setup. The typical example of non-init-driven Linux system is the installation environment of most modern distributions, where /sbin/init is a symbolic link to the installation program.

    #!/bin/sh

    # avoid typing full pathnames
    export PATH=/usr/bin:/bin:/sbin:/usr/sbin

    # remount root read-write, and mount all
    mount -n -o remount,rw /
    mount -a
    swapon -a

    # system log
    syslogd
    klogd

    # start your lan
    modprobe eth0 2> /dev/null
    ifconfig eth0 192.168.0.1
    route add 192.168.0.0 eth0
    route add default gw 192.168.0.254

    # start lan services
    inetd
    sendmail -bd -q30m

    # Anything else: crond, named, ...

    # And run one getty with a sane path
    export PATH=/usr/bin:/bin
    /sbin/mingetty tty1

   Listing 1

To make a long story short, Listing 1 shows a script that can perform acceptably as init. The script is very short and incomplete; in particular, note that it only runs one getty, which isn't restarted when it terminates. Be careful if you try to use this script, as each Linux distribution chooses its own flavour of getty. Try

grep
getty /etc/inittab

to know what you have and how to call it. The script shown has another misfeature: it doesn't deal with system shutdown. Adding shutdown support, however, is pretty easy; just bring everything down after the interactive shell terminates. Adding the text shown in Listing2 at the end of Listing1 does the trick.

    # killa anything you started
    killall inetd
    killall sendmail
    killall klogd
    killall syslogd

    # kill anything else
    kill -TERM -1
    sleep 2
    kill -KILL -1

    # release the disks
    swapoff -a
    umount -a
    mount -n -o remount,ro /
    echo "The system is halted"
    exit

   Listing 2

Whenever you boot with a plain init=/bin/sh, you should at least remount the root filesystem before you'll be able to do anything; you should also remember to umount -a before pressing ctrl-alt-del, because the shell doesn't intercept the three-finger salute.

Simpleinit, from util-linux

The util-linux package includes a C version of an init program. It's quite featured and can work well for most personal systems, although it doesn't offer the huge amount of configurability offered by the SysVinit package, which is the default on modern distributions. The role of simpleinit (which should be called init to work properly, as suggested above) is very similar to the shell script just shown, with the added capability of managing single-user mode and iterative invocation of console sessions. It also correctly processes shutdown requests. Simpleinit is interesting to look at, and well documented too, so you might just enjoy reading the documentation; I suggest using the source distribution of util-linux to get up to date information. The implementation of simpleinit is actually simple, like its name suggests. The program executes a shell script (/etc/rc) and parses a configuration file to know what processes need to be respawned. The configuration file is called /etc/inittab, like the one used by the full-featured init; note however that its format is different. If you plan to install simpleinit in your system (which most likely already includes SysVinit) you must proceed with great care, and be prepared to reboot with a kernel argument of ``init=/bin/sh'' to recover from instable situations.

The Real Thing: SysVinit

Most Linux distributions come with the version of init written by Miquel van Smoorenburg, this version is similar the approach taken by System-V (five) Unix. The main idea here is that the user of a computer system can wish to operate his box in one of several different ways (not just single-user and multi-user). Although this feature is not usually exploited, it's not so crazy as you might imagine. When the computer is shared by two or more people in the family, different setups can be needed; a network server and a standalone playstation can happily coexist in the same computer as different runlevels. And although I'm the only user of my laptop, I sometimes want a network server (through PLIP) and sometimes a netless environment, to save resources when I'm working on the train. Each operating mode is called ``runlevel'', and you can choose the runlevel to use either at boot or at runtime. The main configuration file for init is called /etc/inittab, which defines what to do at boot, when entering a runlevel or when switching from one runlevel to another. It also tells how handle the three-finger salute and how to deal with power fails, although you'll need a power-daemon and an UPS to benefit from this feature. The inittab file is organized by lines, where each line is made up of several colon-separated fields: ``id:runlevel:action:command'' The inittab(5) man page is well written and comprehensive like a man page should be, but I feel worth repeating here one of its examples: a stripped-down /etc/inittab that implements the same features and misfeatures of the shell script shown above:

    id:1:initdefault:
    rc::bootwait:/etc/rc
    1:1:respawn:/sbin/getty 9600 tty1

This simple inittab tells init that the default runlevel is ``1'', that at system boot it must execute /etc/rc waiting for its completion, and that when in runlevel 1 it must respawn forever the command ``/sbin/getty 9600 tty1''. As you may suspect, you're not expected to test this out, because it doesn't handle the shutdown procedure. Before proceeding further, however, I must fill a pair of gaps I left behinf. Let'd reply to the questions you keep asking:

``How can I boot into a different runlevel than the default?'' That's easy, just add the runlevel on the kernel command line; for example tell ``Linux 2'' to your Lilo prompt, if ``Linux'' is the name of your kernel.
``How can I switch from a runlevel to another one?'' That's easy, either; as root call ``telinit 5'' to tell the init process to switch to runlevel 5. Different numbers are different runlevels.

Configuring Init

Naturally, the typical /etc/inittab file is much more featured than the three-liner shown above. Although ``bootwait'' and ``respawn'' are the most important actions, in order to deal with several issues related to system management several other actions exist, but I won't detail them here. Note that SysVinit can deal with ctrl-alt-del whereas the versions of init shown earlier didn't catch the three-finger salute (i.e., the machine would reboot if you press the key sequence). Who is interested in how this is done can check sys_reboot in /usr/src/linux/kernel/sys.c (if you look in the code you'll note the use of a magic number of 672274793: can you imagine why Linus chose this very number? I think I know what it is, but you'll enjoy finding it by yourself). So, let's see how a fairly complete /etc/inittab can take care of everything that's needed to handle the needs of a system's lifetime, including different runlevels. Although the magic of the game is always on show in /etc/inittab, you can choose between several different approaches to system configuration, the simplest being the three-liner shown above. In my opinion, two approaches are worth discussing in some detail: I'll call them ``the Slackware way'' and ``the Debian way'' from two renown Linux distributions that chose to follow them.

The Slackware way

Although it's quite some time I don't install Slackware, the documentation included in SysVinit-2.74 tells that it still works the same, less featured but much faster than the Debian way described later. My personal 486 box runs a Slackware-like /etc/inittab just for the speed benefit. The core of an /etc/inittab as used by a Slackare system is shown in Listing 3.

    # Default runlevel.
    id:5:initdefault:

    # System initialization (runs when system boots).
    si:S:sysinit:/etc/rc.d/rc.S

    # Script to run when going single user (runlevel 1).
    su:1S:wait:/etc/rc.d/rc.K

    # Script to run when going multi user.
    rc:2345:wait:/etc/rc.d/rc.M

    # What to do at the "Three Finger Salute".
    ca::ctrlaltdel:/sbin/shutdown -t5 -rf now

    # Runlevel 0 halts the system.
    l0:0:wait:/etc/rc.d/rc.0

    # Runlevel 6 reboots the system.
    l6:6:wait:/etc/rc.d/rc.6

    # Runlevel 1,2,3,5 have text login
    c1:1235:respawn:/sbin/agetty 38400 tty1 linux

    # Runlevel 4 is X only
    x1:4:wait:/etc/rc.d/rc.4
    # But run a getty on /dev/tty4 just in case...
    c4:4:respawn:/sbin/agetty 38400 tty1 linux

   Listing 3

You should not rightahead that the runlevels 0, 1 and 6 have a predefined meaning. This is hardwired into the init command (or better, into the shutdown command, part of the same package). Whenever you want to halt or reboot the system, init is told to switch to runlevel 0 or 6, thus executing /etc/rc.d/rc.0 or /etc/rc.d/rc.6. This works flawlessly because whenever init switches to a different runlevel it stops respawning any task that is not defined for the new runlevel; actually, it even kills the running copy of the task (in this case, the active /sbin/agetty). Configuring this setup is pretty simple, as the role of the different files is pretty clear:

/etc/rc.d/rc.S is run at system boot independenty of the runlevel. Add here anything you want to execute right ahead.
/etc/rc.d/rc.M is run after rc.S is over, only when the system is going to runlevels 2-5. If you boot in runlevel 1 (single user) this is not executed. Add here anything you only run when multiuser.
/etc/rc.d/rc.K deals with killing processes when going from multi-user to single-user. If you added anything in rc.M you'll probably want to stop it from rc.K.
/etc/rc.d/rc.0 and /etc/rc.d/rc.6 shutdown and reboot the computer, respectively.
/etc/rc.d/rc.4 is only executed when runlevel 4 is entered. The file runs the ``xdm'' process, to allow graphic login. Note that no getty is run on /dev/tty1 when in runlevel 4 (but you can change this if you want).

This kind of setup is easy to understand, and you can differentiate between runlevels 2, 3 and 5 by adding proper ``wait'' (execute once waiting for termination) and ``respawn'' (execute forever) entries. By the way, if you ever guessed what ``rc'' means, it's the short form for ``run command''. I have been editing my ``.cshrc'' and ``.twmrc'' for years before being told what this arcane ``rc'' suffix is -- there's something in the Unix world that is only handed on by oral tradition. I hope I'm now saving someone from years of unneeded darkness -- and I hope I won't be punished for writing it down.

The Debian way

Although simple, the Slackware way to setup /etc/inittab doesn't scale well when you add new software packages to the system. Let's imagine, for example, that someone distribute an ssh package (not unlikely, as ssh can't be distributed in official disks due to the insane US rules about crypto). The program sshd is a standalone server that must be invoked at system boot; this means that the package should patch /etc/rc.d/rc.M or one of the scripts it invokes to add ssh support. This is clearly a problem in a world where packages are typically archives of files; add to this that you can't assume that rc.local is always unchanged from the stock distribution, so even a post-install script that patches the file will miserably fail most of the times. You should also consider that adding a new server program is only part of the job; the server must also be stopped in rc.K, rc.0 and rc.6. As you see, things are getting pretty tricky. The solution to this problem is both clean and elaborate. The idea is that each package that includes a server must provide the system with a script to start and stop the service; each runlevel than will start or stop the services that are associated to that very runlevel. Associating a service and a runlevel can be as easy as creating files in a runlevel-specific directory. This setup is common to Debian and Red Hat, and possibly other distributions that I never ran. The core of the /etc/inittab used by Debian-1.3 is shown in Listing 4.

    # The default runlevel.
    id:2:initdefault:

    # This is run first
    si::sysinit:/etc/init.d/boot

    # What to do in single-user mode.
    ~~:S:wait:/sbin/sulogin

    # Enter each runlevel
    l0:0:wait:/etc/init.d/rc 0
    l1:1:wait:/etc/init.d/rc 1
    l2:2:wait:/etc/init.d/rc 2
    l3:3:wait:/etc/init.d/rc 3
    l4:4:wait:/etc/init.d/rc 4
    l5:5:wait:/etc/init.d/rc 5
    l6:6:wait:/etc/init.d/rc 6

    # getty
    1:2345:respawn:/sbin/getty 38400 tty1

   Listing 3

The Red Hat setup featuring exactly he same structure for system initialization but uses different pathnames; you'll be able to map one structure over the other. Let's list the role of the different files:

/etc/init.d/boot Is the exact counterpart of rc.S. It typically checks local filesystems and mounts them, but the real thing is much more featured than that.
/sbin/sulogin Allows root to log in a single-user workstation. Only shown in lising 4 because single-user mode is so important for system maintainance.
/etc/init.d/rc Is a script that runs any start/stop script that belongs to the runlevel being entered.

The last item, the ``rc'' program, is the main character of this environment: it's task consists in scanning the directory /etc/rc$runlevel.d invoking any script that appears in the directory. A stripped down version of ``rc'' would look like the following:

    #!/bin/sh
    level=$1
    cd /etc/rc.d/rc$level.d
    for i in K*; do
    	$i stop
    done
    for i in S*; do
	$i start
    done

What does it mean? It means that /etc/rc2.d (for example) includes files called K* and S*; the former identify services that must be stopped, and the latter identify services that must be started. Ok, but I didn't tell whence do K* and S* come from. This is the smart part of it all: every software package that needs to be run for some runlevel adds itself to all the /etc/rc?.d directories, either as a ``start'' entry or as a ``kill'' (stop) entry. To avoid code duplication, the package installs a script in /etc/init.d and several symbolic links from the various /etc/rc?.d. To show a real-life example, lets's see what is included in two ``rc'' directories of debian:

    rc1.d:
    K11cron         K20sendmail
    K12kerneld      K25netstd_nfs
    K15netstd_init  K30netstd_misc
    K18netbase      K89atd
    K20gpm          K90sysklogd
    K20lpd          S20single
    K20ppp

    rc2.d:
    S10sysklogd     S20sendmail
    S12kerneld      S25netstd_nfs
    S15netstd_init  S30netstd_misc
    S18netbase      S89atd
    S20gpm          S89cron
    S20lpd          S99rmnologin
    S20ppp

This shows how entering runlevel 1 (single-user) kills all the services and start a ``single'' script; entering runlevel 2 (the default level) starts all the services. The number that appears near the K or the S is used to order the birth of death of the various services, as the shell expands wildcards appearing in /etc/init.d/rc in ascii order. Inovking an ls -l command confirms that all of these files are symlinks, like the following:

    rc2.d/S10sysklogd -> ../init.d/sysklogd
    rc1.d/K90sysklogd -> ../init.d/sysklogd

To summarize, adding a new software package in this environment means adding a file in /etc/init.d and the proper symbolic link from each of the /etc/rc?.d directories. To make different runlevels behave differently (2, 3, 4 and 5 are configured in the same way by default), just remove or add symlinks in the proper /etc/rc?.d directories. If this scares you as too difficult, not all is lost. If you use Red Hat (or Slackware), you can think of /etc/rc.d/rc.local like it was autoexec.bat -- if you are old enough to remember about the pre-Linux age. If you run Debian, you could create /etc/rc2.d/S95local and use it as your own rc.local; note however that Debian is very clean about system setup and I would have better not cast in print such a heresy. You know, powerful and trivial seldom match; you have been warned.

Debian-2.0 (hamm)

As I write this article, Debian 2.0 is being released to the public, and I suspect it will be of wide use when you read it. Although the structure of system initialization is the same, it's interesting to note that the developers managed to make it faster. Instead of running the files in /etc/rc2.d, the script /etc/init.d/rc can now run them in the same process, without spawning another shell; whether to execute them or source them is controlled by the filename: executables whoe name ends in .sh are sourced, the other ones are executed. The trick is shown in the following few lines, and the speed benefit is non-negligible:

    case ``$i'' in
        *.sh)
            # Source shell script for speed.
            (
            trap - INT QUIT TSTP
            set start; . $i
            ) ;;
        *)
            # No sh extension, so fork subprocess.
            $i start ;;
   esac

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.