Using Devfs
(May 2000)

by alessandro rubini

Reprinted with permission of Linux Magazine

As everybody knows, the role of the kernel is mostly related to hardware control, and user-space program need an entry points to talk with each hardware device. While every Unix offers a /dev directory where such entry points are collected, there exist several different ways to lay out /dev, and each has its own advantages and disadvantages.

Simple systems, like Linux-1.0 and maybe Linux-2.0, are best served by an on-disk /dev directory and 8-bit-wide major and minor numbers. But as the number of supported devices grows and new entry points to low-level information are conceived, the old and easy lay out may not fit any more. That's why version 2.3.46 finally introduced devfs support in the official kernel tree. The facility is marked as experimental, and its use is expected to remain optional, as some environments (embedded systems are the most notable ones) may still prefer to use the old approach.

In this article I'm only going to give a brief introduction to the tool, as there is plenty of documentation about setting it up (a good reading for instance is Documentation/filesystems/devfs/README). I'll rather show how device programmers can write code that fits in the devfs environment. The discussion and the sample code is based on version 2.2.14 of the kernel, patched with devfs-patch-v99.11.gz, available from

The sample module is called drums, short for ``Devfs Resources in User Module Sample'', and is available with the Makefile and this article as

Registering an entry point

A device driver that wants to register its entry point within the devfs filesystem should call one of the forms of the devfs_register function. The devfs kernel interface is prototyped in the header file @lt;linux/devfs_fs_kernel.h>. Let's imagine for example we want to register a character device driver, the function to call is:

      devfs_handle_t devfs_register (devfs_handle_t dir,
            const char *name, unsigned int namelen,
            unsigned int flags,
            unsigned int major, unsigned int minor,
            umode_t mode, uid_t uid, gid_t gid,
            void *ops, void *info);

Given the huge list of arguments, the function can register pretty anything and can assign the desired ownership and permissions to the file. The current version of devfs (at time of writing) doesn't allow registration of directories and symbolic links using this function, but there are other functions to create such files.

In a perfectly devfs-ized world, devfs_register would be everything that's needed to create an entry point for a device. However, you may want to allow the superuser to create a non-devfs entry point, using the conventional mknod command. To this aim, you need to register the file operations associated to your major number, by calling devfs_register_chrdev, which takes the same arguments you used to pass to register_chrdev.

Both devfs_register_chrdev and devfs_register_blkdev are simple wrappers arount register_chrdev and register_blkdev. They either call the old-style function or don't do anything, according to whether the command-line option of devfs=only has been passe to the kernel at boot time. If devfs is the only way to access devices, the functions don't do anything, so any device file created outside of devfs will not be associated to any device driver.

With this background, listing 1 shows how drums registers its entry points: a /dev/drums directory and a few files in there. While the real source code has complete error checking and recovery, I'd better avoid print those lines here, as they may be distracting.

      devfs_register_chrdev(DRUMS_MAJOR, "drums", &drums_fops);
      drums_dir = devfs_mk_dir(NULL, "drums", 0, NULL);
      for (i=0; i<DRUMS_NR_DEV; i++) {
	  drums_devs[i] = devfs_register(drums_dir /* parent dir */,
                              drums_strings[i], DRUMS_NAME_LEN,
		              DEVFS_FL_NONE, DRUMS_MAJOR, i/*minor*/, 
			      S_IFCHR | S_IRUGO, 0, 0,
			      &drums_fops, NULL);

Once registered, the devices behave pretty much like any conventional device, and you can even chown and chmod them. The sample drums are not very refined, and if you listen to them they repeat the same note over and over.

borea.root# ls -l /dev/drums
total 0
cr--r--r--   1 root     root      60,   0 Jan  1  1970 bam
cr--r--r--   1 root     root      60,   1 Jan  1  1970 bum
cr--r--r--   1 root     root      60,   2 Jan  1  1970 pam
cr--r--r--   1 root     root      60,   3 Jan  1  1970 pum
cr--r--r--   1 root     root      60,   4 Jan  1  1970 tam
cr--r--r--   1 root     root      60,   5 Jan  1  1970 tum
borea.root# head -2 /dev/drums/bam
borea.root# head -100 /dev/drums/tum | uniq

The implementation of the drums is pretty standard: the minor number of the device being read is used to choose which string to return to user space, and the string being returned is the same drums_strings[i] used in registering the device name.

    int minor = MINOR(inode->i_rdev);

    if (count > DRUMS_TXT_LEN) count = DRUMS_TXT_LEN;
    copy_to_user(buf, drums_strings[minor], count);

Unregistering the devices at unload time is easy, you just need to call devfs_unregister for each entry point you registered. Also, if you called devfs_register_chrdev you should now call devfs_unregister_chrdev. Unregistering is shown in listing 2.

    for (i=0; i<DRUMS_NR_DEV; i++)
    devfs_unregister_chrdev(DRUMS_MAJOR, "drums");

Working without major and minor numbers

If your device is meant to be only available via devfs, you can choose to avoid to deal with major and minor numbers. Actually, when a devfs node is opened, the kernel doesn't need to use the device numbers, as the driver already provided the file_operations structure that must be used to act on that device.

To get automatic device numbers, the only thing that's needed is specifying DEVFS_FL_AUTO_DEVNUM in the flags argument. The major and minor arguments are then unused, and the filesystem will automatically choose a major/minor pair for your device.

What is most interesting in using automatic device numbers is that the driver write can't use the drums_read approach any more (choosing what to do according to the minor number), as the minor number isn't known at compile time.

What comes to rescue is the private_data field that is part of the file structure. Most drivers that use the field internally assign its value at open time based on the minor number being opened, and use it in the other device methods (read, write, etc). With devfs you are allowed to choose your private_data pointer before the device is opened, and the chosen value can be passed to devfs_register as last argument.

The historical role of the device numbers is vanished by devfs: the major number is unneeded because each device declares its operations, and the minor number is not needed because each device declares its private information. The only remaining problems may be in user-space program, which expect the major number to be constant across similar devices, but this applications' behavior doesn't touch to new devices (whose applications has not yet been written), so the problem doesn't really apply.

In the drums module, you'll find tambourine and timpani as examples of automatic assignment of device numbers. The code lines that implement them are shown in listing 3, and their appearence in the system is show by this screenshot.

borea.root# ls -l /devfs/timpani /devfs/tambourine
cr--r--r--   1 root     root     144,   3 Jan  1  1970 /devfs/tambourine
cr--r--r--   1 root     root     144,   4 Jan  1  1970 /devfs/timpani
borea.root# head -1 /devfs/timpani
borea.root# head -1 /devfs/tambourine

    /* init_module: register tambourine and timpani */
    drums_tambourine = devfs_register(NULL, "tambourine", 0,
				      DEVFS_FL_AUTO_DEVNUM, 0, 0,
				      S_IFCHR | S_IRUGO, 0, 0,
				      &drums_fops, (void *)"rattle\n");
    drums_timpani = devfs_register(NULL, "timpani", 0,
				   DEVFS_FL_AUTO_DEVNUM, 0, 0,
				   S_IFCHR | S_IRUGO, 0, 0,
				   &drums_fops, (void *)"boom\n");

    /* this is the read() implementation */
    txt = filp->private_data;
    if (count > strlen(txt)) count = strlen(txt);
    copy_to_user(buf, txt, count);
    *offp += count;
    return count;

The ability to work without using the device numbers is very important, because the Linux device space is not far from exhaustion, due to its sparse nature: a major number is assigned for every important-enough device driver, even though most systems only have a dozen of drivers installed. Being able to make drivers work independent of major assignment, and without resorting to hairy scripts to call mknod at load time if you use a dynamic major number.

More than that

While the sample drums module only shows the basic functionality of devfs, the interface exported by devfs_fs_kernel.h offers much more. The filesystem can host conventional files, symbolic links and everything that can live in a conventional filesystem.

The filesystem is currently marked as experimental, even though The current devfs implementation is pretty stable. The problem with devfs as I write this is that actual use of its features must be somehow standardized, to prevent possible cluttering of the devfs name space. As a matter of fact, kernel developers are still discussing about the suitability of procfs and devfs for all system configuration, in order to find the best and cleanest way to access sytem configuration and resources.

While non-devfs systems use less memory and a tiny conventioanl /dev directory is currently still the best option for small embedded systems, the availability of devfs opens a new range of options for driver developers, and simplifies user's life in adding a new device driver to their system, as now the driver module can do everything is needed to grant user-space access to the hardware.

Alessandro is an independent consultant based in Italy. He writes uninteresting device drivers and uninteresting applications like GNU barcode, which gained him his preferred email address: rubini-at-gnu-dot-org.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved