Virtual Network Interfaces

by Alessandro Rubini

In Linux (or Unix) world, most network interfaces, such as eth0 and ppp0, are associated to a physical device that is in charge or transmitting and receiving data packets. However, there are exceptions to this rule, and some logical network interface doesn't feature any physical packet transmission; the most known examples are the shaper and eql interfaces. This article shows how such ``virtual'' interfaces attach to the kernel and to the packet transmission mechanism.


From the kernel's point of view, a network interface is a software object that can process outgoing packets, and the actual transmission mechanism remains hidden inside the interface driver. Even though most interfaces are associated to physical devices (or, for the loopback interface, to a software-only data loop), it is possible to design network interface drivers that rely on other interfaces to perform actual packet transmission. The idea of a ``virtual'' interface can be useful to implement special-purpose processing on data packets while avoiding to hack with the network subsystem of the kernel. To support this discussion with a real-world example, I wrote an insane (INterface SAmple for Network Errors) driver, available as insane.tar.gz. The interface simulates semi-random packet loss or intermittent network failures. The code fragments shown here are part of the insane driver, and have been tested with Linux-2.3.41.

While, the following description is rather terse, the sample code is well-commented and tries to fill the gaps left open by this quick tour of the topic.

How an Interface Plugs in the Kernel

Like other kinds of device drivers, a network interface module connects to the rest of Linux by registering its own data structure within the kernel. The insane driver, for example, registers itself by calling ``register_netdev(&insane_dev);''.

The device structure being registered, insane_dev is a struct net_device object (but Linux 2.3.13 and earlier called it struct device), and it must feature at least two valid fields: the interface name and a pointer to its initialization function:


	static struct net_device insane_dev = {
		name: "insane",
		init: insane_init,
	};

The init callback is meant for internal use by the driver: It usually fills other fields of the data structure with pointers to device methods, the functions that performing real work during the interface life time. When an interface driver is linked into the kernel (instead of being loaded as a module), the first task of the init function is checking whether the interface hardware is there.

As you may imagine, the interface can be removed by calling unregister_netdev(), usually invoked by cleanup_module() (or not invoked at all if the driver is not modularized).

The net_device structure includes, in addition to all the standardized fields, a ``private'' pointer (a void *) that can be used by the driver for its own use. When virtual interfaces are concerned, the private field is the best place to host configuration information; the insane sample interface follows the good practice of allocating its own priv structure at initialization time:


    /* priv is used to host the statistics, and packet dropping policy */
    dev->priv = kmalloc(sizeof(struct insane_private), GFP_USER);
    if (!dev->priv) return -ENOMEM;
    memset(dev->priv, 0, sizeof(struct insane_private));

The allocation is released at interface shutdown (i.e., when the module is removed from the kernel).

Device Methods

A network interface object, like most kernel objects, exports a list of methods so the rest of the kernel can use it. These methods are function pointers located in fields of the object data stricture, here struct net_device.

An interface can be perfectly functional by exporting just a subset of all the methods; the recommended minimum subset includes open, stop (i.e., ``close''), do_ioctl and get_stats. These methods are directly related to system calls invoked by a user program (such as ifconfig). With the exception of ioctl, which needs some detailed discussion, their implementation is pretty trivial, and they turn out to be just a few lines of code.

      int insane_open(struct net_device *dev)
      {
          dev->start = 1;
          MOD_INC_USE_COUNT;
          return 0;
      }
      int insane_close(struct net_device *dev)
      {
          dev->start = 0;
          MOD_DEC_USE_COUNT;
          return 0;
      }
      struct net_device_stats *insane_get_stats(struct net_device *dev)
      {
          return &((struct insane_private *)dev->priv)->priv_stats;
      }

The open method is called when you call ``ifconfig insane up'', and close deals with ``ifconfig insane down''; get_stats returns a pointer to the local statistics structure and is used by ifconfig as well as by the /proc informative files. The driver is responsible of filling the statistic information (although it may choose not to), whose fields are defined in <linux/netdevice.h>).

There are other methods, more related to the low level details of packet transmission, but they fall outside of the scope of this discussion (they are on show in the source package, though). The only interesting low-level method is hard_start_xmit, discussed later.

ioctl

The do_ioctl entry point is the most important one for virtual interfaces. When a user program configures the behavior of the interface, it does its task by invoking the ioctl() system call. This is how shapecfg defines network shaping and how eql_enslave attaches real interfaces to the load-balancing interface eql. Similarly, the insanely application configures the insane behavior on the insane virtual interface.

Unlikely what happens for ``normal'' device drivers (char and block drivers), the implementation of ioctl for interfaces is pretty well-defined: the invoking file descriptor must be a socket, the available commands are only SIOCDEVPRIVATE to SIOCDEVPRIVATE+15, and the infamous ``third argument'' of the system call is always a struct ifreq * pointer, instead of the generic void * pointer. This ``restriction'' in ioctl arguments takes place because socket ioctl commands span several logical layers and several protocols; the predefined values are reserved for device private use, and are unique throughout the protocol stack (note that no other ioctl command will be delivered to the network interface method, so you really cannot choose your own values). Passing a predefined data structure to ioctl doesn't limit the flexibility of interface configuration, as the ifreq structure includes a data field, a caddr_t value that can point to arbitrary configuration information

Based on the information above, the insane interface can be controlled using these commands (defined in "insane.h":

      #define SIOCINSANESETINFO SIOCDEVPRIVATE
      #define SIOCINSANEGETINFO (SIOCDEVPRIVATE+1)

Actual use of the command, within the user-space program insanely turns out to be pretty simple:

      int sock = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
      struct insane_userinfo info; /* configuration data in/out */
      struct ifreq req;

      strcpy(req.ifr_name, "insane");
      req.ifr_data = (caddr_t)&info;

      /* fill info structure... */

      if (ioctl(sock, SIOCINSANESETINFO, &req)<0) {
           /* deal with error */
      }
The kernel-space counterpart of the configuration process is slightly more complex, but only because it must deal with permission checks and copying data.
      struct insane_userinfo info;
      struct insane_userinfo *uptr;


      /* only authorized users can control the interface */
      if (cmd == SIOCINSANESETINFO && !capable(CAP_NET_ADMIN))
          return -EPERM;
    
      /* retrieve the data structure from user space */
      uptr = (struct insane_userinfo *)ifr->ifr_data;
      err = copy_from_user(&info, uptr, sizeof(info));
      if (err) return err;

      /* deal with the information */
      return 0;

Packet Transmission

The most important entry point for a network interface driver is hard_start_xmit, where hard is a shorthand for hardware. The device method gets called whenever a network packet gets routed through the interface. Unlike the methods described above (and like the ones not discussed here), this one is not directly related to any system call or application; rather, it is used by the network subsystem of the Linux kernel according to its own policies.

When virtual interfaces are concerned, no actual hardware transmission takes place in the interface itself. The interface will instead resort to another network interface to perform transmission. Packet passing is implemented in two steps: first (usually at configuration time, within ioctl), the interface must connect to another interface, the one that can transmit packets; then, its own hard_start_xmit must take proper action to pass the packet.

      /* look for the hardware interface */
      slave = __dev_get_by_name(info.name);
      if (!slave) return -ENODEV;
      priv->priv_device = slave;

            /* .... */

      /* update your statistic counters */
      priv->priv_stats.tx_packets++;
      priv->priv_stats.tx_bytes += skb->len;

      /* assign the packet to the hw interface */
      skb->dev = priv->priv_device;

      /* and tell Linux to pass it to its device */
      dev_queue_xmit (skb);

In a perfect world, the virtual interface should also register a notifier callback, so Linux will tell the driver when the physical hardware interface goes away -- if the slave interface is a module, its removal will make insane unhappy. The released insane implementation doesn't register any callback, and making it saner is left as an exercise for the reader.

Packet Reception

When network packets hit an interface board, they generate an interrupt so that the Operating System can handle packet arrival (the only exception is the loopback interface, whose reception mechanism is part of packet transmission).

A virtual interface, on the other hand, has no way to receive interrupts, and thus it cannot receive any network packet. This can be perceived as unfortunate, because it would be nice to attach the same software operations to both directions of data flow. But the mechanics of packet reception don't allow virtual interface to enter the game, and whoever need to intercept incoming packets must use other ways to hook into the packets' path. This kind of functionality goes out of the scope of this discussion and leans very much towards the way netfilter works.

Using Insane

All of this talking may look rather pointless, unless we can see it at work. The insane interface relies on an Ethernet interface for physical transmission, and it be configured to operate in one of three insane modes. It can relay every packet (``pass'' mode), or relay only some percent of packets (``percent'' mode, with an integer parameter), or turn relaying on and off on a repeated timely basis (``time mode -- with two parameters, on-time and off-time, specified as jiffy counts, architecture-dependent time quanta that correspond to 10ms each for the PC platform). Here are three examples of use of insanely:

      # insanely eth0 pass         ; # relay everything to eth0
      # insanely eth0 percent 80   ; # drop 20% (pseudo random)
      # insanely eth0 time 50 100  ; # relay for .5 seconds, drop for 1s

In order to connect insane to the network, you need to assign a ``local'' IP address to the interface (that IP address will be used as ``source address'' and be used by remote hosts to send their replies) and route some packets through it. Current versions of Linux automatically associate a network route to each device, and this routing cannot be removed. Therefore, we can't re-route all of the lan through insane at once, and the following example reroutes a single host, called "morgana", in the routing table of the host "borea".

      borea# insmod insane                 ; # load module
      borea# ifconfig insane borea         ; # give same IP as eth0
      borea# route add morgana dev insane  ; # re-route this host
      borea# ./insanely eth0 percent 60    ; # set dropping rate

Unfortunately, due to a glitch in Linux-2.3.41, you'll also need to disable the packet filters on the Ethernet interface used by insane. The following command worked for me: ``echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter''

With this setup, you can connect to morgana with any protocol you like, and experience a 40% packet loss -- only on transmitted packets, though, unless morgana runs another instance of insane with a similar configuration.

An interesting effect of this transmission path through two interfaces is that you can run tcpdump on both eth0 and insane and see different results. While ``tcpdump -i eth0'' shows the packets being transmitted, ``tcpdump -i insane'' displays every packet sent out by the protocol layers, before any dropping is applied.





You can find the PostScript here:vinter.ps
The figure in this page explains this behavior by showing the path taken by a packet being transmitted through insane.
Alessandro is an independent consultant based in Italy. He writes uninteresting device drivers and uninteresting applications like GNU barcode, which gained him his preferred email address: rubini@gnu.org.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved