The Design of kHTTPd

by Alessandro Rubini

Based on last month's article about invoking system calls from within kernel code, this month's column shows how a complete network server can be implemented as a kernel thread. The sample code shown implements the skeleton of a simplified tftp server.

While normally you shouldn't bring user-space processes to kernel space, there are times when this may be a good choice, either for performance or for size. The former reason is what led to kHTTPd, the latter may be relevant to avoid libc altogether in small embedded systems devoted to a single task.

The discussion in this column is based in the kernel-based web server released within version 2.4.0-test9 of the kernel, in the net/khttpd directory. This is the kHTTPd program by Arjan van de Ven, not the Tux reimplementation by Ingo Molnar. The latter is both more performant and more complex, and can be found at http://people.redhat.com/mingo/TUX-patches/.

Features of the Sample Code

Code excerpts included in this column are part of a ktftpd skeletal module, available from http://ar.linux.it/docs/khttpd/ktftpd.tar.gz. The kernel-space daemon loosely mimics what kHTTPd implements, but with simplicity as a primary target. The choice to use UDP instead of TCP is partly for simplicity and partly to avoid replicating too much of kHTTPd. Serving a file through TCP can be performed using do_generic_file_read, (``sendfile''), but this is yet another can of worms.

Our tftp daemon shall be able to serve world readable files from the /tftp file tree, where /tftp could also be a symbolic link to another directory. Even though the tftp protocol supports data transfers in both directions, the sample daemon will refuse to write to its own filesystem. Finally, it won't implement any packet retransmission (contrary to what RFC 783 requires).

The daemon won't keep real logs. It'll merely print a little information using conventional printk calls.

The structure of the skeletal module has been thought out to be thouroughly understood in a reasonable time, but has not been completely implemented. In my opinion, reading this column, ktftpd, kHTTPd, and the Tux patch (in that order) makes a good learning path.

Kernel Threads

The first step a programmer must take to run a server in kernel space is forking a process. To create a new thread, you must call the function kernel_thread. Since this is usually performed by the initialization function of a kernel module, the programmer must also ensure to detach the thread from the context of the insmod or modprobe command that was executing the initialization function.

Listing 1 shows how module initialization forks a new thread and how the thread detaches itself from the user process that forked it.

    /*
     * in init_module(): fork the main thread
     */
    kernel_thread(ktftpd_main, NULL /* no arg */,
                  0 /* no clone flags */);

    /*
     * in ktftpd_main(): detach from the original process
     */
    sprintf(current->comm,"ktftpd-main"); /* comm is 16 bytes */
    lock_kernel();   /* This seems to be required for exit_mm */
    exit_mm(current);
    /* close open files too (stdin/out/err are open) */
    exit_files(current);

In order to handle several clients at the same time, a daemon usually forks several copies of itself, each of them in charge of a single connection. This is accomplished by calling kernel_thread again from within the main loop each time a new connection is accepted. This time, however, there is no need to do special processing at the beginning of the thread, since there is no user process to detach from.

You shouldn't be shy of forking copies of your kernel daemon, as the resources consumed by each of them are almost negligible when compared to the cost associated to forking a user-space server. A kernel thread requires no memory-management overhead: it only consumes a pair memory pages for the stack and a few data structures that are replicated for each thread.

To count the number of running threads, an atomic_t data item is used. I called it DaemonCount, the same name used by kHTTPd.

Just before unloading the module you'll need to stop all the threads, as the code their are executing is bound to disappear. There are several ways to accomplish the task. The kHTTPd server uses a sysctl entry point (/proc/sys/net/khttpd/stop), so the user can tell kernel code to stop server activity before unloading the module (and each thread increments the module's usage count, to prevent users from unloading it before the threads are all stopped.

Sysctl is an interesting feature, but would increase the complexity of the sample ktftpd module. Who's interested in sysctl could refer to http://www.linux.it/~rubini/docs/sysctl/sysctl.html for more information. To keep code shorter and simpler I chose a different approach: the individual thread doesn't add to the usage count for the module, and the cleanup function sets a global flag and then waits for all the threads to terminate.

Listing 2 shows the code that deals with thread termination.

   int ktftpd_shutdown = 0; /* set at unload time */
   DECLARE_WAIT_QUEUE_HEAD(ktftpd_wait_threads);

    /*
     * In the code of each thread, the main loop depends
     * on the value of ktftpd_shutdown
     */
    while (!signal_pending(current) && !ktftpd_shutdown) {
        /* .... */
    }

    /*
     * The following code is part of the cleanup function
     */
    /* tell all threads to quit */
    ktftpd_shutdown = 1;
    /* kill the one listening (it would take too much time to exit) */
    kill_proc(DaemonPid, SIGTERM, 1);
    /* and wait for them to terminate (no signals accepted) */
    wait_event(ktftpd_wait_threads, !atomic_read(&DaemonCount));

Additionally, the user is allowed to terminate each thread by sending it a signal (as you may have imagined by looking at the condition around the main loop above). Trivially, when a signal is pending the thread exits. This behavior is the same signal handling implemented in the kernel web server, and boils down to the few lines of code shown in listing 3. The instructions shown are part of the initialization code of the main thread. Other threads are created with the CLONE_SIGHAND flag, so sending a signal to any of them will kill them all.

    /* Block all signals except SIGKILL, SIGTERM */
    spin_lock_irq(¤t->sigmask_lock);
    siginitsetinv(¤t->blocked, sigmask(SIGKILL) | sigmask(SIGTERM));
    recalc_sigpending(current);
    spin_unlock_irq(¤t->sigmask_lock);

Managing Network Connections

The main task of a kHTTPd (and most similar network services) consists in transferring data to/from the network and from/to the local filesystem. While I'm going to describe network access, I won't discuss filesystem access, as system calls like open, read and close are accessible from kernel space and last month we've seen how to invoke them.

As fas as network access is concerned, what a server should generally do reduces to the following few system calls:

    fd = socket();
    bind(fd); listen(fd);
    while (1) {
        newfd = accept(fd);
        if (fork()) {
            close(newfd);
            /* .... */
            exit();
        } else {
            close(fd);
        }

Performiing the same task from kernel space reduces to similar code, with fork replaced by kernel_thread. The main difference lies in that there is no need to use file descriptors because socket structures can be manipulated directly (thus avoiding the need to look up the file descriptor table every time).

The file net/khttpd/main.c as found in the kernel source shows how to handle a TCP session from kernel space. The implementation for UDP is pretty similar and Listing 4 shows how the first part of the task (preparing to receive packets) is implemented in the sample code.

    /* Open and bind a listening socket */
    error = sock_create(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
    if (error < 0) {
	printk(KERN_ERR "ktftpd: can't create socket: errno == %i\n", -error);
	goto out;
    }

    /* Same as setsockopt(SO_REUSE). Actaully not needed for tftpd */
    /* sock->sk->reuse   = 1; ---  needed for multi-thread TCP servers */

    sin.sin_family       = AF_INET;
    sin.sin_addr.s_addr  = INADDR_ANY;
    sin.sin_port         = htons((unsigned short)KTFTPD_PORT);
    error = sock->ops->bind(sock,(struct sockaddr*)&sin,sizeof(sin));
    if (error < 0) {
	printk(KERN_ERR "ktftpd: can't bind UDP port %i\n", KTFTPD_PORT);
	goto out;
    }

#if 0 /* There is no need to listen() for UDP. It would be needed for TCP */
    error = sock->ops->listen(sock,5);  /* "5" is the standard value */
    if (error < 0) {
	printk(KERN_ERR "ktftpd: can't listen()\n");
	goto out;
    }
#endif

Next, a TCP server would sleep in the accept system call. But unless you want to run a separate thread for any active network connection, you'll need to multiplex operation of a single thread across several file descriptors, by calling either select or poll.

A kernel-space server shouldn't resort to select nor to poll, as these calls feature non-negligible overhead. Instead, what kHTTPd does is performing non-blocking calls for all pending operation, counting the successes. If no operation performed was successfull, then the daemon sleeps for at least one timer tick, giving other processes the option to run before polling its file descriptors again.

If the service is based on the UDP protocol, the thread will usually sleep on recvfrom right after bind returns. This is the what the sample server does. It avoids both select and a polling loop similar to the one described above by forking a new thread for each new connection being processed.

Sleeping on a system call invoked from kernel space is not different than sleeping in user space: the system call handles its own wait queue. The difference is, as outlined last month, in the need to use set_fs get_fs if the system call is expected to read or write a ``user'' buffer.

The kHTTPd daemon uses two functions to receive data. One is used to collect all the HTTP headers, it is called DecodeHeader; it uses the MSG_PEEK flag in order not to flush the input queue until all headers are received. The other function is called ReadRest and it uses the MSG_DONTWAIT flag in order not to block when no data is there.

The procedure used by ktftpd to receive a data packet is somehow simpler and uses no socket flags. It will sleep waiting for a packet until one is received or a signal is caught. It is shown in Listing 5. It is not as generic as recvfrom is, as it requires to be using IP addresses.

/*
 * This procedure is used as a replacement for recvfrom(). Actually it is
 * is based on the one in kHTTPd which in turn is based on sys_recvfrom.
 * The iov is passed by the caller since it hosts the peer's address,
 * and the buffer is passed by the calles because it can't be global
 * (all threads share the same address space)
 */
static inline int ktftpd_recvfrom(struct socket *sock,
				  struct sockaddr_in *addr,
				  unsigned char *buf)
{
    struct msghdr msg;
    struct iovec iov;
    int len;
    mm_segment_t oldfs;

    if (sock->sk==NULL) return 0;

    msg.msg_flags = 0;
    msg.msg_name = addr;
    msg.msg_namelen  = sizeof(struct sockaddr_in);
    msg.msg_control = NULL;
    msg.msg_controllen = 0;
    msg.msg_iov	 = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = buf;
    msg.msg_iov->iov_len = PKTSIZE;

    oldfs = get_fs(); set_fs(KERNEL_DS);
    len = sock_recvmsg(sock,&msg,1024,0);
    set_fs(oldfs);
    return len;
}

Sending Packets

The code to transmit packets is similar to the code receiving them. It exploits sock_sendmsg, which mimics sock_recvmsg.

The main difference is in how blocking is managed. The kernel thread that pushes data to a TCP socket should prevent to find itself with partially-written data, as that situation would require extra data management.

In the implementation of kHTTPd (in the file called datasending.c), the program requests how much free space is there in the TCP window associated to the socket, and only tries to push that many bytes. The following lines represent the culprit of the code:

    int ReadSize,Space;
    int retval;

    Space = sock_wspace(sock->sk);
    ReadSize = min(4*4096, FileLength - BytesSent);
    ReadSize = min(ReadSize , Space );

    if (ReadSize>0) {
	oldfs = get_fs(); set_fs(KERNEL_DS);
	retval = filp->f_op->read(filp, buf, ReadSize, &filp->f_pos);
	set_fs(oldfs);
	if (retval>0) {
	    retval = SendBuffer(sock, buf, (size_t)retval);
	    if (retval>0) {
		BytesSent += retval;
	    }
	}
    }

With UDP each packet is sent as an individual item, so there is no need to check sock_wpace. The ktftpd sample daemon reads 512 bytes at a time from the filesystem and builds a packet accordingly to the amount read (512 or less). It then waits for the acknowledgment packet using the same ktftpd_recvfrom function shown above. A TCP server doesn't deal with acknowledgments since reliability of the connection is build in the TCP protocol stack.

Alessandro is an independent consultant based in Italy. He learns programming by reading free software and is usually late with his deadlines. He can be reached as rubini@gnu.org.

Thanks to Davide Ciminaghi <ciminaghi-at-prosa-dot-it> and Andrea Glorioso <andrea.glorioso-at-binary-only-dot-com> for helping revising this article.