Programming with sockets

Connectionless servers

While connection-based services are the norm, some services are based on the use of datagram sockets. One, in particular, is the rwho service, which provides users with status information for hosts connected to a local area network. This service, while predicated on the ability to broadcast information to all hosts connected to a particular network, is of interest as an example usage of datagram sockets.

A user on any machine running the rwho server may find out the current status of a machine with the ruptime program. The output generated is illustrated below:

   arpa     up     9:45,       5 users, load      1.15,    1.39,    1.31
   cad      up     2+12:04,    8 users, load      4.67,    5.13,    4.59
   calder   up     10:10,      0 users, load      0.27,    0.15,    0.14
   dali     up     2+06:28,    9 users, load      1.04,    1.20,    1.65
   degas    up     25+09:48,   0 users, load      1.49,    1.43,    1.41
   ear      up     5+00:05,    0 users, load      1.51,    1.54,    1.56
   ernie    down   0:24
   esvax    down   17:04
   oz       down   16:09
   statvax  up     2+15:57,    3 users, load      1.52,    1.81,    1.86

Status information for each host is periodically broadcast by rwho server processes on each machine. The same server process also receives the status information and uses it to update a database. This database is then interpreted to generate the status information for each host. Servers operate autonomously, coupled only by the local network and its broadcast capabilities.

Note that the use of broadcast for such a task is fairly inefficient, as all hosts must process each message, whether or not using an rwho server. Unless such a service is sufficiently universal and is frequently used, the expense of periodic broadcasts outweighs the simplicity.

The rwho server, in a simplified form, is pictured below. It performs two separate tasks. The first is to act as a receiver of status information broadcast by other hosts on the network. This job is carried out in the main loop of the program. Packets received at the rwho port are interrogated to insure they've been sent by another rwho server process, then are time stamped with their arrival time and used to update a file indicating the status of the host. When a host has not been heard from for an extended period of time, the database interpretation routines assume the host is down and report this information on the status reports. This algorithm is prone to error, as a server may be down while a host is up.

   main()
   {
   	...
   	sp = getservbyname("who", "udp");
   	net = getnetbyname("localnet");
   	sin.sin_len = sizeof(sin);
   	sin.sin_addr = inet_makeaddr(net->n_net, INADDR_ANY);
   	sin.sin_port = sp->s_port;
   	...
   	s = socket(AF_INET, SOCK_DGRAM, 0);
   	...
   	on = 1;
   	if (setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on,
   	 sizeof(on)) < 0) {
   		syslog(LOG_ERR, "setsockopt SO_BROADCAST: %m");
   		exit(1);
   	}
   	bind(s, (struct sockaddr *) &sin, sizeof(sin));
   	...
   	signal(SIGALRM, onalrm);
   	onalrm();
   	for (;;) {
   		struct whod wd;
   		int cc, whod, len = sizeof(from);
   
   		cc = recvfrom(s, (char *)&wd, sizeof(struct whod),
   			0, (struct sockaddr *)&from, &len);
   		if (cc <= 0) {
   			if (cc < 0 && errno != EINTR)
   				syslog(LOG_ERR, "rwhod: recv: %m");
   			continue;
   		}
   		if (from.sin_port != sp->s_port) {
   			syslog(LOG_ERR, "rwhod: %d: bad from port",
   				ntohs(from.sin_port));
   			continue;
   		}
   		...
   		if (!verify(wd.wd_hostname)) {
   			syslog(LOG_ERR, "rwhod: bad host name from %x",
   				ntohl(from.sin_addr.s_addr));
   			continue;
   		}
   		(void) sprintf(path, "%s/whod.%s", RWHODIR,
   			wd.wd_hostname);
   		whod = open(path, O_WRONLY|O_CREAT|O_TRUNC, 0666);
   		...
   		(void) time(&wd.wd_recvtime);
   		(void) write(whod, (char *)&wd, cc);
   		(void) close(whod);
   	}
   	exit(0);
   }

The second task performed by the server is to supply information regarding the status of its host. This involves periodically acquiring system status information, packaging it up in a message and broadcasting it on the local network for other rwho servers to hear. The supply function is triggered by a timer and runs off a signal. Locating the system status information is somewhat involved, but uninteresting. Deciding where to transmit the resultant packet is somewhat problematic, however.

Status information must be broadcast on the local network. For networks that do not support the notion of broadcast another scheme must be used to simulate or replace broadcasting. One possibility is to list the known neighbors (based on the status messages received from other rwho servers). This, unfortunately, requires some bootstrapping information, for a server will have no idea what machines are its neighbors until it receives status messages from them. Therefore, if all machines on a net are freshly booted, no machine will have any known neighbors and thus will never receive, or send, any status information. This is the identical problem faced by the routing table management process in propagating routing status information. The standard solution, unsatisfactory as it may be, is to inform one or more servers of known neighbors and request that they always communicate with these neighbors. If each server has at least one neighbor supplied to it, status information may then propagate through a neighbor to hosts that are not (possibly) directly neighbors. If the server is able to support networks that provide a broadcast capability, as well as those that do not, then networks with an arbitrary topology may share status information.

NOTE: Programmers must be concerned about loops, however. If a host is connected to multiple networks, it will receive status information from itself. This can lead to an endless, wasteful, exchange of information.

It is important that software operating in a distributed environment not have any site-dependent information compiled into it. This would require a separate copy of the server at each host and make maintenance a severe headache. The UNIX system attempts to isolate host-specific information from applications by providing system calls that return the necessary information. (An example of such a system call is the gethostname(S) call that returns the host's official name.) The ioctl(S) call allows you to find the collection of networks to which a host is directly connected. Further, a local network broadcasting mechanism has been implemented at the socket level. Combining these two features allows a process to broadcast on any directly connected local network that supports the notion of broadcasting in a site independent manner. This solves the problem of deciding how to propagate status information with rwho, or more generally in broadcasting. Such status information is broadcast to connected networks at the socket level, where the connected networks have been obtained via the appropriate ioctl calls. The specifics of such broadcastings are complex, however, and will be covered in ``Advanced topics''.