MINIX: A CHEAP UNIX CLONE WITH SOURCE CODE Andrew S. Tanenbaum Dept. of Mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands 1. OVERVIEW OF THE MINIX SYSTEM ARCHITECTURE UNIX- is organized as a single executable program that is loaded into memory at system boot time and then run. MINIX is structured in a much more modular way, as a collec- tion of processes that communicate with each other and with user processes by sending and receiving messages. There are separate processes for the memory manager, the file system, for each device driver, and for certain other system func- tions. This structure enforces a better interface between the pieces. The file system cannot, for example, acciden- tally change the memory manager's tables because the file system and memory manager each have their own private address spaces. These system processes are each full-fledged processes, with their own memory allocation, process table entry and state. They can be run, blocked, and send messages, just as the user processes. In fact, the memory manager and file system each run in user space as ordinary processes. The device drivers are all linked together with the kernel into the same binary program, but they communicate with each other and with the other processes by message passing. When the system is compiled, four binary programs are independently created: the kernel (including the driver processes), the memory manager, the file system, and init (which reads /etc/ttys and forks off the login processes). In other words, compiling the system results in four dis- tinct a.out files. When the system is booted, all four of these are read into memory from the boot diskette. It is possible, and in fact, normal, to modify, recom- pile, and relink, say, the file system, without having to relink the other three pieces. This design provides a high degree of modularity by dividing the system up into indepen- dent pieces, each with a well-defined function and interface to the other pieces. The pieces communicate by sending and receiving messages. The various processes are structured in four layers: 4. The user processes (top layer). 3. The server processes (memory manager and file system). _________________________ - UNIX is a trademark of AT&T Bell Laboratories. 2. The device drivers, one process per device. 1. Process and message handling (bottom layer). Let us now briefly summarize the function of each layer. Layer 1 is concerned with doing process management including CPU scheduling and interprocess communication. When a process does a SEND or RECEIVE, it traps to the ker- nel, which then tries to execute the command. If the com- mand cannot be executed (e.g., a process does a RECEIVE and there are no messages waiting for it), the caller is blocked until the command can be executed, at which time the process is reactivated. When an interrupt occurs, layer 1 converts it into a message to the appropriate device driver, which will normally be blocked waiting for it. The decision about which process to run when is also made in layer 1. A prior- ity algorithm is used, giving device drivers higher priority over ordinary user processes, for example. Layer 2 contains the device drivers, one process per major device. These processes are part of the kernel's address space because they must run in kernel mode to access I/O device registers and execute I/O instructions. Although the IBM PC does not have user mode/kernel mode, most other machines do, so this decision has been made with an eye toward the future. To distinguish the processes within the kernel from those in user space, the kernel processes are called tasks. Layer 3 contains only two processes, the memory manager and the file system. They are both structured as servers, with the user processes as clients. When a user process (i.e., a client) wants to execute a system call, it calls, for example, the library procedure read with the file descriptor, buffer, and count. The library procedure builds a message containing the system call number and the parame- ters and sends it to the file system. The client then blocks waiting for a reply. When the file system receives the message, it carries it out and sends back a reply con- taining the number of bytes read or the error code. The library procedure gets the reply and returns the result to the caller in the usual way. The user is completely unaware of what is going on here, making it easy to replace the local file system with a remote one. Layer 4 contains the user programs. When the system comes up, init forks off login processes, which then wait for input. On a successful login, the shell is executed. Processes can fork, resulting in a tree of processes, with init at the root. When CTRL-D is typed to a shell, it exits, and init replaces the shell with another login pro- cess. 2. LAYER 1 - PROCESSES AND MESSAGES The two basic concepts on which MINIX is built are processes and messages. A process is an independently schedulable entity with its own process table entry. A mes- sage is a structure containing the sender's process number, - 2 - a message type field, and a variable part (a union) contain- ing the parameters or reply codes of the message. Message size is fixed, depending on how big the union happens to be on the machine in question. On the IBM PC it is 24 bytes. Three kernel calls are provided: - RECEIVE(source, &message) - SEND(destination, &message) - SENDREC(process, &message) These are the only true system calls (i.e., traps to the kernel). RECEIVE announces the willingness of the caller to accept a message from a specified process, or ANY, if the RECEIVER will accept any message. (From here on, ``pro- cess'' also includes the tasks.) If no message is available, the receiving process is blocked. SEND attempts to transmit a message to the destination process. If the destination process is currently blocked trying to receive from the sender, the kernel copies the message from the sender's buffer to the receiver's buffer, and then marks them both as runnable. If the receiver is not waiting for a message from the sender, the sender is blocked. The SENDREC primitive combines the functions of the other two. It sends a message to the indicated process, and then blocks until a reply has been received. The reply overwrites the original message. User processes use SENDREC to execute system calls by sending messages to the servers and then blocking until the reply arrives. There are two ways to enter the kernel. One way is by the trap resulting from a process' attempt to send or receive a message. The other way is by an interrupt. When an interrupt occurs, the registers and machine state of the currently running process are saved in its process table entry. Then a general interrupt handler is called with the interrupt number as parameter. This procedure builds a mes- sage of type INTERRUPT, copies it to the buffer of the wait- ing task, marks that task as runnable (unblocked), and then calls the scheduler to see who to run next. The scheduler maintains three queues, corresponding to layers 2, 3, and 4, respectively. The driver queue has the highest priority, the server queue has middle priority, and the user queue has lowest priority. The scheduling algo- rithm is simple: find the highest priority queue that has at least one process on it, and run the first process on that queue. In this way, a clock interrupt will cause a process switch if the file system was running, but not if the disk driver was running. If the disk driver was running, the clock task will be put at the end of the highest priority queue, and run when its turn comes. In addition to this rule, once every 100 msec, the clock task checks to see if the current process is a user process that has been running for at least 100 msec. If so, that user is removed from the front of the user queue and put on the back. In effect, compute bound user processes - 3 - are run using a round robin scheduler. Once started, a user process runs until either it blocks trying to send or receive a message, or it has had 100 msec of CPU time. This algorithm is simple, fair, and easy to implement. 3. LAYER 2 - DEVICE DRIVERS Like all versions of UNIX for the IBM PC, MINIX does not use the ROM BIOS for input or output because the BIOS does not support interrupts. Suppose a user forks off a compilation in the background and then calls the editor. If the editor tried to read from the terminal using the BIOS, the compilation (and any other background jobs such as printing) would be stopped dead in their tracks waiting for the the next character to be typed. Such behavior may be acceptable in the MS-DOS world, but it certainly is not in the UNIX world. As a result, MINIX contains a complete set of drivers that duplicate the functions of the BIOS. Like the rest of MINIX, these drivers are written in C, not assembly language. This design has important implications for running MINIX on PC clones. A clone whose hardware is not compati- ble with the PC down to the chip level, but which tries to hide the differences by making the BIOS calls functionally identical to IBM's will not run an unmodified MINIX because MINIX does not use the BIOS. Each device driver is a separate process in MINIX. At present, the drivers include the clock driver, terminal driver, various disk drivers (e.g., RAM disk, floppy disk), and printer driver. Each driver has a main loop consisting of three actions: 1. Wait for an incoming message. 2. Perform the request contained in the message. 3. Send a reply message. Request messages have a standard format, containing the opcode (e.g., READ, WRITE, or IOCTL), the minor device number, the position (e.g., disk block number), the buffer address, the byte count, and the number of the process on whose behalf the work is being done. As an example of where device drivers fit in, consider what happens when a user wants to read from a file. The user sends a message to the file system. If the file system has the needed data in its buffer cache, they are copied back to the user. Otherwise, the file system sends a mes- sage to the disk task requesting that the block be read into a buffer within the file system's address space (in its cache). Users may not send messages to the tasks directly. Only the servers may do this. MINIX supports a RAM disk. In fact, the RAM disk is always used to hold the root device. When the system is booted, after the operating system has been loaded, the user is instructed to insert the root file system diskette. The file system then sees how big it is, allocates the necessary - 4 - memory, and copies the diskette to the RAM disk. Other file systems can then be mounted on the root device. This organization puts important directories such as /bin and /tmp on the fastest device, and also makes it easy to work with either floppy disks or hard disks or a mixture of the two by mounting them on /usr or /user or elsewhere. In any event, the root device is always in the same place. In the standard distribution, the RAM disk is about 240K, most of which is full of parts of the C compiler. In the 256K system, a much smaller RAM disk has to be used, which explains why this version has no C compiler: there is no place to put it. (The /usr diskette is completely full with the other utility programs and one of the design goals was to make the system run on a 256K PC with 1 floppy disk.) Users with an unusual configuration such as 256K and three hard disks are free to juggle things around as they see fit. The terminal driver is compatible with the standard V7 terminal driver. It supports cooked mode, raw mode, and cbreak mode. It also supports several escape sequences, such as cursor positioning and reverse scrolling because the screen editor needs them. The printer driver copies its input to the printer character for character without modification. It does not even convert line feed to carriage return + line feed. This makes it possible to send escape sequences to graphics printers without the driver messing things up. MINIX does not spool output because floppy disk systems rarely have enough spare disk space for the spooling directory. Instead one normally would print a file f by saying lpr