2 State of the Art

2.1 Microkernel Technology

2.1.1 L3

L3 is a microkernel based operating system developed by Jochen Liedtke and others at GMD's SET institute. Its key features are persistence, minimality and speed. L3 features synchronous message-based IPC, a simple-to-use external pager mechanism and a domain-based security design. [1] [2]

L3 knows about only a small number of kernel objects: tasks, threads and messages. (task denotes a protection domain provided by the (μ-)kernel, and thread means an activity within a task.)

IPC Architecture

Message Structure in L3.

Message Transfer in L3.

A message is a collection of data that is passed to the kernel in one system call. The kernel is responsible for delivering this message to the intended recipient.

In L3 messages are sent by one thread to another thread without intermediate objects like ports. IPC is synchronous, i.e. the sender thread blocks until the receiver thread has received the message. These design decisions are a good base for enhancements of IPC performance. [3]

A message can contain components of four different types:

direct string: These values are stored in the message buffer. Two DWords (one DWord is 4 Byte long) are mandatory.
indirect strings: Indirect strings are specified in the message buffer by a pointer and the length of this memory area.
flex pages: Flex pages are a mechanism for transferring memory pages between tasks. [4]
data spaces: Data spaces are abstract memory objects. Sending a data space employs lazy copying.

A message dope is a structure that describes how many components of each of these types are to be processed. The buffer dope describes the layout of the provided message buffer and therefore how many items can be received, the send dope describes how many items are to be sent. (The two dopes differ when you send less data than you are prepared to receive.)

Design of the Virtual Memory Subsystem

During preparation of this document, L3's virtual memory (VM) subsystem has been comprehensively documented for the first time. This section will only explain the most important concepts of this subsystem. For a more detailed discussion please refer to the [5] and [6].

Address Spaces

An address space (aka virtual address space) defines the set of valid virtual addresses that any thread executing within the task owning the address space is allowed to reference. An address space is owned by exactly one task, and is created and destroyed along with it.

The semantics associated with a valid region of virtual memory are provided by memory managers which implement abstract memory objects called data spaces (discussed below). When a new memory region is established in an address space, a thread id of a manager providing those semantics needs to be specified.

Data Spaces

A data space is an abstract memory object which can be mapped into the address space of tasks. The semantics associated with a data space are provided by the memory manager that backs it. Data spaces can be of variable sizes up to 4 GB.

From the kernel's perspective, data spaces are uniquely identified by a <user_task, memory_manager, data_space_number> triple, where the data space number is an identifier which is never interpreted by the kernel. It is the manager's job to decide whether the same data space number in two different user tasks refer to the same memory object or not (in the standard memory manager, it does not).

The kernel should be viewed as using main memory as a (directly accessible) cache for the contents of the various data spaces.

Memory Managers

A memory manager (sometimes also called a pager) provides the semantics associated with the act of referencing memory regions which have mapped data spaces backed by it. It provides (or denies) access to pages of data that have been referenced by user tasks.

Every thread can be a memory manager for every other thread. There is no special initialization necessary to become a memory manager, and the kernel provides no special access protection for them (as with all L3 IPC). Once the id of some thread has been specified in a map system call, the kernel starts treating the thread as the memory manager for the region (and send page fault messages to it, which, of course, can be ignored or refusingly replied to by the thread).

Memory managers willing to back a data space are involved in a simple IPC dialogue with the kernel [5]. Basically, the kernel reports page faults of client tasks, and the memory manager responds with a flexpage mapping to resolve the page fault and allow the kernel to establish a temporary page mapping.

Flexpage mapping

Temporary mappings of memory regions from a memory manager to a client task are not limited to single hardware memory pages. Instead, the manager may specify any memory region to be mapped into the client (a flexpage). This is a simple optimization which may result in fewer page faults from the client because several pages can be mapped at once. [4]

Memory object manipulation in L3.

Figure [here] shows the basic memory object manipulation dialogue between the kernel and a memory manager: A client task references a portion of a memory range for which a temporary mapping has not yet been established. The kernel will send a page fault message to the respective memory manager and set up a receive operation on behalf of the client. The memory manager analyzes the request and responds with a flexpage directly to the client.

The kernel is free to forget the flexpage mapping at any time (perhaps when evicting the page from main memory) which may result in making clients send page fault messages requesting previously supplied pages again.

The memory manager may also forcibly remove a flexpage mapping established earlier (with a system call), for instance, when the manager is low on memory and wants to recycle memory regions previously committed as flexpages.

Kernel Resource Management

Except the CPU and the kernel clock, the L3 kernel doesn't manage system resources at all; that's done by of kernel-external tasks (discussed below).

The clock is managed by the kernel for two reasons: First, the system clock can well be thought of as a part of the CPU, and secondly (and more importantly) the kernel requires access to the clock to perform IPC timeouts. [7]

Kernel-external Standard Facilities

L3 comes with a number of kernel-external standard facilities which handle many of the resources conventionally managed by an operating system kernel.

Supervisor

The supervisor is L3's root task. It manages host control (definition of a station's name, and spawning sub-tasks which manage checkpointing, devices and other system services) and controls a system's task tree. The supervisor is the only task which may create new tasks and threads; other tasks have access to these services through a special message-based (IPC) protocol, the Supervisor Protocol.

Device drivers

Device drivers are (from the kernel's point of view) normal user tasks. (Well, almost: As they must not be paged out, they're memory-resident and non-persistent. Consequently, they must be re-started every time the system reboots.)

Device drivers are controlled in a uniform fashion using the Generic Driver Protocol.

Embedded ELAN compiler

Most non-kernel operating system facilities have been implemented in ELAN. The standard ELAN environment contains procedures encapsulating L3 system calls and IPC protocols to manager tasks (for example, the Supervisor Protocol and the Generic Driver Protocol).

Please note that L3 currently doesn't come with any C libraries to access system services. That's why our group has developed several libraries providing C bindings to many system services [5]; these libraries are now being exploited regularly and gladly:

libl3sys: encapsulates L3 system calls and calls to the standard memory manager.
libl3prot: encapsulates L3's standard protocols: the Supervisor Protocol and the Generic Driver Protocol, among others.
libl3serv: contains support generally required by L3 servers: startup code, trap handlers, and global initialization.

2.1.2 Mach

Mach is a microkernel originally developed at CMU. It features an object-oriented design approach, sophisticated memory management and message-based IPC.

Mach is meant as a foundation for operating systems built up on it and provides several kernel abstractions commonly required, its key elements being tasks and threads (similar to L3's), ports, messages and memory objects.

IPC

Messages are sent via unidirectional communication channels, so-called ports. The ability to send messages to ports and to receive messages is stored in port rights. Port rights are kernel-protected capabilities, the kernel enforces these capabilities and controls the transfer of port rights between tasks. Due to this access control properties ports also used to name other kernel resources: tasks, threads, devices, memory objects, among others. [8]

Unlike L3, IPC in Mach is asynchronous. When the send operation cannot deliver the message immediately it stores the message in a message queue and returns. The receive operation then dequeues this message.

Ports

A port is a unidirectional communication channel between tasks. A port has a single receiver and (potentially) multiple senders.

The state associated with a port is:

The message queue.

The reference counter. (It counts the number of send rights that currently exist for this port.)

Limits for the amount of received data.

IPC structure in Mach.

Message transmission in Mach.

Figure [here] shows a port and associated objects. These are described in detail below.

Messages

A message is a collection of typed data.

Available message types are:

pure data (integers, floats, strings, ...)

port rights

memory regions

Message Queues

A message queue is associated with a port and stores all messages that were sent to this port.

Port Rights

A port right is a secure entity that indicates the right to access a specific port.

There are three kinds of port rights:

receive right: A receive right allows the task to receive messages from this port. Only one receive right exists for a port.
send right: A send right allows the task to send messages to this port. The number of send rights per port is not restricted.
send-once rights: A send-once right allows the task to send one message to this port. The send-once right is destroyed after the message is sent.

Port rights are system-wide identifiers for ports. They are stored in kernel-protected port name spaces and can only be manipulated by tasks via port names.

Port Names

A port name is used to name a port right in a task and points to one entry in the port name space where all port rights of one task are stored. Every port name space is protected even against its task.

Port Sets

A port set is a common access point for receiving messages from ports. A receive operation retrieves a message from one of the queues of the member ports unless all of these queues are empty. A port cannot be a member of more than one port set.

Design of the Virtual Memory Subsystem

Mach's virtual memory design is one of the most remarkable aspects of the system. It is layered in a simple hardware dependent portion and a hardware independent portion providing a complicated set of high-level abstractions. It claims to feature high performance, exploiting optimizations like lazy copying and shared memory. [8]

The abstractions provided are:

Virtual Address Spaces

Virtual address spaces in Mach share many aspects with L3's address spaces: They are owned by tasks and are created and destroyed along with them. Ranges of memory within a virtual address space can be associated with abstract memory objects backed by memory managers (discussed below).

Abstract Memory Object

Abstract memory objects can be mapped into the virtual address space of tasks. The semantics associated with an abstract memory object are provided by the memory manager that backs it.

Memory objects are named by abstract memory object representative ports (the port specified to the kernel when mapping a memory object into a client task's virtual address space) and abstract memory object ports (the port the memory manager presents to the kernel when initializing a memory object; see below).

Memory Cache Object

The kernel should be viewed as using main memory as a (directly accessible) cache for the contents of the various abstract memory objects. In order to be able to control this cache's contents (i.e., to map or invalidate memory ranges), the memory manager gets a send right for a memory cache control port from the kernel at memory object initialization.

Memory object manipulation in Mach.

A memory manager is a task holding a send right to a memory cache object port. It is involved in a dialogue with the kernel to maintain the main memory cache of the abstract memory object backed by it. In this dialogue, the kernel sends requests to the memory manager's abstract memory object port, and the memory manager supplies requested data using IPC to the kernel's respective memory cache control port (see figure [here]).

Memory is passed back and forth between the kernel and the memory manager as out-of-line data in usual Mach messages. The kernel may return data it was supplied earlier at any time; when doing so, the returned data becomes backed by the default memory manager (a manager built into Mach, the ``pager of last resort'') until the manager explicitly deallocates it.

Memory object setup dialogue in Mach.

The protocol to initialize an abstract memory object has a special meaning: It is used to authenticate the kernel and the memory manager to each other. For details about the employed authentication scheme, please refer to [8].(1)

During this dialogue (figure [here]), the kernel needs to establish an association between the memory object representative port right presented in the vm_map system call and the memory object's abstract memory object port. That's why the kernel communicates a send right for its memory cache control port to the owner of the specified representative port and believes any task which can present the rights of both the representative port and the cache control port to be the memory manager for said representative port. This dialogue also has the side effect of exchanging the cache control port and abstract memory object port rights between the kernel and the memory manager so that the memory object manipulation dialogue discussed above can begin.

Kernel Resource Management

The Mach kernel virtualizes many physical resources conventionally managed by operating systems: clocks, devices, hosts, nodes, processors and processor sets (see [8] for an explanation of these resource classes). The kernel also provides for resource accounting using ledgers, a special kernel resource providing a mechanism to limit consumption of another resource.(2)

Kernel-external Standard Facilities

Mach comes with extensive C libraries and an RPC interface generator which make interfacing the microkernel easy.

Libmach: The C library Libmach provides C bindings to system calls and IPC to kernel-implemented objects. [9]
Libcthreads: The Libcthreads library is a user-level threads library. It implements C Threads which are not to be confused with kernel threads: C Threads are user-level threads, and Libcthreads manages the mapping from C Threads to kernel threads. It also offers tools to synchronize threads. [10] [11]
MIG: MIG, the Mach Interface Generator, is Mach's RPC stub generator. It generates code for communication between two tasks from an interface description. The interface is described as a set of procedures with input and output parameters. Libmach contains some MIG-specific functions that are required by MIG-generated programs. [10]

2.1.3 Comparison

This section will compare the two microkernels Mach and L3. We will identify features available in both microkernels and features available in one but missing in the other.

Later in this section, we will take a closer look at the virtual memory subsystems of the kernels.

Features Available in Both μ-kernels

Tasks, threads.

Virtual memory subsystem with external pager interface; however, the implementation and kernel interface of the VM subsystems differ a lot between the two kernels; see below.

IPC via messages. However, there is a fundamental difference between the kernels' IPC schemes: L3 implements synchronous IPC between threads while Mach offers asynchronous IPC using ports as communication channels.

Interface to clocks and other physical resources, although implemented very differently.

If one system were to be emulated on the other, a fair (albeit varying from feature to feature) amount of emulation glue would be required.

Features Only Available in Mach

The following fundamental Mach features don't have an equivalent on L3:

Ports.

MIG, the RPC interface generator.

Comparison of the Virtual Memory Subsystems of Mach and L3

External Pagers in Mach and L3. The figure shows the abstractions (objects) involved in the external pager mechanisms, and the various types of references the active elements (tasks and kernel) hold, as explained in sections [here] and [here]

Both microkernels possess virtual memory subsystems which fit well in their respective kernel design philosophies: L3 strikes for minimality, simplicity and speed, while Mach goes for object-orientation, elegance and kernel-enforced reference protection.

Figure [here] compares the elements involved in Mach's and L3's external pager mechanisms. We see that these elements have similarities to their respective counterparts in the other kernel, but their interconnections (and interactions) differ a lot.

There are further, more subtle differences:

In Mach, when resolving page faults, the memory manager passes pages to the kernel in terms of out-of-line data in memory_object_data_supply messages sent to the kernel's cache control port. After the data has been supplied, the memory remains under kernel control until returned in a memory_object_data_return message. [8]
In L3, it works differently: Memory managers merely pass their data by reference using flexpage mappings; the memory remains under control of the memory manager. Both the kernel and the memory manager can remove the mapping at any time. [4]

Mach provides an interface to operate directly on virtual memory regions (vm_*): For instance, it is possible to unmap a specified memory region, no matter which memory manager(s) backs the region. [8]
L3 provides no equivalent interface: Region management is carried out entirely within memory managers (hence only available on single data spaces, not on virtual memory regions having any number of data spaces mapped), so client tasks need to communicate with their memory manager to manipulate memory regions.

Footnotes:

The initialization dialogue discussed here is unique to the OSF version of Mach 3.0. Other Mach version employ a simpler scheme; however, with those schemes the memory manager is unable to protect itself properly.
Ledgers are only available in OSF Mach.

2.2 Unix Operating Systems as Microkernel Application Programs

2.2.1 Approaches

There are two approaches to implementing Unix operating systems on microkernels: multi-server and single-server. This work (and this section) focus on the single-server approach; for a general discussion of multi-server vs. single-server vs. monolithic Unix design, please refer to [6] and the [12].

2.2.2 Implementation of Lites on Mach

Lites is a free Unix implementation implemented as a single-server and an emulation library running under Mach. It has been developed by Johannes Helander at HUT and borrows code from CMU's Unix single-server implementation and UCB's 4.4BSD Lite. [12]

In this section we will look at certain aspects of the Lites implementation, namely the aspects relevant to porting the Lites server to another microkernel.

Types of IPC Used

Messages through ports: are used for RPC between the Unix server and the emulated Unix processes (the clients), and for delivering signals to a signal handling thread running within the task emulating the Unix process.
Shared memory: is employed for sharing various data between the server and the clients, both in read-only and read-write fashion: u area, time information and mapped files (,,vnode paging''), among others. The server acts as an external pager for the clients.
Faults, exceptions and low-level Mach system calls: Besides the usual semantics of invoking Mach mechanisms like page in/out requests to pagers and Mach system calls, these types of IPC are also used to emulate certain Unix mechanisms: Unix system call redirection/emulation and raising Unix signals. [12]

What Ports Are Used For

Ports are not just used as communication channels; they're exploited in various other ways, too:

Object referencing: Besides utilization as references to Mach kernel objects as explained in section [here], ports are also used by the clients' emulation library to reference Lites (Unix) kernel objects like process structures and files, and references to Lites' ttys, Unix devices and network interface handlers are passed to Mach's device drivers.
Mach-enforced reference protection: The Mach kernel protects port references (send and receive rights). The utilization of port rights as capabilities regarding Lites kernel objects simplifies access checking in the Lites server. [12]

Other Mach Facilities Employed

Besides ports and Mach's IPC facilities, Lites makes use of various other Mach facilities:

Libcthreads: The C Threads library is utilized to gain concurrency in the Lites server and to synchronize accesses from threads to kernel data structures.
MIG: The interfaces for communication between the server and the emulation library mapped into the user task's address space are described as MIG specification files. MIG generates the RPC stubs from these files.

2.3 Development Environment for L3 programs

The L3 kernel is implemented using a DOS 32-bit assembler by Pharlap. This assembler is run in the DOS environment and we cannot give it away at no cost because of licensing restrictions. The memory management subsystem is written in CDL, but we did not consider to use this non-widespread language. (Our group even started a project to rewrite the memory management in C.)

ELAN, an ALGOL-like language, is used as command language, for scripts and as implementation language for about all non-kernel programs. The ELAN compiler translates very efficiently into i386 assembler. [13]

A port of gcc (the GNU C Compiler) version 1.37 is available. Compared to the current gcc 2.x it lacks some features, among them limited inline assembler facilities and no C++.

We didn't find a make-like programs .

Michael Hohmuth, Sven Rudolph
April 9, 1996