Hello Fiasco.OC developers,
hereby, I'd like to propose a slight architectural change regarding the roles of sigma0 and roottask in Fiasco.OC-based systems.
During our work with using the kernel for Genode, we repeatedly encountered problems that were somehow related to sigma0, in particular the priority inversion problem we reported last year (http://os.inf.tu-dresden.de/pipermail/l4-hackers/2012/005348.html) and a recent issue related to inconsistencies of caching attributes on ARM, which led to subtle memory corruptions. Both problems were pretty hard to debug and took us a lot of time.
Knowing about those issues, there are of course ways to deal with them. The priority inversion problem could be solved by assigning the highest priority to sigma0. The cache attribute problem can principally be dealt with by managing the cache flushing manually and making sure not to touch the wrong cache lines. But this remains to be a mine field. Because roottask has all memory mapped, dangling pointers may go unnoticed, yet produce unwanted caching effects. In our experience, these kinds of problems remain largely invisible until the system gets highly dynamic (e.g., if the role of RAM for DMA buffers or normal memory changes at runtime). But once they occur, they become a nuisance.
Without sigma0, our life would have been easier. With no sigma0 thread, there wouldn't have been a priority inversion problem. And without the sigma0 protocol that is unaware of caching attributes, we could easily maintain the consistent use of those attributes among all processes including roottask.
Besides hitting the issues mentioned above, we found that the use of sigma0 implies two further problems. First, because roottask can hand out memory not before obtaining it from sigma0, all physical memory must be mapped within roottask. So the virtual memory of roottask limits the amount of physical memory usable in the system. And second, because roottask must maintain all those mappings with maximum privileges, a bug in roottask can silently corrupt arbitrary memory.
Motivated by these observations, I conducted the experiment to remove sigma0 from the picture and see where this would lead us.
Kernel changes --------------
The (preliminary) patch of the kernel and bootstrap is actually pretty small:
https://github.com/nfeske/foc/commit/7599e863c2feb07a34b891499982f4ffb58ff3e...
I kept the term "sigma0" in place to keep the patch simple. In the following, the terms "sigma0" and "roottask" always refer to the first user-land process started by the kernel.
Originally, sigma0 was paged with one-to-one mappings by the kernel and would use the normal map operation with a sigma0-virtual address as source of the mapping. Here, the new solution differs in that all memory mappings originating from sigma0 are now directly coming from the physical address space. This requires one kernel-internal interface change concerning the 'Mem_space::v_fabricate_map' function.
This function is used in two situations, map and unmap. When mapping, it is used to determine the physical frame for the virtual address specified as source for the mapping. For the new version of sigma0, this makes no sense because the source address does not refer to sigma0's virtual address space. When unmapping, however, this function is used to look up the physical frame for the virtual page to unmap. In this case, the argument refers to an actual sigma0-virtual address. Consequently, the function cannot accommodate both use cases. Therefore, I introduced a new 'v_fabricate_map_src' function that accompanies the 'v_fabricate' function. For all processes other than sigma0, both functions are doing the same thing. But for sigma0, the 'map_src' function interprets the address argument as a physical address. Because this function is used to determine the mapping source, I have used the suffix "_map_src".
The second noteworthy change is the distinction between sigma0 threads with a pager and those without a pager. The original version of sigma0 had no notion of a pager. There was only a single thread, paged by the kernel. Now, if roottask is sigma0, there are several threads. Most of them can be paged by a local pager in roottask. To distinguish both cases, I needed to introduce an 'is_null' accessor.function to the 'Context_ptr' class.
User-land changes -----------------
The implications to Genode's version of roottask (called core) are more substantial but in very positive ways:
The initialization of core's allocators used to required an interplay between core and sigma0. Because there is no longer a need to have all memory mapped in core, we can simply drop this whole procedure and just use the memory descriptors provided by the KIP.
After an initialization phase where core faults-in its own image and the KIP via the kernel, core drops its privileges by assigning a core-local pager to all core threads. So any invalid access gets detected right away. Browsing through the page table of core using the kernel debugger is like visiting a desert. In contrast to the original version, it has become easy to maintain an overview of the mappings within core.
The revocation of memory mappings used to rely on the in-kernel mapping database. This won't work for core anymore because core does not maintain mappings for the memory handed out to other processes. Instead, core uses 'l4_task_unmap' to flush mappings from non-core processes as needed. This is similar to how Genode works on OKL4. Still, non-core processes may create further mappings, which are captured by the in-kernel mapping database.
Current state and open questions --------------------------------
With the current state of the implementation, the complete software stack of Genode runs without sigma0. This includes L4Linux. So I am pretty confident that the removal of sigma0 from the system does not imply functional disadvantages.
That said, there are a few remaining questions that I'd like to discuss.
First, the sole use of 'l4_task_unmap' to remotely flush memory mappings in other processes means that we must no longer provide the option of granting memory mappings. Otherwise, a process that received a mapping from roottask could "steal" the physical memory by granting it to someone else. Roottask would not know about that, and an attempt to flush the mappings in the original receiver of the mapping would just target a hole in the address space. My question is:
"Can we live without granting memory?"
or the other way: "What is a known use case for granting memory?"
Second, the new situation triggers some code paths in the kernel that were not used before. Apparently, the unmapping of memory from sigma0 was not considered. This is where I hit a few assertions in the kernel. Right now, I have just worked around these assertions by uncommenting the offending code in 'kernel/fiasco/src/kern/map_util.cpp'. I understand that this is just a stop-gap solution. Would you like to lend a helping hand to find out...
"How to support unmap from sigma0 in a clean way?"
I would like to avoid Genode from diverging too much from the semantics of the official Fiasco.OC kernel. Hence, I would appreciate your consideration:
"Would you like to follow a similar path with L4Re?"
That would be very nice. But even if this should not be the case, I would find your rationale behind sticking with sigma0 very valuable to know, e.g., for reconsidering my plan.
Regards Norman