Hello Fiasco.OC developers,
hereby, I'd like to propose a slight architectural change regarding the roles of sigma0 and roottask in Fiasco.OC-based systems.
During our work with using the kernel for Genode, we repeatedly encountered problems that were somehow related to sigma0, in particular the priority inversion problem we reported last year (http://os.inf.tu-dresden.de/pipermail/l4-hackers/2012/005348.html) and a recent issue related to inconsistencies of caching attributes on ARM, which led to subtle memory corruptions. Both problems were pretty hard to debug and took us a lot of time.
Knowing about those issues, there are of course ways to deal with them. The priority inversion problem could be solved by assigning the highest priority to sigma0. The cache attribute problem can principally be dealt with by managing the cache flushing manually and making sure not to touch the wrong cache lines. But this remains to be a mine field. Because roottask has all memory mapped, dangling pointers may go unnoticed, yet produce unwanted caching effects. In our experience, these kinds of problems remain largely invisible until the system gets highly dynamic (e.g., if the role of RAM for DMA buffers or normal memory changes at runtime). But once they occur, they become a nuisance.
Without sigma0, our life would have been easier. With no sigma0 thread, there wouldn't have been a priority inversion problem. And without the sigma0 protocol that is unaware of caching attributes, we could easily maintain the consistent use of those attributes among all processes including roottask.
Besides hitting the issues mentioned above, we found that the use of sigma0 implies two further problems. First, because roottask can hand out memory not before obtaining it from sigma0, all physical memory must be mapped within roottask. So the virtual memory of roottask limits the amount of physical memory usable in the system. And second, because roottask must maintain all those mappings with maximum privileges, a bug in roottask can silently corrupt arbitrary memory.
Motivated by these observations, I conducted the experiment to remove sigma0 from the picture and see where this would lead us.
Kernel changes --------------
The (preliminary) patch of the kernel and bootstrap is actually pretty small:
https://github.com/nfeske/foc/commit/7599e863c2feb07a34b891499982f4ffb58ff3e...
I kept the term "sigma0" in place to keep the patch simple. In the following, the terms "sigma0" and "roottask" always refer to the first user-land process started by the kernel.
Originally, sigma0 was paged with one-to-one mappings by the kernel and would use the normal map operation with a sigma0-virtual address as source of the mapping. Here, the new solution differs in that all memory mappings originating from sigma0 are now directly coming from the physical address space. This requires one kernel-internal interface change concerning the 'Mem_space::v_fabricate_map' function.
This function is used in two situations, map and unmap. When mapping, it is used to determine the physical frame for the virtual address specified as source for the mapping. For the new version of sigma0, this makes no sense because the source address does not refer to sigma0's virtual address space. When unmapping, however, this function is used to look up the physical frame for the virtual page to unmap. In this case, the argument refers to an actual sigma0-virtual address. Consequently, the function cannot accommodate both use cases. Therefore, I introduced a new 'v_fabricate_map_src' function that accompanies the 'v_fabricate' function. For all processes other than sigma0, both functions are doing the same thing. But for sigma0, the 'map_src' function interprets the address argument as a physical address. Because this function is used to determine the mapping source, I have used the suffix "_map_src".
The second noteworthy change is the distinction between sigma0 threads with a pager and those without a pager. The original version of sigma0 had no notion of a pager. There was only a single thread, paged by the kernel. Now, if roottask is sigma0, there are several threads. Most of them can be paged by a local pager in roottask. To distinguish both cases, I needed to introduce an 'is_null' accessor.function to the 'Context_ptr' class.
User-land changes -----------------
The implications to Genode's version of roottask (called core) are more substantial but in very positive ways:
The initialization of core's allocators used to required an interplay between core and sigma0. Because there is no longer a need to have all memory mapped in core, we can simply drop this whole procedure and just use the memory descriptors provided by the KIP.
After an initialization phase where core faults-in its own image and the KIP via the kernel, core drops its privileges by assigning a core-local pager to all core threads. So any invalid access gets detected right away. Browsing through the page table of core using the kernel debugger is like visiting a desert. In contrast to the original version, it has become easy to maintain an overview of the mappings within core.
The revocation of memory mappings used to rely on the in-kernel mapping database. This won't work for core anymore because core does not maintain mappings for the memory handed out to other processes. Instead, core uses 'l4_task_unmap' to flush mappings from non-core processes as needed. This is similar to how Genode works on OKL4. Still, non-core processes may create further mappings, which are captured by the in-kernel mapping database.
Current state and open questions --------------------------------
With the current state of the implementation, the complete software stack of Genode runs without sigma0. This includes L4Linux. So I am pretty confident that the removal of sigma0 from the system does not imply functional disadvantages.
That said, there are a few remaining questions that I'd like to discuss.
First, the sole use of 'l4_task_unmap' to remotely flush memory mappings in other processes means that we must no longer provide the option of granting memory mappings. Otherwise, a process that received a mapping from roottask could "steal" the physical memory by granting it to someone else. Roottask would not know about that, and an attempt to flush the mappings in the original receiver of the mapping would just target a hole in the address space. My question is:
"Can we live without granting memory?"
or the other way: "What is a known use case for granting memory?"
Second, the new situation triggers some code paths in the kernel that were not used before. Apparently, the unmapping of memory from sigma0 was not considered. This is where I hit a few assertions in the kernel. Right now, I have just worked around these assertions by uncommenting the offending code in 'kernel/fiasco/src/kern/map_util.cpp'. I understand that this is just a stop-gap solution. Would you like to lend a helping hand to find out...
"How to support unmap from sigma0 in a clean way?"
I would like to avoid Genode from diverging too much from the semantics of the official Fiasco.OC kernel. Hence, I would appreciate your consideration:
"Would you like to follow a similar path with L4Re?"
That would be very nice. But even if this should not be the case, I would find your rationale behind sticking with sigma0 very valuable to know, e.g., for reconsidering my plan.
Regards Norman
On Thu, 2013-03-14 at 23:17 +0100, Norman Feske wrote:
Hello Fiasco.OC developers,
I skip the whole first part of the mail, may be we can discuss this later...
Current state and open questions
With the current state of the implementation, the complete software stack of Genode runs without sigma0. This includes L4Linux. So I am pretty confident that the removal of sigma0 from the system does not imply functional disadvantages.
That said, there are a few remaining questions that I'd like to discuss.
First, the sole use of 'l4_task_unmap' to remotely flush memory mappings in other processes means that we must no longer provide the option of granting memory mappings. Otherwise, a process that received a mapping from roottask could "steal" the physical memory by granting it to someone else. Roottask would not know about that, and an attempt to flush the mappings in the original receiver of the mapping would just target a hole in the address space. My question is:
"Can we live without granting memory?"
or the other way: "What is a known use case for granting memory?"
Yes, it is possible to live without grant for memory, however this would make it impossible to implement a pass-through pager that does not keep mappings on its own. And second, grant does not allow steeling something, if Task A wants to revoke memory it mapped to Task B it can always use its local address to do that regardless of what Task B did by granting or mapping memory somewhere else. So in other words the 'l4_task_unmap' call is useful for tasks that are fully controlled by some other task (tasks that cannot map/grant by themselves) As for example L4Linux user-level tasks or virtual machines.
So there is no functional benefit from removing 'grant', but there are legitimate use-cases that are impossible without grant.
Second, the new situation triggers some code paths in the kernel that were not used before. Apparently, the unmapping of memory from sigma0 was not considered. This is where I hit a few assertions in the kernel. Right now, I have just worked around these assertions by uncommenting the offending code in 'kernel/fiasco/src/kern/map_util.cpp'. I understand that this is just a stop-gap solution. Would you like to lend a helping hand to find out...
"How to support unmap from sigma0 in a clean way?"
So we currently do not consider removing sigma0 as a user-level process. Nor do we consider changing the semantics that sigma0 has access to all physical memory resources. Hence there is currently no good reason to support unmapping memory from sigma0 because it could regain access to that memory by accessing it.
Modifying the Fiasco interface in the proposed way would make the semantics for the sigma0 task very special and to all other tasks on Fiasco, which is currently not the case.
I would like to avoid Genode from diverging too much from the semantics of the official Fiasco.OC kernel. Hence, I would appreciate your consideration:
"Would you like to follow a similar path with L4Re?"
That would be very nice. But even if this should not be the case, I would find your rationale behind sticking with sigma0 very valuable to know, e.g., for reconsidering my plan.
Currently not, we consider sigma0 being part of our architecture and will probably stay with it. However, there could be possible enhancements to the sigma0 interface that support your use-cases. And there could also be some kernel-interface enhancements that allow more effective and robust user-level memory management.
regards
Hello Alex,
thanks for your reply, but as you might have expected there are some follow-up questions...
On Mon, Mar 18, 2013 at 05:19:08PM +0100, Alexander Warg wrote:
Yes, it is possible to live without grant for memory, however this would make it impossible to implement a pass-through pager that does not keep mappings on its own.
Is the pass-through pager really a use case or more like a theoretical construct. I was always wondering what the scenario would look like that needs this kind of pager. Could you please elaborate more on this?
And second, grant does not allow steeling something, if Task A wants to revoke memory it mapped to Task B it can always use its local address to do that regardless of what Task B did by granting or mapping memory somewhere else. So in other words the 'l4_task_unmap' call is useful for tasks that are fully controlled by some other task (tasks that cannot map/grant by themselves) As for example L4Linux user-level tasks or virtual machines.
Hm, I'm not that familiar with the current mapping interface/rights, but does that mean there exists a feature to prevent mappees from further delegating page-frame access rights via map or grant?
So we currently do not consider removing sigma0 as a user-level process. Nor do we consider changing the semantics that sigma0 has access to all physical memory resources. Hence there is currently no good reason to support unmapping memory from sigma0 because it could regain access to that memory by accessing it.
Modifying the Fiasco interface in the proposed way would make the semantics for the sigma0 task very special and to all other tasks on Fiasco, which is currently not the case.
I doubt that Norman's proposal does change that much "semantics" regarding Sigma0 as it's special anyway and can map any physical frame into its address space by just touching the corresponding address. Please consider, that the proposed changes could lead to a noticable gain in robustness for the most-privileged user-level process on Fiasco.
Currently not, we consider sigma0 being part of our architecture and will probably stay with it. However, there could be possible enhancements to the sigma0 interface that support your use-cases. And there could also be some kernel-interface enhancements that allow more effective and robust user-level memory management.
Do you already have some ideas to discuss here? I'd highly appreciate a lively dicussion on this (at least for us) important topic.
Regards
On Tue, 2013-03-19 at 09:34 +0100, Christian Helmuth wrote:
Hello Alex,
thanks for your reply, but as you might have expected there are some follow-up questions...
On Mon, Mar 18, 2013 at 05:19:08PM +0100, Alexander Warg wrote:
Yes, it is possible to live without grant for memory, however this would make it impossible to implement a pass-through pager that does not keep mappings on its own.
Is the pass-through pager really a use case or more like a theoretical construct. I was always wondering what the scenario would look like that needs this kind of pager. Could you please elaborate more on this?
If this example is too theoretical, and yes we currently have no such construct in use. There is a different use case that is facilitated by grant which is a local move operation e.g. for an mremap-like operation. Where a local region mapper can use grant to move mappings to a different virtual address.
And second, grant does not allow steeling something, if Task A wants to revoke memory it mapped to Task B it can always use its local address to do that regardless of what Task B did by granting or mapping memory somewhere else. So in other words the 'l4_task_unmap' call is useful for tasks that are fully controlled by some other task (tasks that cannot map/grant by themselves) As for example L4Linux user-level tasks or virtual machines.
Hm, I'm not that familiar with the current mapping interface/rights, but does that mean there exists a feature to prevent mappees from further delegating page-frame access rights via map or grant?
No, at least not directly. However you can prevent threads in an address space to do a system call at all (either using VCPU features or the 'alien' flag).
So we currently do not consider removing sigma0 as a user-level process. Nor do we consider changing the semantics that sigma0 has access to all physical memory resources. Hence there is currently no good reason to support unmapping memory from sigma0 because it could regain access to that memory by accessing it.
Modifying the Fiasco interface in the proposed way would make the semantics for the sigma0 task very special and to all other tasks on Fiasco, which is currently not the case.
I doubt that Norman's proposal does change that much "semantics" regarding Sigma0 as it's special anyway and can map any physical frame into its address space by just touching the corresponding address. Please consider, that the proposed changes could lead to a noticable gain in robustness for the most-privileged user-level process on Fiasco.
sigma0 is not really special, from user-level perspective it just has a pager that resolves its page faults transparently.
Currently not, we consider sigma0 being part of our architecture and will probably stay with it. However, there could be possible enhancements to the sigma0 interface that support your use-cases. And there could also be some kernel-interface enhancements that allow more effective and robust user-level memory management.
Do you already have some ideas to discuss here? I'd highly appreciate a lively dicussion on this (at least for us) important topic.
The ideas are currently not in a shape that I want to discuss them here.
regards
Thanks Alex for your response,
mappings on its own. And second, grant does not allow steeling something, if Task A wants to revoke memory it mapped to Task B it can always use its local address to do that regardless of what Task B did by granting or mapping memory somewhere else. So in other words the 'l4_task_unmap' call is useful for tasks that are fully controlled by some other task (tasks that cannot map/grant by themselves) As for example L4Linux user-level tasks or virtual machines.
I am afraid that I did not express the fundamental idea of my proposal well enough. In the new version of roottask, roottask maps directly from the physical address space to the virtual address space of all other processes (in your example, this is task B) without keeping a local mapping. So roottask does not possess a mapping node to revoke such mappings. In order to be able to revoke mappings, it keeps records of the virtual addresses to where it installed the mappings in the other tasks. For revoking a mapping, roottask performs 'l4_task_unmap' with the remote task capability and the remote virtual address range as arguments.
The "stealing" would happen if task B grants the mapping to somewhere else. My incentive behind the removal of the memory-granting mechanism is solely to avoid this loophole. I would appreciate to know more specifically, which "legitimate use cases" this would break. For example, does L4Re relies on granting memory?
Please note that the mapping database is still in effect for mappings further propagated by task B. But task B would be the root of the mapping tree.
So we currently do not consider removing sigma0 as a user-level process. Nor do we consider changing the semantics that sigma0 has access to all physical memory resources. Hence there is currently no good reason to support unmapping memory from sigma0 because it could regain access to that memory by accessing it.
I did not suggest at all to change sigma0/roottasks's ultimate power over physical memory. Sigma0/roottask can still map any physical page to its local address space using l4_task_map with the physical address as source and its local virtual address as destination. The two fundamental differences are that those mappings are not installed auto-magically but explicitly, and that the mappings are not necessarily identity mappings. By explicitly installing the mappings, roottask gains robustness. By removing the identity-mapping policy of sigma0, roottask can use its virtual memory more flexible.
My line of thinking is that the fewer memory is shared between roottask and other user-space processes, the better. The current architecture shares all memory of all processes with roottask. In the variant I suggest, roottask shares no memory with other user processes.
Currently not, we consider sigma0 being part of our architecture and will probably stay with it. However, there could be possible enhancements to the sigma0 interface that support your use-cases. And there could also be some kernel-interface enhancements that allow more effective and robust user-level memory management.
Even though sigma0 and roottask happen to be executed in user space, both are logically part of the operating system's kernel because all other processes ultimately depend on them. According to the principles of microkernel construction, the kernel should be free from policy. Yet, you support keeping policy in the form of sigma0, which can easily be removed as my experiment suggests. According to the minimality principle, the kernel should contain functionalities only if they are strictly needed. I fail to grasp how sigma0 qualifies for that. When looking at the problem from this perspective, "enhancing" the sigma0 interface looks like going in the wrong direction. It would make the kernel of the system (which is not merely the code executed in privileged mode) not less complex but more complex.
From your mail, I understand that you do not feel any urge to change the
status quo as sigma0 apparently causes no problems for you. Even though I was hoping for a different response, thank you for stating your stance.
Regards Norman
On Tue, 2013-03-19 at 12:29 +0100, Norman Feske wrote:
Thanks Alex for your response,
mappings on its own. And second, grant does not allow steeling something, if Task A wants to revoke memory it mapped to Task B it can always use its local address to do that regardless of what Task B did by granting or mapping memory somewhere else. So in other words the 'l4_task_unmap' call is useful for tasks that are fully controlled by some other task (tasks that cannot map/grant by themselves) As for example L4Linux user-level tasks or virtual machines.
I am afraid that I did not express the fundamental idea of my proposal well enough. In the new version of roottask, roottask maps directly from the physical address space to the virtual address space of all other processes (in your example, this is task B) without keeping a local mapping. So roottask does not possess a mapping node to revoke such mappings. In order to be able to revoke mappings, it keeps records of the virtual addresses to where it installed the mappings in the other tasks. For revoking a mapping, roottask performs 'l4_task_unmap' with the remote task capability and the remote virtual address range as arguments.
I think I already got this. And my proposal is to not use l4_task_unmap to unmap memory from an arbitrary address space, doing this either need cooperation from the target or full control over the target address space (using VCPU or alien). However you can use unmap with a local address in the roottask (pager) and unmap from all children.
The "stealing" would happen if task B grants the mapping to somewhere else. My incentive behind the removal of the memory-granting mechanism is solely to avoid this loophole. I would appreciate to know more specifically, which "legitimate use cases" this would break. For example, does L4Re relies on granting memory? Please note that the mapping database is still in effect for mappings further propagated by task B. But task B would be the root of the mapping tree.
In fiasco the root of a mapping tree is always sigma0 as long as it does not grant, and our implementation doesn't. If an intermediate task uses grant it will not disconnect a subtree it will just move the mapping node to a different location (virtual address and/or task).
So we currently do not consider removing sigma0 as a user-level process. Nor do we consider changing the semantics that sigma0 has access to all physical memory resources. Hence there is currently no good reason to support unmapping memory from sigma0 because it could regain access to that memory by accessing it.
I did not suggest at all to change sigma0/roottasks's ultimate power over physical memory. Sigma0/roottask can still map any physical page to its local address space using l4_task_map with the physical address as source and its local virtual address as destination. The two fundamental differences are that those mappings are not installed auto-magically but explicitly, and that the mappings are not necessarily identity mappings. By explicitly installing the mappings, roottask gains robustness. By removing the identity-mapping policy of sigma0, roottask can use its virtual memory more flexible.
The semantic change is that currently there is no notion of physical addresses in the Fiasco API and your proposal introduces this notion for a single task, call it sigma0.
My line of thinking is that the fewer memory is shared between roottask and other user-space processes, the better. The current architecture shares all memory of all processes with roottask. In the variant I suggest, roottask shares no memory with other user processes.
I totally agree that for robustness reasons it would be good to not map all the memory to the roottask, and in fact you can already do this by changing the pager of the roottask thread itself to point to an invalid capability for example to prevent unwanted page faults to be resolved by sigma0. The tricky part is to map memory to clients that is not mapped in the server, one would have to request the memory on demand from sigma0 and then grant it to the client task. If you want to have revocation you would either need to enhance the sigma0 protocol to request sigma0 to do unmaps of memory on a sigma0-clients behalve, or need to use an extra task that has the whole memory mapped and use l4_task_map to map this memory to your roottask and then grant it to your client, for revocation you can use l4_task_unmap with that helper task.
Currently not, we consider sigma0 being part of our architecture and will probably stay with it. However, there could be possible enhancements to the sigma0 interface that support your use-cases. And there could also be some kernel-interface enhancements that allow more effective and robust user-level memory management.
Even though sigma0 and roottask happen to be executed in user space, both are logically part of the operating system's kernel because all other processes ultimately depend on them. According to the principles of microkernel construction, the kernel should be free from policy. Yet, you support keeping policy in the form of sigma0, which can easily be removed as my experiment suggests. According to the minimality principle, the kernel should contain functionalities only if they are strictly needed. I fail to grasp how sigma0 qualifies for that. When looking at the problem from this perspective, "enhancing" the sigma0 interface looks like going in the wrong direction. It would make the kernel of the system (which is not merely the code executed in privileged mode) not less complex but more complex.
This will be an endless discussion I think...
Our policy is that everybody can run its own implementation of sigma0 on Fiasco.OC and yes it is part core functionality for any OS running on Fiasco, this does not mean that we should put its functionality into the microkernel.
regards
l4-hackers@os.inf.tu-dresden.de