Similar to Rudy I try to answer multiple mails at once. (Sorry for the long post!!!)
First a general statement. It appears to me that goals are very different. As Rudy pointed out one of the main goals of L4 (at least pursued in Karlsruhe) is to have a universal and high performance uK. Universal here means that it is applicable to any domain. That also answers Rudy's question who is going to use a microkernel without a security model--anybody who does not require a security model. And that may be your wrist watch, cell phone, or a medical appliance.
Jonathan wrote:
Please consider that "clever" is a four letter word (a curse). All of these optimizations are, in my opinion, justifiable reasons to fire a programmer with a strongly negative recommendation. The resulting systems are unmaintainable.
Speed is not the primary goal. Efficient robustness (measured end to end) is the primary goal.
That maybe true for your research agenda. However, to investigate the limits of uK based systems I would be very disappointed if you wouldn't even consider these optimization. Then we are back to the good old MACH problem: uKs are inherently slow by nature, full stop. And, we are talking research here, not business (btw, I would fire those guys too). We may also simply not have the right tools to do that robust and efficiently today, but then we should look into better tools.
Excuse me, but this is utter nonsense. You are making a quantitative argument using qualitative and unsubstantiated arguments. You (and I) need numbers in order to evaluate this issue.
The correct way to approach this is to ask:
- What percentage of invocations dynamically incur the cost of checking?
- What is the respective cost of this checking at user and at supervisor level?
- Given that the supervisor-mode implementation *is* incurred by all cases, what is the weighted cost of the mechanism in both cases (user and supervisor).
Once you have this calculation, you have a correct engineering basis for making a decision.
Good that we are on common ground now. We also have to agree on what system you and I have in mind and from that we can derive the overall costs. To me it seems you are implying a system design and claim that L4's lack of certain features makes it perform as good/bad or worse. However, by not having the feature in the kernel but in user land allows to modify it. Now the question changes to how efficiently can we implement it in user land compared to the in-kernel version. In the case it is part of the kernel you can't change it anymore which means you will always pay the cost whether you want or not. And here you also have to evaluate _all_ potential systems, even those without security requirements.
And I should acknowledge that I disagree with Volkmar very fundamentally about the "performance uber alles" assumption. Performance is important, but from a research perspective it is *more* important to understand what the fundamental architectural issues are. Once we do, we can step back and decide what to take out. At the moment, we have no common base on which direct comparison is possible.
I think Jonathan misunderstood me on that. I think we have a set of core requirements for a kernel which are probably almost identical for EROS and L4. However, we have a different focus. And performance is still one of the main reasons why monolithic kernels are preferred over uKs. That does not mean that security is less important.
I don't think that the L4 kernel has all the answers, but I also think that the EROS kernel does not have all the answers. Each, I think, has important strengths. EROS lacks a certain elegance of minimality. L4 (as I understand it today) lacks a credible story about security,
access
control and denial of service. Perhaps it is time for both groups to step back and make a serious attempt to learn from each other.
I agree that one of the current weaknesses of L4 is its very rudimentary security model. And each L4 group seems to have its own perspective on how this should be solved.
From the conversations on this list, however, it appears to me that the L4 research groups do not agree universally on what the IPC control mechanism is, and this may be contributing to some difficulty in the discussion. One group clearly advocates thread
spaces.
Volkmar (if I understand him) currently advocates IPC indirection. The
two designs have very different implications for performance,
implementation,
and security.
No, I'm not advocating this. So far nobody has a reasonable cost-benefit analysis of the thread mappings and as long as that is the case I'm not in favor of anything. However, considering complexity and elegance of the different alternatives and the performance impact on a wide variety of hardware architectures (specifically MP and NUMA systems), I'm very skeptical about proposals which bind security principals on address spaces. That will just open another can of worms.
The redirection model has the disadvantage of fundamentally changing the semantics of IPC. With a purely synchronous IPC model timing is a fundamental part of the communication protocol. Transparent interception is not possible since the message gets successfully delivered to the redirector, which, however, may discard it. This alters guarantees usually given by the kernel.
Volkmar's technique is quite elegant. The only problem I see is that it places the burden of verification on the wrong party -- it should be on the client. That is, the design invites DoS attacks.
If it does not matter where the check code is executed (client/kernel/server) but only who gets accounted for the used resources (CPU, cache, whatever) this is not the case. (I'm not claiming L4 can do that (yet).)
However, Volkmar is making an assumption that is problematic. He is assuming that the passed ID is a pointer. From a security perspective this is VERY bad. First, it invites memory attacks. Second, it discloses information about the internal implementation of the
server.
Third, it makes selective rescind of authority very hard to do.
No, it can be an object handle which can be translated in some/any way (e.g. cryptographically modified, or a hash). The fundamental difference is that it is not a first class kernel object. This means you have the freedom to choose the appropriate encoding (which could be a pointer of course).
As a backwards compatibility matter, and as a validation of the L4 nucleus, running Linux on an L4 is interesting. As a matter of future research, and as a matter of forward-looking system architecture, it is boring.
Yes, I agree. However, a gradually decomposed Linux can answer many research questions without re-implementing a fully fledged OS.
The right forward-looking research question is:
*Given* a fast nucleus in the style of L4, what is the most effective way to structure a native operating environment? What fundamental and novel leverage do such kernels provide?
That was one of the goals of SawMill.
==== Rudy wrote:
Sorry, but I do not see how the version part could be used as the server defined word. You can't have multiple versions (read server defined words) of the same thread-id at the same time.
Correct. I had a different usage scenario in mind--sorry for confusions.
I believe these costs are exagerated. To convert a capability into sender-id a simple table lookup can be used. This requires a single TLB entry and a single L2 cache line (worst case). If IPC is frequent,
which is the case in which you want high performance, the TLB entry is very likely to be present and probably even the L2 cache line.
This assumes that you have a small TLB working set and small cache working set. However, if communication is frequent that also means that you are probably communicating to _many_ partners. And then you probably need a TLB entry per table which again increases the TLB footprint. That means the likelihood that you replace user-TLB entries goes up--the problem is the user's active working set you replace.
==== Jonathan wrote:
Volkmar is combining two things in one, and seems to be forgetting that he does so. You are correct. Once you have the capability, obtaining the target thread id is quite fast. No table lookup is required. The capability is kernel-protected state, and can simply contain a direct pointer to the recipient PCB.
I think that the costs that Volkmar is identifying come from the need to lookup the thread-id or capability in the thread mapping space. That is, these are the costs of the address space traversal. The costs are the same in either case (thread-id or capability).
If you add a level of indirection (thread id or cap) that is the case. However, if you pay for a level of indirection anyhow, then it does not matter whether the name specifies a capability or a thread. I have the (maybe unreasonable) feeling that it should be possible to express both in one model with the same costs associated.
One of the features of EROS is an indirection object. This object can stand in front of a start capability transparently. The client invokes the indirection object capability rather than the start capability. Given this construction, a service can selectively rescind that client capability by destroying the indirection object.
This is something that L4 presently has no means to do. Of course, L4 does not need it, because it makes no assumption about controlling client invocations at all.
In the current L4 model this would be a proxy thread assuming that IPC is cheap and it happens infrequently. In the case of frequent redirection you would extend the IPC protocol to take the reconfiguration into account. This boils down to how much transparency do we want. Jonathan mentioned on the EROS list that transparent persistency is something which is not feasible. The same question can be asked in this case: is transparent redirection something we want/need? And are we willing to pay the overhead? One could construct a system, where communication can only be restricted (for enforcement) and apps must be able to dynamically re-configure.
==== Jonathan
In practice, I would use a cache for an L1 software probe first in order to avoid all marginal TLB and cache misses in the usual case, but this would result in high variance of IPC times.
Which means you add a new policy to the kernel, something we strictly try to avoid with L4.
- Volkmar
On Thu, 2004-01-01 at 18:09, Volkmar Uhlig wrote:
Similar to Rudy I try to answer multiple mails at once. (Sorry for the long post!!!)
First a general statement. It appears to me that goals are very different. As Rudy pointed out one of the main goals of L4 (at least pursued in Karlsruhe) is to have a universal and high performance uK. Universal here means that it is applicable to any domain.
Actually, this is the goal that I am hoping to achieve. Today, L4 seems a very inefficient platform for constructing a protected, object-based operating system.
I won't suggest (and I don't believe) that EROS is the ultimate example of such a system, but perhaps you will consider the following as a challenge question:
Assertion [yours]: L4 is a universal uK.
Test: implement the semantics of EROS correctly and with reasonable efficiency on top of L4.
Hint: Today, you cannot, because L4 does not provide adequately protected IPC.
I can say confidently that neither Jochen nor I could figure out how to do it. The EROS reliance on protected descriptors is too deeply embedded and too critical to performance. You could do it without all of the EROS kernel object types -- Leendert and I figured that part out. What we can't figure out is how to do it without (a) protected thread-ids, and (b) that in-kernel word per descriptor.
That also answers Rudy's question who is going to use a microkernel without a security model--anybody who does not require a security model. And that may be your wrist watch, cell phone, or a medical appliance.
The wrist watch I might believe. Cell phones and medical appliances are both areas that are suffering badly from the absence of protection. Recent attacks on phones have made this pretty clear. In medical appliances, I'm not aware of security issues yet, but I'm definitely aware of *reliability* issues, and the underlying requirements for truly high-reliability systems are very very similar to the underlying requirements for secure systems. They begin with protection.
Perhaps there should be two specializations of the kernel API: one where protection must be supported and one where it is not needed. I am prepared to believe that unprotected systems should not pay the cost of protection. My problem is that protected systems similarly should not pay the cost of non-protection.
We also have to agree on what system you and I have in mind and from that we can derive the overall costs. To me it seems you are implying a system design and claim that L4's lack of certain features makes it perform as good/bad or worse.
Yes, in part, I am. But if L4 is indeed a universal microkernel -- or even a reasonable approximation to a universal microkernel -- then it should be able to efficiently support any particular system.
My claim is that L4 is missing one essential abstraction in the IPC mechanism: protection.
Now the question changes to how efficiently can we implement it in user land compared to the in-kernel version.
That is one important question. The other important question is: if we implement it in user land, can we implement it with correct protection.
Performance is still one of the main reasons why monolithic kernels are preferred over uKs.
Based on experience, I am not convinced of this. I have been deeply involved in three very large scale decision processes about uK acquisitions and transitions. In each case, the outcome went *against* the uK for good reasons. As far as I remember, IPC performance was raised in the discussoins only as a curiosity. It never once played a significant role in the decisions.
The problem, in each case, was that the systems (including their microkernels) had not been effectively engineered in an end to end way. We saw evidence in each case that *overall* performance was terrible, but this was usually NOT due to IPC -- IPC frequency was actually quite low. It was the result of bad systemic design (certainly including the microkernel).
There are two possible conclusions that can be drawn from this:
1. Microkernels need to be faster than they used to be. 2. Microkernels and the systems on top of them need to be engineered with a better view of end to end design.
Up to a point, I think that (1) is correct. Past that point, (2) becomes dominant. This is why EROS uses an object-based rather than a process-based invocation system, and it is why we are willing to pay the cost of protection in the invocation mechanism. It remains to be seen whether we got it right, but recent performance numbers for our window system and our network system look very very promising.
In actual fact, I have only once seen a figure that has broken down for any production microkernel the total time spent in the kernel (KeyKOS: 40%). That figure is comparable to monolithic kernels. The KeyKOS implementation was definitely too slow, and in that generation it was pretty much a monolithic kernel.
shap
In this mail I am trying to address two very specific problems in L4 that are impediments in high-assurance certification. While fixing these issues would improve the likelihood of supporting a system like EROS on L4, both issues need to be addressed even if EROS is not a consideration at all.
I should explain that I have been involved in some high assurance work in connection with a couple of consulting clients. In each case, L4 was seriously considered and was discarded because of the issues I will describe below. Setting aside any question of running EROS on top of L4, I would like to see it possible for a good uKernel (L4) to be used as the base of a serious high-assurance system.
I should also emphasize that I may be wrong about some details of L4, and that if this is true I would like to be corrected!
Issue 1: Need for Protected IPC
On Thu, 2004-01-01 at 18:09, Volkmar Uhlig wrote:
If it does not matter where the check code is executed (client/kernel/server) but only who gets accounted for the used resources (CPU, cache, whatever) this is not the case. (I'm not claiming L4 can do that (yet).)
In answer to Volkmar's question: it matters very fundamentally where the check is performed. It is the difference between mandatory and discretionary controls.
The current L4 IPC is discretionary: the sender can invoke anyone, and the recipient checks the sender identity and decides whether to accept the IPC.
There are a variety of secure system designs that impose some form of mandatory information flow policy. These include multilevel secure systems, but more generally, almost any system that uses reference monitor(s) for any reason (including recoverability).
In such systems, it is a fundamental requirement that the reference monitor be able to PREVENT (absolutely) communication between processes that are under its control. If a sender S is not permitted to send to a recipient R, the behavior must be exactly as if the send was performed to a non-existent ID.
Such systems include any system that seeks A1, EAL6, or EAL7 certification (or equivalent -- including FAA level A and British and German equivalents). In a system of this sort, the sender is not even permitted to know of the *existence* of R indirectly. Disclosing recipient thread-ids indirectly discloses this information (it is a failure of encapsulation). This means that the current L4 architecture cannot obtain an EAL6 or EAL7 certification today.
Setting aside any question of supporting EROS, this inability to meet high assurance security requirements violates the "L4 is universal" argument in a very basic way. This issue and one other (discussed below) have forced me (as a consultant) to advise two large and serious US companies against using L4 in high-assurance products. [To be clear: EROS wasn't even a candidate. L4 was frustratingly close to possible, and I would have loved to recommend L4.]
The main change that is needed in L4 to resolve this is to define the recipient-id and sender-id fields as opaque fields. Under high-assurance requirements, the sender is not entitled to know how many threads execute within the recipient. The sender-id and recipient-id therefore must not encode "thread within process". Similarly, it must not encode "process ID".
Of these two requirements, I suspect that eliminating "thread within process" is the harder part. If the recipient-id and sender-id today were simply process id's, L4 could architecturally redefine them as opaque values. A low-assurance implementation could simply use the process control block address as the value, while a high-assurance implementation would implement software protection on the value.
While I do not advocate any particular implementation strategy, let me give an *example* of one that might suffice: a simple hash table. Instead of using the PCB address as the process ID, the kernel could use H(sender kernel ID, requested-recipient-ID) as an index into a hash table and perform a single indirection (and possible hash bucket chasing) to find the process address. This is fundamentally the design proposed by Trent several years ago.
This implementation would clearly be more expensive than the current one (hash computation, extra TLB miss into the indirection table, extra D-cache reference), but I think that it is cheaper than thread spaces.
Perhaps there is some simpler solution to this. If so, I would be *very* interested to know, because I would like to be able to help some of my clients!
2: Restricted Mapping
This is a very small issue, and there may be some way around it.
In general, I like the map/grant model very much, but in some systems it is necessary for the manager to know who has what mappings. In these systems, having applications perform mappings directly to each other creates a consistency problem.
I can imagine two ways to enforce this policy:
1. Introduce a bit somewhere in the protected thread descriptor that prevents map/grant operations.
2. Introduce a bit in the map descriptor indicating that the recipient may not perform further map/grant operations from this region.
My preference would be to have *both* controls, because they serve slightly different purposes. The first lets me virtualize the map/grant operations (in order to keep manager metadata updated), while the second prevents map recursion.
The problem with the first method is that it requires protected recipient descriptors.
The problem with implementing only the second method is: what thread id should a page fault handler receive from a faulting thread? Restricted or non-restricted? I can argue for either depending on what kind of system I am trying to build.
shap
l4-hackers@os.inf.tu-dresden.de