RE: IPC/Capabilities Overview

2 Jan 2004

      Similar to Rudy I try to answer multiple mails at once.  (Sorry for the
long post!!!)

First a general statement.  It appears to me that goals are very
different.  As Rudy pointed out one of the main goals of L4 (at least
pursued in Karlsruhe) is to have a universal and high performance uK.
Universal here means that it is applicable to any domain.  That also
answers Rudy's question who is going to use a microkernel without a
security model--anybody who does not require a security model.  And that
may be your wrist watch, cell phone, or a medical appliance.

Jonathan wrote:
...
Please consider that "clever" is a four letter word (a curse). All of
these optimizations are, in my opinion, justifiable reasons to fire a
programmer with a strongly negative recommendation. The resulting
systems are unmaintainable.
Speed is not the primary goal. Efficient robustness (measured end to
end) is the primary goal.
That maybe true for your research agenda.  However, to investigate the
limits of uK based systems I would be very disappointed if you wouldn't
even consider these optimization.  Then we are back to the good old MACH
problem:  uKs are inherently slow by nature, full stop.  And, we are
talking research here, not business (btw, I would fire those guys too).
We may also simply not have the right tools to do that robust and
efficiently today, but then we should look into better tools.
...
Excuse me, but this is utter nonsense. You are making a quantitative
argument using qualitative and unsubstantiated arguments. You (and I)
need numbers in order to evaluate this issue.
The correct way to approach this is to ask:
1. What percentage of invocations dynamically incur the cost of
     checking?
  2. What is the respective cost of this checking at user and at
     supervisor level?
  3. Given that the supervisor-mode implementation *is* incurred by
     all cases, what is the weighted cost of the mechanism in both
     cases (user and supervisor).
Once you have this calculation, you have a correct 
engineering basis for making a decision.
Good that we are on common ground now.
We also have to agree on what system you and I have in mind and from
that we can derive the overall costs.  To me it seems you are implying a
system design and claim that L4's lack of certain features makes it
perform as good/bad or worse.  However, by not having the feature in the
kernel but in user land allows to modify it.  Now the question changes
to how efficiently can we implement it in user land compared to the
in-kernel version.
In the case it is part of the kernel you can't change it anymore which
means you will always pay the cost whether you want or not.  And here
you also have to evaluate _all_ potential systems, even those without
security requirements.
...
And I should acknowledge that I disagree with Volkmar very 
fundamentally about the "performance uber alles" assumption. 
Performance is important, but from a research perspective it 
is *more* important to understand what the fundamental architectural 
issues are. Once we do, we can step back and decide what to take 
out. At the moment, we have no common base on which direct comparison 
is possible.
I think Jonathan misunderstood me on that.  I think we have a set of
core requirements for a kernel which are probably almost identical for
EROS and L4.  However, we have a different focus.  And performance is
still one of the main reasons why monolithic kernels are preferred over
uKs.  That does not mean that security is less important.
...
I don't think that the L4 kernel has all the answers, but I also think
that the EROS kernel does not have all the answers. Each, I think, has
important strengths. EROS lacks a certain elegance of  minimality. L4 
(as I understand it today) lacks a credible story about security,
access
control and denial of service. Perhaps it is time for both groups to
step back and make a serious attempt to learn from each other.
I agree that one of the current weaknesses of L4 is its very rudimentary
security model.  And each L4 group seems to have its own perspective on
how this should be solved.
...
From the conversations on this list, however, it appears to 
me that the L4 research groups do not agree universally on what the 
IPC control mechanism is, and this may be contributing to some 
difficulty in the discussion. One group clearly advocates thread
spaces. 
Volkmar (if I understand him) currently advocates IPC indirection. The
...
two designs have very different implications for performance,
implementation, 
and security.
No, I'm not advocating this.  So far nobody has a reasonable
cost-benefit analysis of the thread mappings and as long as that is the
case I'm not in favor of anything.  However, considering complexity and
elegance of the different alternatives and the performance impact on a
wide variety of hardware architectures (specifically MP and NUMA
systems), I'm very skeptical about proposals which bind security
principals on address spaces.  That will just open another can of worms.

The redirection model has the disadvantage of fundamentally changing the
semantics of IPC.  With a purely synchronous IPC model timing is a
fundamental part of the communication protocol.  Transparent
interception is not possible since the message gets successfully
delivered to the redirector, which, however, may discard it.  This
alters guarantees usually given by the kernel.
...
Volkmar's technique is quite elegant. The only problem I see 
is that it places the burden of verification on the wrong party 
-- it should be on the client. That is, the design invites DoS 
attacks.
If it does not matter where the check code is executed
(client/kernel/server) but only who gets accounted for the used
resources (CPU, cache, whatever) this is not the case.  (I'm not
claiming L4 can do that (yet).)
...
However, Volkmar is making an assumption that is problematic. He is
assuming that the passed ID is a pointer. From a security perspective
this is VERY bad. First, it invites memory attacks. Second, 
it discloses information about the internal implementation of the
server. 
Third, it makes selective rescind of authority very hard to do.
No, it can be an object handle which can be translated in some/any way
(e.g. cryptographically modified, or a hash).  The fundamental
difference is that it is not a first class kernel object.  This means
you have the freedom to choose the appropriate encoding (which could be
a pointer of course).
...
As a backwards compatibility matter, and as a validation of the L4
nucleus, running Linux on an L4 is interesting. As a matter of future
research, and as a matter of forward-looking system 
architecture, it is boring.
Yes, I agree.  However, a gradually decomposed Linux can answer many
research questions without re-implementing a fully fledged OS.
...
The right forward-looking research question is:
*Given* a fast nucleus in the style of L4, what is the most
  effective way to structure a native operating environment?
  What fundamental and novel leverage do such kernels provide?
That was one of the goals of SawMill.

==== Rudy wrote:
...
Sorry, but I do not see how the version part could be used as 
the server defined word. You can't have multiple versions 
(read server defined words) of the same thread-id at the 
same time.
Correct.  I had a different usage scenario in mind--sorry for
confusions.
...
I believe these costs are exagerated. To convert a capability into 
sender-id a simple table lookup can be used. This requires a single 
TLB entry and a single L2 cache line (worst case). If IPC is frequent,
...
which is the case in which you want high performance, the TLB entry 
is very likely to be present and probably even the L2 cache line.
This assumes that you have a small TLB working set and small cache
working set.  However, if communication is frequent that also means that
you are probably communicating to _many_ partners.  And then you
probably need a TLB entry per table which again increases the TLB
footprint.  That means the likelihood that you replace user-TLB entries
goes up--the problem is the user's active working set you replace.

==== Jonathan wrote:
...
Volkmar is combining two things in one, and seems to be 
forgetting that he does so. You are correct. Once you have the 
capability, obtaining the target thread id is quite fast. No table 
lookup is required. The capability is kernel-protected state, and can 
simply contain a direct pointer to the recipient PCB.
I think that the costs that Volkmar is identifying come from 
the need to lookup the thread-id or capability in the thread mapping 
space. That is, these are the costs of the address space traversal. 
The costs are the same in either case (thread-id or capability).
If you add a level of indirection (thread id or cap) that is the case.
However, if you pay for a level of indirection anyhow, then it does not
matter whether the name specifies a capability or a thread.  I have the
(maybe unreasonable) feeling that it should be possible to express both
in one model with the same costs associated.
...
One of the features of EROS is an indirection object. This object can
stand in front of a start capability transparently. The client invokes
the indirection object capability rather than the start capability.
Given this construction, a service can selectively rescind that client
capability by destroying the indirection object.
This is something that L4 presently has no means to do. Of course, L4
does not need it, because it makes no assumption about controlling
client invocations at all.
In the current L4 model this would be a proxy thread assuming that IPC
is cheap and it happens infrequently.  In the case of frequent
redirection you would extend the IPC protocol to take the
reconfiguration into account.  This boils down to how much transparency
do we want.
Jonathan mentioned on the EROS list that transparent persistency is
something which is not feasible.  The same question can be asked in this
case: is transparent redirection something we want/need?  And are we
willing to pay the overhead?  One could construct a system, where
communication can only be restricted (for enforcement) and apps must be
able to dynamically re-configure.

==== Jonathan
...
In practice, I would use a cache for an L1 software probe 
first in order to avoid all marginal TLB and cache misses 
in the usual case, but this would result in high variance 
of IPC times.
Which means you add a new policy to the kernel, something we strictly
try to avoid with L4.

- Volkmar

RE: IPC/Capabilities Overview

Volkmar Uhlig