IPC/Capabilities Overview

Volkmar Uhlig volkmar at ira.uka.de
Wed Dec 31 16:25:08 CET 2003

> -----Original Message-----
> From: Jonathan S. Shapiro [mailto:shap at eros-os.org] 
> Sent: Wednesday, December 31, 2003 6:32 AM
> First, some corrections on specific statements that you made, because
> they are potentially distracting from the real issues. Then I will try
> to answer your questions.
> Corrections:
> 1. L4 IPC speed has relatively little to do with the "direct lookup"
> aspect of thread ids. Any indirect encoding will carry a cost, but in
> terms of the overall performance of IPC this cost is quite small.

That is wrong.  The direct lookup drastically reduces cache and TLB
footprint.  For a full IPC we have to access two TCBs (which are
virtually mapped and have the stack in the same page) which costs two
TLB entries.  The complete lookup is therefore a simple mask (plus maybe
a shift), a relative mov (e.g. mov 0xe0000000(%eax), %ebx) and a
compare.  Overall costs therefore (on IA32):
- 2 TLB entries (but we need them anyway for the stack, they could be
reduced to one TLB entry when using 4M pages for all TCBs, but that
would add an indirection table and therefore a cache line); refetch
costs ~80 cycles/entry
- shift and move (~3 cycles)
- 1 cache line for the thread id (which is shared with thread state

Assume you add 2 more TLB entries and 5 more L2 cache lines--your
aftercosts for IPC go up by 2*80 + 5*80 = 560 cycles.
Considering overall IPC costs of 1000 cycles on a P4 with all those
nasty cache and TLB flushes you add an overhead of >50%.

> 2. Security checks done in the server vs. the kernel are not 
> necessarily slower or faster. It depends greatly on what security 
> checks you wish to do. I argue that:
>    a) *many* (not all) of the security checks currently done in L4
>       servers could be eliminated if kernel-protected bits existed
>       in each descriptor
>    b) For some types of systems (capability systems), disclosure of
>       the sender id is an absolute violation of design requirements,
>       so any microkernel that relies exclusively on server-side
>       security checks based on sender-id is not a universal 
> microkernel.
>    c) More specifically, any microkernel that requires checks based
>       on sender-id is entirely unsuitable as a platform for EROS-NG.

And as you stated this is _also_ a limited view, because you only look
from the EROS and capability point of view.  By throwing away one
register for an identifier you reduce your register real estate by 33%.
That can be particularly hurtful in the local case, where a very simple
check (or none) is sufficient.  
And the argument that everything should go in memory (one of your last
emails) is not convincing--register-based IPC is still much faster, it
is mostly a question of a reasonable IDL compiler.  And we are all aware
that IA32 is crippled from that perspective.  Take any other
architecture (worst case: IA64) and the argument becomes complete bogus.

> Ignoring clans and chiefs (which we all agree is too expensive and
> inflexible), here is how the three schemes break down:
> Thread IDs:
>    No restriction who can send.
>    Server makes decisions based on sender-id

You forgot sender restriction and redirection.

> Hybrid:
>    Sender can only invoke a thread descriptor that is mapped in their
>       thread descriptor space (thread space)
>    Server makes decisions based on either (a) a field that is encoded
>       within the descriptor, or (b) the sender-id.
>    ** Sender-id is software controlled by the thread manager, and can
>       be set to zero for all threads to simulate capability behavior.

A possibility you did not mention is a hybrid thread id which has an
thread and a descriptor part.  The descriptor is kernel enforced.  That
is what we currently have in V4 (please not that V2 and X.0 are
completely outdated!!!)--the version part of the thread id.

> In the thread-IDs design, there are two distinguished phases in the
> server-side security checks:
>    1. Object resolution. Based on sender-id and arguments, determines
>       the identity and permissions of the server-implemented object
>       that has been invoked. This phase may conclude that no such
>       object exists.
>    2. Permissions check. Given the object identity and permissions,
>       make a decision about whether the particular operation is to be
>       permitted.
> All of the bits needed for phase 2 can be encoded in the 
> descriptor. All of the expensive parts of the current L4 protocol 
> lie in phase 1 (object resolution).

See above.  Furthermore, your suggestion is to move that part into the
kernel... Then the overhead is on _every_ invocation, not just for the
once where you need it.

> > Now I want to make clear why capablities are much better 
> than virtual thread 
> > objects:
> > 
> > * The extra word does not seem to decrease performance in 
> any way (is this 
> > true?) so it a free feature, that can be used but doesn't have to.
> I believe that this is true, and the evidence of the EROS 
> implementation seems to support this view.

Where are benchmarks with cold caches?  Do you have a detailed analysis
of the cache and TLB footprint?

> This is a possible usage. In our experience, the more common 
> behavior is
> to have a pointer to some data structure that describes a server
> implemented object (i.e. has nothing to do with any 
> particular client),
> and reuse the low bits for permissions. For example, the pointer might
> point to a file metadata structure, and the low bits might 
> indicate read and write permissions.

In our case we use an object identifier in the message, which is a
handle to an object descriptor and do a reverse check (i.e. if the
thread is allowed to invoke that object).  Costs: boundary check (can be
a simple AND) one MOV and a CMP.  The permissions can go into the same
cache line.

Happy new year!!!

- Volkmar

More information about the l4-hackers mailing list