New subject: Introducing a cmp() operation

9 Jun 2005


      Hi,
Last week when Neal was visiting Dresden, he explained how rights
amplification (i.e. passing capabilities as arguments to capability
object invocation) doesn't work with the current L4 capability
framework.  The specific example he gave was that of a memory server
which provides so-called memory containers and the ability to copy
memory between memory containers using something like:
container_copy (A, a_start, B, b_start, length)
where A and B are capabilities/end points presumably to the memory
containers.
To perform a container_copy, the client would send an IPC to the
capability/end point A and map B to the server.  The server is told
the id of the receive end of the end point A and thus can easily find
the server object associated with A, however, it has no way to derive
the receive id of B (even though it serves B) and thus has no way to
find the associated server object.
The following discussion presents the problem more generally, shows
that the above is a special case of a larger problem and presents a
solution which minimally modifies the current L4 capability proposal.
The main purpose of the suggested solution is to further illustrate
the semantics that we consider necessary, and to start a discussion.
Rights Amplification and future L4 designs (Marcus Brinkmann, Neal Walfield)
============================================================================
Not much can be said about a capability system without first
determining how capabilities are represented.  The current proposals
assume that server objects will be represented by communication end
points (more precisely, by the receive end).  The send sides of the
end points then assume the role of the capability.  This is not the
only solution.  Other primary (kernel) objects following the
map/grant/unmap semantics could be used to represent objects and
capabilities.
Within this framework, the only directly supported operations on such
capabilities are map, grant, unmap and IPC.  The map operation can be
used to delegate the capability, grant can be used to move the
capability and unmap to revoke it.  The IPC operation allows clients
to send messages to an object, for example, to invoke operations on
the object.
The server can identify the object only when it receives a message on
it: on message receipt, the receive side of the end point is presented
to the receiving thread.  There exists (by assumption) a one-to-one
relationship in the server between receive sides of end points and
server objects.  Thus, given the receive end point, the server can
determine to which object any given message was sent.
(Note that if you do not use communication end points as object
identifiers via a 1:1 relationship, determining to which object a
message should go is an open question whose answer depends on how
objects identifiers are represented in the first place.)
The problem for us is that identifying the object only in the server
and only when a message is sent to the object is not sufficient to
implement the system we have in mind.  In general, we need to be able
to know if two capabilities (that is, mappings), are, from a certain
perspective, "identical".  Identical in this context means that one of
the mappings is an ancestor of the other mapping, i.e. that both are
associated with the same communcation end point.  For our purposes, we
don't need to compare _any_ two mappings to the same communication end
point; we only need to be able to compare two mappings in the case
that one is an ancestor of the other.  The following examples will
make it clearer.
Usage scenario 1: Operations on multiple objects.
Consider a server providing multiple objects of the same or different
types and operations which require access to more than one of the
objects at the same time, i.e. right amplification.  The classic
example is that of the can and the can opener: an operation such as
"open_can (can_opener_t opener, can_t can);" or equivalently "open_can
(can_t can, can_opener_t opener);" can only be invoked on either of
the two objects but requires capabilities to both.  The second object
is passed as an argument in the RPC.
The server receives the message on one of the two objects which it can
readily identify but is now in the peculiar situation of needing to
identify the other object.  Within the proposed framework, the only
low-level mechanism that seems feasible at all here is that the caller
_maps_ the second object to the server so as to prove that it has the
capability in question.  This proof is without meaning if the server
cannot establish that this mapped capability is indeed associated with
an object it provides; the server must be able to lookup the object
based on the mapping from the caller.
Usage scenario 2: Reference counting.
Modern microkernel based operating systems usually ask for explicit
creation and destruction of resources, and rightly so.  But at least
for legacy support it must be possible to implement reference counting
on top of them.  Because references must be deallocated automatically
on task death, the support for this must be integrated---at least to
some extent---into the trusted computing base.
Within the initial assumptions, a simple design for a reference
counting server readily suggests itself: a reference counting server
positions itself---mapping-wise---between a server and its clients.
The servers map capabilities to the reference counter which in turn
maps them to clients doing the necessary accounting along the way.
There are a number of protocols which can be used to copy references
from one task to another, etc.  The following situation arises when
copying a reference from a client, A, to a second client, B:
A server, S, has mapped a capability (to a server object) to the
reference counter C which has in turn mapped the capability to A.  A
then maps the capability to B thereby delegating it.  B now wants to
acquire its own reference and mapping from the reference counter in
order to become independent of A.
Situation:  S -> C -> (1 reference) A -> B
Goal:             /-> (1 reference) A
            S-> C
                  -> (1 reference) B
So, B makes an RPC to C:
insert_reference (reference_space_t space, cap_t cap, cap_t *new_cap);
(The reference space is an object provided by the reference counter
and is a name space for all the capabilities to which B has
references.)  What is important here is that this operation takes a
capability as an argument.  In this way, B proves that it has access
to the capability it got from A.  The reference counter will reply
with a new mapping that B can use which is independent of the mapping
that it got from A - if B then unmaps the mapping it got from A, the
situation is symmetrical and the goal accomplished.
Shifting perspective from B to the reference counter, the reference
counter is in a peculiar position: it is _not_ the server providing
the capability CAP.  Also, it didn't give a mapping for the capability
to client B previously.  So it is in a totally different position than
the can opener server above: the reference counter sees only send
sides, not receive sides, of the communication end points for which it
manages references and furthermore it receives capabilities as
arguments from tasks to which it didn't previously give mappings.
However, there is still some structure left: the reference counter is
a mapper of the capability CAP itself: it has a mapping of this
capability which it got directly from the server.  In addition, the
mapping CAP is derived from this mapping in the reference counter
itself via A.  This is what we want to exploit: we want a way for the
reference counter to find the mapping it got from the server for this
object based on the mapping it got from B for the same object.
The solution:
Finding a solution to usage scenario 1 is not too difficult.  One
approach is protected payloads (a data word chosen by the server)
which can be associated with the receive side of a communication end
point and which can be read out based on the send side (the mapping,
aka capability) if you are sufficiently privileged (i.e., if the
receive side resides in the same address space).  The protected
payload can be set to an internal OID.
This solution is inadequate for the second usage scenario: the thread
doing the lookup (in the reference counter) did not create the
communication end point and thus had no say in the protected payload.
Moreover, it shouldn't be able to read it out even if there was one.
So, the protected payload idea doesn't seem sufficient.
There seems to be a tangible solution though: if there was a system
call which allowed the caller to check if a mapping was derived from
another mapping in the same address space, then we can use that to
"unroll" mapping loops like the one in the first scenario, i.e. Server
-> Client -> Server, or in the second, i.e. Server -> Reference
Counter -> Client A -> Client B -> Reference Counter.  More
specifically:
Each mapping is identified in the mapper's address space via a virtual
address (or index, or slot number, ...).  Then there could be a kernel
system call
vaddr_t map_lookup (vaddr_t addr);
which walks up the mapping tree, starting at the mapping entry for
ADDR and going to the parent of the current node in each iteration
until a mapping in the same address space is encountered.  The vaddr
of this mapping is returned.
Solution in usage scenario 1:
Say the send side of the communication point created in the server for
the object is located at the vaddr 10.  The server records this
information in some sort of table.  Then the capability is mapped to
the client.  The client maps it back into the server as part of an
RPC.  This mapping from the client is installed into the server at
vaddr 67.  Now the server does:
map_lookup (67) -> 10
The map_lookup in the kernel starts from the mapping the server has to
the client.  It then goes to the parent of that mapping which is the
mapping B has from the server.  B has a different address space from
the server, so the kernel doesn't stop there and continues to the next
parent.  The next parent is the mapping the server has from the kernel
(the communication end point).  As address spaces match, the vaddr of
this mapping, 10, is returned.
The server can then look up the object data for the object associated
with the vaddr 10.
Solution in usage scenario 2:
The server creates a communication end point and maps it to the
reference counter which receives it, at say, vaddr 23.  The reference
counter writes in a table that vaddr 23 corresponds to this server's
capability.  Now A gets a mapping, delegates it to B, which in turn
delegates it to the reference counter which receives it at, for
example, vaddr 127.  The reference counter does:
map_lookup (127) -> 23
The map_lookup operation in the kernel starts with the mapping the
reference counter has from the client B.  It then walks up the tree
first finding the mapping client B has from A, then the mapping
client A has from the reference counter and eventually the mapping
the reference counter has from the server.  The reference counter
address space is found and thus the vaddr of this mapping from the
server is returned to the reference counter.
The reference counter can then record a reference and create a mapping
from the mapping "23" to the client B thus creating a symmetrical
situation between client B and A.
Conclusion
----------
The operation map_lookup achieves our goal of being able to implement
the desired functionality.  Note that it does not require the kernel
to search the whole mapping tree: the tree is only walked from the
current node up until the root; it is never walked down or sideways.
This means that, for example, if the reference counter has multiple
mappings from the server for the same communication point, it will see
them all as being distinct (at least it can not establish the
identity).  It also means that if B gets a direct mapping from the
server without going through the reference counter, then the reference
counter will not be able to identify this capability as belonging to
the same communication point as the mapping it already has.
All of these limitations are perfectly acceptable and indeed
beneficial.  We don't want global identification; we just want to be
able to detect mappings _back_ to us which somehow originated from us.
I said this should be a system call but in fact that is an
implementation detail.  It may just as well be done via some other
mechanism, for example, in transit during an IPC, if requested.  That
would actually potentially help.
It's unfortunate that the tree has to be walked.  Information that is
associated strictly with the receiving side ("protected payloads"
above) is faster to lookup and has a fixed cost, while walking a tree
to the root grows with the depth of the tree.  For a negative reply,
the whole depth of the tree up to the root has to be walked.  In
common usage scenarios, however, the tree will not be very deep.  I am
no expert on mapping databases, so maybe playing around with the data
structures can further optimize this path (for example, keeping linked
lists per address space in the mapping tree would allow fast lookup of
anchestors within the same address space, but my feeling is that
maintaining such lists would be more expensive than doing the lookup
when it is needed only).
There may be security implications in revealing mapping "identities"
at all.  Maybe this feature needs to be restricted for confined tasks.
However, some sort of rights amplification is essential and it is OK
to ask for the cooperation of all tasks involved in determining the
identity.  If for example client A in the reference counter example
can deny the map_lookup through its branch of the mapping tree then
the reference counter will simply deny its service.  This is perfectly
acceptable to me; the important thing is that it must not be possible
to trick some other task in believing an identity that doesn't exist:
to hide identities is fine.