Hi,
Last week when Neal was visiting Dresden, he explained how rights amplification (i.e. passing capabilities as arguments to capability object invocation) doesn't work with the current L4 capability framework. The specific example he gave was that of a memory server which provides so-called memory containers and the ability to copy memory between memory containers using something like:
container_copy (A, a_start, B, b_start, length)
where A and B are capabilities/end points presumably to the memory containers.
To perform a container_copy, the client would send an IPC to the capability/end point A and map B to the server. The server is told the id of the receive end of the end point A and thus can easily find the server object associated with A, however, it has no way to derive the receive id of B (even though it serves B) and thus has no way to find the associated server object.
The following discussion presents the problem more generally, shows that the above is a special case of a larger problem and presents a solution which minimally modifies the current L4 capability proposal. The main purpose of the suggested solution is to further illustrate the semantics that we consider necessary, and to start a discussion.
Rights Amplification and future L4 designs (Marcus Brinkmann, Neal Walfield) ============================================================================
Not much can be said about a capability system without first determining how capabilities are represented. The current proposals assume that server objects will be represented by communication end points (more precisely, by the receive end). The send sides of the end points then assume the role of the capability. This is not the only solution. Other primary (kernel) objects following the map/grant/unmap semantics could be used to represent objects and capabilities.
Within this framework, the only directly supported operations on such capabilities are map, grant, unmap and IPC. The map operation can be used to delegate the capability, grant can be used to move the capability and unmap to revoke it. The IPC operation allows clients to send messages to an object, for example, to invoke operations on the object.
The server can identify the object only when it receives a message on it: on message receipt, the receive side of the end point is presented to the receiving thread. There exists (by assumption) a one-to-one relationship in the server between receive sides of end points and server objects. Thus, given the receive end point, the server can determine to which object any given message was sent.
(Note that if you do not use communication end points as object identifiers via a 1:1 relationship, determining to which object a message should go is an open question whose answer depends on how objects identifiers are represented in the first place.)
The problem for us is that identifying the object only in the server and only when a message is sent to the object is not sufficient to implement the system we have in mind. In general, we need to be able to know if two capabilities (that is, mappings), are, from a certain perspective, "identical". Identical in this context means that one of the mappings is an ancestor of the other mapping, i.e. that both are associated with the same communcation end point. For our purposes, we don't need to compare _any_ two mappings to the same communication end point; we only need to be able to compare two mappings in the case that one is an ancestor of the other. The following examples will make it clearer.
Usage scenario 1: Operations on multiple objects.
Consider a server providing multiple objects of the same or different types and operations which require access to more than one of the objects at the same time, i.e. right amplification. The classic example is that of the can and the can opener: an operation such as "open_can (can_opener_t opener, can_t can);" or equivalently "open_can (can_t can, can_opener_t opener);" can only be invoked on either of the two objects but requires capabilities to both. The second object is passed as an argument in the RPC.
The server receives the message on one of the two objects which it can readily identify but is now in the peculiar situation of needing to identify the other object. Within the proposed framework, the only low-level mechanism that seems feasible at all here is that the caller _maps_ the second object to the server so as to prove that it has the capability in question. This proof is without meaning if the server cannot establish that this mapped capability is indeed associated with an object it provides; the server must be able to lookup the object based on the mapping from the caller.
Usage scenario 2: Reference counting.
Modern microkernel based operating systems usually ask for explicit creation and destruction of resources, and rightly so. But at least for legacy support it must be possible to implement reference counting on top of them. Because references must be deallocated automatically on task death, the support for this must be integrated---at least to some extent---into the trusted computing base.
Within the initial assumptions, a simple design for a reference counting server readily suggests itself: a reference counting server positions itself---mapping-wise---between a server and its clients. The servers map capabilities to the reference counter which in turn maps them to clients doing the necessary accounting along the way.
There are a number of protocols which can be used to copy references from one task to another, etc. The following situation arises when copying a reference from a client, A, to a second client, B:
A server, S, has mapped a capability (to a server object) to the reference counter C which has in turn mapped the capability to A. A then maps the capability to B thereby delegating it. B now wants to acquire its own reference and mapping from the reference counter in order to become independent of A.
Situation: S -> C -> (1 reference) A -> B
Goal: /-> (1 reference) A S-> C -> (1 reference) B
So, B makes an RPC to C:
insert_reference (reference_space_t space, cap_t cap, cap_t *new_cap);
(The reference space is an object provided by the reference counter and is a name space for all the capabilities to which B has references.) What is important here is that this operation takes a capability as an argument. In this way, B proves that it has access to the capability it got from A. The reference counter will reply with a new mapping that B can use which is independent of the mapping that it got from A - if B then unmaps the mapping it got from A, the situation is symmetrical and the goal accomplished.
Shifting perspective from B to the reference counter, the reference counter is in a peculiar position: it is _not_ the server providing the capability CAP. Also, it didn't give a mapping for the capability to client B previously. So it is in a totally different position than the can opener server above: the reference counter sees only send sides, not receive sides, of the communication end points for which it manages references and furthermore it receives capabilities as arguments from tasks to which it didn't previously give mappings.
However, there is still some structure left: the reference counter is a mapper of the capability CAP itself: it has a mapping of this capability which it got directly from the server. In addition, the mapping CAP is derived from this mapping in the reference counter itself via A. This is what we want to exploit: we want a way for the reference counter to find the mapping it got from the server for this object based on the mapping it got from B for the same object.
The solution:
Finding a solution to usage scenario 1 is not too difficult. One approach is protected payloads (a data word chosen by the server) which can be associated with the receive side of a communication end point and which can be read out based on the send side (the mapping, aka capability) if you are sufficiently privileged (i.e., if the receive side resides in the same address space). The protected payload can be set to an internal OID.
This solution is inadequate for the second usage scenario: the thread doing the lookup (in the reference counter) did not create the communication end point and thus had no say in the protected payload. Moreover, it shouldn't be able to read it out even if there was one. So, the protected payload idea doesn't seem sufficient.
There seems to be a tangible solution though: if there was a system call which allowed the caller to check if a mapping was derived from another mapping in the same address space, then we can use that to "unroll" mapping loops like the one in the first scenario, i.e. Server -> Client -> Server, or in the second, i.e. Server -> Reference Counter -> Client A -> Client B -> Reference Counter. More specifically:
Each mapping is identified in the mapper's address space via a virtual address (or index, or slot number, ...). Then there could be a kernel system call
vaddr_t map_lookup (vaddr_t addr);
which walks up the mapping tree, starting at the mapping entry for ADDR and going to the parent of the current node in each iteration until a mapping in the same address space is encountered. The vaddr of this mapping is returned.
Solution in usage scenario 1:
Say the send side of the communication point created in the server for the object is located at the vaddr 10. The server records this information in some sort of table. Then the capability is mapped to the client. The client maps it back into the server as part of an RPC. This mapping from the client is installed into the server at vaddr 67. Now the server does:
map_lookup (67) -> 10
The map_lookup in the kernel starts from the mapping the server has to the client. It then goes to the parent of that mapping which is the mapping B has from the server. B has a different address space from the server, so the kernel doesn't stop there and continues to the next parent. The next parent is the mapping the server has from the kernel (the communication end point). As address spaces match, the vaddr of this mapping, 10, is returned.
The server can then look up the object data for the object associated with the vaddr 10.
Solution in usage scenario 2:
The server creates a communication end point and maps it to the reference counter which receives it, at say, vaddr 23. The reference counter writes in a table that vaddr 23 corresponds to this server's capability. Now A gets a mapping, delegates it to B, which in turn delegates it to the reference counter which receives it at, for example, vaddr 127. The reference counter does:
map_lookup (127) -> 23
The map_lookup operation in the kernel starts with the mapping the reference counter has from the client B. It then walks up the tree first finding the mapping client B has from A, then the mapping client A has from the reference counter and eventually the mapping the reference counter has from the server. The reference counter address space is found and thus the vaddr of this mapping from the server is returned to the reference counter.
The reference counter can then record a reference and create a mapping from the mapping "23" to the client B thus creating a symmetrical situation between client B and A.
Conclusion ----------
The operation map_lookup achieves our goal of being able to implement the desired functionality. Note that it does not require the kernel to search the whole mapping tree: the tree is only walked from the current node up until the root; it is never walked down or sideways. This means that, for example, if the reference counter has multiple mappings from the server for the same communication point, it will see them all as being distinct (at least it can not establish the identity). It also means that if B gets a direct mapping from the server without going through the reference counter, then the reference counter will not be able to identify this capability as belonging to the same communication point as the mapping it already has.
All of these limitations are perfectly acceptable and indeed beneficial. We don't want global identification; we just want to be able to detect mappings _back_ to us which somehow originated from us.
I said this should be a system call but in fact that is an implementation detail. It may just as well be done via some other mechanism, for example, in transit during an IPC, if requested. That would actually potentially help.
It's unfortunate that the tree has to be walked. Information that is associated strictly with the receiving side ("protected payloads" above) is faster to lookup and has a fixed cost, while walking a tree to the root grows with the depth of the tree. For a negative reply, the whole depth of the tree up to the root has to be walked. In common usage scenarios, however, the tree will not be very deep. I am no expert on mapping databases, so maybe playing around with the data structures can further optimize this path (for example, keeping linked lists per address space in the mapping tree would allow fast lookup of anchestors within the same address space, but my feeling is that maintaining such lists would be more expensive than doing the lookup when it is needed only).
There may be security implications in revealing mapping "identities" at all. Maybe this feature needs to be restricted for confined tasks. However, some sort of rights amplification is essential and it is OK to ask for the cooperation of all tasks involved in determining the identity. If for example client A in the reference counter example can deny the map_lookup through its branch of the mapping tree then the reference counter will simply deny its service. This is perfectly acceptable to me; the important thing is that it must not be possible to trick some other task in believing an identity that doesn't exist: to hide identities is fine.