In this mail I am trying to address two very specific problems in L4 that are impediments in high-assurance certification. While fixing these issues would improve the likelihood of supporting a system like EROS on L4, both issues need to be addressed even if EROS is not a consideration at all.
I should explain that I have been involved in some high assurance work in connection with a couple of consulting clients. In each case, L4 was seriously considered and was discarded because of the issues I will describe below. Setting aside any question of running EROS on top of L4, I would like to see it possible for a good uKernel (L4) to be used as the base of a serious high-assurance system.
I should also emphasize that I may be wrong about some details of L4, and that if this is true I would like to be corrected!
Issue 1: Need for Protected IPC
On Thu, 2004-01-01 at 18:09, Volkmar Uhlig wrote:
If it does not matter where the check code is executed (client/kernel/server) but only who gets accounted for the used resources (CPU, cache, whatever) this is not the case. (I'm not claiming L4 can do that (yet).)
In answer to Volkmar's question: it matters very fundamentally where the check is performed. It is the difference between mandatory and discretionary controls.
The current L4 IPC is discretionary: the sender can invoke anyone, and the recipient checks the sender identity and decides whether to accept the IPC.
There are a variety of secure system designs that impose some form of mandatory information flow policy. These include multilevel secure systems, but more generally, almost any system that uses reference monitor(s) for any reason (including recoverability).
In such systems, it is a fundamental requirement that the reference monitor be able to PREVENT (absolutely) communication between processes that are under its control. If a sender S is not permitted to send to a recipient R, the behavior must be exactly as if the send was performed to a non-existent ID.
Such systems include any system that seeks A1, EAL6, or EAL7 certification (or equivalent -- including FAA level A and British and German equivalents). In a system of this sort, the sender is not even permitted to know of the *existence* of R indirectly. Disclosing recipient thread-ids indirectly discloses this information (it is a failure of encapsulation). This means that the current L4 architecture cannot obtain an EAL6 or EAL7 certification today.
Setting aside any question of supporting EROS, this inability to meet high assurance security requirements violates the "L4 is universal" argument in a very basic way. This issue and one other (discussed below) have forced me (as a consultant) to advise two large and serious US companies against using L4 in high-assurance products. [To be clear: EROS wasn't even a candidate. L4 was frustratingly close to possible, and I would have loved to recommend L4.]
The main change that is needed in L4 to resolve this is to define the recipient-id and sender-id fields as opaque fields. Under high-assurance requirements, the sender is not entitled to know how many threads execute within the recipient. The sender-id and recipient-id therefore must not encode "thread within process". Similarly, it must not encode "process ID".
Of these two requirements, I suspect that eliminating "thread within process" is the harder part. If the recipient-id and sender-id today were simply process id's, L4 could architecturally redefine them as opaque values. A low-assurance implementation could simply use the process control block address as the value, while a high-assurance implementation would implement software protection on the value.
While I do not advocate any particular implementation strategy, let me give an *example* of one that might suffice: a simple hash table. Instead of using the PCB address as the process ID, the kernel could use H(sender kernel ID, requested-recipient-ID) as an index into a hash table and perform a single indirection (and possible hash bucket chasing) to find the process address. This is fundamentally the design proposed by Trent several years ago.
This implementation would clearly be more expensive than the current one (hash computation, extra TLB miss into the indirection table, extra D-cache reference), but I think that it is cheaper than thread spaces.
Perhaps there is some simpler solution to this. If so, I would be *very* interested to know, because I would like to be able to help some of my clients!
2: Restricted Mapping
This is a very small issue, and there may be some way around it.
In general, I like the map/grant model very much, but in some systems it is necessary for the manager to know who has what mappings. In these systems, having applications perform mappings directly to each other creates a consistency problem.
I can imagine two ways to enforce this policy:
1. Introduce a bit somewhere in the protected thread descriptor that prevents map/grant operations.
2. Introduce a bit in the map descriptor indicating that the recipient may not perform further map/grant operations from this region.
My preference would be to have *both* controls, because they serve slightly different purposes. The first lets me virtualize the map/grant operations (in order to keep manager metadata updated), while the second prevents map recursion.
The problem with the first method is that it requires protected recipient descriptors.
The problem with implementing only the second method is: what thread id should a page fault handler receive from a faulting thread? Restricted or non-restricted? I can argue for either depending on what kind of system I am trying to build.
shap