L4, High Assurance, and Protection

5 Jan 2004

      In this mail I am trying to address two very specific problems in L4
that are impediments in high-assurance certification. While fixing these
issues would improve the likelihood of supporting a system like EROS on
L4, both issues need to be addressed even if EROS is not a consideration
at all.

I should explain that I have been involved in some high assurance work
in connection with a couple of consulting clients. In each case, L4 was
seriously considered and was discarded because of the issues I will
describe below. Setting aside any question of running EROS on top of L4,
I would like to see it possible for a good uKernel (L4) to be used as
the base of a serious high-assurance system.

I should also emphasize that I may be wrong about some details of L4,
and that if this is true I would like to be corrected!

Issue 1: Need for Protected IPC

On Thu, 2004-01-01 at 18:09, Volkmar Uhlig wrote:
...
If it does not matter where the check code is executed
(client/kernel/server) but only who gets accounted for the used
resources (CPU, cache, whatever) this is not the case.  (I'm not
claiming L4 can do that (yet).)
In answer to Volkmar's question: it matters very fundamentally where the
check is performed. It is the difference between mandatory and
discretionary controls.

The current L4 IPC is discretionary: the sender can invoke anyone, and
the recipient checks the sender identity and decides whether to accept
the IPC.

There are a variety of secure system designs that impose some form of
mandatory information flow policy. These include multilevel secure
systems, but more generally, almost any system that uses reference
monitor(s) for any reason (including recoverability).

In such systems, it is a fundamental requirement that the reference
monitor be able to PREVENT (absolutely) communication between processes
that are under its control. If a sender S is not permitted to send to a
recipient R, the behavior must be exactly as if the send was performed
to a non-existent ID. 

Such systems include any system that seeks A1, EAL6, or EAL7
certification (or equivalent -- including FAA level A and British and
German equivalents). In a system of this sort, the sender is not even
permitted to know of the *existence* of R indirectly. Disclosing
recipient thread-ids indirectly discloses this information (it is a
failure of encapsulation). This means that the current L4 architecture
cannot obtain an EAL6 or EAL7 certification today.

Setting aside any question of supporting EROS, this inability to meet
high assurance security requirements violates the "L4 is universal"
argument in a very basic way. This issue and one other (discussed below)
have forced me (as a consultant) to advise two large and serious US
companies against using L4 in high-assurance products. [To be clear:
EROS wasn't even a candidate. L4 was frustratingly close to possible,
and I would have loved to recommend L4.]

The main change that is needed in L4 to resolve this is to define the
recipient-id and sender-id fields as opaque fields. Under high-assurance
requirements, the sender is not entitled to know how many threads
execute within the recipient. The sender-id and recipient-id therefore
must not encode "thread within process". Similarly, it must not encode
"process ID".

Of these two requirements, I suspect that eliminating "thread within
process" is the harder part. If the recipient-id and sender-id today
were simply process id's, L4 could architecturally redefine them as
opaque values. A low-assurance implementation could simply use the
process control block address as the value, while a high-assurance
implementation would implement software protection on the value.

While I do not advocate any particular implementation strategy, let me
give an *example* of one that might suffice: a simple hash table.
Instead of using the PCB address as the process ID, the kernel could use
H(sender kernel ID, requested-recipient-ID) as an index into a hash
table and perform a single indirection (and possible hash bucket
chasing) to find the process address. This is fundamentally the design
proposed by Trent several years ago.

This implementation would clearly be more expensive than the current one
(hash computation, extra TLB miss into the indirection table, extra
D-cache reference), but I think that it is cheaper than thread spaces.

Perhaps there is some simpler solution to this. If so, I would be *very*
interested to know, because I would like to be able to help some of my
clients!

2: Restricted Mapping

This is a very small issue, and there may be some way around it.

In general, I like the map/grant model very much, but in some systems it
is necessary for the manager to know who has what mappings. In these
systems, having applications perform mappings directly to each other
creates a consistency problem.

I can imagine two ways to enforce this policy:

  1. Introduce a bit somewhere in the protected thread descriptor
     that prevents map/grant operations.

  2. Introduce a bit in the map descriptor indicating that the
     recipient may not perform further map/grant operations from
     this region.

My preference would be to have *both* controls, because they serve
slightly different purposes. The first lets me virtualize the map/grant
operations (in order to keep manager metadata updated), while the second
prevents map recursion.

The problem with the first method is that it requires protected
recipient descriptors.

The problem with implementing only the second method is: what thread id
should a page fault handler receive from a faulting thread? Restricted
or non-restricted? I can argue for either depending on what kind of
system I am trying to build.

shap

L4, High Assurance, and Protection

Jonathan S. Shapiro