-----Original Message----- From: Jonathan S. Shapiro [mailto:shap@eros-os.org] Sent: Thursday, December 11, 2003 9:57 AM
By all means we should discuss, but let me attempt to clarify.
ALL communications are invocations on objects. The only questions that exist in principle are:
What restrictions are imposed on the TYPE of object that can be invoked?
Is the object namespace extensible by user-mode code. That is, can user-mode servers present objects or interfaces that appear to the invoker to be "first class" in the same sense that kernel supported objects are first class.
L4 imposes the restriction that the only invocable object type is "process" (or in some cases thread).
We only have threads, there is no notion of processes (I'm talking about V4/X.2).
L4 does *not* (today) provide means to allow a server to extend the object name space.
There is no reason, "objects" are user-managed things. Allowing extension doesn't bring you any benefit. Transparency can be implemented in a user-level library (without any overhead). How I understood your description is that you cache information about what user object types you have in the kernel. That costs you another check on the critical path.
You argue that single method of invocation is a good thing--yes. We have that, it is called IPC.
For all kernel objects (we have threads, AS, and memory) I don't see a benefit of a unified interface. That leads to IOCTL and that is a mess. If you want to use a generic interface (i.e. IPC) to _all_ kernel objects you can always place a protocol translation layer in between at almost no cost (threads and AS are manipulated very infrequently). This allows for optimizations you can't perform otherwise.
My first point is probably self-explanatory, so I will expand only on the second.
In L4, if a client wishes to perform an operation on a file, the "name" of the file must be passed as an argument to an IPC. The invocation is something like:
file_server->invoke(file-id, operation-id, ... other args ...)
Because "file" is not a kernel-supported object, the protocol mandates that the sender provide an additional argument in the IPC invocation. In the EROS philosophy, we would argue that these objects are therefore "second class" and that this is bad for several reasons:
- The invoker should not know the server identity. That should be known only to the file object.
- It is difficult to transparently virtualize objects when their invocation patterns are different.
1) Assume threads have zero costs. That means you can have one thread per object and the thread id is your identifier.
2) Assume you want to share a thread for multiple object invocations (and threads have costs > 0). The additional check you have to perform on _every_ object invocation (code for checking, table access, I and D-cache footprint etc) in EROS has probably a higher overhead than the additional parameter on the invocation which is untouched by the kernel. For concurrency (multiple worker threads) you can use LIPC in L4 so you can perform load distribution in user land.
3) Virtualizing objects is only a question of protocols. That is something _userland_ defines. So why should virtualization be difficult??? (I'm puzzled). Furthermore, how do you deal with situation where your object space is full? Is the object space local or global? We can dynamically extend and shrinl the object space (e.g. use a single bit if we have only 2 object types).
Next problem:
The server must then run some function:
get_permissions(sender-id, file-id) -> permissions
to determine what operations are permitted. Note that if this operation is performed faithfully and correctly, it is impossible to emulate correctly the behavior of the UNIX I_SENDFD socket operation without many additional calls to a shared service -- the design of the operation makes descriptor transfer an inherently expensive operation.
I would say that is a weak argument considering all the shortcomings of the POSIX API. Implementing fork within a distributed system is very expensive--so what? We know for more than 10 years that fork is broken. I will look into I_SENDFD into more detail and try to give you a satisfactory answer.
Descriptors, which *can* be used as a foundation for certain kinds of security, suddenly become extremely inefficient because they cannot be passed without consulting a third party.
You can cache that information in user land after validation or use shared memory with the authentication server. That is what you basically do in the kernel.
The server-defined-bits portion cannot be examined by the client. It is provided to the server during invocation. The server can interpret these bits in any way desired: as an object id, as a facet id, as permission bits, as some mix of these.
The presence of these bits does not preclude invocation of the server qua server; the server merely assigns to itself an arbitrarily chosen value of "server-defined-bits" to name the server itself.
So you provide an in-kernel cache for some identifiers (call it bits) which are unforgeable. How much of your register real estate do you give up for that? What when the size is exceeded?
Because these are architecturally insufficient to implement an efficient, secure, object-based operating system.
Hmm, actually that is a question of what you try to implement. What if you don't want an object-based OS? Do you incur a significant overhead with your model? I'm curious how a Linux kernel would perform on top of EROS--I could imagine that your security model has a measurable overhead. And then we have to start a discussion on generality of uKs.
Our experience has been that relying on such clients to specify the intended operation is not robust. The flow of permissions in complex programs is not well localized, and it is very easy to write a subroutine designed for one purpose that does some mildly dangerous thing and then call it (by programmer error) in the middle of some sequence of code where care is required.
Tying permissions to the object descriptor does not prevent the programmer from passing the wrong descriptor, but it does help a great deal in localizing the scope of programmer attention that is required to resolve these problems.
This sounds like you are suggesting kernel design based on bad programming habits. Are you willing to pay the overhead? We don't.
- Volkmar
"Volkmar Uhlig" volkmar@ira.uka.de writes:
Next problem:
The server must then run some function:
get_permissions(sender-id, file-id) -> permissions
to determine what operations are permitted. Note that if this operation is performed faithfully and correctly, it is impossible to emulate correctly the behavior of the UNIX I_SENDFD socket operation without many additional calls to a shared service -- the design of the operation makes descriptor transfer an inherently expensive operation.
I would say that is a weak argument considering all the shortcomings of the POSIX API. Implementing fork within a distributed system is very expensive--so what? We know for more than 10 years that fork is broken. I will look into I_SENDFD into more detail and try to give you a satisfactory answer.
This is related to the problem the L4/Hurd people discussed some month ago. They also have the problem how to transfer access rights from one thread to another in a save way. If I remember correctly they came up with a protocol solving this problem. Maybe a short review of this discussion will help.
Regards, Jean
This is a response to several messages (from Volkmar, Rudy, Hermann) at once. The delay has partly been due to other demands on my time, and partly because I wanted to consider how to answer.
First, let me make sure that we are debating the same issue by giving it a precise description.
Currently, L4 invocations invoke:
thread-id
There is a proposal for thread address spaces. Under this proposal, the invocation argument becomes an *index* (equivalently: an address) for a thread-id. I will write this as:
[thread-id]
Note that once the indexing mechanism is in place, the no longer has access to the thread-id's per se. Thus, semantically, the ID bits no longer name a thread from the application perspective -- this is strictly a detail of implementation. From a semantics perspective, it is clearer to rewrite this as:
[server-id]
This leaves us the freedom to change later how "server-id" is demultiplexed, e.g. in order to have a default demultiplexing policy for multithreaded services if one were ever desired.
Today, when an L4 client wishes to invoke an object, it performs an IPC of the form:
IPC : [server-id], object-id, { args ...} => [caller-id], principal-id, object-id, { args... }
our debate is whether we should consider adding a server-controlled ID field into the descriptor. To avoid confusion, I will call this new ID the "if-id" (for "interface-id"). This would revise the invocation above into:
IPC : [server-id, if-id], object-id, {args...} => [caller-id], principal-id, if-id, object-id, {args ...}
If this characterization does NOT capture the discussion, please read no further and let us first agree on what the question is. The balance of this note ASSUMES that this is a correct characterization of the question.
Separately, I am proposing that the revealed principal-id should be set in software by the thread manager, and should NOT be simply the sender thread-id. Current behavior can be maintained by setting principal-id=thread-id. EROS behavior requires setting principal-id to some fixed value shared by all threads.
I should emphasize that the term "interface-id" is quite misleading. Just as the interpretation of the object-id bits lies completely in the discretion of the server, so does the interpretation of the interface-id bits.
The critical difference is that the interface-id bits are guarded by the kernel on behalf of the service. The service therefore can rely on the fact that these bits have not been tampered with by the client, and can (depending on the interpretation assigned to these bits) omit any check of their security.
Volkmar has replied:
Allowing extension doesn't bring you any benefit. Transparency can be implemented in a user-level library (without any overhead). How I understood your description is that you cache information about what user object types you have in the kernel.
There *is* a clear advantage: these bits are guarded by the kernel, which eliminates the need for extra checks or awkward transfer protocols.
That costs you another check on the critical path.
From the description above, it should be clear that there is NO
additional check on the critical path. I suspect Volkmar is thinking of the capability type field, which is a completely separate issue, and one that I agree we should try to avoid.
Hermann has replied:
L4 does *not* (today) provide means to allow a server to extend the object name space.
But is allows servers to build arbitrary name spaces on top of L4. It is not kernel business to provide a name space for user-land objects. Name spaces are often defined by user-level standards (e.g., file ids).
I think that there is a second misunderstanding here. Nothing in my proposal alters this at all. The bits stored in the kernel are not interpreted by the kernel in any way. Therefore, the name space that they represent remains a user-land name space. They are merely *carried* by the kernel, protected on behalf of the server. You can think of them as a small piece of secure storage.
The problem is that in the *absence* of this secure storage, it is necessary to introduce complex multi-party protocols at user level in order to support descriptors correctly. L4 has embedded a policy that descriptor architectures should be penalized. Given the presence of this policy, no claim can be sustained that L4 is policy-neutral.
MOTIVATION:
The motivation for this feature is the need to be able to implement an access control model that is decidable and potentially correct. L4 today fundamentally does not support this efficiently. What Volkmar may not know is that this is also the ONLY reason that EROS was not built on L4 years ago.
Some time around 1995 or 1996, Jochen came to visit me at the University of Pennsylvania to explore several topics, among them moving EROS to L4. At the time, there seemed to be many impediments, but Leendert van Dorn and I would later resolve most of them in the paper design for the Obsidian kernel. The one matter that Leendert and I could NOT resolve was the absence of the interface-id bits and the (then) need to transition from "thread-id" to "[thread-id]". At the time, Trent had not yet started his work on IPC indirection.
As Volkmar says, UNIX fork() performance sucks, and the interface-id issue may not help -- there are already many IPCs that need to be done for UNIX fork(), and the extra ones needed to validate/cache the user-supplied object-id are not significant from a performance perspective.
However, Jeff is also right that I am describing lamda binding. This is fundamentally powerful, and Jeff is right that it is very useful in eliminating some important programming errors. The interface-id additionally improves end to end performance in a number of significant situations -- most notably checking of descriptor protection bits (e.g. read-only).
The EROS problem in particular is that descriptor copy is not an occasional thing. It is *ubiquitous*. Ever CALL/RETURN pair that we do transfers at least one descriptor, and our entire design rests on being able to examine the interface-id. We absolutely CANNOT replace this with a multi-IPC sequence that relies on some third party to validate a user-supplied argument.
Further, it is UNACCEPTABLE in the EROS design to perform ANY checking based on the sender-id. Indeed, if we were to re-implement EROS on top of L4, we would be forced to set the revealed sender-id to zero in all cases.
Ultimately, the L4 design has a deeply embedded assumption about access control: that access control should be performed based on subject ID. That is, it is an ACL design. ACLs have been formally proven to be a broken model for access control. I am advocating that L4 needs to adopt a change that will admit the possibility of implementing at least one access control model that is formally decidable and correct: capabilities.
I am trying to be very careful NOT to propose a change that will violate any of the current L4 programming model (at least, no more than a recompilation).
OTHER
[Volkmar:]
So you provide an in-kernel cache for some identifiers (call it bits) which are unforgeable. How much of your register real estate do you give up for that? What when the size is exceeded?
Register real-estate: I believe none. It is simply an additional word to be copied within the descriptor map/grant path.
When the size is exceeded, EROS falls back to a nasty hack that lets us extend this field to 48 bits. We have never found an application where 48 bits was insufficient. Beyond that, we would start using multiple, distinguished threads so that we could leverage the thread-id for additional bits.
If I had it to do again I would probably simply define this part of our descriptor to be 48 or 64 bits long. The need for the nasty trick is truly ugly.
[Volkmar:]
Because these are architecturally insufficient to implement an efficient, secure, object-based operating system.
Hmm, actually that is a question of what you try to implement. What if you don't want an object-based OS? Do you incur a significant overhead with your model? I'm curious how a Linux kernel would perform on top of EROS--I could imagine that your security model has a measurable overhead.
Now that the proposal has been articulated more clearly, are you still concerned about this? It is very difficult for me to imagine that adding 64 bits (max) to the IPC protocol payload would actually matter.
It certainly creates register pressure on the x86, but you might wish to have a look at:
http://www.eros-os.org/pipermail/eros-arch/2003-December/004249.html
We have decided that register-optimized transfer is probably a bad idea. Moving to a mapped page scheme essentially eliminates the register pressure, and probably simplifies the IDL code enough that it improves end to end invocation time.
[Volkmar:]
Our experience has been that relying on such clients to specify the intended operation is not robust. The flow of permissions in complex programs is not well localized, and it is very easy to write a subroutine designed for one purpose that does some mildly dangerous thing and then call it (by programmer error) in the middle of some sequence of code where care is required.
Tying permissions to the object descriptor does not prevent the programmer from passing the wrong descriptor, but it does help a great deal in localizing the scope of programmer attention that is required to resolve these problems.
This sounds like you are suggesting kernel design based on bad programming habits. Are you willing to pay the overhead? We don't.
One of Jochen's beliefs was that performance is more important then any other consideration. He passed this strong belief on to his students. In my opinion he was deeply wrong about this belief.
There are many kinds of overhead:
1. The difficulty of writing good programs using bad APIs is an overhead.
2. The fact that the resulting systems are demonstrably unsecurable, and that many of the most common problems can be traced to (1) is an overhead.
3. Performance cost is certainly an overhead.
.. and of course, lots of others
I believe that the correct overhead to optimize is the end to end runtime cost of a system measured in dollars, not cycles.
With that as preamble, let me answer your question:
If, at the performance cost of one or two additionally transferred words, we provide a foundation that can eliminate millions of dollars of daily security flaws, then I submit that this was a very good engineering decision, and yes, I think the "overhead" is justified.
If, at the performance cost of one or two additionally transferred words in the kernel we can eliminate complex validation code at user level in a significant number of cases, then yes, I believe that the "overhead" is good engineering -- in this case, even if it merely "breaks even".
From a research perspective, if at the performance cost of one or two
additionally transferred words in the kernel we create a platform that facilitates a much broader space of research operating systems, then UNQUESTIONABLY I believe that the "overhead" is justified.
And realistically, taking into account the cache line effects that will arise, we are probably not talking about more than one or two cycles. Given superscalar execution and the nature of the copy control loop, we may be talking about ZERO.
And then two answers that are much more subjective:
When one discards 30 years of experience with insecure code without serious consideration, one is engaging in idiology rather than engineering, and our proper business is engineering. Let us try to avoid idiology on all sides of this discussion.
It is not bad programming practice to follow the most natural path that is dictated by a given interface. It is *inevitable* programming practice, and the fault, if any, must rest entirely with the designer of the interface. Your value judgment is that it is good engineering to require millions of programmers to write complex code so that ten system architects can save a small number of cycles. This is absurd, and it ignores every piece of empirical evidence about human behavior that we have. As a group, humans will seek to behave in the way that give greatest short-term benefit for least energy. Any other expectation is wishful thinking. Therefore, the behavior that you label "bad programming practice" intrinsically justifies labeling the interface a "bad interface design".
Most system designers lack the capacity to engineer in a way that accounts for this, but it is one of the marks of a good system designer that they do so successfully more often than not.
CLOSING
If the L4 community eventually feels that this is not a reasonable change, that is okay. However, it is absolutely impossible for a system like EROS to be efficiently implemented on L4 without it. This means that if the decision is negative, we are also deciding not to merge the communities.
This is also okay, but we should clearly understand what is at stake in the discussion.
shap
Jonathan S. Shapiro wrote:
This is a response to several messages (from Volkmar, Rudy, Hermann) at once. The delay has partly been due to other demands on my time, and partly because I wanted to consider how to answer.
It seems to me (but this needs to be confirmed in face to face discussions in January, I am now confused about the many "id"s) that both, EROS and the "Generalized Mappings" proposal, replace (in relation to original L4) thread-id by some other "id" as an unforgable part of messages. The "other id" is protected by the kernel and managed by user level processes. In EROS, it is managed by the server, in "Generalized Mappings" by the pager providing the mappings. In "Generalized Mapping", the size of this "id" is under user control as well. Thus no problem with register payload, the "id" can become even smaller than the original thread-id in specific cases. In EROS, these "id"s are under complete control of the process providing a service, in L4 "Generalized Mapping" the pagers can restrict the "id" spaces of their clients in a pager hierarchy.
I do not understand the relation of this "id" to "user-level object invocation" as a first class EROS citizen.
Merry Chrismas to all of you
--hermann
l4-hackers@os.inf.tu-dresden.de