On Wed, 2005-02-23 at 21:52 +0100, Marcus Brinkmann wrote:
At Wed, 23 Feb 2005 19:49:44 +0100 (CET), "Ronald Aigner" ra3@os.inf.tu-dresden.de wrote:
It was brought to my attention that pagefault timeouts _are_ important as to enforce trust relation with your communication partner. I don't know what the semantic of a zero pagefault timeout is. If it means that the page has to be present and a infinite pagefault timeout means that you don't care, then finite pagefault timeouts seems reasonable. Still, defining a useful value seems unpractical to me.
If you use string items in a reply from a server to the client, I think even small timeouts can be used for DoS attacks. This is why I use timeout and transfer timeout 0 for all IPC from the server to the client. The client just has to be ready, and all buffers to receive string items need to be wired down (or other mechanisms need to be used, like trusted buffer objects, or resuming the operation for the not-transfered data). Of course, other systems may have different trust considerations.
[In the following, please note that when I use the term "paging" I mean the virtualization of available physical memory by migrating pages of content between store and memory. This should not be confused with "page fault handling", which is the implementation of a policy that defines the validity and protection of locations in an address space. Allowing a timeout for the recipient page definition policy engine (which in L4 is a pager) is potentially useful, but not compellingly so.]
Marcus is exactly correct. The problem with the need for a zero timeout on the pager is that it violates the encapsulation of paging. A paging system is supposed to be able to page out portions of a process without altering the semantics of its behavior (ignoring latency). An immediate consequence of this definition is that it should not be necessary for a server transmitting a string item to a client to consider the behavior of the paging agent to be part of the threat model. Because the L4 memory model cannot distinguish between the logical presence/absence of a mapping and the physical presence/absence of a mapping, it is not possible to accurately capture the semantics of paging within the L4 operational semantics.
Some might object that the pager should not have to be trusted. What is emerging from the discussion at hand is that this architectural view does not map well to real usage. Marcus's approach of using a zero pager timeout in server->client sends implies that the client must have means to pin its receive area (a client-defined number of page frames) for an indefinite time (there may be no contract on how fast the server replies). This can induce a very real and urgent resource denial of service problem, and may be in contradiction with the residency policy requirements under which the client must operate. It works very well in non-paging environments such as embedded applications.
Just to be clear: I'm certainly not claiming for an instant that EROS got this stuff perfect. And I think it would definitely be nice to have a paging agent that could be outside of the universally trusted computing base. Unfortunately, nobody has been able to suggest a paging architecture in which this has turned out to be feasible. In all current L4-derived systems, the region manager (or the software that plays the equivalent functional role) is necessarily trusted completely by all of the programs that it serves.
It appears to me that there is no behavioral difference that motivates the desire for untrusted pagers. All pagers do basically the same thing: move pages into and out of memory. The rationale for untrusted pagers usually turns out to be a desire to have distinguishable *policies* for page replacement (equivalently: for residency retention). Separating the policy of paging from the mechanism of paging is definitely possible, and it simplifies the trust problem greatly. Once these things are separated, we may reduce the trust assumption to:
1. The *mechanism* of paging must be universally trusted. 2. The relevant *policy* under which the page fault is triggered must ensure progress of transfer for the string item that is being returned to the client. The latency characteristics of this policy must be understood by the sender (the server).
We have spent some time thinking about this in the Coyotos design, and we have concluded that the *second* requirement (knowledge of the policy) is still problematic. Our conclusion is that the sender must always be in charge of the policy under which any page (including a receiver page) involved in an IPC is fetched in. Only if this is true can the server know whether the delay characteristics of the recipient page faults will be acceptable. We have therefore concluded that the *sender* must have the ability to specify the working set (or whatever page replacement policy embodiment is used) that should be used to bring in recipient pages if the receiver is untrusted. The reverse is also true -- the receiver must be able to deny the sender control over receiver residency policy. The end result is very much like schedule donation by mutual agreement.
Once the IPC is done, these pages become subject to cleaning in the usual way, and will remain resident only if the recipient has ensured adequate working set guarantees to ensure their residence.
This is not a simple mechanism, and there are serious difficulties in reasoning about residence behavior of redundantly sponsored pages, but it is the best that we have been able to come up with.
Finite IPC timeouts seem to be necessary to sleep for a specified time (receiving from yourself), for implementing functions like sleep() and timed waits on a synchronization primitive.
This is indeed a reasonable way to do these things in L4. As an alternative approach to consider, these functions are provided in EROS by kernel-implemented services. In the case of both kernel-implemented and process-implemented services, what the invoker sees is that they are invoking a capability.
Apologies if this aside is off topic. My point is only that there is a choice of design spaces, and embedding the timeout in the IPC specification may be a reasonable choice, but that the desire for delays of the type that Marcus identifies for self-send and sleep do not imply a requirement for timeout in the IPC primitive.
The timer event is not dropped, but instead defered - the next time the thread does an IPC, and there are no pending partners, it is canceled immediately and does not block.
It appears to me that this amounts to a special case of a more general problem: non-blocking reliable delivery of event notification. We have run into places in EROS where there is a serious need for such a mechanism, and we have been contemplating how to achieve this in Coyotos. We have now concluded that it should *not* be done in endpoints, because it is extremely desirable for endpoints to be stateless and pending events must be recorded somewhere. It is unclear at this point what alternative will emerge.
shap