During my presentation of the design of the Hurd on L4 to the Dresden group, Lars Reuther asked me if I had considered Sawmill's dataspace model and if so why I had rejected it. My answer was that we want to be able to catch any errors at the time of data acquisition and that relying on a file system server to provide a mapping to the data does not guarantee this: a malicious server can unmap mappings and refuse to remap them; or if the server is shorter lived than the client, the the data becomes inaccessible to the client when the server exits.
Lars asked for more details. Unfortunately, I failed to provide a coherent and complete argument: my problem was mainly that I came to this conclusion fairly early in the design process and had since forgotten too many details of Sawmill's architecture to reconstruct my argumentation. I've recently revisited some related issues and am now in a position to better answer Lars's question. My primary reference for the Sawmill framework is [1].
I think I may be able to best explain the problem with an illustration: consider a server providing access to a file system backed by a disk.
In the Sawmill dataspace model [1], the file system server is a dataspace manager which likely provides a dataspace for each file. When a task wants to use a file, it first identifies the dataspace associated with the file (e.g. gets a capability to the file from the DM) and attaches it to its address space (e.g. tells its pager to associate a portion of the VM with the capability). "After an attach, the region mapping forwards all page fault requests in that region to the dataspace manager. The dataspace manager resolves the faults with map or grant operations."
I understand this to mean that a client depends on a DM to:
- provide mappings to data - provide resources backing the data
When a client requests some data from a dataspace, the DM provides a mapping to the client. The client can proceed to use the data, however, at any point, the server could cause the mapping to be unmapped and possibly render the data inaccessible. The implication is that the client must either trust the server to always provide the mapping or be prepared to recover should the data disappear. The latter approach can be simplified by making a physical copy of data before committing to using it (which can be done by interposing a second DM between the DM and the client). General use of this tactic means a lot of cycles will be spent copying bytes and a reduction in the amount of physical memory sharing in the system.
DMs appear to use their own resources to fetch and store data as well as to hold data (neither [1] nor [2] mentions any mechanism for the client to specify to the DM what memory/data space to read data into). I assume that the normal mode of operation is that a file system DM has a certain amount of physical memory available from a "physical memory" DM which it uses to hold data from backing store. Once that memory is exhausted, it must choose some page to evict. There are several problems with this model: because DMs allocate resources on behalf of clients resources are allocated with the priority of the DM and resource accounting is extremely difficult. We know this from our experience with the Hurd on Mach. Moreover, the DM controls the paging policy; not the clients who are actively using the memory. To control the availability of memory, it would seem that a client would again have to copy the data.
[ Physmem DM ] | v [ FS DM ] | v [ client ]
The framework that I have developed for the Hurd avoids these dependencies on the file system servers. The physical memory manager ("physmem") is part of the TCB. physmem provides capabilities to so-called containers which identify memory reservations (either specific, e.g. a specific set of frames, or general, e.g. a specific number of frames). Given a container capability, the holder can map the contents or logically copy the contains to or from a second container.
When a task wants to read data from a file, it passes a container capability to the file system server. The file system server stores the data in the container (if it already read it into memory then it can logically copy it). Then it returns to the client.
If the task wants a mapping to the data, it requests one from physmem. Thus, tasks on the Hurd do not depend on file system server to provide mappings. After the read is complete, the task knows that the data is either available or not available. If the file system server exits, this does not affect the data that the client has.
[ physmem ] / \ |_ _| [ client ] [ FS ]
Because the client passes containers to the server, the server does not allocate memory to store the data on behalf of the client. (The server may need other resources such as CPU time, I/O bandwidth and state, however, we have other provisions for that.) Thus memory allocations are directly attributed to the client which is vital to the correct functioning of accounting. Also, when the client has exhausted its memory quota, it must free some memory before it can allocate a container with the required reserve. Thus, clients fully control their own paging policy.
One of the goals of the Hurd framework is to minimize the number of dependencies that a client has on the behavior of servers. This is important to us because clients often interact with other users's servers which may be malicious. Another goal is to more directly link consumers of resources with resource allocators. Our observation is that many applications are hurt due to policies such as the eviction scheme imposed by the OS. Moreover, applications such as garbage collectors and multimedia applications could benefit from knowing how much real resource is actually available to them. (Applications should work in harmony with the mechanisms provided to them and the policies imposed on them; they should not have to work around them.)
Thanks. I have tried to be as concise as possible which means I may have missed some details. I am particularly interested in the thoughts of the Sawmill and DROPS developpers.
Neal
[1] http://l4ka.org/publications/2001/sawmill-framework.pdf [2] http://os.inf.tu-dresden.de/l4env/doc/l4env-concept/l4env.pdf
Hi,
Neal H. Walfield wrote on 09/04/2005 09:45 PM this: <snip>
In the Sawmill dataspace model [1], the file system server is a dataspace manager which likely provides a dataspace for each file. When a task wants to use a file, it first identifies the dataspace associated with the file (e.g. gets a capability to the file from the DM) and attaches it to its address space (e.g. tells its pager to associate a portion of the VM with the capability). "After an attach, the region mapping forwards all page fault requests in that region to the dataspace manager. The dataspace manager resolves the faults with map or grant operations."
I understand this to mean that a client depends on a DM to:
- provide mappings to data
- provide resources backing the data
When a client requests some data from a dataspace, the DM provides a mapping to the client. The client can proceed to use the data, however, at any point, the server could cause the mapping to be unmapped and possibly render the data inaccessible. The implication is that the client must either trust the server to always provide the mapping or be prepared to recover should the data disappear. The latter approach can be simplified by making a physical copy of data before committing to using it (which can be done by interposing a second DM between the DM and the client). General use of this tactic means a lot of cycles will be spent copying bytes and a reduction in the amount of physical memory sharing in the system.
<snip>
It appears to me that a file system server providing a file to a client always belongs to that client's trusted computing base. The FS server has to belong to the client's TCB, because it will provide the client with the content of a file. It may alter that content in any possible way before handing it to the client.
Given that trust relationship, the revocation of pages may or may not be part of the protocol the client and the server agreed upon. If no pages shall be revoked, the client *knows* that the server will not revoke pages, because the client trusts the server.
Therefore, the FS server can be the DM for the file, the client requested: No need to drop that approach.
If my assumption is false, then your argumentation seems feasable.
Greetings, Ron.
It appears to me that a file system server providing a file to a client always belongs to that client's trusted computing base. The FS server has to belong to the client's TCB, because it will provide the client with the content of a file. It may alter that content in any possible way before handing it to the client.
Given that trust relationship, the revocation of pages may or may not be part of the protocol the client and the server agreed upon. If no pages shall be revoked, the client *knows* that the server will not revoke pages, because the client trusts the server.
Therefore, the FS server can be the DM for the file, the client requested: No need to drop that approach.
Data integrity is an orthogonal issue from providing data (i.e. transferring it) and holding data (e.g. mapping). Data integrity can be guaranteed using cryptographic means. (Indeed, confidentiality, another separate issue, can as well.)
Consider "Reducing TCB size by using untrusted components---small kernels versus virtual-machine monitors" by Hohmuth et al: untrusted L4Linux servers are securely used to fulfill operation dependencies. In this model, I think having the trusted components depend on the untrusted L4Linux servers to provide mappings of data may violate these security requirements.
Thanks, Neal
On Mon, Sep 05, 2005 at 11:15:46AM +0200, Ronald Aigner wrote: <snip>
It appears to me that a file system server providing a file to a client always belongs to that client's trusted computing base. The FS server has to belong to the client's TCB, because it will provide the client with the content of a file. It may alter that content in any possible way before handing it to the client.
There are several levels of trust. The client must trust the filesystem to give data it wants to handle, no matter which route it uses to actually get the data. Trusting the server so much that it's allowed to hang, crash, or even take over the client is a completely different level of trust.
System servers such as physmem automatically get that trust, because there is nothing you can do about it. Physmem can just change your executing code if it wants, for example. However, for a filesystem (and especially one from an other normal user) such trust is not a good idea.
What we call "trusting a process" in the Hurd (which is something we want to avoid usually) is a lot more than accepting data for display to the user, for example. If the user wants to start executing that data, then appearantly he trusts the source, so it should be good. But if he doesn't, then we shouldn't force that trust on him.
Thanks, Bas Wijnen
It appears to me that a file system server providing a file to a client always belongs to that client's trusted computing base. The FS server has to belong to the client's TCB, because it will provide the client with the content of a file. It may alter that content in any possible way before handing it to the client.
I'd like to add that we often don't even care about the correctness of content. Consider the web: I don't trust web servers to provide me with correct data and I generally have no way to computationally verify that the data is correct. Nevertheless, I find the web useful with the caveat that the data may be either malicious or incorrect.
Thanks, Neal
Neal H. Walfield wrote:
I understand this to mean that a client depends on a DM to:
- provide mappings to data
- provide resources backing the data
*snip* [ Physmem DM ] | v [ FS DM ] | v [ client ]
The design for a Dataspaces environment would be as follows: Simplified you've got: - Dataspace Manager providing open / close / map / grant - Region Mapper providing attach / detach and pf handling - client
client and region mapper are in the same address space(AS). Think of a region a continous area of virtual memory in the clients AS that makes a part of a dataspace available to the client.
[ dataspace manager ] | request page for region | [ region mapper | client ] | | |--<--- PF -<-|
A page fault scenario would be as follows: 1. The client triggers a page fault 2. The region mapper (pager of client) receives PF 3. The region mapper requests the page from the dataspace manager (possibly with a timeout) 4.A The dataspace manager maps/grants the page to the region mapper and thus to the client (same AS) 4.B The dataspace manager denies to map the page / timeout fires 5. The region mapper is free to act upon the behaviour of the dataspace manager
An open - attach scenario would be: 1. client sends open call to DS manager, receives dataspaceid (DSID) 2. client attaches to the region mapper with the DSID 3. Now the region mapper is capable of resolving PF in the region the DS is mapped
I hope I've pointed out where your interpretation was missdirected.
In fact my opinion is that the Dataspaces paper is very vague in some points especially concerning protocols for pagefault,... and implementation details.
See http://i30www.ira.uka.de/teaching/coursedocuments/117/week10-dataspaces.pdf for more details on implementing dataspaces.
*snip* [ physmem ] / \ |_ _| [ client ] [ FS ]
Besides the general discussion about dataspaces here, isn't your approach very centralized and thus a possible bottleneck for the system ? One of the ideas behind dataspaces was in fact to decentralize memory management by e.g. stacking dataspace managers.
Regards, Bernhard
At Mon, 05 Sep 2005 21:59:30 +0200, Bernhard Pöss wrote:
The design for a Dataspaces environment would be as follows: Simplified you've got:
- Dataspace Manager providing open / close / map / grant
- Region Mapper providing attach / detach and pf handling
- client
client and region mapper are in the same address space(AS). Think of a region a continous area of virtual memory in the clients AS that makes a part of a dataspace available to the client.
[ dataspace manager ] | request page for region | [ region mapper | client ] | | |--<--- PF -<-|
A page fault scenario would be as follows:
- The client triggers a page fault
- The region mapper (pager of client) receives PF
- The region mapper requests the page from the dataspace manager
(possibly with a timeout) 4.A The dataspace manager maps/grants the page to the region mapper and thus to the client (same AS) 4.B The dataspace manager denies to map the page / timeout fires 5. The region mapper is free to act upon the behaviour of the dataspace manager
An open - attach scenario would be:
- client sends open call to DS manager, receives dataspaceid (DSID)
- client attaches to the region mapper with the DSID
- Now the region mapper is capable of resolving PF in the region the DS
is mapped
I hope I've pointed out where your interpretation was missdirected.
Thanks for the explanation but I don't understand what in your text you think is inconsistent with my understanding of data spaces. The issue that I have noted is that after a DM maps an fpage to a client, the DM can unmap it at any time. My claim is that the result is that the client of the DM must trust the DM to always provide a mapping or it must make a physical copy of the fpage.
Besides the general discussion about dataspaces here, isn't your approach very centralized and thus a possible bottleneck for the system ? One of the ideas behind dataspaces was in fact to decentralize memory management by e.g. stacking dataspace managers.
I think it is fair to say that the root of the DS hierarchy is also centralized. I've designed the physmem interfaces to provide what I view as the minimum required mechanisms to maximize sharing and flexibility, minimize trust and permit accountability. As I understand the DS model, I think my approach is better for each of these four points. I may, of course, be overlooking something. This email thread is specifically about the security issue, however, I'm interested in discussing if my approach is really minimal and what alternative approaches there are.
Thanks, Neal
l4-hackers@os.inf.tu-dresden.de