Sawmill's dataspaces and the Hurd's physmem

Wed Oct 19 10:20:44 CEST 2005

Hi Neal,

	By reading the document of Hurd(VMM), I guess I had a misunderstanding about Hurd's approach. Sorry for bothering.

Best Regards,
Darwin

-----Original Message-----
From: Neal H. Walfield [mailto:neal at walfield.org] 
Sent: Tuesday, October 18, 2005 9:40 PM
To: yuan Darwin-r62832
Cc: l4-hackers at os.inf.tu-dresden.de; l4-hurd at gnu.org
Subject: Re: Sawmill's dataspaces and the Hurd's physmem

At Tue, 18 Oct 2005 19:42:27 +0800,
yuan Darwin-r62832 wrote:
> 	I think these 2 approachs are not incompatible. Here are the reasons,
> 
> 	1. In Hurd's approach, every application could manage its own 
> physical memory. However, for most of application developers, they 
> don't want to take care of the VM replacement policy. To solve this 
> problem, Hurd has to provide a general VM server to be the pager of 
> this kind of applications. However, as the philosophy of Hurd, should 
> this applications trust this server?

Physical memory management needn't be an all or nothing deal. Certainly, an application might wish to completely manage the paging policy and its address space layout, however, I tend to think that this is the exception.  And as we will provide a POSIX personality, we need to have some sort of default VM manager.

The solution that I've opted for is a library based one: a default memory management library will, for instance, implement an LRU based eviction scheme and require no application input.  For many applications this will be appropriate and sufficient.  Those applications wishing to take complete control will also be able to completely replace the library.

I think that some applications can provide useful hints in relatively concise ways.  A document viewer, for instance xpdf, might want to cache previously rendered pages.  It does not make sense to send these to swap if rereading the data and rerendering is cheaper.  In this case, the application can attach a function to drain the cache to a library provided hook which is called when there is memory pressure. I think that this small type of change may offer dramatic results. Moreover, if the change is highly isolated (which in this case seems feasible), it will be easily accepted upstream.

An even less invasive hint would be to set some environment variables. Clearly we wouldn't expect most users to set these but an application's packager could based on observed behavior.  In the case of e.g. grep or cat we might want to set the read ahead parameter to "very aggressive".

If a developper so desires, a more aggressive, but more invasive approach, can also be adopted.  Instead of using malloc and free, the application can use a slab allocator.  I think this can only be effectively done if the slab allocator participates in the eviction scheme.  Again, this is possible in our case with a number of library provided hooks but not for user applications running on a traditional Unix-like core.  This can be made backwards compatible by having the configure script check for the required mechanisms and if they are not available then to redefine slab_alloc and slab_free to malloc and free.

> 	2. In Sawmill's DS approach, every task(AS) has a specific thread 
> named "region mapper" to be the pager of other threads. It captures 
> the page fault, then decide to forward it to corresponding server, and 
> get mapped. So from the higher level point of view, these servers are 
> the pagers of the task. If Hurd application should trust that general 
> VM pager, the applications using Sawmill's DS framework should trust 
> these servers as well.

I hope it is now clear that there is no general VM server.

> 	3. Relative to Sawmill's approach, Hurd provides a clear & great 
> physical memory server, which makes the whole physical memory of 
> platform could be fairly used by all of the servers & applications.
> 
> 	Therefore, we can use Hurd's physmem server as the central 
> controller. Sawmill's DSMs apply physical memory from it. The 
> applications who wanna use Sawmill's approach could still walk on 
> their own way. For some applications who wanna manage their own 
> physical memory, they can apply memory from physmem server directly.

I see a number of problems with SawMill's dataspaces.  The root of this thread is the presentation of a potential security flaw in the design of dataspaces.  (Whether this is important or not depends on the assumed trust model and security goals.)  Another is that as far as I can tell paging decisions are made towards the root of a dataspace hierarchy and not at the applications themselves.

Hopefully it is clear why I've choosen to reject this scheme.

Thanks,
Neal