[Jonathan S Shapiro]
I am trying to understand the implications of the "mapping is a cache" design argument. I suspect that this design can only be upheld if encapsulation is violated. First, however, I would like to understand the sequence of events in the following scenario:
Consider a situation in which
A maps some region to B B completes the receive operation, and therefore now has a copy of the mapping B is immediately preempted, before it can do any user-level book keeping about the mapping ... other stuff runs ... kernel runs out of mapping cache space, chooses to evict the mapping just received by B ... other stuff runs ... B attempts to reference the region that it believes should be mapped, and page faults.
Can someone explain the process by which B is able to get the mapping reconstructed?
A really quick answer:
B's pager, Pb, receives the page fault Pb requests the mapping from A
Note that Pb and A could here be the same thread, in which case A must know how to translate a virtual address in B's space to some region. In practice, however, B will not use A as its pager because:
o A might be an untrusted entity. o Allowing the virtual address in the page fault to be first somehow translated into a higher-level object allows for much greater flexibility.
Of course, in order for this scheme to work, A (and Pb), not B, must keep some sort of data structures that allows page faults to be resolved. These data structures must be initialized *before* A actually maps the memory region to B.
A longer answer would require a better understanding of our concept of "data spaces", "data space managers", and "region maps" [1]. Here's a rather shortish explanation of this scheme:
Data space: An unstructured data container, e.g., a file, anonymous memory, pinned memory, etc.
Data space manager: A server that manages accesses to a particular data space. The data space manager will typically have parts (or the whole) of the data space mapped into its own address space. It will map these parts off to clients.
Region map: A region map is a part of the client's address space that contains parts (or the whole) of a data space. Note that the region map need not be fully populated. If the client accesses a part of the region which is not mapped, a page fault will be generated.
Region mapper: The region mapper serves as the page fault handler for the threads within the client. The region mapper keeps track of all region maps attached to the address space. When the region mapper catches page faults it translated these page faults into requests that are forwarded to the respective data space manager.
Data spaces are typically constructed recursively. At the bottom (or top depending on your point of view) there is a data space that manages the complete physical memory. On top of this data space one can build data spaces that handle anonymous memory, pinned memory, frame buffer memory, etc. The anonymous memory data spaces can implement various policies for paging, one can build data spaces on top of this that provides access to files, distributed shared memory, etc.
Now, to map our concepts of data spaces onto your question. The thread A in your scheme would correspond to a data space manager, B would correspond to a client thread, and Pb would correspond to the region mapper. For B to access parts of the data space, the following steps would typically be taken (Rm = region mapper, Dm = data space manager):
1. Rm: Create region (R) 2. Rm: Request data space manager (Dm) to attach a data space (D) to R. 3. B: Touch some memory in R. Nothing is mapped yet and a page fault is therefore raised. 4. Rm: Receive page fault and use virtual address to identify region. 5. Rm: Request Dm to map parts of the data space to R. 6. Dm: Map parts of D to R.
An obvious optimization here is for Rm to request parts of the region map to be pre-populated before step 3.
At any time when B attempts to access parts of R that is not mapped, the region mapper will translate the page fault into a data space request. It does not matter why the memory is not mapped. All that matters is that Rm and Dm keep data structures that allow the page fault to be resolved.
[Hmm... my "short" answer turned out to be a bit longer than expected.]
eSk
[1] http://i30www.ira.uka.de/research/documents/l4ka/sawmill-framework.pdf