Re: Question on "mappings as cache"

8 Dec 2003

      [Jonathan S Shapiro]
...
I am trying to understand the implications of the "mapping is a
cache" design argument. I suspect that this design can only be
upheld if encapsulation is violated. First, however, I would like to
understand the sequence of events in the following scenario:
...
Consider a situation in which
...
A maps some region to B
   B completes the receive operation, and therefore
     now has a copy of the mapping
   B is immediately preempted, before it can do any user-level
     book keeping about the mapping
   ... other stuff runs ...
   kernel runs out of mapping cache space, chooses to evict
     the mapping just received by B
   ... other stuff runs ...
   B attempts to reference the region that it believes should
     be mapped, and page faults.
...
Can someone explain the process by which B is able to get the
mapping reconstructed?
A really quick answer:

   B's pager, Pb, receives the page fault
   Pb requests the mapping from A

Note that Pb and A could here be the same thread, in which case A must
know how to translate a virtual address in B's space to some region.
In practice, however, B will not use A as its pager because:

  o A might be an untrusted entity.
  o Allowing the virtual address in the page fault to be first somehow
    translated into a higher-level object allows for much greater
    flexibility.

Of course, in order for this scheme to work, A (and Pb), not B, must
keep some sort of data structures that allows page faults to be
resolved.  These data structures must be initialized *before* A
actually maps the memory region to B.

A longer answer would require a better understanding of our concept of
"data spaces", "data space managers", and "region maps" [1].  Here's a
rather shortish explanation of this scheme:

   Data space: An unstructured data container, e.g., a file, anonymous
      memory, pinned memory, etc.

   Data space manager: A server that manages accesses to a particular
      data space.  The data space manager will typically have parts
      (or the whole) of the data space mapped into its own address
      space.  It will map these parts off to clients.

   Region map: A region map is a part of the client's address space
      that contains parts (or the whole) of a data space.  Note that
      the region map need not be fully populated.  If the client
      accesses a part of the region which is not mapped, a page fault
      will be generated.

   Region mapper: The region mapper serves as the page fault handler
      for the threads within the client.  The region mapper keeps
      track of all region maps attached to the address space.  When
      the region mapper catches page faults it translated these page
      faults into requests that are forwarded to the respective data
      space manager.

Data spaces are typically constructed recursively.  At the bottom (or
top depending on your point of view) there is a data space that
manages the complete physical memory.  On top of this data space one
can build data spaces that handle anonymous memory, pinned memory,
frame buffer memory, etc.  The anonymous memory data spaces can
implement various policies for paging, one can build data spaces on
top of this that provides access to files, distributed shared memory,
etc.

Now, to map our concepts of data spaces onto your question.  The
thread A in your scheme would correspond to a data space manager, B
would correspond to a client thread, and Pb would correspond to the
region mapper.  For B to access parts of the data space, the following
steps would typically be taken (Rm = region mapper, Dm = data space
manager):

   1. Rm: Create region (R)
   2. Rm: Request data space manager (Dm) to attach a data space (D)
      to R.
   3. B: Touch some memory in R.  Nothing is mapped yet and a page
      fault is therefore raised.
   4. Rm: Receive page fault and use virtual address to identify
      region.
   5. Rm: Request Dm to map parts of the data space to R.
   6. Dm: Map parts of D to R.

An obvious optimization here is for Rm to request parts of the region
map to be pre-populated before step 3.

At any time when B attempts to access parts of R that is not mapped,
the region mapper will translate the page fault into a data space
request.  It does not matter why the memory is not mapped.  All that
matters is that Rm and Dm keep data structures that allow the page
fault to be resolved.

[Hmm... my "short" answer turned out to be a bit longer than
expected.]

	eSk

[1] http://i30www.ira.uka.de/research/documents/l4ka/sawmill-framework.pdf

Re: Question on "mappings as cache"

Espen Skoglund