Hi Paul,
On 3/2/21 12:42 AM, Paul Boddie wrote:
On Monday, 1 March 2021 21:30:11 CET Philipp Eppelt wrote:
On 2/24/21 10:10 PM, Paul Boddie wrote:
[...]
However, if there are other regions attached, e.g. (a2, s2) -> (d1, o2), this will still remain and as soon as you unmap the d1-capability, you have stale entries in your region map.
What happens when a task tries to access the memory within a2 to a2+s2? Are there virtual memory associations that may still provide access to the memory exported by the now-unmapped capability?
This I actually don't know. I'll investigate. I hope the mappings are gone and you'll get a page fault, though.
So do I. :-)
[Strange behaviour]
This was actually wrong. So assume you get a DS capability from some other task. Then your use of the DS cap to get mappings from the dataspace - either through rm->attach() and page faults or direct ds->map() calls. As a result you have several mappings in your task.
Then you unmap the DS cap from your object space. And ... nothing happens or does it? You might still have access to the memory mappings or you might not. You didn't unmap the memory from your address space, but someone else, the dataspace provider might destroy the DS in it's address space and unmap the corresponding memory, leading to removal of the memory in all other tasks (aka remove the branch from the mapping tree).
Applied to our example above: * l4re_rm_detach(a1): (a1, s1) -> (d1, o1) is gone. * free_um(d1) * the region map still contains (a2, s2) -> (d1, o2): page faults will fail, but if the memory was already mapped, it might be still there.
I also saw it with a region that overlapped the old one instead of having precisely the same base address:
(a1+0x1000, s2) -> d2 -> mem[o2:o2+s2]
Here, an access to the new base of a1+0x1000 appeared to expose mem[o1+0x1000] instead of mem[o2].
Are you certain that d1 and d2 are actually different dataspaces? Are you getting only d1 data or only d2 data? Are you getting a mix of d1 and d2 data?
It is, of course, always possible that I have been making a mistake - this being the usual discovery when I report strange behaviour - but the means of acquiring dataspaces d1 and d2 may involve distinct objects, and it involves creating further distinct objects to act as dataspaces. So, something like this would occur:
d1 = c1.open() d2 = c2.open()
Here, c1 and c2 may even be the same object, but even then they should still allocate a new object for each invocation of the open operation, yielding two distinct dataspaces d1 and d2.
What I would observe is d1 data even after d2 was attached. I was somewhat confused as to whether d1 might still be active or not. But if it is, then d2 should not be allocated an address region coinciding with that of d1. If it isn't, then d2 should be unaffected by whatever d1 had been doing.
Let me summarize the steps I think are necessary during the lifetime of the dataspace:
- Allocate a capability index for the dataspace
- Allocate the memory and receive the dataspace capability in the
allocated index (see http://l4re.org/doc/classL4Re_1_1Mem__alloc.html#a44b301573ae859e8406400338c c8e924) or something alike to get the mapping for the dataspace capability under the allocated capability index. (to be sure use: http://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a5 05af)
Do I need to use the memory allocation interface if the dataspace is sending flexpage items? I have previously used the l4re_ma functions (and possibly C++ equivalents) to allocate memory, but this was mostly useful for device drivers where physical addresses may need to be obtained for hardware peripheral usage, plus convenient sharing of entire memory regions between tasks without any of my tasks needing to act as dataspaces.
My strategy with this work is to implement paging by sending flexpage items to satisfy paging requests and thus provide a dataspace implementation. In the dataspace itself, I actually use posix_memalign to obtain memory, but that is ultimately going to be using l4re_ma functions at the lowest level, I imagine.
No, I used Mem_alloc as an example on how to obtain an actual capability behind your allocated index. If you get the capability mapping by other means, this is fine.
Hopefully, this helps you as a baseline. I'm a bit puzzled by the mem[o1+0x1000] case. I went through the code and I don't see how this can happen unless the "task" capability given to l4re_rm_detach_unmap is invalid, however, l4re_rm_detach is using the correct capability. Which code version are you working on? Maybe I'm looking at the wrong code?
I'm still using the Subversion distribution (version 83) of L4Re. I know I should be following the different GitHub repositories but I find the Subversion distribution more convenient and I have not wanted to introduce too many different variables in my own experiments. Plus, it seems to be reliable enough for my needs.
No worries, SVN is fine.
Over the weekend, I tried to troubleshoot this issue and investigate the nature of it. I then retraced my steps, introducing wrapper functions around l4re_rm_attach and l4re_rm_detach to see if the region manager was giving out duplicate addresses. This seemed to indicate that it was indeed doing so. If I introduced synchronisation around the l4re_rm calls (effectively extending the synchronisation already in place around the STL data structure recording active regions), the observed problem went away.
Now, this is not consistent with what Christian wrote a few weeks ago, where he also noted that the capability slot allocator is not thread-safe, but I imagine that either my own code somehow uses the region manager API in a thread-unsafe way (although I cannot see exactly how that might be) or there is some element of using this API where a degree of "thread unsafety" exists. So, I have just added synchronisation around both the capability slot allocator and the region manager operations.
Thread safety again. Nothing springs to mind, but this is certainly interesting. I'll mull over it a bit.
Cheers Philipp