Re: L4Re: region mapper/manager concurrency considerations and resource limits

2 Mar 2021

      Hi Paul,
On 3/2/21 12:42 AM, Paul Boddie wrote:
...
On Monday, 1 March 2021 21:30:11 CET Philipp Eppelt wrote:
...
On 2/24/21 10:10 PM, Paul Boddie wrote:
[...]
...
...
...
...
However, if there are other regions attached, e.g. (a2, s2) -> (d1, o2),
this will still remain and as soon as you unmap the d1-capability, you
have stale entries in your region map.
What happens when a task tries to access the memory within a2 to a2+s2?
Are there virtual memory associations that may still provide access to the
memory exported by the now-unmapped capability?
This I actually don't know.  I'll investigate. I hope the mappings are
gone and you'll get a page fault, though.
So do I. :-)
[Strange behaviour]
This was actually wrong. So assume you get a DS capability from some
other task. Then your use of the DS cap to get mappings from the
dataspace - either through rm->attach() and page faults  or direct
ds->map() calls. As a result you have several mappings in your task.
Then you unmap the DS cap from your object space. And ... nothing
happens or does it? You might still have access to the memory mappings
or you might not.
You didn't unmap the memory from your address space, but someone else,
the dataspace provider might destroy the DS in it's address space and
unmap the corresponding memory, leading to removal of the memory in all
other tasks (aka remove the branch from the mapping tree).
Applied to our example above:
* l4re_rm_detach(a1): (a1, s1) -> (d1, o1) is gone.
* free_um(d1)
* the region map still contains (a2, s2) -> (d1, o2): page faults will
fail, but if the memory was already mapped, it might be still there.
...
...
...
I also saw it with a region that overlapped the old one instead of having
precisely the same base address:
(a1+0x1000, s2) -> d2 -> mem[o2:o2+s2]
Here, an access to the new base of a1+0x1000 appeared to expose
mem[o1+0x1000] instead of mem[o2].
Are you certain that d1 and d2 are actually different dataspaces? Are
you getting only d1 data or only d2 data? Are you getting a mix of d1
and d2 data?
It is, of course, always possible that I have been making a mistake - this 
being the usual discovery when I report strange behaviour - but the means of 
acquiring dataspaces d1 and d2 may involve distinct objects, and it involves 
creating further distinct objects to act as dataspaces. So, something like 
this would occur:
d1 = c1.open()
d2 = c2.open()
Here, c1 and c2 may even be the same object, but even then they should still 
allocate a new object for each invocation of the open operation, yielding two 
distinct dataspaces d1 and d2.
What I would observe is d1 data even after d2 was attached. I was somewhat 
confused as to whether d1 might still be active or not. But if it is, then d2 
should not be allocated an address region coinciding with that of d1. If it 
isn't, then d2 should be unaffected by whatever d1 had been doing.
...
Let me summarize the steps I think are necessary during the lifetime of
the dataspace:

Allocate a capability index for the dataspace
Allocate the memory and receive the dataspace capability in the

allocated index (see
http://l4re.org/doc/classL4Re_1_1Mem__alloc.html#a44b301573ae859e8406400338c
c8e924) or something alike to get the mapping for the dataspace capability
under the allocated capability index. (to be sure use:
http://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a5
05af)
Do I need to use the memory allocation interface if the dataspace is sending 
flexpage items? I have previously used the l4re_ma functions (and possibly C++ 
equivalents) to allocate memory, but this was mostly useful for device drivers 
where physical addresses may need to be obtained for hardware peripheral 
usage, plus convenient sharing of entire memory regions between tasks without 
any of my tasks needing to act as dataspaces.
My strategy with this work is to implement paging by sending flexpage items to 
satisfy paging requests and thus provide a dataspace implementation. In the 
dataspace itself, I actually use posix_memalign to obtain memory, but that is 
ultimately going to be using l4re_ma functions at the lowest level, I imagine.
No, I used Mem_alloc as an example on how to obtain an actual capability
behind your allocated index. If you get the capability mapping by other
means, this is fine.
...
...
Hopefully, this helps you as a baseline. I'm a bit puzzled by the
mem[o1+0x1000] case. I went through the code and I don't see how this
can happen unless the "task" capability given to l4re_rm_detach_unmap is
invalid, however, l4re_rm_detach is using the correct capability.
Which code version are you working on? Maybe I'm looking at the wrong code?
I'm still using the Subversion distribution (version 83) of L4Re. I know I 
should be following the different GitHub repositories but I find the 
Subversion distribution more convenient and I have not wanted to introduce too 
many different variables in my own experiments. Plus, it seems to be reliable 
enough for my needs.
No worries, SVN is fine.
...
Over the weekend, I tried to troubleshoot this issue and investigate the 
nature of it. I then retraced my steps, introducing wrapper functions around 
l4re_rm_attach and l4re_rm_detach to see if the region manager was giving out 
duplicate addresses. This seemed to indicate that it was indeed doing so. If I 
introduced synchronisation around the l4re_rm calls (effectively extending the 
synchronisation already in place around the STL data structure recording 
active regions), the observed problem went away.
Now, this is not consistent with what Christian wrote a few weeks ago, where 
he also noted that the capability slot allocator is not thread-safe, but I 
imagine that either my own code somehow uses the region manager API in a 
thread-unsafe way (although I cannot see exactly how that might be) or there 
is some element of using this API where a degree of "thread unsafety" exists. 
So, I have just added synchronisation around both the capability slot 
allocator and the region manager operations.
Thread safety again. Nothing springs to mind, but this is certainly
interesting. I'll mull over it a bit.
Cheers
Philipp
-- 
philipp.eppelt@kernkonzept.com - Tel. 0351-41 883 221
http://www.kernkonzept.com

Kernkonzept GmbH.  Sitz: Dresden.  Amtsgericht Dresden, HRB 31129.
Geschäftsführer: Dr.-Ing. Michael Hohmuth

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: L4Re: region mapper/manager concurrency considerations and resource limits