Dear L4-Hackers,
recently, I started to upgrade the Fiasco.OC kernel version that is used by the Genode OS framework to the lastly released version (r72). I took the opportunity to upgrade, because the upcoming Genode release uses a fresh compiler toolchain that refused to build the very old Fiasco.OC kernel version that was used until now (r56). Everything went quite smoothly, and I'm glad to see how the kernel develops further. Thanks to all developers at this point!
Unfortunately, I stumbled across an issue when it comes to thread destruction. In our system all threads are constructed and destructed by the roottask that is called 'core'. In some cases, not always but quite often, the Ram_quota pointer of the thread object is zero during the call of the Thread_oject's delete operator, which leads to a page-fault within the kernel-code. A simple check[1] before dereferencing the pointer solves the problem, but I wonder whether we will leak quota or memory then, or in general cover some more serious problem.
Obviously, we have different usage patterns of syscalls, e.g.: the order of destructing IPC-gates, threads, IRQs, and tasks. Moreover, we still have some very few patches[2] so that the kernel meets our requirements. But none of them explains the thread's Ram_quota pointer getting zero. The page-fault triggers across all x86 and arm platforms that we use.
Any hint would be very much appreciated, all the best! Stefan
[1] https://github.com/skalk/foc/commit/2b01c9d16fd8e29e6af18fe750be2c8a312b4762 [2] https://github.com/skalk/foc/commits/r72
Hi Stefan,
On 05/08/2017 09:09 AM, Stefan Kalkowski wrote:
Dear L4-Hackers,
recently, I started to upgrade the Fiasco.OC kernel version that is used by the Genode OS framework to the lastly released version (r72). I took the opportunity to upgrade, because the upcoming Genode release uses a fresh compiler toolchain that refused to build the very old Fiasco.OC kernel version that was used until now (r56). Everything went quite smoothly, and I'm glad to see how the kernel develops further. Thanks to all developers at this point!
Unfortunately, I stumbled across an issue when it comes to thread destruction. In our system all threads are constructed and destructed by the roottask that is called 'core'. In some cases, not always but quite often, the Ram_quota pointer of the thread object is zero during the call of the Thread_oject's delete operator, which leads to a page-fault within the kernel-code. A simple check[1] before dereferencing the pointer solves the problem, but I wonder whether we will leak quota or memory then, or in general cover some more serious problem.
Thank you for reporting this issue. I will forward this to our kernel maintainer.
Could you elaborate a little bit more on the circumstances leading to this issue? I wonder whether we can come up with a simple test case triggering the page fault.
Best, Matthias.
Obviously, we have different usage patterns of syscalls, e.g.: the order of destructing IPC-gates, threads, IRQs, and tasks. Moreover, we still have some very few patches[2] so that the kernel meets our requirements. But none of them explains the thread's Ram_quota pointer getting zero. The page-fault triggers across all x86 and arm platforms that we use.
Any hint would be very much appreciated, all the best! Stefan
[1] https://github.com/skalk/foc/commit/2b01c9d16fd8e29e6af18fe750be2c8a312b4762 [2] https://github.com/skalk/foc/commits/r72
Hi Stefan,
On 05/08/2017 03:36 PM, Matthias Lange wrote:
Hi Stefan,
On 05/08/2017 09:09 AM, Stefan Kalkowski wrote:
Dear L4-Hackers,
recently, I started to upgrade the Fiasco.OC kernel version that is used by the Genode OS framework to the lastly released version (r72). I took the opportunity to upgrade, because the upcoming Genode release uses a fresh compiler toolchain that refused to build the very old Fiasco.OC kernel version that was used until now (r56). Everything went quite smoothly, and I'm glad to see how the kernel develops further. Thanks to all developers at this point!
Unfortunately, I stumbled across an issue when it comes to thread destruction. In our system all threads are constructed and destructed by the roottask that is called 'core'. In some cases, not always but quite often, the Ram_quota pointer of the thread object is zero during the call of the Thread_oject's delete operator, which leads to a page-fault within the kernel-code. A simple check[1] before dereferencing the pointer solves the problem, but I wonder whether we will leak quota or memory then, or in general cover some more serious problem.
Thank you for reporting this issue. I will forward this to our kernel maintainer.
Could you elaborate a little bit more on the circumstances leading to this issue? I wonder whether we can come up with a simple test case triggering the page fault.
No need to come up with a test case. It turns out that your problem originates in an unfortunate combination of "old" sources and new toolchain. C++ allows the compiler to elide writes to objects that are later intialized by a constructor which leads to the _quota member not being initialized correctly under all circumstances.
That also answers your inital question that, yes, your check covers a more serious problem :).
Could you please try the attached patch? It should fix the problem.
Best, Matthias.
Best, Matthias.
Obviously, we have different usage patterns of syscalls, e.g.: the order of destructing IPC-gates, threads, IRQs, and tasks. Moreover, we still have some very few patches[2] so that the kernel meets our requirements. But none of them explains the thread's Ram_quota pointer getting zero. The page-fault triggers across all x86 and arm platforms that we use.
Any hint would be very much appreciated, all the best! Stefan
[1] https://github.com/skalk/foc/commit/2b01c9d16fd8e29e6af18fe750be2c8a312b4762 [2] https://github.com/skalk/foc/commits/r72
Hi Matthias,
what a cool remote diagnosis. Indeed, the patch solved the _quota initialization problem.
Thanks & best regards Stefan
On 05/08/2017 09:31 PM, Matthias Lange wrote:
Hi Stefan,
On 05/08/2017 03:36 PM, Matthias Lange wrote:
Hi Stefan,
On 05/08/2017 09:09 AM, Stefan Kalkowski wrote:
Dear L4-Hackers,
recently, I started to upgrade the Fiasco.OC kernel version that is used by the Genode OS framework to the lastly released version (r72). I took the opportunity to upgrade, because the upcoming Genode release uses a fresh compiler toolchain that refused to build the very old Fiasco.OC kernel version that was used until now (r56). Everything went quite smoothly, and I'm glad to see how the kernel develops further. Thanks to all developers at this point!
Unfortunately, I stumbled across an issue when it comes to thread destruction. In our system all threads are constructed and destructed by the roottask that is called 'core'. In some cases, not always but quite often, the Ram_quota pointer of the thread object is zero during the call of the Thread_oject's delete operator, which leads to a page-fault within the kernel-code. A simple check[1] before dereferencing the pointer solves the problem, but I wonder whether we will leak quota or memory then, or in general cover some more serious problem.
Thank you for reporting this issue. I will forward this to our kernel maintainer.
Could you elaborate a little bit more on the circumstances leading to this issue? I wonder whether we can come up with a simple test case triggering the page fault.
No need to come up with a test case. It turns out that your problem originates in an unfortunate combination of "old" sources and new toolchain. C++ allows the compiler to elide writes to objects that are later intialized by a constructor which leads to the _quota member not being initialized correctly under all circumstances.
That also answers your inital question that, yes, your check covers a more serious problem :).
Could you please try the attached patch? It should fix the problem.
Best, Matthias.
Best, Matthias.
Obviously, we have different usage patterns of syscalls, e.g.: the order of destructing IPC-gates, threads, IRQs, and tasks. Moreover, we still have some very few patches[2] so that the kernel meets our requirements. But none of them explains the thread's Ram_quota pointer getting zero. The page-fault triggers across all x86 and arm platforms that we use.
Any hint would be very much appreciated, all the best! Stefan
[1] https://github.com/skalk/foc/commit/2b01c9d16fd8e29e6af18fe750be2c8a312b4762 [2] https://github.com/skalk/foc/commits/r72
l4-hackers@os.inf.tu-dresden.de