Hi,
I got an application which crashes into slab_alloc (slab.c:358). After some investigation, I've found the faulty slab : l4rm_region_cache. slabs_part != NULL, free_objs == NULL and num_free == 2.
My first question is : do you agree this state should not happen ?
If yes, I've got the following solutions : 1. there's a bug in slab library, 2. there's a bug in l4rm grow function, 3. synchro problem.
It don't believe in 1 or 3. But, in fact, I don't understand l4rm grow mechanism.
Another info : one of the analysed crash happened into l4thread_create.
Regards Marc
Hi,
On Fri Jan 18, 2008 at 11:36:53 +0100, Marc CHALAND wrote:
I got an application which crashes into slab_alloc (slab.c:358). After some investigation, I've found the faulty slab : l4rm_region_cache. slabs_part != NULL, free_objs == NULL and num_free == 2.
My first question is : do you agree this state should not happen ?
If yes, I've got the following solutions :
- there's a bug in slab library,
- there's a bug in l4rm grow function,
- synchro problem.
It don't believe in 1 or 3. But, in fact, I don't understand l4rm grow mechanism.
Another info : one of the analysed crash happened into l4thread_create.
By any chance, can you provide a small test case which triggers this?
Adam
Adam Lackorzynski adam@os.inf.tu-dresden.de:
By any chance, can you provide a small test case which triggers this?
This is quite difficult to produce. I guessed that this was due to memory leek and I've found some. So it would be easy to make a sample by allocating memory in a same manner. Tomorow, I will try to do one example.
Regards Marc
After more investigation, I've found that slab_alloc and slab_free on l4rm_region_cache are not all into a critical section (for example: l4rm_attach may be called by several threads). At first, I though region_list_lock aimed at that, but in fact, this is not the case.
My first idea is to put slab_alloc between l4rm_lock_region_list and l4rm_unlock_region_list into l4rm_region_desc_alloc. Same for slab_free. But I'm not sure this is a good idea. What is your opinion ?
Regards Marc
On Tue Jan 22, 2008 at 12:37:45 +0100, Marc CHALAND wrote:
After more investigation, I've found that slab_alloc and slab_free on l4rm_region_cache are not all into a critical section (for example: l4rm_attach may be called by several threads). At first, I though region_list_lock aimed at that, but in fact, this is not the case.
My first idea is to put slab_alloc between l4rm_lock_region_list and l4rm_unlock_region_list into l4rm_region_desc_alloc. Same for slab_free. But I'm not sure this is a good idea. What is your opinion ?
You're right that the slab objects themselves were not locked. The fix was not as easy as proposed but I think I have fixed this (update l4env and l4rm packages). I hope this fixes your issue. Thanks for reporting, very valuable!
Adam
2008/1/22, Adam Lackorzynski adam@os.inf.tu-dresden.de:
You're right that the slab objects themselves were not locked. The fix was not as easy as proposed but I think I have fixed this (update l4env and l4rm packages). I hope this fixes your issue.
Thank you for this patch. We will test it this afternoon and tonight.
Regards Marc
Hi,
The patch fixes the crash. Test ran all night without any problem. Thank you Adam.
I will try to get testing machine to get more info about semaphore log.
Regards Marc
l4-hackers@os.inf.tu-dresden.de