"Paul Phillips" pphillips@ivue.com writes:
I believe I have found a couple of bugs in Fiasco which were causing the multiple server problem, and probably some other problems. In /fiasco/src/time.h : the code for set( microseconds):
From the first impression, your fix looks right. Thanks for tracking
this down!
I still wonder why I haven't ever seen this problem... I think I tried booting L4Linux and `hello' concurrently, but maybe the order in which they're started makes a difference.
I almost can't believe that I wrote that routine... :-) It must have been late at night. I probably didn't invest much thought into it because I was thinking I was going to rewrite it soon. The timeout handling also suffers from at least two other problems: It disables interrupts while traversing a linear linked list of unknown length, that is, potentially for an unacceptably long time; and it suffers from the ``page fault may re-enable interrupts'' problem because it potentially touches a TCB in a 4-MB kernel region for which a reference to its shared page table hasn't yet been inserted into the current task's page directory (this is done ``on demand'' in the page fault handler). This is just one more example why doing kernel synchronization using interrupt-disabling is a bad idea.
[ from a different email ]
Why did you relocate it to 0x2400000? It should be able to run (even concurrently with L4Linux) at the memory location it is linked to using the distributed version of l4/server/hello/Makefile, 0x200000.
I thought perhaps GRUB was trying to load both 'hello' and L4Linux at the same physical address and that was the reasoning behind my relocating it. It makes no difference.
GRUB is never trying to load boot modules to conflicting memory addresses; the modules (in this case, ELF binaries) are not interpreted at all---GRUB just locates them linearly behind what has been loaded as the ``kernel,'' which in our case is Rmgr, the resource and boot manager.
Once it has been started by GRUB, Rmgr interprets the ELF binaries and loads them to their destination addresses. Rmgr makes sure these addresses don't conflict---otherwise you will see an ugly (and not very informative, I'm afraid) assertion failure.
Thanks again!
Michael
[ code included for reference so that the bug tracking system sees it, too ]
inline void timeout_t::set(unsigned long long abs_microsec) { // XXX uses cli/sti
_wakeup = abs_microsec; _flags.set = 1;
unsigned flags = get_eflags(); cli();
if (!timer::first_timeout) { timer::first_timeout = this; _prev = _next = 0; } else { timeout_t *ti = timer::first_timeout;
for (;;)
{ if (ti->_wakeup >= _wakeup) { // insert before ti _next = ti; _prev = ti->_prev; ti->_prev = this; if (_prev) _prev->_next = this; /***** here you need to set timer::first_timeout since you just replaced it */ else timer::first_timeout = this; /**********/ goto done; }
if (ti->_next) { ti = ti->_next; continue; }
/**** here you need to put a break in case ti->wakeup < timeout, and and ti->_next==0 . Otherwise you loop here with interrupts disabled forever */ else break; /********************/ }
// insert as last item after ti ti->_next = this; _prev = ti; _next = 0; }
done: set_eflags(flags); }