Hey,
I'm having trouble booting the hello example on real hardware. The following config is booted via PXE:
addr 0x2000000 exec mparthey/foc/bootstrap -serial load mparthey/foc/fiasco -serial -serial_esc -esc load mparthey/foc/sigma0 load mparthey/foc/moe load mparthey/foc/l4re load mparthey/foc/hello
But it hangs after "MOE: Hello world". I can still enter the Kernel debugger, listing these task and thread objects:
[Objects] 1 f007e020 [Task ] {KERNEL} R=2 6 ffdc6134 [Thread ] {KERNEL} C=0 R=1 current 7 fffe8f70 [Task ] {sigma0 } R=3 8 ffd80134 [Thread ] {sigma0 } C=0 S=D:7 R=3 9 fffe8f18 [Task ] R=3 a ffd83134 [Thread ] C=0 S=D:9 R=4
Backtrace on thread a yields an address 0x143d7d in moe (?), belonging to
void List_alloc::merge() { List_alloc_sanity_guard __attribute__((unused)) guard(this, __func__); Mem_block *c = _first; while (c && c->next) 143d7d: 8b 00 mov (%eax),%eax 143d7f: 85 c0 test %eax,%eax 143d81: 75 ee jne 143d71 <_ZN22Single_page_alloc_base5_freeEPvmb+0x51> _ZN22Single_page_alloc_base5_freeEPvmb(): [...]/src/l4/pkg/moe/server/src/page_alloc.cc:109 }
On a different machine the exact same setup works fine. Has anyone got a clue what goes wrong here? If I should get more information out of the kernel debugger, just let me know.
Cheers
Markus
On Mon Jun 03, 2013 at 16:22:54 +0200, Markus Partheymueller wrote:
I'm having trouble booting the hello example on real hardware. The following config is booted via PXE:
addr 0x2000000 exec mparthey/foc/bootstrap -serial load mparthey/foc/fiasco -serial -serial_esc -esc load mparthey/foc/sigma0 load mparthey/foc/moe load mparthey/foc/l4re load mparthey/foc/hello
But it hangs after "MOE: Hello world". I can still enter the Kernel debugger, listing these task and thread objects:
[Objects] 1 f007e020 [Task ] {KERNEL} R=2 6 ffdc6134 [Thread ] {KERNEL} C=0 R=1 current 7 fffe8f70 [Task ] {sigma0 } R=3 8 ffd80134 [Thread ] {sigma0 } C=0 S=D:7 R=3 9 fffe8f18 [Task ] R=3 a ffd83134 [Thread ] C=0 S=D:9 R=4
Backtrace on thread a yields an address 0x143d7d in moe (?), belonging to
void List_alloc::merge() { List_alloc_sanity_guard __attribute__((unused)) guard(this, __func__); Mem_block *c = _first; while (c && c->next) 143d7d: 8b 00 mov (%eax),%eax 143d7f: 85 c0 test %eax,%eax 143d81: 75 ee jne 143d71 <_ZN22Single_page_alloc_base5_freeEPvmb+0x51> _ZN22Single_page_alloc_base5_freeEPvmb(): [...]/src/l4/pkg/moe/server/src/page_alloc.cc:109 }
On a different machine the exact same setup works fine. Has anyone got a clue what goes wrong here? If I should get more information out of the kernel debugger, just let me know.
My first guess would be that it's pagefaulting on that instruction. Could you check whether this is the case and what the of value eax/pfa is?
Adam
When I enable pagefault tracing, the last one I see is a fault at address 0xffffffff on that instruction. The respective source code is in /pkg/cxx/lib/tl/include/list_alloc:L182. I assume dereferencing c in c->next is the problem?
On 4 June 2013 00:40, Adam Lackorzynski adam@os.inf.tu-dresden.de wrote:
On Mon Jun 03, 2013 at 16:22:54 +0200, Markus Partheymueller wrote:
I'm having trouble booting the hello example on real hardware. The following config is booted via PXE:
addr 0x2000000 exec mparthey/foc/bootstrap -serial load mparthey/foc/fiasco -serial -serial_esc -esc load mparthey/foc/sigma0 load mparthey/foc/moe load mparthey/foc/l4re load mparthey/foc/hello
But it hangs after "MOE: Hello world". I can still enter the Kernel debugger, listing these task and thread objects:
[Objects] 1 f007e020 [Task ] {KERNEL} R=2 6 ffdc6134 [Thread ] {KERNEL} C=0 R=1 current 7 fffe8f70 [Task ] {sigma0 } R=3 8 ffd80134 [Thread ] {sigma0 } C=0 S=D:7 R=3 9 fffe8f18 [Task ] R=3 a ffd83134 [Thread ] C=0 S=D:9 R=4
Backtrace on thread a yields an address 0x143d7d in moe (?), belonging to
void List_alloc::merge() { List_alloc_sanity_guard __attribute__((unused)) guard(this, __func__); Mem_block *c = _first; while (c && c->next) 143d7d: 8b 00 mov (%eax),%eax 143d7f: 85 c0 test %eax,%eax 143d81: 75 ee jne 143d71 <_ZN22Single_page_alloc_base5_freeEPvmb+0x51> _ZN22Single_page_alloc_base5_freeEPvmb(): [...]/src/l4/pkg/moe/server/src/page_alloc.cc:109 }
On a different machine the exact same setup works fine. Has anyone got a clue what goes wrong here? If I should get more information out of the kernel debugger, just let me know.
My first guess would be that it's pagefaulting on that instruction. Could you check whether this is the case and what the of value eax/pfa is?
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
On Tue Jun 04, 2013 at 08:33:02 +0200, Markus Partheymueller wrote:
When I enable pagefault tracing, the last one I see is a fault at address 0xffffffff on that instruction. The respective source code is in /pkg/cxx/lib/tl/include/list_alloc:L182. I assume dereferencing c in c->next is the problem?
Yes, seems so. Coming back to the memory map provided by the bootloader, is there maybe a difference between the machine that works and this one? Maybe you could also try to nail down where that possible -1 is written e.g. with some if's at the write locations.
Adam
It seems to me as if bootstrap lists RAM regions beyond addressable memory. I have a region that is being limited:
Limiting 'RAM' region [100000000, 43e5fffff] {33e600000} to [100000000, 10ea7c3ff] { ea7c400} due to 3024 MB limit
but in fact has to be dropped if I am not completely mistaken. When dropping regions above ~0UL, everything works fine.
On 4 June 2013 19:27, Adam Lackorzynski adam@os.inf.tu-dresden.de wrote:
On Tue Jun 04, 2013 at 08:33:02 +0200, Markus Partheymueller wrote:
When I enable pagefault tracing, the last one I see is a fault at address 0xffffffff on that instruction. The respective source code is in /pkg/cxx/lib/tl/include/list_alloc:L182. I assume dereferencing c in c->next is the problem?
Yes, seems so. Coming back to the memory map provided by the bootloader, is there maybe a difference between the machine that works and this one? Maybe you could also try to nail down where that possible -1 is written e.g. with some if's at the write locations.
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
On Wed Jun 05, 2013 at 11:05:32 +0200, Markus Partheymueller wrote:
It seems to me as if bootstrap lists RAM regions beyond addressable memory. I have a region that is being limited:
Limiting 'RAM' region [100000000, 43e5fffff] {33e600000} to [100000000, 10ea7c3ff] { ea7c400} due to 3024 MB limit
but in fact has to be dropped if I am not completely mistaken. When dropping regions above ~0UL, everything works fine.
Thanks, I'll investigate/fix.
Adam
l4-hackers@os.inf.tu-dresden.de