Booting L4Re on the CI20: Panic in sigma0

Fri Jul 21 00:06:27 CEST 2017

On Thu Jul 20, 2017 at 22:10:48 +0200, Paul Boddie wrote:
> On Wednesday 19. July 2017 19.40.23 Paul Boddie wrote:
> > 
> > It always seems to involve an address of 0x8, which seems rather bizarre.
> > Again, I think I must be missing something fundamental and must only be
> > seeing the consequences.
> 
> So, I adjusted the kernel code, putting back in a commented-out debugging 
> statement found in the Thread::handle_page_fault method which looks like this 
> (having changed some of the details):
> 
>   printf("Translation error ? %p\n"
>          "  is_kmem_page_fault ? %x\n"
>          "  is_sigma0 ? %x\n"
>          "  program counter: %p\n"
>          "  regs->ip(): %p\n"
>          "  page fault address: %p\n",
>          (void *) PF::is_translation_error(error_code),
>          !PF::is_translation_error(error_code) && mem_space()->is_sigma0(),
>          Kmem::is_kmem_page_fault(pfa, error_code),
>          (void *) pc,
>          (void *) regs->ip(),
>          (void *) pfa);
> 
> I also introduced a statement in Thread::handle_page_fault_pager as follows:
> 
>   printf("handle_page_fault_pager: pfa=" L4_PTR_FMT
>          ", errorcode=" L4_PTR_FMT ", pc=%lx, bad_v_addr=%lx\n",
>          pfa, error_code, regs()->ip(), regs()->bad_v_addr);
> 
> I then observe some strange behaviour:
> 
> Translation error ? 0x1
>   is_kmem_page_fault ? 0
>   is_sigma0 ? 0
>   program counter: 0x80019c8c
>   regs->ip(): 0x80019c8c
>   page fault address: 0xc
>   regs->bad_v_addr: 0xc
> handle_page_fault_pager: pfa=0000000c, errorcode=00000009, pc=103502c, 
> bad_v_addr=8cc4
> L4Re[svr]: request: tag=0xfffe0002 proto=-2 obj=0x0
> L4Re: page fault: 9 pc=103502c
> L4Re[rm]: unhandled read page fault at 0x8 pc=0x103502c
> 
> In the above, the last three lines are normal debugging output. The (wrapped) 
> line above those is from my statement in handle_page_fault_pager.
> 
> For some reason, the presumably correct bad_v_addr (bad virtual address, 
> 0x8cc4) arising in the apparent initial page fault (at 0x0103502c) does not 
> get propagated back to L4Re alongside the associated program counter value. 
> Instead, 0x8 gets reported in the L4Re logging output.
> 
> While handling this page fault, there appears to be another page fault in the 
> kernel (at 0x80019c8c). This latter fault can't be handled (as discussed 
> below) and so the original exception is eventually exposed in L4Re with the 
> confused mix of details noted above.
> 
> The unlikely address of 0x8 reported by L4Re may be related to the kernel 
> fault address of 0xc, which according to the above details occurs in the 
> following code (found in Ram_quota::alloc):

That looks like you should use the patch in
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008005.html


Adam