Booting L4Re on the CI20: Panic in sigma0
paul at boddie.org.uk
Thu Jul 20 22:10:48 CEST 2017
On Wednesday 19. July 2017 19.40.23 Paul Boddie wrote:
> It always seems to involve an address of 0x8, which seems rather bizarre.
> Again, I think I must be missing something fundamental and must only be
> seeing the consequences.
So, I adjusted the kernel code, putting back in a commented-out debugging
statement found in the Thread::handle_page_fault method which looks like this
(having changed some of the details):
printf("Translation error ? %p\n"
" is_kmem_page_fault ? %x\n"
" is_sigma0 ? %x\n"
" program counter: %p\n"
" regs->ip(): %p\n"
" page fault address: %p\n",
(void *) PF::is_translation_error(error_code),
!PF::is_translation_error(error_code) && mem_space()->is_sigma0(),
(void *) pc,
(void *) regs->ip(),
(void *) pfa);
I also introduced a statement in Thread::handle_page_fault_pager as follows:
printf("handle_page_fault_pager: pfa=" L4_PTR_FMT
", errorcode=" L4_PTR_FMT ", pc=%lx, bad_v_addr=%lx\n",
pfa, error_code, regs()->ip(), regs()->bad_v_addr);
I then observe some strange behaviour:
Translation error ? 0x1
is_kmem_page_fault ? 0
is_sigma0 ? 0
program counter: 0x80019c8c
page fault address: 0xc
handle_page_fault_pager: pfa=0000000c, errorcode=00000009, pc=103502c,
L4Re[svr]: request: tag=0xfffe0002 proto=-2 obj=0x0
L4Re: page fault: 9 pc=103502c
L4Re[rm]: unhandled read page fault at 0x8 pc=0x103502c
In the above, the last three lines are normal debugging output. The (wrapped)
line above those is from my statement in handle_page_fault_pager.
For some reason, the presumably correct bad_v_addr (bad virtual address,
0x8cc4) arising in the apparent initial page fault (at 0x0103502c) does not
get propagated back to L4Re alongside the associated program counter value.
Instead, 0x8 gets reported in the L4Re logging output.
While handling this page fault, there appears to be another page fault in the
kernel (at 0x80019c8c). This latter fault can't be handled (as discussed
below) and so the original exception is eventually exposed in L4Re with the
confused mix of details noted above.
The unlikely address of 0x8 reported by L4Re may be related to the kernel
fault address of 0xc, which according to the above details occurs in the
following code (found in Ram_quota::alloc):
80019c7c: 40036000 mfc0 v1,c0_status
80019c80: 30670001 andi a3,v1,0x1
80019c84: 41606000 di
80019c88: 000000c0 ehb
80019c8c: 8c82000c lw v0,12(a0)
Note how the final, fault-causing instruction involves 12 (0xc), suggesting
that a0 is set to zero, which is not an expected value given that it refers to
a function/method parameter block and given that a parameter is expected by
Unfortunately, I don't know the invocation chain responsible for this, and it
doesn't appear to be very obvious how I might discover it efficiently.
More information about the l4-hackers