strange Bug

Mon Nov 17 10:01:04 CET 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Alright, let's see:
> 
> ---------------------------------------------------------------------
>
> 
CPU 0 [f002e898]: IRQ ENTRY
> CPU(s) 0-1 entered JDB jdb: l id  cpu    name             pr     sp
> wait    to state 6e   0     &-                2     58     -
> rcv_wait 6b   0     -----            10     58     -
> rcv_wait 62   0     &-                2     49     -
> rcv_wait 60   0     -----             2     58    6e
> rcv_wait,exc_progr 5c   0     -----            10     49     -
> rcv_wait 59   0     #myClient        ff     58     -
> rcv_wait 4d   0     -----             2     49     -
> rcv_wait,fpu 4a   0     #myServer        ff     49     -
> rcv_wait 40   0     -----             2     31     -
> rcv_wait 3d   0     -----            10     31     -
> rcv_wait 35   0     -----             2     31    35
> rcv_wait 32   0     #ned             ff     31     -
> rcv_wait c   1     -----             0      1             ready a
> 0     moe              ff      9     -       rcv_wait 8   0
> sigma0            1      7     -       rcv_wait 6   0     -----
> 0      1             ready
> 
>> One of them will be the one that is in the debugger now. It
>> should be one of the L4Re threads (those have names starting with
>> # in the thread list).
> 
> Is it thread 60 with 'exc_progr'? Or one of the 'ready' threads?

Very good. The interesting parts of this table are:

* id -> is the respective thread's debug ID
* name -> the thread name set using l4_debugger_set_name(). Usually,
  L4Re applications' first thread is one with a name starting with
  # followed by the binary name.
* sp -> the address space (L4: task) this thread is running in
* wait -> the ID of the thread / object this thread is currently
  blocked waiting for
* state -> the thread's scheduling state

As you guessed correctly, 60 is the interesting one here. It is stuck
in an exception sent to thread 6e. Both are running in address space
58, The L4Re thread for this space is 59 indicating that this is the
myClient application.

One more interesting thing here:
  * 60 is stuck in a message sent to a thread named '&-'. This is the
    name used by L4Re's POSIX signal library for the thread responsible
    for handling exceptions and generating signals from them. This
    indicates that the actual page fault is already gone and was trans-
    formed into an exception message to this thread.

>> What is this thread's state (command 't<id>')? At what
>> instruction is the thread stuck?
> 
> jdb: t60 thread  :  60 <0xf1195000>      CPU: 0:0        prio: 02
> mode: Con state   : 40008 rcv_wait,exc_progr wait for:   6e
> polling:        rcv descr: 00000000 lcked by:
> timeout  : cpu time:  21.000 ms            timeslice: 8000/-1 �s 
> pager   : [C:   3] D:  59       task     : D:  58 exc-hndl: [C:
> 415] D:  6e       UTCB     : f118a200/b3000200 vCPU    : --- vCPU
> : ---
> 
> PC=0015c7a8 USP=7ffff2f8 [0] 000a4a0c 00091f5c 0000050c 00025848
> [4] 00031c20 ffffdabc 000aea70 00000001 [8] 80003e64 00091f5c
> 00000000 00000000 [c] 000ae9b8 0007cb14 004094b4 60000010
> 
> 
> At least this thread has the PC of the Error-Message:

Exactly. 60 is the one that caused the page fault. This means your
myClient application binary should contain the respective PC and allow
you to figure out where the faulting access happened.

> myClient| L4Re[rm]: unhandled write page fault @7ffff2e0 pc=15c7a8 
> myClient| No signal handler found
> 
> 
> Well, is this the bad guy? If so, then where does it come from? And
> what to check next?

Ideally you now find the faulting address in myClient and figure out
where the access happens. Keep us posted if you need help with that.

Bjoern
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlRpuU0ACgkQP5ijxgQLUNlesACgl2yZal0VeDg66THF8iSEK9pZ
Z40An2sb/Nw2jquTWtFQ1nTlhbU3f3RT
=GSKS
-----END PGP SIGNATURE-----