Re: Building programs with MODE=shared in L4Re

17 Jun 2018

      On Thursday 14. June 2018 15.38.33 Paul Boddie wrote:
...
Is there some way of interpreting the "PFA" value or getting more
information about where the exception really occurs?
Well, I got some off-list help/encouragement (many thanks!) and put in some 
debugging statements to see what the cause of the exception might be.

In the Region_map::op_exception method definition (found in the file pkg/l4re-
core/l4re_kernel/server/src/region.cc), modifying the debugging output yields 
the following information:

pc=0x800000      (program counter)
gp=0x82dd30      (global pointer)
sp=0x8d7a        (stack pointer, called "PFA" in the default output)
ra=0x802f6c      (return address)
cause=0x1000002c (exception cause)

Initially, I thought this might just be a stray memory access, not really 
knowing the significance of 0x800000 and whether it might be valid for the 
program counter. However, further investigation indicated that it is clearly 
the base address for the loaded object. Also, the stack pointer is fine.

The clue to the actual cause of the exception is the "cause" register whose 
lower bits provide an indication of the nature of the exception. This turned 
out to be a "coprocessor unusable" exception.

It was suggested that I output the details of the instruction causing the 
exception (which I should really have thought of doing myself, but I guess it 
has been a while since I did this kind of debugging), and this yielded the 
following value:

464c457f

The significance of this value may be obvious to people here, especially given 
where it was found, but for the rest of us, I can reveal that it is just the 
ELF magic number (0x7f 'E' 'L' 'F'). It is a "happy" coincidence that this 
value looks somewhat like a "coprocessor 1" instruction for the MIPS32 
architecture, with the appropriate bits (31..26) indicating a COP1 instruction 
type, causing the exception on this SoC without such a coprocessor.

I dumped and disassembled the calling region of the code which yielded this:

8f998250 # lw $t9, -32176($gp)
24a55fa8 # addiu $a1, $a1, 0x5fa8
0320f809 # jalr $t9
24844ee4 # addiu $a0, $a0, 0x4ee4
8fbc0010 # lw $gp, 16($sp)

With these details, and using objdump to dump all the programs and libraries, 
I discovered that it comes from the _ftext section of libld-l4.so:

    2f5c:       8f998250        lw      t9,-32176(gp)
    2f60:       24a55fa8        addiu   a1,a1,24488
    2f64:       0320f809        jalr    t9
    2f68:       24844ee4        addiu   a0,a0,20196
    2f6c:       8fbc0010        lw      gp,16(sp)

So, what appears to be happening is that the "jalr t9" instruction is using a 
value for t9 that is 0x800000, which causes a jump to the object header and 
the subsequent failure. Here, I started to suspect a problem with the gp 
register initialisation, but this appears completely reasonable:

00002780 <_ftext>:
    2780:       3c1c0003        lui     gp,0x3
    2784:       279cb5b0        addiu   gp,gp,-19024
    2788:       0399e021        addu    gp,gp,t9

This provides a value of 0x30000 - 19024 == 0x2dd30 for gp. (I guess that it 
is really 0x82dd30 at run time, with t9 having been adjusted.) The global 
offset table resides at 0x25d40:

00025d40 a _GLOBAL_OFFSET_TABLE_

But the difference between gp and the table is as expected:

#define OFFSET_GP_GOT 0x7ff0

(See: pkg/l4re-core/uclibc/lib/contrib/uclibc/ldso/ldso/mips/elfinterp.c)

Where things seem to go wrong is with the computation of t9 before the call:

gp - 32176 == 0x2dd30 - 32176 == 0x25f80

The next symbol/section after the table is this one:

00025f90 g __dso_handle

So, the location of the address to be used (0x25f80) is within the region of 
the table. However, dumping the memory from the start of the table until the 
next section indicates that the address lies within an area that seems to be 
padding, appearing after all the meaningful entries and featuring only 
0x800000 for each such entry.

I imagine that code fixes up the table, adding the object base address to each 
entry, and the padding is also adjusted because the loop just proceeds until 
it encounters the next section (or the end of the .got section).

What I cannot figure out is where the _ftext code actually comes from. It 
seems to be some kind of initialisation code, but the only things containing 
_ftext in the distribution are linker scripts. So I don't know where to find 
the offending operations.

I did test shared executables successfully on the CI20 which uses a different 
MIPS32 architecture revision. The code is rather different, perhaps employing 
different styles of code generation that never got applied to the earlier 
architecture revision, but I notice that the global offset tables are 
similarly sized and have the last meaningful entry referring to 
__dl_runtime_pltresolve.

I suppose that I need to figure out which code is responsible for the failing 
invocation and why the generated code is trying to access an uninitialised 
table entry. Although I suspect that some change I have made is responsible, 
there doesn't seem to be anything really obvious amongst my patches. The 
patches I needed to fix t9 initialisation are also in use for the CI20, so I 
doubt that they would have an effect, even if they were relevant here.

Sorry for the long message, but not being particularly familiar with the way 
dynamic linking works, I feel that reporting my observations might trigger the 
memories of those who have seen such problems before.

Paul