Hello,
I have been busy writing libraries and programs for L4Re and recently had to think about the size of the payloads I have been deploying. It appears that in the build system, one can use MODE=shared to build dynamically-linked programs, and there is a shared version of the hello example in the examples package.
However, I don't see any recipe for doing this for other programs. My impression is that the dynamic linker must have all the required libraries deployed as modules, and so I wrote a script to generate a list of such modules using readelf. In addition, it appears that libl4sys-direct.so is required (which readelf does not detect).
Adding these modules to my conf/modules.list should yield success, but within my constrained debugging environment, I only see that "shared" programs fail to start. I really need to see if they start but then experience some initialisation problem, or whether they actually crash but Mag keeps their viewports around.
Are there any other things that I need to consider when building and deploying such programs? I searched for guidance on this topic and only found the following:
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007094.html http://os.inf.tu-dresden.de/pipermail/l4-hackers/2016/007830.html
Both of these discussion threads only provide some details and are obviously not tutorials.
Thanks for any clues you might be able to provide!
Paul
P.S. I guess I could also see if the CI20 produces useful console-level details, perhaps indicating an architecture-related problem. Usually, however, the problem is caused by some simple mistake I have made, so I suspect that there is merely some detail I have forgotten.
Hi Paul,
On Fri May 11, 2018 at 01:00:11 +0200, Paul Boddie wrote:
I have been busy writing libraries and programs for L4Re and recently had to think about the size of the payloads I have been deploying. It appears that in the build system, one can use MODE=shared to build dynamically-linked programs, and there is a shared version of the hello example in the examples package.
However, I don't see any recipe for doing this for other programs. My impression is that the dynamic linker must have all the required libraries deployed as modules, and so I wrote a script to generate a list of such modules using readelf. In addition, it appears that libl4sys-direct.so is required (which readelf does not detect).
It's option that available but on default most/all programs are static.
Adding these modules to my conf/modules.list should yield success, but within my constrained debugging environment, I only see that "shared" programs fail to start. I really need to see if they start but then experience some initialisation problem, or whether they actually crash but Mag keeps their viewports around.
Are there any other things that I need to consider when building and deploying such programs? I searched for guidance on this topic and only found the following:
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007094.html http://os.inf.tu-dresden.de/pipermail/l4-hackers/2016/007830.html
Both of these discussion threads only provide some details and are obviously not tutorials.
Thanks for any clues you might be able to provide!
You can set LD_DEBUG=1 in the environment of your program to make the dynamic loader tell you something. LD_TRACE_LOADED_OBJECTS=1 might also be of help. Add enviroment variable settings after the program's cmdline: ...:start({ ...}, "rom/myprog arg", { LD_DEBUG=1, ... });
Adam
P.S. I guess I could also see if the CI20 produces useful console-level details, perhaps indicating an architecture-related problem. Usually, however, the problem is caused by some simple mistake I have made, so I suspect that there is merely some detail I have forgotten.
On Monday 14. May 2018 21.18.04 Adam Lackorzynski wrote:
You can set LD_DEBUG=1 in the environment of your program to make the dynamic loader tell you something. LD_TRACE_LOADED_OBJECTS=1 might also be of help. Add enviroment variable settings after the program's cmdline: ...:start({ ...}, "rom/myprog arg", { LD_DEBUG=1, ... });
I imagine I could direct log information to a suitable program for display on the screen rather than via the serial console. Can I do this using something like fbterminal? I'm still getting to grips with the different mechanisms accessed and exposed by the different programs.
Paul
On Monday 14. May 2018 22.06.06 Paul Boddie wrote:
On Monday 14. May 2018 21.18.04 Adam Lackorzynski wrote:
You can set LD_DEBUG=1 in the environment of your program to make the dynamic loader tell you something. LD_TRACE_LOADED_OBJECTS=1 might also be of help. Add enviroment variable settings after the program's cmdline: ...:start({ ...}, "rom/myprog arg", { LD_DEBUG=1, ... });
I imagine I could direct log information to a suitable program for display on the screen rather than via the serial console. Can I do this using something like fbterminal? I'm still getting to grips with the different mechanisms accessed and exposed by the different programs.
Maybe the matter of configuring fbterminal as a logging destination is obvious to those better acquainted with L4Re, but I eventually figured it out in a way. What I needed to do is to provide a different "log" capability to the hello program, but this is perhaps easier said than done.
According to the documentation...
http://l4re.org/doc/l4re_servers_ned.html#l4re_ned_startup
...it is supposed to be possible to indicate a log factory that creates a suitable object using the "log_fab" property/attribute of the loader.
Looking at the Lua code (pkg/l4re-core/ned/server/src/ned.lua), the App_env:log function seems to be the thing that gets called, returning the desired capability:
return self.loader.log_fab:create(Proto.Log, table.unpack(self.log_args));
While, I guess I might make a suitable loader available with code that looks like this...
local l2 = L4.Loader.new({loader = l, log_fab = term});
...and then use it to start the hello program...
l2:start({ log = { "hello", "b" }, }, "rom/hello");
...the challenge is to specify something that can *create* the logging destination. In my .cfg script, "term" is actually the IPC gate (or "channel") capability exposing the fbterminal:
local term = l:new_channel();
l:start({ caps = { fb = mag_caps.svc:create(L4.Proto.Goos, "g=800x460+0+0", "barheight=20"), term = term:svr(), }, }, "rom/fbterminal");
So, as far as I can see, I would need something that acts as a factory capable of providing a fbterminal capability when its create method is invoked (specifying the log protocol). This seems like a lot of effort.
It then occurred to me that I only really want a way of presenting the "term" capability to the hello program. Since this isn't obviously possible in the loader, I modified the App_env:log function to support an additional case:
if self.log_cap then return self.log_cap elseif self.loader.log_fab == nil or self.loader.log_fab.create == nil then error ("Starting an application without valid log factory", 4); end
Thus, if "log_cap" is indicated when starting a program, it just uses this instead of trying to conjure up another capability. So the hello program is started as follows:
l:start({ log_cap = term, }, "rom/hello");
This manages to log to fbterminal, much to my relief. I hope this helps anyone else who struggled with this or just wondered about it.
Paul
On Monday 14. May 2018 21.18.04 Adam Lackorzynski wrote:
You can set LD_DEBUG=1 in the environment of your program to make the dynamic loader tell you something. LD_TRACE_LOADED_OBJECTS=1 might also be of help. Add enviroment variable settings after the program's cmdline: ...:start({ ...}, "rom/myprog arg", { LD_DEBUG=1, ... });
So, with the help of fbterminal (mentioned in my last message), I managed to get some debugging information out of the loader:
L4Re: rom/ex_hello_shared: Unhandled exception: PC=0x800000 PFA=8d7a LdrFlgs=0
This appears to be generated in pkg/l4re-core/l4re_kernel/server/src/region.cc inside the Region_map::op_exception method. I'm not really sure what I can do with this information, though.
Alongside ex_hello_shared and the stack of programs supporting the framebuffer environment, I have the following libraries included in the modules.list:
module lib4re.so module lib4re-util.so module libc_be_l4refile.so module libc_be_l4re.so module libc_be_socket_noop.so module libc_support_misc.so module libdl.so module libl4sys-direct.so module libl4sys.so module libl4util.so module libld-l4.so module libpthread.so module libsupc++.so module libuc_c.so
This being for reference, since my first guess would be that the loader is failing before even considering any of the libraries.
Paul
On Sun May 20, 2018 at 00:47:02 +0200, Paul Boddie wrote:
On Monday 14. May 2018 21.18.04 Adam Lackorzynski wrote:
You can set LD_DEBUG=1 in the environment of your program to make the dynamic loader tell you something. LD_TRACE_LOADED_OBJECTS=1 might also be of help. Add enviroment variable settings after the program's cmdline: ...:start({ ...}, "rom/myprog arg", { LD_DEBUG=1, ... });
So, with the help of fbterminal (mentioned in my last message), I managed to get some debugging information out of the loader:
L4Re: rom/ex_hello_shared: Unhandled exception: PC=0x800000 PFA=8d7a LdrFlgs=0
This appears to be generated in pkg/l4re-core/l4re_kernel/server/src/region.cc inside the Region_map::op_exception method. I'm not really sure what I can do with this information, though.
Alongside ex_hello_shared and the stack of programs supporting the framebuffer environment, I have the following libraries included in the modules.list:
module lib4re.so module lib4re-util.so module libc_be_l4refile.so module libc_be_l4re.so module libc_be_socket_noop.so module libc_support_misc.so module libdl.so module libl4sys-direct.so module libl4sys.so module libl4util.so module libld-l4.so module libpthread.so module libsupc++.so module libuc_c.so
This being for reference, since my first guess would be that the loader is failing before even considering any of the libraries.
So the "Unhandled exception" message is the first one, or are there other messages? For me this works, in QEMU. If there would be a lib missing or similar it would also complain differently.
Adam
On Friday 8. June 2018 00.32.04 Adam Lackorzynski wrote:
On Sun May 20, 2018 at 00:47:02 +0200, Paul Boddie wrote:
So, with the help of fbterminal (mentioned in my last message), I managed to get some debugging information out of the loader:
L4Re: rom/ex_hello_shared: Unhandled exception: PC=0x800000 PFA=8d7a LdrFlgs=0
[...]
So the "Unhandled exception" message is the first one, or are there other messages? For me this works, in QEMU. If there would be a lib missing or similar it would also complain differently.
This is the first and only message I see.
I have exercised the dynamic loading functionality successfully elsewhere, using dlopen and dlsym to obtain libraries from the "rom" directory, so it probably isn't the act of loading libraries that causes this problem.
Is there some way of interpreting the "PFA" value or getting more information about where the exception really occurs?
Paul
On Thursday 14. June 2018 15.38.33 Paul Boddie wrote:
Is there some way of interpreting the "PFA" value or getting more information about where the exception really occurs?
Well, I got some off-list help/encouragement (many thanks!) and put in some debugging statements to see what the cause of the exception might be.
In the Region_map::op_exception method definition (found in the file pkg/l4re- core/l4re_kernel/server/src/region.cc), modifying the debugging output yields the following information:
pc=0x800000 (program counter) gp=0x82dd30 (global pointer) sp=0x8d7a (stack pointer, called "PFA" in the default output) ra=0x802f6c (return address) cause=0x1000002c (exception cause)
Initially, I thought this might just be a stray memory access, not really knowing the significance of 0x800000 and whether it might be valid for the program counter. However, further investigation indicated that it is clearly the base address for the loaded object. Also, the stack pointer is fine.
The clue to the actual cause of the exception is the "cause" register whose lower bits provide an indication of the nature of the exception. This turned out to be a "coprocessor unusable" exception.
It was suggested that I output the details of the instruction causing the exception (which I should really have thought of doing myself, but I guess it has been a while since I did this kind of debugging), and this yielded the following value:
464c457f
The significance of this value may be obvious to people here, especially given where it was found, but for the rest of us, I can reveal that it is just the ELF magic number (0x7f 'E' 'L' 'F'). It is a "happy" coincidence that this value looks somewhat like a "coprocessor 1" instruction for the MIPS32 architecture, with the appropriate bits (31..26) indicating a COP1 instruction type, causing the exception on this SoC without such a coprocessor.
I dumped and disassembled the calling region of the code which yielded this:
8f998250 # lw $t9, -32176($gp) 24a55fa8 # addiu $a1, $a1, 0x5fa8 0320f809 # jalr $t9 24844ee4 # addiu $a0, $a0, 0x4ee4 8fbc0010 # lw $gp, 16($sp)
With these details, and using objdump to dump all the programs and libraries, I discovered that it comes from the _ftext section of libld-l4.so:
2f5c: 8f998250 lw t9,-32176(gp) 2f60: 24a55fa8 addiu a1,a1,24488 2f64: 0320f809 jalr t9 2f68: 24844ee4 addiu a0,a0,20196 2f6c: 8fbc0010 lw gp,16(sp)
So, what appears to be happening is that the "jalr t9" instruction is using a value for t9 that is 0x800000, which causes a jump to the object header and the subsequent failure. Here, I started to suspect a problem with the gp register initialisation, but this appears completely reasonable:
00002780 <_ftext>: 2780: 3c1c0003 lui gp,0x3 2784: 279cb5b0 addiu gp,gp,-19024 2788: 0399e021 addu gp,gp,t9
This provides a value of 0x30000 - 19024 == 0x2dd30 for gp. (I guess that it is really 0x82dd30 at run time, with t9 having been adjusted.) The global offset table resides at 0x25d40:
00025d40 a _GLOBAL_OFFSET_TABLE_
But the difference between gp and the table is as expected:
#define OFFSET_GP_GOT 0x7ff0
(See: pkg/l4re-core/uclibc/lib/contrib/uclibc/ldso/ldso/mips/elfinterp.c)
Where things seem to go wrong is with the computation of t9 before the call:
gp - 32176 == 0x2dd30 - 32176 == 0x25f80
The next symbol/section after the table is this one:
00025f90 g __dso_handle
So, the location of the address to be used (0x25f80) is within the region of the table. However, dumping the memory from the start of the table until the next section indicates that the address lies within an area that seems to be padding, appearing after all the meaningful entries and featuring only 0x800000 for each such entry.
I imagine that code fixes up the table, adding the object base address to each entry, and the padding is also adjusted because the loop just proceeds until it encounters the next section (or the end of the .got section).
What I cannot figure out is where the _ftext code actually comes from. It seems to be some kind of initialisation code, but the only things containing _ftext in the distribution are linker scripts. So I don't know where to find the offending operations.
I did test shared executables successfully on the CI20 which uses a different MIPS32 architecture revision. The code is rather different, perhaps employing different styles of code generation that never got applied to the earlier architecture revision, but I notice that the global offset tables are similarly sized and have the last meaningful entry referring to __dl_runtime_pltresolve.
I suppose that I need to figure out which code is responsible for the failing invocation and why the generated code is trying to access an uninitialised table entry. Although I suspect that some change I have made is responsible, there doesn't seem to be anything really obvious amongst my patches. The patches I needed to fix t9 initialisation are also in use for the CI20, so I doubt that they would have an effect, even if they were relevant here.
Sorry for the long message, but not being particularly familiar with the way dynamic linking works, I feel that reporting my observations might trigger the memories of those who have seen such problems before.
Paul
On Sunday 17. June 2018 01.07.44 Paul Boddie wrote:
With these details, and using objdump to dump all the programs and libraries, I discovered that it comes from the _ftext section of libld-l4.so:
2f5c: 8f998250 lw t9,-32176(gp) 2f60: 24a55fa8 addiu a1,a1,24488 2f64: 0320f809 jalr t9 2f68: 24844ee4 addiu a0,a0,20196 2f6c: 8fbc0010 lw gp,16(sp)
So, what appears to be happening is that the "jalr t9" instruction is using a value for t9 that is 0x800000, which causes a jump to the object header and the subsequent failure.
Some more investigation (and off-list encouragement) eventually led me to realise what might be happening here. The table entry providing the value for t9 happens to be a "global" entry in the global offset table. Looking at the output from "readelf -a" for libld-l4.so yields the following:
Global entries: Address Access Initial Sym.Val. Type Ndx Name 00025f80 -32176(gp) 00000000 00000000 FUNC UND __register_frame_info
Here, we see the offending entry and what the program apparently expects to find. Other entries that are apparently undefined are the following:
_ITM_deregisterTMCloneTable _ITM_registerTMCloneTable __deregister_frame_info
(It is worth noting at this point that the "_ITM" symbols are also undefined in the CI20's libld-l4.so, whereas the other symbols do not appear at all.)
What I can see is that various libgcc binaries produced for my toolchain provide __register_frame_info, notably libgcc_eh.a and libgcc_s.so. However, amongst the libgcc binaries in the L4Re build directory, only the following provides the object:
pkg/l4re-core/libgcc/lib/OBJ-mips_32-l4f/libgcc-l4.so
It isn't clear to me what relation this L4Re-built object has to the other libraries, however. The build system logic in the mk directory doesn't refer to such libraries in an obvious way. Maybe all I am missing is some linking option, but I cannot be certain of this.
Paul
On Wednesday 27. June 2018 00.43.33 Paul Boddie wrote:
Global entries: Address Access Initial Sym.Val. Type Ndx Name 00025f80 -32176(gp) 00000000 00000000 FUNC UND __register_frame_info
Here, we see the offending entry and what the program apparently expects to find. Other entries that are apparently undefined are the following:
_ITM_deregisterTMCloneTable _ITM_registerTMCloneTable __deregister_frame_info
Here, I should have realised that the declarations and "null" definitions in pkg/l4re-core/ldso/ldso/fixup.c were important in this context, having already looked at it amongst many other files while trying to understand what was happening.
Fortunately, my off-list helper, Jean Wolter, kindly pointed me to this file and suggested that I add definitions for these symbols being produced by my compiler (GCC 6.4.0 from Buildroot). Presumably, these symbols are not produced by earlier GCC versions, or perhaps not by the compilers employed in the MIPS porting exercise.
I did see if I could configure GCC to not use these symbols at all, given that there are tests in the code incorporated into executables (libgcc/crtstuff.c) for the presence of such symbols, but this was beyond me. So, I added the following to fixup.c:
void __deregister_frame_info(void); void __register_frame_info(void); void _ITM_deregisterTMCloneTable(void); void _ITM_registerTMCloneTable(void);
void __deregister_frame_info(void) {} void __register_frame_info(void) {} void _ITM_deregisterTMCloneTable(void) {} void _ITM_registerTMCloneTable(void) {}
Anyway, as a consequence, I got the shared-hello example to function and to write its output to fbterminal. Many thanks must go to Jean for suggesting things to look at over the course of a few mails.
It was, I suppose, interesting to go from a coprocessor error, related to the accidental execution of an ELF magic number word that happens to resemble a MIPS floating point instruction, through the different routines related to library loading and initialisation in an attempt to identify some mystery code (from crtstuff.c), examining global offset table entries and their potential misinterpretation, before arriving at such a simple solution.
Paul
l4-hackers@os.inf.tu-dresden.de