Hello,
I've been trying to coerce L4Re and Fiasco.OC to work with the Ben NanoNote, which is related to the MIPS Creator CI20 that is (mostly - see earlier discussions) supported by this software. There are a few challenges involved, such as avoiding MIPS32r2 instructions that the NanoNote's SoC doesn't support (JZ4740 in the NanoNote versus JZ4780 in the CI20), and positioning things appropriately to avoid upsetting the installed bootloader.
Since I last posted anything about this, I've managed to get Fiasco to go about its bootstrapping business, proceeding from the bootstrap package, entering the kernel, running kernel_main, doing various initialisation tasks, enabling interrupts, and ending up in the init_workload method where a sigma0 thread and a boot thread are created and activated.
It is at this point that I seem to have encountered a particularly stubborn problem: how to get the sigma0 thread to activate and return control to init_workload for the boot thread to be activated, then going on to finish what little remains of the bootstrapping process. Strangely, the boot thread has no difficulty in being activated and returning control if I put it first in the sequence, but sigma0 seems to cause something different to happen.
Now, for this activation, it seems that the real action begins in Context::switch_cpu, where an exchange of stack pointers occurs and a branch is made to the Context::switchin_context method, with the old context presumably being made to resume after the branchpoint. As far as I can tell, both threads manage to set this up correctly. I see that the new context then causes Thread::user_invoke to be called, with execution proceeding via the ret_from_user_invoke routine and ultimately into the user task. Looking at the address to which the CPU will "return" to in user mode, all seems reasonable and consistent with the configuration of sigma0.
However, one thing that baffles me somewhat is the way that interrupts are disabled in Thread::user_invoke. Given that the CPU ends up in the task in user mode, it would surely need some kind of exception or interrupt for the kernel to be re-entered, yet I don't see any operation that re-enables interrupts anywhere. Maybe this is the cause of my problem, but I don't understand how sigma0 would be different from anything else.
I have needed to change some CPU-level operations to adapt the code to the earlier microarchitecture revision, but I don't think I've made any obvious mistakes, and I have seemingly figured out how to describe the interrupt system because for a while I couldn't get past the Delay::init invocation in the bootstrap process which relies on interrupts working. (Some comments in the code would have helped me guess what numbers to use in certain invocations.)
Does anyone have any thoughts or guidance about what I should be looking at?
Thanks in advance and sorry for the wall of text!
Paul
Hi Paul,
On Sun Mar 04, 2018 at 22:25:12 +0100, Paul Boddie wrote:
I've been trying to coerce L4Re and Fiasco.OC to work with the Ben NanoNote, which is related to the MIPS Creator CI20 that is (mostly - see earlier discussions) supported by this software. There are a few challenges involved, such as avoiding MIPS32r2 instructions that the NanoNote's SoC doesn't support (JZ4740 in the NanoNote versus JZ4780 in the CI20), and positioning things appropriately to avoid upsetting the installed bootloader.
Since I last posted anything about this, I've managed to get Fiasco to go about its bootstrapping business, proceeding from the bootstrap package, entering the kernel, running kernel_main, doing various initialisation tasks, enabling interrupts, and ending up in the init_workload method where a sigma0 thread and a boot thread are created and activated.
It is at this point that I seem to have encountered a particularly stubborn problem: how to get the sigma0 thread to activate and return control to init_workload for the boot thread to be activated, then going on to finish what little remains of the bootstrapping process. Strangely, the boot thread has no difficulty in being activated and returning control if I put it first in the sequence, but sigma0 seems to cause something different to happen.
Now, for this activation, it seems that the real action begins in Context::switch_cpu, where an exchange of stack pointers occurs and a branch is made to the Context::switchin_context method, with the old context presumably being made to resume after the branchpoint. As far as I can tell, both threads manage to set this up correctly. I see that the new context then causes Thread::user_invoke to be called, with execution proceeding via the ret_from_user_invoke routine and ultimately into the user task. Looking at the address to which the CPU will "return" to in user mode, all seems reasonable and consistent with the configuration of sigma0.
However, one thing that baffles me somewhat is the way that interrupts are disabled in Thread::user_invoke. Given that the CPU ends up in the task in user mode, it would surely need some kind of exception or interrupt for the kernel to be re-entered, yet I don't see any operation that re-enables interrupts anywhere. Maybe this is the cause of my problem, but I don't understand how sigma0 would be different from anything else.
I have needed to change some CPU-level operations to adapt the code to the earlier microarchitecture revision, but I don't think I've made any obvious mistakes, and I have seemingly figured out how to describe the interrupt system because for a while I couldn't get past the Delay::init invocation in the bootstrap process which relies on interrupts working. (Some comments in the code would have helped me guess what numbers to use in certain invocations.)
Does anyone have any thoughts or guidance about what I should be looking at?
All what you write sounds good. In any case the eret must restore state including setting the right interrupt state. Are you getting timer interrupts when sigma0 shall run, or is there silence? Is ESC working to get into jdb?
Adam
On Tuesday 6. March 2018 00.46.29 Adam Lackorzynski wrote:
All what you write sounds good. In any case the eret must restore state including setting the right interrupt state. Are you getting timer interrupts when sigma0 shall run, or is there silence? Is ESC working to get into jdb?
Thanks for the reply as usual! :-)
After Proc::cli is called in user_invoke, I don't think any interrupts will be delivered, and if I display the status register, the IE (interrupt enable) bit is indeed not set. So I wouldn't expect any timer interrupts unless something else enables interrupts again, but I can't find any statement where this gets done.
Here, I think that I *might* have transcribed some operation incorrectly, leaving interrupts disabled when they should be re-enabled. The eret shouldn't itself re-enable interrupts, as far as I remember from messing around with my own boot payloads, since it merely clears the EXL (exception level) bit which prevents interrupts if set (and then jumps to EPC, of course).
(Thinking about it, EXL isn't even set when I check the status register, but if allowing interrupts in kernel mode, it is customary to clear it, from what I have read, so maybe Fiasco does that.)
Now, I have transcribed the di instruction to the supposedly-equivalent status register operations that clear IE, and the ei instruction to the operations that set IE, both of these featuring in the Proc::cli and Proc::sti methods. Maybe these instructions should be transcribed to set and clear EXL, however, even though that is not what di and ei do.
As for jdb and UART interactions, I've had to use more primitive techniques because I can't establish a reliable physical connection to the relevant pins. Fortunately, I can take over the framebuffer and display simple bit patterns (to keep debugging code at a minimum), and this is how I can comment on things like the status register. Yes, it is a slow and tedious way of working, but I've used it successfully before. :-)
Do you have any idea where this missing re-enabling statement might be, or should I really be manipulating EXL instead of IE?
Thanks once again for indulging me!
Paul
On Tue Mar 06, 2018 at 01:14:25 +0100, Paul Boddie wrote:
On Tuesday 6. March 2018 00.46.29 Adam Lackorzynski wrote:
All what you write sounds good. In any case the eret must restore state including setting the right interrupt state. Are you getting timer interrupts when sigma0 shall run, or is there silence? Is ESC working to get into jdb?
Thanks for the reply as usual! :-)
After Proc::cli is called in user_invoke, I don't think any interrupts will be delivered, and if I display the status register, the IE (interrupt enable) bit is indeed not set. So I wouldn't expect any timer interrupts unless something else enables interrupts again, but I can't find any statement where this gets done.
Here, I think that I *might* have transcribed some operation incorrectly, leaving interrupts disabled when they should be re-enabled. The eret shouldn't itself re-enable interrupts, as far as I remember from messing around with my own boot payloads, since it merely clears the EXL (exception level) bit which prevents interrupts if set (and then jumps to EPC, of course).
(Thinking about it, EXL isn't even set when I check the status register, but if allowing interrupts in kernel mode, it is customary to clear it, from what I have read, so maybe Fiasco does that.)
Now, I have transcribed the di instruction to the supposedly-equivalent status register operations that clear IE, and the ei instruction to the operations that set IE, both of these featuring in the Proc::cli and Proc::sti methods. Maybe these instructions should be transcribed to set and clear EXL, however, even though that is not what di and ei do.
As for jdb and UART interactions, I've had to use more primitive techniques because I can't establish a reliable physical connection to the relevant pins. Fortunately, I can take over the framebuffer and display simple bit patterns (to keep debugging code at a minimum), and this is how I can comment on things like the status register. Yes, it is a slow and tedious way of working, but I've used it successfully before. :-)
Do you have any idea where this missing re-enabling statement might be, or should I really be manipulating EXL instead of IE?
The asm code sets cp0_status upon exit which includes enabling interrupts. Are you sure you're not getting any timer interrupts when supposedly running inside sigma0? (Flipping some pixels in the timer handler...)
Adam
On Wednesday 7. March 2018 00.27.34 Adam Lackorzynski wrote:
The asm code sets cp0_status upon exit which includes enabling interrupts. Are you sure you're not getting any timer interrupts when supposedly running inside sigma0? (Flipping some pixels in the timer handler...)
You beat to me a reply! What I was writing just now was that I found the place where IE gets set by searching for EXL and (re)discovering the Cp0_status::status_eret_to_user_ei method, which provides the appropriate value for the status register, incorporating UM (KSU=1), EXL and IE.
This value gets stored in the copy of the register for the thread, and then in the assembly language routine containing eret, it gets transferred to the actual status register by the instructions in the restore_cp0_status macro.
This contradicts what I wrote earlier because I had tested the status register before the restore_cp0_status macro, not realising that it might set IE. At that point, only IM2 is set (indicating which interrupt source should be enabled). Sorry for the inadvertent misdirection!
So, the conditions for returning to user mode seem to be present together with the conditions for subsequent interrupts, and for re-entering the kernel on timer interrupts, but somehow the activation of the sigma0 thread doesn't succeed.
Currently, I have reason to believe that an exception occurs causing the sigma0 thread to terminate, but it's getting late and my debugging efficiency is suffering. I think that when the thread terminates, it has the following cause register flags set:
ExcCode = 0b01101 (= 11, coprocessor unusable) IP2 = 1 CE = 0b01
The error exception program counter seems to be given as 0x80210000, which doesn't sound consistent with a user mode address, but perhaps the kernel is using that register for something else.
So maybe there's some FPU stuff that I haven't managed to eradicate in the L4Re code.
Paul
On Wednesday 7. March 2018 01.22.46 Paul Boddie wrote:
Currently, I have reason to believe that an exception occurs causing the sigma0 thread to terminate, but it's getting late and my debugging efficiency is suffering. I think that when the thread terminates, it has the following cause register flags set:
ExcCode = 0b01101 (= 11, coprocessor unusable) IP2 = 1 CE = 0b01
The error exception program counter seems to be given as 0x80210000, which doesn't sound consistent with a user mode address, but perhaps the kernel is using that register for something else.
So maybe there's some FPU stuff that I haven't managed to eradicate in the L4Re code.
None of this appeared to be accurate, probably because I was reading directly from the cause register when I should have been reading the saved version. Anyway, I decided to make the kernel stop upon receiving the first thread- halting condition, and this yielded the following details:
* Stored cause has ExcCode 4 (address error, load/instruction fetch) * Stored exception program counter is 0x2092ac * Stored bad virtual address is 0xfff2aff4
In sigma0, I found the following rather interesting, produced using objdump with some comments added by me:
209290: 3c02fff3 lui v0,0xfff3 209294: 24422000 addiu v0,v0,8192 // 0xfff32000 209298: afc2011c sw v0,284(s8) 20929c: 7c02e83b 0x7c02e83b // rdhwr v0, $29 (ULR) 2092a0: afc20118 sw v0,280(s8) 2092a4: 8fc20118 lw v0,280(s8) 2092a8: 8fc30158 lw v1,344(s8) 2092ac: 8c508ff4 lw s0,-28684(v0) // 0xfff2aff4
Evidently, v0 was not getting updated by the rdhwr instruction, which should be handled as a reserved instruction. Thus, the previous value of v0 was being combined with the offset in the final instruction above, causing a read from a completely invalid address that happens to feature in the saved bad virtual address register.
So, I had a look at my code and discovered that I had indeed transcribed an operation incorrectly: it was related to implementing the ins instruction for this SoC; I had employed the wrong temporary register and was losing the original instruction details when attempting to get the target register.
At this point, I think Fiasco now starts up. However, I now need to figure out how to perform the necessary operations to reinitialise the framebuffer and do interesting things like provide feedback on what is actually going on in my example programs. I hope it gets a bit easier from this point, though. :-)
Paul
On Thu Mar 08, 2018 at 00:52:55 +0100, Paul Boddie wrote:
On Wednesday 7. March 2018 01.22.46 Paul Boddie wrote:
Currently, I have reason to believe that an exception occurs causing the sigma0 thread to terminate, but it's getting late and my debugging efficiency is suffering. I think that when the thread terminates, it has the following cause register flags set:
ExcCode = 0b01101 (= 11, coprocessor unusable) IP2 = 1 CE = 0b01
The error exception program counter seems to be given as 0x80210000, which doesn't sound consistent with a user mode address, but perhaps the kernel is using that register for something else.
So maybe there's some FPU stuff that I haven't managed to eradicate in the L4Re code.
None of this appeared to be accurate, probably because I was reading directly from the cause register when I should have been reading the saved version. Anyway, I decided to make the kernel stop upon receiving the first thread- halting condition, and this yielded the following details:
- Stored cause has ExcCode 4 (address error, load/instruction fetch)
- Stored exception program counter is 0x2092ac
- Stored bad virtual address is 0xfff2aff4
In sigma0, I found the following rather interesting, produced using objdump with some comments added by me:
209290: 3c02fff3 lui v0,0xfff3 209294: 24422000 addiu v0,v0,8192 // 0xfff32000 209298: afc2011c sw v0,284(s8) 20929c: 7c02e83b 0x7c02e83b // rdhwr v0, $29 (ULR) 2092a0: afc20118 sw v0,280(s8) 2092a4: 8fc20118 lw v0,280(s8) 2092a8: 8fc30158 lw v1,344(s8) 2092ac: 8c508ff4 lw s0,-28684(v0) // 0xfff2aff4
Evidently, v0 was not getting updated by the rdhwr instruction, which should be handled as a reserved instruction. Thus, the previous value of v0 was being combined with the offset in the final instruction above, causing a read from a completely invalid address that happens to feature in the saved bad virtual address register.
So, I had a look at my code and discovered that I had indeed transcribed an operation incorrectly: it was related to implementing the ins instruction for this SoC; I had employed the wrong temporary register and was losing the original instruction details when attempting to get the target register.
At this point, I think Fiasco now starts up. However, I now need to figure out how to perform the necessary operations to reinitialise the framebuffer and do interesting things like provide feedback on what is actually going on in my example programs. I hope it gets a bit easier from this point, though. :-)
Good!
Adam
l4-hackers@os.inf.tu-dresden.de