Hi, Just I could boot L4-Linux and execute "ls" etc on 486, without rebooting. At present, it seems that Fiasco must be recent version, but the version of L4-Linux does not matter. Both of the previous snapshot (L4-Linux 2.0.21) and the anonymous CVS version (linux22 which I checked out July/26) are working. I remember, after the announcement of linux22 availability via CVS, I experienced several 486 rebooting caused by "ls". So I will continue to search the critical point.
After several trying of remote debug, I think "kernel"-debugger is not so much powerful to detect how L4-Linux "server" breaks down. It's not easy for me to understand what happens in the server program from the system calls. Therefore, now I'm looking for a remote-debug-stub for server program on Fiasco. The document of the OSKit tells that the debug-stub in the OSKit is easy to re-use for user-spaced programs, I will try. But there's any working debug-stub?
suzuki
P.S. By the way, wait_for_keypress() (linux22/drivers/char/tty_io.c) of the latest linux22 works well? It is called when mounting the root filesystem on floppy, or on ramdisk loaded via floppy, like:
printk(KERN_NOTICE "VFS: Insert root floppy and press ENTER\n"); wait_for_keypress();
(from linux/fs/super.c). In my testing, the latest linux22 is killed when it calls wait_for_keypress(). But the previous snapshot of L4-Linux (2.0.21) safely passes wait_for_keypress(). The message from the debugger (when linux22 is killed) is like this:
grub> kernel=(fd0)/rmgr -nopentium -configfile -sigma0 [Multiboot-elf, <0x100000:0x20320:0x0>,<0x121320:0x418:0x26dac>,entry=0x100000] grub> module=(fd0)/main -nokdb -nojdb [Multiboot-module @ 0x149000, 0x24200 bytes] grub> module=(fd0)/sigma0 [Multiboot-module @ 0x16e000, 0xcb26 bytes] grub> module=(fd0)/rmgr.cfg [Multiboot-module @ 0x17b000, 0x56d bytes] grub> module=(fd0)/glinux.gz init=/bin/sh root=/dev/fd1 [Multiboot-module @ 0x17c000, 0xe428 bytes]
RMGR: loading task (fd0)/glinux.gz init=/bin/sh root=/dev/fd1 from 0x17c000-0x260238 to [ 0x3ff000-0x4b2170 0x4b4000-0x4fe034 ] RMGR: starting task (fd0)/glinux.gz init=/bin/sh root=/dev/fd1 from 0x17c000-0x260238 at entry 0x3ff000 via trampoline page code 0x26114c
[...snip...]
VFS: Insert root floppy and press ENTER Dump of trap_state at 0xc0141fb4: EAX 00000000 EBX 00000001 ECX 004e1646 EDX 00000001 ESI 00000000 EDI 00000000 EBP 00000000 ESP 00d05f5c EIP 00000003 EFLAGS 00013a92 CS 0023 SS 002b DS 002b FS 002b GS 002b trapno 6, error 00000000, from user mode
Just I could boot L4-Linux and execute "ls" etc on 486, without rebooting.
L4-Linux is running quite nicely for me now, but I can still crash it by doing cksum /usr/bin/* over CVS. It dies after saying "irq still active". (To get that message I'm using the patch that Michael Hohmuth posted to the list on July 25.)
I think I may have found a bug in the SANITY code in linux22/arch/l4-i386/kernel/irq.c, by the way. See below.
After several trying of remote debug, I think "kernel"-debugger is not so much powerful to detect how L4-Linux "server" breaks down. It's not easy for me to understand what happens in the server program from the system calls.
What I have been doing is telling GDB "symbol-file .../vmlinux" to inspect what's happening in Linux, then "symbol-file .../kernel.image" to go back to inspecting Fiasco. There's probably some way of defining macros to make that switching a bit easier. I don't know if GDB could use both sets of symbols at the same time.
It took me ages to discover how to find out the current location (program counter) of a given thread. (If you've got thread_t *t, the thread is running at *(void *)((char *)t+0x7ec), IIRC.)
If there's one good thing to come out of my playing around with L4-Linux, I've learnt a bit about GDB ...
(This is a bit off topic, but does anyone know a better solution to this one? I tell GDB "p spong" and get a value that is totally wrong. This is because the variable has been optimised into a register. So to find out its value I use "disassemble" and "info registers". Is there a quicker way?)
Edmund
This is what I did to linux22/arch/l4-i386/kernel/irq.c:
In timer_irq_thread(), I moved
#ifdef SANITY irqs_in_progress &= ~(1 << TIMER_IRQ); #endif
from its present position just before the call to l4_i386_ipc_receive() to just before the call to execute_bottom_halves(). This got rid of the "irq active" messages which I think are spurious.
Sorry for the lack of patch but I've rather lost control of the versions of that file ...
1000 Thanks to Edmund!
It dies after saying "irq still active". (To get that message I'm using the patch that Michael Hohmuth posted to the list on July 25.) I think I may have found a bug in the SANITY code in linux22/arch/l4-i386/kernel/irq.c, by the way. See below.
I remember, I've ever seen same message & following crash, thanks for the patch.
What I have been doing is telling GDB "symbol-file .../vmlinux" to inspect what's happening in Linux, then "symbol-file .../kernel.image" to go back to inspecting Fiasco. There's probably some way of defining macros to make that switching a bit easier.
Ahh, Thanks! Following your instruction, I could trace L4-Linux. I see: L4-Linux is loaded onto real wired memory without virtualization, thus, the kernel-debugger can watch and fix the text/data of L4-Linux directly.
suzuki
It dies after saying "irq still active". (To get that message I'm using the patch that Michael Hohmuth posted to the list on July 25.) I think I may have found a bug in the SANITY code in linux22/arch/l4-i386/kernel/irq.c, by the way. See below.
I remember, I've ever seen same message & following crash, thanks for the patch.
Unfortunately I still don't understand why the irq is "still active".
Here's the corresponding bit of code in irq.c. (The lines which aren't indented as much as they should be came from Michael's debugging patch.)
mask = 1 << (irq & 7); if (irq < 8) { outb(inb(0x21) | mask, 0x21); /* block the irq */ outb(0x20, 0x20); /* acknowledge the irq */
outb(0x0B, 0x20); if (inb(0x20) & mask) enter_kdebug("irq still active"); } else { unsigned foo;
outb(inb(0xA1) | mask, 0xA1); /* block */ outb(0x20, 0xA0); /* acknowledge */ outb(0x0B, 0xA0); if ((foo = inb(0xA0)) == 0) outb(0x20, 0x20); if (foo & mask) enter_kdebug("irqslave still active");
}
If you look at the bottom of kernel/fiasco/src/irq.h you'll find an unused function irq_ack() which does roughly the same thing, because acknowleding the IRQ is something that ought to be done by L4/Fiasco, but is at present done by Linux.
So why do we get "irq still active" during heavy use of the network card? Is there a PIC expert in the house?
Is it possible that the problem is caused by the interrupt not being acknowledged quickly enough? If so, maybe I should move the ack from Linux into Fiasco ...
Edmund
So why do we get "irq still active" during heavy use of the network card? Is there a PIC expert in the house?
Is it possible that the problem is caused by the interrupt not being acknowledged quickly enough? If so, maybe I should move the ack from Linux into Fiasco ...
If you know, please let me know easy way to realize the bug concerning to the interrupt? I want to try, because I could not remember when I found such crashing.
However I'm testing Fiasco on old 486 PC and the ethernet card is 3c507 on ISA. I'm afraid that the ether device is too poor to overload the IRQ handler...
suzuki
are the interrupts constantly disabled from raising the interrupt until executing that code? (Unspecific eoi (0x20) always clears the highest irq. If an interrupt can happen between raising the original interrupt and executing the unspecific eoi, this might clear a higher irq instead of the currently handled irq. Consequences: arbitrary confusion: the higher irq is cleared too early, the next unspecific eoi clears a lower irq, ...)
Jochen
-----Original Message----- From: owner-l4-hackers@os.inf.tu-dresden.de [mailto:owner-l4-hackers@os.inf.tu-dresden.de]On Behalf Of edmundo@rano.demon.co.uk Sent: Sonntag, 1. August 1999 17:46 To: suzukis@file.phys.tohoku.ac.jp Cc: l4-hackers@os.inf.tu-dresden.de Subject: Re: L4-Linux worked on 486!
It dies after saying "irq still active". (To get that message I'm using the patch that Michael Hohmuth posted to the list on July 25.) I think I may have found a bug in the SANITY code in linux22/arch/l4-i386/kernel/irq.c, by the way. See below.
I remember, I've ever seen same message & following crash, thanks for the patch.
Unfortunately I still don't understand why the irq is "still active".
Here's the corresponding bit of code in irq.c. (The lines which aren't indented as much as they should be came from Michael's debugging patch.)
mask = 1 << (irq & 7); if (irq < 8) { outb(inb(0x21) | mask, 0x21); /* block the irq */ outb(0x20, 0x20); /* acknowledge the irq */
outb(0x0B, 0x20); if (inb(0x20) & mask) enter_kdebug("irq still active"); } else { unsigned foo;
outb(inb(0xA1) | mask, 0xA1); /* block */ outb(0x20, 0xA0); /* acknowledge */ outb(0x0B, 0xA0); if ((foo = inb(0xA0)) == 0)
outb(0x20, 0x20); if (foo & mask) enter_kdebug("irqslave still active");
}
If you look at the bottom of kernel/fiasco/src/irq.h you'll find an unused function irq_ack() which does roughly the same thing, because acknowleding the IRQ is something that ought to be done by L4/Fiasco, but is at present done by Linux.
So why do we get "irq still active" during heavy use of the network card? Is there a PIC expert in the house?
Is it possible that the problem is caused by the interrupt not being acknowledged quickly enough? If so, maybe I should move the ack from Linux into Fiasco ...
Edmund
Jochen Liedtke asked, and I think it's a crucial question:
are the interrupts constantly disabled from raising the interrupt until executing that code?
Looking at the code, I think the answer is yes. (There is, however, quite a lot of code between the interrupt happening and the non-specific EOI being issued, and there are lots of macros and conditional compilations, so I might be wrong. Is it possible to inspect the state of the interrupt-disable flag using remote gdb?)
In which case I no longer understand what Jean Wolter wrote about priority assignment to interrupt threads. If interrupts are constantly disabled from raising the interrupt until the non-specific EOI, how can the priorities matter? If the PIC can "deliver" an interrupt with higher priority while interrupts are disabled, and if the non-specific EOI then acknowledges the highest-priority interrupt that was delivered, as opposed to the hightest-priority interrupt that is being serviced, then we are always in trouble if we issue a non-specific EOI with interupts disabled. In this case, should we reenable interrupts before doing the EOI?
But I can't think why the PIC should have been designed this way.
Also Jochen Liedtke suggests the problem might be "caused" by interrupts not being disabled all the way until the EOI:
(Unspecific eoi (0x20) always clears the highest irq. If an interrupt can happen between raising the original interrupt and executing the unspecific eoi, this might clear a higher irq instead of the currently handled irq. Consequences: arbitrary confusion: the higher irq is cleared too early, the next unspecific eoi clears a lower irq, ...)
To reconcile this, I am lead to conclude that my inspection of the code is mistaken, interrupts are not disabled constantly from raising the interrupt until issuing the EOI, and this is deliberately so, but somehow we are ending up in the wrong interrupt thread after a higher-priority interrupts interrupts a lower-priority one.
I'm really out of my depth here. Can anyone recommend a good book, or other source of information, about x86 and PIC programming? (I don't need an introduction to assembler programming, as I did a lot of 6502 and 8080 as a child, but I don't know much about x86 specifics or about interrupt details.)
Edmund
Before I'm understanding that the problem is not about position independent code but about Intel 8259 Priority Interrupt Control Unit (in a few books, it is written as ICU), it seems that the discussion is much progressed :-).
I'm really out of my depth here. Can anyone recommend a good book, or other source of information, about x86 and PIC programming?
When I was ever trying to run Fiasco on BOCHS emulator, I was refering...
"Operating Systems: Design and Implementation" by Andrew S. Tanenbaum and Albert S. Woodhull I have the 2nd edition translated in Japanese. In the section 2.6.7, there is the comments on the implementation of MINIX.
"The Basic Kernel Source Code Secrets" by William F. Jolitz and Lynne G. Jolitz Also I have is Japanese edition. In the section 2.6, there is the comments on the implementation on 386BSD. (it said that 8259 cannot rotate the priority...?)
and a few web pages
http://www.brl.ntt.co.jp/people/takehiko/interrupt/PORTS.LST.txt
Sorry, at present, my 3c507 is very very poor and I could not overload to realize the problem. I will exchange it with any faster ethernet card.
suzuki
l4-hackers@os.inf.tu-dresden.de