Hi,
I'm working to implement the Linux Oprofile functionality on L4/Fiasco environment. OProfile configures the performance counter to raise a NMI whenever the counter overflows. I modified the NMI handling codes in handle_slow_trap() to collect samples but I encountered an strange system behavior, REBOOTING without any information. If there is no user application activity, the system looks fine. Otherwise, the system reboots even if a very simple application like hello runs and the required time before reboot varies. I don't know what causes this behavior. IMHO, it seems a kind of triple fault. Is there any one give me an advice?
Thanks in advance. -- Jugwan Eom
On Wed, 15 Nov 2006 15:25:11 +0900 Jugwan Eom (JE) wrote:
JE> I'm working to implement the Linux Oprofile functionality on L4/Fiasco JE> environment. JE> OProfile configures the performance counter to raise a NMI whenever the JE> counter overflows. I modified the NMI handling codes in JE> handle_slow_trap() to collect samples but I encountered an strange JE> system behavior, REBOOTING without any information. If there is no user JE> application activity, the system looks fine. Otherwise, the system JE> reboots even if a very simple application like hello runs and the JE> required time before reboot varies. JE> I don't know what causes this behavior. IMHO, it seems a kind of triple JE> fault. Is there any one give me an advice?
It's rather unlikely that anyone will be able to help, unless you make your code and configuration available.
- Udo
Udo A. Steinberg 쓴 글:
It's rather unlikely that anyone will be able to help, unless you make your code and configuration available.
Ok, Udo. I'd like to show small patches instead of showing my code but I think it's operation is the same with my code and it reproduced the same result. ------------------------------------------------------------------------------------------------------------------- Index: src/kern/shared/thread-ia32-ux.cpp =================================================================== RCS file: /home/remote-cvs/l4/kernel/fiasco/src/kern/shared/thread-ia32-ux.cpp,v retrieving revision 1.138 diff -u -r1.138 thread-ia32-ux.cpp --- src/kern/shared/thread-ia32-ux.cpp 7 Nov 2006 18:34:03 -0000 1.138 +++ src/kern/shared/thread-ia32-ux.cpp 15 Nov 2006 11:04:33 -0000 @@ -40,6 +40,7 @@ #include "timer.h" #include "trap_state.h" #include "vmem_alloc.h" +#include "watchdog.h"
#ifdef CONFIG_KDB extern unsigned gdb_trap_recover; // in gdb_trap.c @@ -294,6 +295,15 @@ if (!check_trap13_kernel (ts, from_user)) return 0;
+ if (ts->_trapno == 2) { // NMI +#if 0 + printf ("%d: CS=%lx %x.%02x IP="L4_PTR_FMT" Trap=%02lx \n", + __LINE__, ts->cs(), d_taskno(), d_threadno(), ts->ip (), ts->_trapno); +#endif + Watchdog::enable(); + goto success; + } + if (EXPECT_FALSE (!from_user)) { // small space faults can be raised in kernel mode, too (long IPC)
Index: src/kern/shared/thread-ia32-amd64.cpp =================================================================== RCS file: /home/remote-cvs/l4/kernel/fiasco/src/kern/shared/thread-ia32-amd64.cpp,v retrieving revision 1.14 diff -u -r1.14 thread-ia32-amd64.cpp --- src/kern/shared/thread-ia32-amd64.cpp 23 Oct 2006 12:12:38 -0000 1.14 +++ src/kern/shared/thread-ia32-amd64.cpp 15 Nov 2006 11:04:23 -0000 @@ -104,12 +104,13 @@ if(Kconsole::console()->char_avail()==1) kdb_ke("SERIAL_ESC"); } - +#if 0 if (Config::watchdog) { // tell doggy that we are alive Watchdog::touch(); } +#endif }
IMPLEMENTATION[ia32]:
Index: src/kern/shared/perf_cnt-ia32-ux.cpp =================================================================== RCS file: /home/remote-cvs/l4/kernel/fiasco/src/kern/shared/perf_cnt-ia32-ux.cpp,v retrieving revision 1.15 diff -u -r1.15 perf_cnt-ia32-ux.cpp --- src/kern/shared/perf_cnt-ia32-ux.cpp 21 Dec 2005 17:38:28 -0000 1.15 +++ src/kern/shared/perf_cnt-ia32-ux.cpp 15 Nov 2006 11:04:00 -0000 @@ -749,13 +749,13 @@ // is 0x7ffffffff. The 31st bit is extracted to the bits 32-39 (see // "IA-32 Intel Architecture Software Developer's Manual. Volume 3: // Programming Guide" section 14.10.2: PerfCtr0 and PerfCtr1 MSRs. - if (hold_watchdog > 0x7fffffff) - hold_watchdog = 0x7fffffff; + //if (hold_watchdog > 0x7fffffff) + hold_watchdog = 3000; // oprofile: P4 GLOBAL_POWER_EVENTS default hold_watchdog = -hold_watchdog; init_watchdog(); touch_watchdog(); - start_watchdog(); - start_pmc(pmc_watchdog); + //start_watchdog(); + //start_pmc(pmc_watchdog); } }
--------------------------------------------------------------------------------------------------------------------- GRUB Menulist
kernel (nd)/l4/bin/bootstrap -serial modaddr 0x02000000 module (nd)/l4/fiasco/fiasco -nokdb -serial -serial_esc -comport 1 -watchdog module (nd)/l4/bin/sigma0 module (nd)/l4/bin/roottask module (nd)/l4/bin/names module (nd)/l4/bin/log --prio 0xA1 module (nd)/l4/bin/dm_phys module (nd)/l4/bin/enable_watchdog module (nd)/l4/bin/hello
The watchdog is enabled by enable_watchdog via fiasco_watchdog_enable() in sys/kdebug.h. The kernel configuration file is attached. Is this enough that someone helps me? :) Please, give me any advice.
Regards, -- Jugwan Eom
# # Automatically generated, don't edit # # Generated on: ubuntu # At: Wed, 15 Nov 2006 11:29:44 +0000 # Linux version 2.6.17-10-generic (root@vernadsky) (gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)) #2 SMP Fri Oct 13 18:45:35 UTC 2006 (Ubuntu 2.6.17-10.33-generic)
# # Main menu # CONFIG_EXPERIMENTAL=n
# # Target System Options #
# # Target Platform # CONFIG_PF_PC=y CONFIG_PF_UX=n CONFIG_PF_SA1100=n CONFIG_PF_XSCALE=n CONFIG_PF_ISG=n CONFIG_PF_INTEGRATOR=n CONFIG_PF_REALVIEW=n
# # Target CPU Family # CONFIG_IA32=y CONFIG_ARM=n CONFIG_AMD64=n
# # Target processor # CONFIG_IA32_486=n CONFIG_IA32_586=y CONFIG_IA32_686=n CONFIG_IA32_P2=n CONFIG_IA32_P3=n CONFIG_IA32_P4=n CONFIG_IA32_PM=n CONFIG_IA32_K6=n CONFIG_IA32_K7=n CONFIG_IA32_K8=n
CONFIG_REGPARM3=y CONFIG_WORKAROUND_AMD_FPU_LEAK=n
# # Kernel Options #
# # Kernel ABI Version # CONFIG_ABI_V2=y CONFIG_ABI_X0=n
# # ABI Extensions # CONFIG_DECEIT_BIT_DISABLES_SWITCH=y CONFIG_TASK_CAPS=n
CONFIG_SMALL_SPACES=n CONFIG_CONTEXT_4K=n
# # Scheduling Timer # CONFIG_SCHED_PIT=n CONFIG_SCHED_RTC=n CONFIG_SCHED_APIC=y
CONFIG_ONE_SHOT=n CONFIG_SYNC_TSC=n CONFIG_FINE_GRAINED_CPUTIME=n CONFIG_IO_PROT=n CONFIG_UX_CON=n CONFIG_UX_NET=n
# # Kernel Debugging # CONFIG_INLINE=y CONFIG_NDEBUG=n CONFIG_PROFILE=n CONFIG_NO_FRAME_PTR=n CONFIG_STACK_DEPTH=n CONFIG_LIST_ALLOC_SANITY=n CONFIG_BEFORE_IRET_SANITY=n CONFIG_GSTABS=y CONFIG_POWERSAVE_GETCHAR=y CONFIG_SERIAL=y CONFIG_KDB=n CONFIG_JDB=y CONFIG_JDB_LOGGING=n CONFIG_JDB_ACCOUNTING=n CONFIG_JDB_MISC=n CONFIG_WATCHDOG=y
# # Runtime warning level # CONFIG_WARN_NONE=n CONFIG_WARN_ANY=y
# # Compiling and Building # CONFIG_CC="gcc-3.4" CONFIG_CXX="g++-3.4" CONFIG_HOST_CXX="g++-3.4" CONFIG_VERBOSE=n CONFIG_MAINTAINER_MODE=y
# # Derived symbols # CONFIG_BIT32=y CONFIG_XARCH="ia32" CONFIG_IA32_TARGET="Intel Pentium" CONFIG_WARN_LEVEL=2 CONFIG_BIT64=n CONFIG_PERF_CNT=y CONFIG_ABI="v2" # # That's all, folks!
On Wed, 15 Nov 2006 20:34:56 +0900 Jugwan Eom (JE) wrote:
JE> > Ok, Udo. I'd like to show small patches instead of showing my code but JE> > I think it's operation is the same with my code and it reproduced the JE> > same result.
Your mailer has munged the patch such that it cannot be applied. A pristine patch does not have any characters other than + or - or space in the first row. Yours has the leading space for unchanged lines removed.
If your mailer cannot be fixed, you could gzip your patch and attach it.
- Udo
On Wed, 15 Nov 2006 20:34:56 +0900 Jugwan Eom (JE) wrote:
JE> > I think it's operation is the same with my code and it reproduced the JE> > same result.
JE> + if (ts->_trapno == 2) { // NMI JE> +#if 0 JE> + printf ("%d: CS=%lx %x.%02x IP="L4_PTR_FMT" Trap=%02lx \n", JE> + __LINE__, ts->cs(), d_taskno(), d_threadno(), ts->ip (), ts->_trapno); JE> +#endif JE> + Watchdog::enable(); JE> + goto success; JE> + }
Why do you have a call to Watchdog::enable() in there?
- Udo
Udo A. Steinberg 쓴 글:
On Wed, 15 Nov 2006 20:34:56 +0900 Jugwan Eom (JE) wrote:
JE> > I think it's operation is the same with my code and it reproduced the JE> > same result.
JE> + if (ts->_trapno == 2) { // NMI JE> +#if 0 JE> + printf ("%d: CS=%lx %x.%02x IP="L4_PTR_FMT" Trap=%02lx \n", JE> + __LINE__, ts->cs(), d_taskno(), d_threadno(), ts->ip (), ts->_trapno); JE> +#endif JE> + Watchdog::enable(); JE> + goto success; JE> + }
Why do you have a call to Watchdog::enable() in there?
Each time a NMI is raised, a sample that includes the instruction pointer, cpu mode (kernel/user), event type (in this case, GLOBAL_POWER_EVENTS) and etc. is collected before Watchdog::enable(). And Watchdog::enable () resets the counter values (Perf_cnt::touch_watchdog()) and clear the overflow flag (Perf_cnt::start_watchdog()) and eventually restarts a performance counter for the next sample.
Regards, -- Jugwan Eom
Hi,
Is there any information about this behavior? Udo, could you reproduce it?
Regards, -- Jugwan Eom
l4-hackers@os.inf.tu-dresden.de