[Cristiano Ligieri Pereira]
thanks for answering. I'm really tight on my schedule to try such major change. I guess I will have to just forget about it for now.
How was it done in the previous version of l4? The one written in assembly. And does the assembly version (hazelnut?) work fine on pentium 4 machines?
the problem is that Fiasco seems to be performing pretty bad on the same benchmarsk presented in the paper "The performance of micro-kernel Based Systems" and I'm wondering why. My first guess was the trampoline mechanism. I using a Pentium 4 1.3Ghz machine and was expecting a smoother performance degradation (compared to the paper) even though executing a different version of l4 (fiasco instead of the assembly version
Of course(!) Hazelnut works fine with Pentium 4 machines. Even though our (the group in Karlsruhe) focus is now on developing the Pistachio kernel, the Hazelnut kernel is definitely not something to frown upon. It is more than stable enough to do a decent job (at least for benchmarking puposes). We've even been running our web server on top of L4Linux/Hazelnut for several months.
The Dresden people will probably kill me for saying so :-), but if you want to do performance measurements (except for measurments concerning real-time workloads, e.g., interrupt latency), you should definitely go for the Hazelnut kernel.
I can't seem to remember what the performance numbers for the original asm kernel is on the Pentium 4, but for Pentium III the Hazelnut kernel was actually performing slightly better than the original asm kernel. (This only goes for pure IPC times. I don't have other metrics like cache footprint, etc., at hand.) The reason why Hazelnut was performing better was that it was optimized with the newer Pentium III (and Pentium 4) chips in mind, whereas the asm kernel had not been updated to take advantage of the newer CPU architectures. You can therefore expect the difference to be even larger for Pentium 4.
Another reason why you might want to use Hazelnut for performance measurements is that it supports small spaces (emulation of tagged TLBs). The impertance of having small spaces has increased with the years since the TLBs have gotten larger, hence impose a larger indirect penalty when they are flushed; and the TLB miss penalty has gotten higher due to longer pipelines and a greater disparity between CPU speed and memory access speed. In addition, the Pentium 4 also contains a 12K u-ops Trace Cache (i.e., an instruction cache) which is virtually tagged and therefore flushed on cr3 reloads (address space switches). Having small spaces also avoids such trace cache flushes. As an example of the benefit of small spaces, an L4Linux getpid() syscall is 70% slower on a kernel without small spaces compared to a kernel with small spaces enabled [1].
When compiling Hazelnut to run you benchmarks remember to turn of tracepoints, IPC tracing, spin wheels, etc., and make sure that the FastPath IPC and small spaces are enabled. This can all be done using make (x)config.
eSk
[1] http://i30www.ira.uka.de/research/documents/l4ka/smallspaces.pdf