enabling/disabling trampoline

Espen Skoglund esk at ira.uka.de
Thu Mar 13 11:33:09 CET 2003

[Cristiano Ligieri Pereira]
> thanks for answering. I'm really tight on my schedule to try such
> major change. I guess I will have to just forget about it for now.

> How was it done in the previous version of l4? The one written in
> assembly. And does the assembly version (hazelnut?) work fine on
> pentium 4 machines?

> the problem is that Fiasco seems to be performing pretty bad on the
> same benchmarsk presented in the paper "The performance of
> micro-kernel Based Systems" and I'm wondering why. My first guess
> was the trampoline mechanism. I using a Pentium 4 1.3Ghz machine and
> was expecting a smoother performance degradation (compared to the
> paper) even though executing a different version of l4 (fiasco
> instead of the assembly version

Of course(!) Hazelnut works fine with Pentium 4 machines.  Even though
our (the group in Karlsruhe) focus is now on developing the Pistachio
kernel, the Hazelnut kernel is definitely not something to frown upon.
It is more than stable enough to do a decent job (at least for
benchmarking puposes).  We've even been running our web server on top
of L4Linux/Hazelnut for several months.

The Dresden people will probably kill me for saying so :-), but if you
want to do performance measurements (except for measurments concerning
real-time workloads, e.g., interrupt latency), you should definitely
go for the Hazelnut kernel.

I can't seem to remember what the performance numbers for the original
asm kernel is on the Pentium 4, but for Pentium III the Hazelnut
kernel was actually performing slightly better than the original asm
kernel.  (This only goes for pure IPC times.  I don't have other
metrics like cache footprint, etc., at hand.)  The reason why Hazelnut
was performing better was that it was optimized with the newer Pentium
III (and Pentium 4) chips in mind, whereas the asm kernel had not been
updated to take advantage of the newer CPU architectures.  You can
therefore expect the difference to be even larger for Pentium 4.

Another reason why you might want to use Hazelnut for performance
measurements is that it supports small spaces (emulation of tagged
TLBs).  The impertance of having small spaces has increased with the
years since the TLBs have gotten larger, hence impose a larger
indirect penalty when they are flushed; and the TLB miss penalty has
gotten higher due to longer pipelines and a greater disparity between
CPU speed and memory access speed.  In addition, the Pentium 4 also
contains a 12K u-ops Trace Cache (i.e., an instruction cache) which is
virtually tagged and therefore flushed on cr3 reloads (address space
switches).  Having small spaces also avoids such trace cache flushes.
As an example of the benefit of small spaces, an L4Linux getpid()
syscall is 70% slower on a kernel without small spaces compared to a
kernel with small spaces enabled [1].

When compiling Hazelnut to run you benchmarks remember to turn of
tracepoints, IPC tracing, spin wheels, etc., and make sure that the
FastPath IPC and small spaces are enabled.  This can all be done using
make (x)config.


[1] http://i30www.ira.uka.de/research/documents/l4ka/smallspaces.pdf

More information about the l4-hackers mailing list