enabling/disabling trampoline

Cristiano Ligieri Pereira

10 Mar 2003 10 Mar '03

8:33 a.m.

Hi all, I understand that the trampoline mechanism slows down the execution because it adds one more level of "indirection" when handling system calls on l4-linux. I would like to run some experiments with/without it but I'm not quite sure on how to achieve it,i.e., disable and enable it back again. Do I have to recompile the whole kernel? Do I have to recompile only the Linux application I will run on the top of l4-linux? What exactly do I have to do? I'm pasting part of a message from mailing list in which Frank Mehnert mention something about a patch, which I could not find. Any help is appreciated. Thanks, Cristiano.

...

Additional speedup for L4Linux could be achieved by patching the syscalls in the libc by direct jumping into the emulib of the process preventing the trampoline mechanism (int 0x80 => general protection => int 0x30 => l4linux server). I don't know if such a patch if floating around somewhere. See linux22/arch/l4/x86/emulib/int_entry.S, function entry13.

------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira

Show replies by date

Frank Mehnert

10 Mar 10 Mar

10:02 a.m.

On Monday 10 March 2003 08:33, Cristiano Ligieri Pereira wrote:

...

I understand that the trampoline mechanism slows down the execution because it adds one more level of "indirection" when handling system calls on l4-linux. I would like to run some experiments with/without it but I'm not quite sure on how to achieve it,i.e., disable and enable it back again. Do I have to recompile the whole kernel? Do I have to recompile only the Linux application I will run on the top of l4-linux? What exactly do I have to do?

I'm pasting part of a message from mailing list in which Frank Mehnert mention something about a patch, which I could not find.

Please wait some days. Currently we are working on a glibc which does exactly what you want. Frank -- ## Dept. of Computer Science, Dresden University of Technology, Germany ## ## E-Mail: fm3@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/~fm3 ##

Cristiano Ligieri Pereira

2:27 p.m.

Isn't there a quick (but perhaps not very nice or friendly) way to do it? I'm kind of tight on a schedule to have this data collected. thanks again, Cristiano. ------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira On Mon, 10 Mar 2003, Frank Mehnert wrote:

...

On Monday 10 March 2003 08:33, Cristiano Ligieri Pereira wrote:

...
I understand that the trampoline mechanism slows down the execution because it adds one more level of "indirection" when handling system calls on l4-linux. I would like to run some experiments with/without it but I'm not quite sure on how to achieve it,i.e., disable and enable it back again. Do I have to recompile the whole kernel? Do I have to recompile only the Linux application I will run on the top of l4-linux? What exactly do I have to do?

I'm pasting part of a message from mailing list in which Frank Mehnert mention something about a patch, which I could not find.

Please wait some days. Currently we are working on a glibc which does exactly what you want.

Frank -- ## Dept. of Computer Science, Dresden University of Technology, Germany ## ## E-Mail: fm3@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/~fm3 ##

Frank Mehnert

3:22 p.m.

On Monday 10 March 2003 14:27, Cristiano Ligieri Pereira wrote:

...

Isn't there a quick (but perhaps not very nice or friendly) way to do it? I'm kind of tight on a schedule to have this data collected.

Hmm. There isn't a quick hack. You have to replace every occurrence of "int $0x80" in your programs by "1: \n\t" "pushf \n\t" "push %%cs \n\t" "pushl $2f \n\t" "pushl $0x402 \n\t" "pushl $0xa0008000 \n\t" "ret \n\t" "jmp 1b \n\t" "2: \n\t" So the best is to generate an extra C library for this, right? Frank -- ## Dept. of Computer Science, Dresden University of Technology, Germany ## ## E-Mail: fm3@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/~fm3 ##

Cristiano Ligieri Pereira

3:54 p.m.

Actually what I'm doing is trying the reproduce some of the results from the paper "The performance of micro-kernel Based Systems" using lmbench/hbench/AIM benchmarks on a P4 machine. I'm not quite sure if I understood what you mean by generating a new C library. Where is the code for the C library in the source tree?. Isn't there a piece of code handling all system calls in which I could just change it instead of changing every occurrence of "int 0x80" in my code? Cristiano. ------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira On Mon, 10 Mar 2003, Frank Mehnert wrote:

...

On Monday 10 March 2003 14:27, Cristiano Ligieri Pereira wrote:

...
Isn't there a quick (but perhaps not very nice or friendly) way to do it? I'm kind of tight on a schedule to have this data collected.

Hmm. There isn't a quick hack. You have to replace every occurrence of "int $0x80" in your programs by

"1: \n\t" "pushf \n\t" "push %%cs \n\t" "pushl $2f \n\t" "pushl $0x402 \n\t" "pushl $0xa0008000 \n\t" "ret \n\t" "jmp 1b \n\t" "2: \n\t"

So the best is to generate an extra C library for this, right?

Frank -- ## Dept. of Computer Science, Dresden University of Technology, Germany ## ## E-Mail: fm3@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/~fm3 ##

Jacob Gorm Hansen

5:29 p.m.

On Mon, 2003-03-10 at 15:54, Cristiano Ligieri Pereira wrote:

...

Actually what I'm doing is trying the reproduce some of the results from the paper "The performance of micro-kernel Based Systems" using lmbench/hbench/AIM benchmarks on a P4 machine. I'm not quite sure if I understood what you mean by generating a new C library. Where is the code for the C library in the source tree?. Isn't there a piece of code handling all system calls in which I could just change it instead of changing every occurrence of "int 0x80" in my code?

Most programs are dynamically linked to libc, which performs the actual linux syscalls. What you need is a version of glibc which invokes L4 by calling into the trampoline code already present in the address space of each l4linux process in the 'emulib' pages at 0xa0008000 instead of using the normal int $0x80 trap instruction. Unfortunately, I do not think a patched version of glibc exists. If you wish, you can roll your own by downloading the source of glibc, and patching it according to Frank's instructions, compile it, and install it into the library path of your benchmark applications. If anyone has or is able to produce a patch and more elaborate installation instructions, I would be very interested in receiving a copy. Best regards, Jacob

Cristiano Ligieri Pereira

13 Mar 13 Mar

1:24 a.m.

thanks for answering. I'm really tight on my schedule to try such major change. I guess I will have to just forget about it for now. How was it done in the previous version of l4? The one written in assembly. And does the assembly version (hazelnut?) work fine on pentium 4 machines? the problem is that Fiasco seems to be performing pretty bad on the same benchmarsk presented in the paper "The performance of micro-kernel Based Systems" and I'm wondering why. My first guess was the trampoline mechanism. I using a Pentium 4 1.3Ghz machine and was expecting a smoother performance degradation (compared to the paper) even though executing a different version of l4 (fiasco instead of the assembly version Cristiano. ------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira On 10 Mar 2003, Jacob Gorm Hansen wrote:

...

On Mon, 2003-03-10 at 15:54, Cristiano Ligieri Pereira wrote:

...
Actually what I'm doing is trying the reproduce some of the results from the paper "The performance of micro-kernel Based Systems" using lmbench/hbench/AIM benchmarks on a P4 machine. I'm not quite sure if I understood what you mean by generating a new C library. Where is the code for the C library in the source tree?. Isn't there a piece of code handling all system calls in which I could just change it instead of changing every occurrence of "int 0x80" in my code?

Most programs are dynamically linked to libc, which performs the actual linux syscalls. What you need is a version of glibc which invokes L4 by calling into the trampoline code already present in the address space of each l4linux process in the 'emulib' pages at 0xa0008000 instead of using the normal int $0x80 trap instruction. Unfortunately, I do not think a patched version of glibc exists. If you wish, you can roll your own by downloading the source of glibc, and patching it according to Frank's instructions, compile it, and install it into the library path of your benchmark applications. If anyone has or is able to produce a patch and more elaborate installation instructions, I would be very interested in receiving a copy.

Best regards, Jacob

_______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Cristiano Ligieri Pereira

3:26 a.m.

it's not hazelnut... which one is it then? And is it available for downloading? Cristiano. ------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira On Wed, 12 Mar 2003, Cristiano Ligieri Pereira wrote:

...

thanks for answering. I'm really tight on my schedule to try such major change. I guess I will have to just forget about it for now.

How was it done in the previous version of l4? The one written in assembly. And does the assembly version (hazelnut?) work fine on pentium 4 machines?

the problem is that Fiasco seems to be performing pretty bad on the same benchmarsk presented in the paper "The performance of micro-kernel Based Systems" and I'm wondering why. My first guess was the trampoline mechanism. I using a Pentium 4 1.3Ghz machine and was expecting a smoother performance degradation (compared to the paper) even though executing a different version of l4 (fiasco instead of the assembly version

Cristiano.

------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira

On 10 Mar 2003, Jacob Gorm Hansen wrote:

...
On Mon, 2003-03-10 at 15:54, Cristiano Ligieri Pereira wrote:

...
Actually what I'm doing is trying the reproduce some of the results from the paper "The performance of micro-kernel Based Systems" using lmbench/hbench/AIM benchmarks on a P4 machine. I'm not quite sure if I understood what you mean by generating a new C library. Where is the code for the C library in the source tree?. Isn't there a piece of code handling all system calls in which I could just change it instead of changing every occurrence of "int 0x80" in my code?

Most programs are dynamically linked to libc, which performs the actual linux syscalls. What you need is a version of glibc which invokes L4 by calling into the trampoline code already present in the address space of each l4linux process in the 'emulib' pages at 0xa0008000 instead of using the normal int $0x80 trap instruction. Unfortunately, I do not think a patched version of glibc exists. If you wish, you can roll your own by downloading the source of glibc, and patching it according to Frank's instructions, compile it, and install it into the library path of your benchmark applications. If anyone has or is able to produce a patch and more elaborate installation instructions, I would be very interested in receiving a copy.

Best regards, Jacob

_______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

_______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Frank Mehnert

10:36 a.m.

On Thursday 13 March 2003 01:24, Cristiano Ligieri Pereira wrote:

...

thanks for answering. I'm really tight on my schedule to try such major change. I guess I will have to just forget about it for now.

How was it done in the previous version of l4? The one written in assembly. And does the assembly version (hazelnut?) work fine on pentium 4 machines?

Hazelnut is not the assembly version. Hazelnut from Karlsruhe (L4Ka project) is written in C++ without using special C++ features. The assembly version of L4 is not freely available.

...

the problem is that Fiasco seems to be performing pretty bad on the same benchmarsk presented in the paper "The performance of micro-kernel Based Systems" and I'm wondering why. My first guess was the trampoline

Which benchmark do you talk about? AIM? Where from do you got it? Which version of Fiasco do you use? What did you measure? Which compiler did you use for compiling Fiasco? Please post your config file! Please, could you give some more details about your test scenario?!

...

mechanism. I using a Pentium 4 1.3Ghz machine and was expecting a smoother performance degradation (compared to the paper) even though executing a different version of l4 (fiasco instead of the assembly version

Why do you expect a smoother performance degradation? We still have some minor problems building a libc for L4Linux without using the trampoline mechanism. In fact we built such an libc but some tests after the build process fail. However, you can only expect notable better performance if you use the right benchmark, that is benchmarks using many Linux system calls. You will not note a big difference in compiling the Linux kernel since most work is done in userland. Frank -- ## Dept. of Computer Science, Dresden University of Technology, Germany ## ## E-Mail: fm3@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/~fm3 ##

Espen Skoglund

11:33 a.m.

[Cristiano Ligieri Pereira]

...

thanks for answering. I'm really tight on my schedule to try such major change. I guess I will have to just forget about it for now.

...

How was it done in the previous version of l4? The one written in assembly. And does the assembly version (hazelnut?) work fine on pentium 4 machines?

...

the problem is that Fiasco seems to be performing pretty bad on the same benchmarsk presented in the paper "The performance of micro-kernel Based Systems" and I'm wondering why. My first guess was the trampoline mechanism. I using a Pentium 4 1.3Ghz machine and was expecting a smoother performance degradation (compared to the paper) even though executing a different version of l4 (fiasco instead of the assembly version

Of course(!) Hazelnut works fine with Pentium 4 machines. Even though our (the group in Karlsruhe) focus is now on developing the Pistachio kernel, the Hazelnut kernel is definitely not something to frown upon. It is more than stable enough to do a decent job (at least for benchmarking puposes). We've even been running our web server on top of L4Linux/Hazelnut for several months. The Dresden people will probably kill me for saying so :-), but if you want to do performance measurements (except for measurments concerning real-time workloads, e.g., interrupt latency), you should definitely go for the Hazelnut kernel. I can't seem to remember what the performance numbers for the original asm kernel is on the Pentium 4, but for Pentium III the Hazelnut kernel was actually performing slightly better than the original asm kernel. (This only goes for pure IPC times. I don't have other metrics like cache footprint, etc., at hand.) The reason why Hazelnut was performing better was that it was optimized with the newer Pentium III (and Pentium 4) chips in mind, whereas the asm kernel had not been updated to take advantage of the newer CPU architectures. You can therefore expect the difference to be even larger for Pentium 4. Another reason why you might want to use Hazelnut for performance measurements is that it supports small spaces (emulation of tagged TLBs). The impertance of having small spaces has increased with the years since the TLBs have gotten larger, hence impose a larger indirect penalty when they are flushed; and the TLB miss penalty has gotten higher due to longer pipelines and a greater disparity between CPU speed and memory access speed. In addition, the Pentium 4 also contains a 12K u-ops Trace Cache (i.e., an instruction cache) which is virtually tagged and therefore flushed on cr3 reloads (address space switches). Having small spaces also avoids such trace cache flushes. As an example of the benefit of small spaces, an L4Linux getpid() syscall is 70% slower on a kernel without small spaces compared to a kernel with small spaces enabled [1]. When compiling Hazelnut to run you benchmarks remember to turn of tracepoints, IPC tracing, spin wheels, etc., and make sure that the FastPath IPC and small spaces are enabled. This can all be done using make (x)config. eSk [1] http://i30www.ira.uka.de/research/documents/l4ka/smallspaces.pdf

Cristiano Ligieri Pereira

9 p.m.

I'm not managing to get hazelnut working properly. It boots but then when it creates the Linux panics with the following messages: Kernel panic: failed to create ping pong task In swapper task = no syncing --- Linux panic --- Might be something stupid but I can't figure it out. Any suggestion? Cristiano. ------------------------------------------------------------ Cristiano Ligieri Pereira - http://www.ics.uci.edu/~cpereira

...

Of course(!) Hazelnut works fine with Pentium 4 machines. Even though our (the group in Karlsruhe) focus is now on developing the Pistachio kernel, the Hazelnut kernel is definitely not something to frown upon. It is more than stable enough to do a decent job (at least for benchmarking puposes). We've even been running our web server on top of L4Linux/Hazelnut for several months.

The Dresden people will probably kill me for saying so :-), but if you want to do performance measurements (except for measurments concerning real-time workloads, e.g., interrupt latency), you should definitely go for the Hazelnut kernel.

I can't seem to remember what the performance numbers for the original asm kernel is on the Pentium 4, but for Pentium III the Hazelnut kernel was actually performing slightly better than the original asm kernel. (This only goes for pure IPC times. I don't have other metrics like cache footprint, etc., at hand.) The reason why Hazelnut was performing better was that it was optimized with the newer Pentium III (and Pentium 4) chips in mind, whereas the asm kernel had not been updated to take advantage of the newer CPU architectures. You can therefore expect the difference to be even larger for Pentium 4.

Another reason why you might want to use Hazelnut for performance measurements is that it supports small spaces (emulation of tagged TLBs). The impertance of having small spaces has increased with the years since the TLBs have gotten larger, hence impose a larger indirect penalty when they are flushed; and the TLB miss penalty has gotten higher due to longer pipelines and a greater disparity between CPU speed and memory access speed. In addition, the Pentium 4 also contains a 12K u-ops Trace Cache (i.e., an instruction cache) which is virtually tagged and therefore flushed on cr3 reloads (address space switches). Having small spaces also avoids such trace cache flushes. As an example of the benefit of small spaces, an L4Linux getpid() syscall is 70% slower on a kernel without small spaces compared to a kernel with small spaces enabled [1].

When compiling Hazelnut to run you benchmarks remember to turn of tracepoints, IPC tracing, spin wheels, etc., and make sure that the FastPath IPC and small spaces are enabled. This can all be done using make (x)config.

eSk

[1] http://i30www.ira.uka.de/research/documents/l4ka/smallspaces.pdf

_______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

8391

Age (days ago)

8394

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Cristiano Ligieri Pereira
Cristiano Ligieri Pereira
Espen Skoglund
Frank Mehnert
Jacob Gorm Hansen