hi,
the original l4linux papers mention performance about 4% lower than native linux. When running l4linux on top of Fiasco, performance seems to be more like 50% of what we get with native 2.2. For example, encoding a .wav-file to ogg-vorbis takes 2 minutes native, and almost 4 minutes with l4linux. LMBench also gives way-different figures.
Is there anything we can do to tune Fiasco for better performance? Are other people experiencing the same speed differences as we are?
Best regards, Jacob
Hi,
On Tuesday 24 September 2002 23:53, Jacob Gorm Hansen wrote:
the original l4linux papers mention performance about 4% lower than native linux. When running l4linux on top of Fiasco, performance seems to be more like 50% of what we get with native 2.2. For example, encoding a .wav-file to ogg-vorbis takes 2 minutes native, and almost 4 minutes with l4linux. LMBench also gives way-different figures.
We have not done benchmarks for many time because Fiasco is still under heavy development. Next weeks we want to merge several CVS trees into the main tree giving support for additional architectures and giving more speed for the common IPC path.
Is there anything we can do to tune Fiasco for better performance? Are other people experiencing the same speed differences as we are?
Some tuning tips for Fiasco:
Create a file
l4/kernel/fiasco/src/Makeconf.local: EXTRA_FLAGS += -march=i686 -malign-functions=4 # optimize for >= PPro EXTRA_FLAGS += -DNDEBUG # don't compile assertions EXTRA_FLAGS += -DNO_FRAME_PTR -fomit-frame-pointer # frame pointer is only # needed for debugging
Further, edit
l4/kernel/fiasco/src/kern/config.cpp
and replace
kernel_mem_per_cent = 20;
by
kernel_mem_per_cent = 10;
This gives additional memory for user space applications. You have to consider that nevertheless Fiasco + L4Linux has significant less memory available than plain Linux.
In the linux22 configuration, disable "L4Linux/Compile kernel with frame pointer" and enable "L4Linux/Use Assembler version of l4_idle".
And last but not least don't enable any tracing events in Fiasco since they give additional costs.
Additional speedup for L4Linux could be achieved by patching the syscalls in the libc by direct jumping into the emulib of the process preventing the trampoline mechanism (int 0x80 => general protection => int 0x30 => l4linux server). I don't know if such a patch if floating around somewhere. See linux22/arch/l4/x86/emulib/int_entry.S, function entry13.
Frank
Hi Jacob,
On Tuesday 24 September 2002 23:53, Jacob Gorm Hansen wrote:
the original l4linux papers mention performance about 4% lower than native linux. When running l4linux on top of Fiasco, performance seems to be more like 50% of what we get with native 2.2. For example, encoding a .wav-file to ogg-vorbis takes 2 minutes native, and almost 4 minutes with l4linux. LMBench also gives way-different figures.
Encoding of 185MB .wav-file into mp3/128kBit/s with lame 3.92 on Pentium III Coppermine 800 MHz:
Plain Linux 2.2.21 real 6m32.000s user 4m16.290s sys 0m7.360s
L4Linux 2.2.21 on Fiasco (current CVS version): real 6m46.943s user 0m0.010s sys 0m0.000s
Don't depend on the usr/sys times! The only time you can believe in is the time after "real". L4Linux doesn't have its own scheduler but L4Linux tasks are scheduled by the scheduler of the underlaying L4 implementation. The time a L4Linux task is running is determined by the l4_thread_schedule() system call. But thread time accounting is disabled by default in Fiasco because of the additional costs. You can enable it by setting config::fine_grained_cputime to TRUE (but it isn't fully tested yet so without any warranty :-)).
BTW: What about your L4Linux migration? Do you have a website? We are very interested ...
Frank
On Tue, Oct 01, 2002 at 02:50:07PM +0200, Frank Mehnert wrote:
Hi Jacob,
On Tuesday 24 September 2002 23:53, Jacob Gorm Hansen wrote:
the original l4linux papers mention performance about 4% lower than native linux. When running l4linux on top of Fiasco, performance seems to be more like 50% of what we get with native 2.2. For example, encoding a .wav-file to ogg-vorbis takes 2 minutes native, and almost 4 minutes with l4linux. LMBench also gives way-different figures.
Encoding of 185MB .wav-file into mp3/128kBit/s with lame 3.92 on Pentium III Coppermine 800 MHz:
Plain Linux 2.2.21 real 6m32.000s user 4m16.290s sys 0m7.360s
L4Linux 2.2.21 on Fiasco (current CVS version): real 6m46.943s user 0m0.010s sys 0m0.000s
This looks good. I've rerun the benchmarks after your original tuning advice, and this look much better now.
Don't depend on the usr/sys times! The only time you can believe in is the time after "real". L4Linux doesn't have its own scheduler but L4Linux tasks are scheduled by the scheduler of the underlaying L4 implementation. The time a L4Linux task is running is determined by the l4_thread_schedule() system call. But thread time accounting is disabled by default in Fiasco because of the additional costs. You can enable it by setting config::fine_grained_cputime to TRUE (but it isn't fully tested yet so without any warranty :-)).
OK, we'll pull out a real clock then ;-)
BTW: What about your L4Linux migration? Do you have a website? We are very interested ...
Things are going very well indeed. Today, we migrated a running (though not very busy) kernel with 16M ram to another computer, with a downtime around 1/10th of a second, on a 100mbit network. We're using precopy, so that first all pages get copied, then dirty pages are copied over. This is repeated a few times to minimize the size of the last transfer, minimizing downtime as well.
We currently have the following features:
* An OSkit-based 'bios' which abstract an ethernet-device, and has its own IP-stack, so that kernel can be sent to it for boot or resume via sockets. We do not have zero-copy networking though :-(
* Run multiple l4linuxes atop the bios, on the same cpu (nfs-booted).
* Migrate running l4linuxes between bioses on different CPUs, with minimal downtime (lower bound is probably the transferral of one 4k page per linux process).
We do not have a website (yet?) and the setup is a bit complicated. You need an NFS-server for the filesystems, and you probably need etherbooted grub as well. Bootp in oskit is broken, so right now some manual editing of the bios will be needed to get things running.
There is also a small problem when migrating to a different set of task numbers, due to the lack of C&C in Fiasco. I've hacked it to return 0x10 for out-of-clan ipcs here though.
We would like to make sources available, but we've been a bit brutal to l4linux, so we better keep it in our own CVS for now.
Best, Jacob
l4-hackers@os.inf.tu-dresden.de