Re[4]: performance of L4Linux

Igor Shmukler

28 Oct 2002 28 Oct '02

4:41 p.m.

Well, then you should probably start reading the publications about L3 and L4 of the last 7 years and you will find an in-detail analysis why your statement could be taken as offensive rant... ;-) There are pretty exact numbers of MACH's cache utilization and performance of IPC and the structural problems of the communication mechanism.

Well, I have done that (read papers). Cache utilization is very easy to fix. In fact when I did Mach VM overhauling (couple of years ago) it maybe took me one day to implement coloring. IPC issue was not discovered by l3/4 team, but rather was known issue for very long time. There are much older Utah and UW papers that address that. Anything else l3/4 team has up the sleeve? If I was to say that being tightly architectutre coupled is strange or that since l4 took almost everything out of the kernel, memory management could have been evicted as well (like eros microkernel does), then you may need to get offended. Otherwise, I don't see your point. L4 papers states that since there is no way to prove that Mach only causes 5% degradation statement is not valid. I say same logic applies to MkLinux vs. L4Linux. I am not trying to start am flame war. Nor do I claim that L4 is worthless. I simply want to see if there is a way to validate specific statement. If there is, that means that statement is valid, if there is not, then you tell me ... As simple as that. IS.

Show replies by date

Jean Wolter

28 Oct 28 Oct

7:20 p.m.

"Igor Shmukler" <shmukler@mail.ru> writes:

...

...
Well, then you should probably start reading the publications about L3 and L4 of the last 7 years and you will find an in-detail analysis why your statement could be taken as offensive rant... ;-) There are pretty exact numbers of MACH's cache utilization and performance of IPC and the structural problems of the communication mechanism.

...

Well, I have done that (read papers). Cache utilization is very easy to fix. In fact when I did Mach VM overhauling (couple of years ago) it maybe took me one day to implement coloring.

Maybe I am missing something, but what has cache coloring to do with cache utilization? Or do you mean you could solve the cache problems of Mach by simply reserving a fraction of the cache for the kernel?

...

IPC issue was not discovered by l3/4 team, but rather was known issue for very long time. There are much older Utah and UW papers that address that.

The issue was known before. That was the reason for papers like "The increasing irrelevance of ipc performance" from Brian Bershad which essentially claimed: Since the ipc performance is so bad, applications found other ways to communicate with each other and therefore there is no reason to try to achieve better ipc performance. But did anyone provide a solution? Surely you can point us to the Utah and UW papers which describe ipc performance comparable to L3/L4 ipc performance and where published before 1993. The program committee of the 93 SOSP surely thought otherwise and the audience was "shocked" when Jochen presented its 10-fold increase in IPC performance (l3 compared to Mach). The first thing Brian Ford did was asking Jochen for a copy of L3 to be able to verify the results (It was quite funny since L3 had a really strange user land and was mostly in German so Jochen had to coach Brian through the installation process).

...

that since l4 took almost everything out of the kernel, memory management could have been evicted as well (like Eros micro kernel does), then you may need to get offended.

You don't mean that Eros allows user level processes to directly manipulate kernel page tables? You are kidding, aren't you? Eros does even more memory management work inside the kernel then L4. If l4 catches a page fault it directly sends a message to the pager. Eros first tries to parse the capability tree to check whether there is a mapping present which isn't in its hardware page table and only if it finds no mapping it invokes the user level page fault handler.

...

L4 papers states that since there is no way to prove that Mach only causes 5% degradation statement is not valid.

There was a rumor and we tried to verify it. No-one working with Mach was able to substantiate this rumor. They simply refused to confirm it. So we stated: "We found no substantiation for the ``common knowledge'' that early Mach3.0-based Unix single-server implementations achieved a performance penalty of only 10% compared to bare Unix on the same hardware.". There is no paper, no tech report, nothing that we are aware of. Feel free to prove us wrong and point us to any document published before 1997.

...

I say same logic applies to MkLinux vs. L4Linux.

Here in contrast we actually measured the two systems.

...

I am not trying to start am flame war. Nor do I claim that L4 is worthless. I simply want to see if there is a way to validate specific statement.

Have you tried to redo our measurements? Just pick the a machine with the same configuration like we used for our measurements (a 133 MHz Pentium PC based on an ASUS P55TP4N motherboard using Intel's 430FX chipset, equipped with a 256KB pipeline-burst second-level cache and 64MB of 60ns Fast Page Mode RAM) and do them again. If you come up with different numbers then we we can start to discuss the differences. http://www.ibiblio.org/pub/historic-linux/early-ports/mach-linux/intel/ google will point you to the location of hbench...

...

If there is, that means that statement is valid, if there is not, then you tell me ... As simple as that. IS.

We will see. Regards, Jean

Igor Shmukler

8:09 p.m.

New subject: Re[6]: performance of L4Linux

...

...
Well, I have done that (read papers). Cache utilization is very easy to fix. In fact when I did Mach VM overhauling (couple of years ago) it maybe took me one day to implement coloring.

Maybe I am missing something, but what has cache coloring to do with cache utilization? Or do you mean you could solve the cache problems of Mach by simply reserving a fraction of the cache for the kernel? That's not what I tried to say. My English must be bad. Sorry. I tried to say that carefully selecting pages to be placed on the queue, gave Mach serious performance boost.

...

...
IPC issue was not discovered by l3/4 team, but rather was known issue for very long time. There are much older Utah and UW papers that address that.

The issue was known before. That was the reason for papers like "The increasing irrelevance of ipc performance" from Brian Bershad which essentially claimed: Since the ipc performance is so bad, applications found other ways to communicate with each other and therefore there is no reason to try to achieve better ipc performance. I know about Brian's paper, but there other papers that addressed that. For instance there was UW paper where researchers have sucessfully (work was not finished then, but it could be extended even today) tried replacing Mach rpc with LRPC in an instance of colocated objects.

...

But did anyone provide a solution? Surely you can point us to the Utah and UW papers which describe ipc performance comparable to L3/L4 ipc performance and where published before 1993. The program committee of the 93 SOSP surely thought otherwise and the audience was "shocked" when Jochen presented its 10-fold increase in IPC performance (l3 compared to Mach). The first thing Brian Ford did was asking Jochen for a copy of L3 to be able to verify the results (It was quite funny since L3 had a really strange user land and was mostly in German so Jochen had to coach Brian through the installation process). Well obvious solution to Mach's problems is critical path optimizations.

...

...
that since l4 took almost everything out of the kernel, memory management could have been evicted as well (like Eros micro kernel does), then you may need to get offended.

You don't mean that Eros allows user level processes to directly manipulate kernel page tables? You are kidding, aren't you? Well, I was talking about "First class flexpage-based address spaces" paper presented in by Shapiro in(around) 2000.

...

Eros does even more memory management work inside the kernel then L4. If l4 catches a page fault it directly sends a message to the pager. Eros first tries to parse the capability tree to check whether there is a mapping present which isn't in its hardware page table and only if it finds no mapping it invokes the user level page fault handler. If basic assertation is "microkernel is only to provide enough protection so that applications can provide abstractions", then resource allocation may be safely moved out of microkernel. Correct me when I am wrong.

...

...
L4 papers states that since there is no way to prove that Mach only causes 5% degradation statement is not valid.

There was a rumor and we tried to verify it. No-one working with Mach was able to substantiate this rumor. They simply refused to confirm it. So we stated: "We found no substantiation for the ``common knowledge'' that early Mach3.0-based Unix single-server implementations achieved a performance penalty of only 10% compared to bare Unix on the same hardware.". There is no paper, no tech report, nothing that we are aware of. Feel free to prove us wrong and point us to any document published before 1997. I am not trying to prove somebody wrong (or right for that matter). I want to perform evaluation then report result to those who may be interested. I will make sure to share results with you upon availability.

...

...
I am not trying to start am flame war. Nor do I claim that L4 is worthless. I simply want to see if there is a way to validate specific statement.

Have you tried to redo our measurements? Just pick the a machine with the same configuration like we used for our measurements (a 133 MHz Pentium PC based on an ASUS P55TP4N motherboard using Intel's 430FX chipset, equipped with a 256KB pipeline-burst second-level cache and 64MB of 60ns Fast Page Mode RAM) and do them again. If you come up with different numbers then we we can start to discuss the differences.

http://www.ibiblio.org/pub/historic-linux/early-ports/mach-linux/intel/ I will try to redo you tests, now that I have resources. I may have to use different hardware, but in this case, I will note that in the report.

...

google will point you to the location of hbench... Thanks.

...

We will see. I could not agree with you more. And we shall see relatively soon too. Just give me a little time. Thank you for providing link to the needed files. Sincerely, IS.

Jean Wolter

29 Oct 29 Oct

2:18 p.m.

New subject: Re[6]: performance of L4Linux

"Igor Shmukler" <shmukler@mail.ru> writes:

...

...
But did anyone provide a solution? Surely you can point us to the Utah and UW papers which describe ipc performance comparable to L3/L4 ipc performance and where published before 1993. The program committee of the 93 SOSP surely thought otherwise and the audience was "shocked" when Jochen presented its 10-fold increase in IPC performance (l3 compared to Mach). The first thing Brian Ford did was asking Jochen for a copy of L3 to be able to verify the results (It was quite funny since L3 had a really strange user land and was mostly in German so Jochen had to coach Brian through the installation process). Well obvious solution to Mach's problems is critical path optimizations.

wow, /me wonders why no-one ever thought about that...

...

...
...
that since l4 took almost everything out of the kernel, memory management could have been evicted as well (like Eros micro kernel does), then you may need to get offended.

You don't mean that Eros allows user level processes to directly manipulate kernel page tables? You are kidding, aren't you? Well, I was talking about "First class flexpage-based address spaces" paper presented in by Shapiro in(around) 2000.

(Unfortunatly I was only able to browse the text version provided by google since the Server providing the post script version is down) And? Did anybody implemement it? Eros doesn't use it... http://www.eros-os.org/pipermail/eros-arch/2002-July/003425.html

...

...
Eros does even more memory management work inside the kernel then L4. If l4 catches a page fault it directly sends a message to the pager. Eros first tries to parse the capability tree to check whether there is a mapping present which isn't in its hardware page table and only if it finds no mapping it invokes the user level page fault handler. If basic assertation is "microkernel is only to provide enough protection so that applications can provide abstractions", then resource allocation may be safely moved out of microkernel. Correct me when I am wrong.

Suggest an alternative way to implement address spaces and and everybody will listen. Regards, Jean

Igor Shmukler

3:47 p.m.

New subject: Re[8]: performance of L4Linux

First, I'd like to note that my domain apparently been compromised and Jean's server bounces it, which makes replying a little more difficult than it could be. In previous email Jean suggested to check string to ensure that I have correct kernel version. Which I will do once I get around to it. To be honest this raises a question. If I in fact have wrong kernel version, could I get that particular one? I also think that for tests to make sense I'd need compiler version and arguments passed to it. I don't think that explanation on purpose of that is needed. Also, please be advised that I am not yet ready to run benchmarks and it may take *more* than a day or so before I will be ready. Not because, I am missing software, but rather because mach has paging issue and need to set everything up and recompile with compatible options.

...

...
Well obvious solution to Mach's problems is critical path optimizations. wow, /me wonders why no-one ever thought about that... Fact that people thought about it, does not mean that it has been done and/or ineffective.

...

...
Well, I was talking about "First class flexpage-based address spaces" paper presented in by Shapiro in(around) 2000.

(Unfortunatly I was only able to browse the text version provided by google since the Server providing the post script version is down) I am attaching paper to the message. Don't know if mailing list server will not strip it.

Sincerely, IS.

Jean Wolter

4:46 p.m.

New subject: Re[8]: performance of L4Linux

"Igor Shmukler" <shmukler@mail.ru> writes:

...

First, I'd like to note that my domain apparently been compromised and Jean's server bounces it, which makes replying a little more difficult than it could be.

The mailinglist is subscriber only so simply subscribe to it and ou problems go away.

...

In previous email Jean suggested to check string to ensure that I have correct kernel version. Which I will do once I get around to it. To be honest this raises a question. If I in fact have wrong kernel version, could I get that particular one?

As far as I understand you want to find out whether: - mklinux is really as bad as published - L4Linux is really as fast as published Start with the first one. Then fetch a version of L4 from l4kabuild l4linux 2.0 against it and try to find out about the second point. (I don't know whether l4ka's version supports 4Mb pages and small address spaces but you will find out whether they do or not)

...

I also think that for tests to make sense I'd need compiler version and arguments passed to it. I don't think that explanation on purpose of that is needed.

Standard Linux makefile options, L4Linux used 4 Mb pages and small address spaces (there where some config options but I think they are set in the makefile)

...

...
...
Well obvious solution to Mach's problems is critical path optimizations. wow, /me wonders why no-one ever thought about that... Fact that people thought about it, does not mean that it has been done and/or ineffective.

No comment...

...

...
...
Well, I was talking about "First class flexpage-based address spaces" paper presented in by Shapiro in(around) 2000.

(Unfortunatly I was only able to browse the text version provided by google since the Server providing the post script version is down) I am attaching paper to the message. Don't know if mailing list server will not strip it.

- read 4.1 and try to implement a non trivial pager like the linux server. 16 slots (16 Mappings) per address space and a pager which uses single pages to resolve page faults (and therefore needs on slot per page) doesn't seem to work very well. - Adress space construction becomes a complicated operation (do I have enough slots or do I have to use two adress spaces or even more to construct the real adress space) - what about recursive address space construction? How do I remove mappings created by another process based on the mapppings I gave it? Thats the whole reason for the mapping tree Shapiro is complaining about. And I don't see how this situation is handled by his proposed solution. - Shapiro tries to remove the mapping database but introduces capabilities into the kernel, which are dynamically created and have to be managed somehow There are other proposals which handle the resource allocation problem in a better way... Regards, Jean

8526

Age (days ago)

8527

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Igor Shmukler
Jean Wolter