blaine at mac.com
Sat Feb 15 19:15:44 CET 2014
Awesome! Thanks for the references.
The fact that L4 hasn’t been beaten is quite significant to me and so I’ll dig in further.
My intuition is that we have to turn our thinking upside down. There are no Threads. Have a few stacks for I/O interrupts in the kernel, do all stack scheduling in user space, and let IPC be as pure as trap, swap mmu, jump. If this model has been explored I’ll be fascinated to learn where it ran aground.
Again, thanks for the references, this is essential work which will take a long time to fully realize. I now know that L4 has set the bar, and I have some papers and code to peruse, and so I have refined my starting position. Its a good result!
On Feb 14, 2014, at 9:18 PM, Gernot Heiser <gernot at unsw.edu.au> wrote:
> On Fri Feb 14, 2014 at 10:39:49 -0800, Blaine Garst wrote:
>> At first glance I suspect that my architectural work will improve L4
>> IPC times.
>> The premise is/was that threads don’t belong to address spaces but
>> instead wander with the IPC from one address domain to another
>> carrying their arguments in registers.
> You’re talking about a migrating-threads model. Bryan Ford implemented that in Mach in the ‘90s , it improved Mach IPC (from a very low baseline), but still not even close to L4’s. (And note that they don’t compare to L4, bit of a benchmarking crime…) Pebble  was a from-scratch kernel using a migrating threads model, it got within 10% of L4 IPC performance but not better. More recently Gabe Parmer’s and Rich West’s Composite OS  tried the same, their IPC costs are also higher than L4’s.
>> IPC is a trap, adjust mmu,
>> proceed. If the IPC is carrying an IPC end-point, e.g. a capability,
>> its a different trap and some bookkeeping must be done, but it can
>> also be blindingly fast. The hard question is and was, well, if you
>> don’t have a blocking thread waiting for the IPC, how do you manage
>> all these spontaneous “up-calls”.
> You’ll find that it ain’t that easy. On the one hand, L4 IPC is designed to be little more than a context switch, so, as Adam says, there isn’t much to shave off. (In fact, about 10–15 years ago, when we were building Mungi on L4, some of my students argued that we should be moving to a kernel with a migrating threads model as this would map more efficiently onto Mungi’s migrating threads model. But when going through the operations that needed to be performed, no-one could show me how it would end up faster than using L4.)
> On the other hand, you have to do considerable more than switching page tables. In particular, while logically the thread continues executing on its old stack, in reality that doesn’t work: the thread switches protection domains, and its old stack is no longer accessible. While logically, the whole stack moves between protection domains, in practice, this means that you need to provide a new stack on the fly. Obviously, the stack will be cached, so it can be re-used on a repeat call, but it isn’t as easy as only changing the page table.
> And, there is no guarantee (except if you’re in a single-address-space OS like Mungi) that you actually *can* allocate a new stack where you need it: as you’re switching to a new AS, the address range used by the original stack might be in use by something else, which means you’re hosed.
> Plus, maintaining a cache of stacks introduces resource-management policies into the kernel, in violation of microkernel principles.
>  Bryan Ford and Jay Lepreau, Evolving Mach 3.0 to a Migrating Thread Model, USENIX Winter, 1994
>  Eran Gabber, Christopher Small, John Bruno, José Brustoloni and Avi Silberschatz, The Pebble Component-Based Operating System, Usenix’99
>  Gabriel Parmer, Composite: A component-based operating system for predictable and dependable computing, PhD thesis, Boston University, 2009
> l4-hackers mailing list
> l4-hackers at os.inf.tu-dresden.de
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the l4-hackers