Information on implementing L4

List overview All Threads
Download

newer

older

FOSDEM 2019 - Microkernels...

OS/Microkernel Engineers Wanted!

John

14 Sep 2018 14 Sep '18

2:55 a.m.

Hi all.

I've been looking at Google and Wikipedia for information about L4, and it's fairly thin. I've come up with that L4 has read and write registers and performs IPC by fixed-length registers, with all actions synchronous (I guess that means you send a message and block until the target thread reads the message).

Is there a more concrete document anywhere about implementing an L4 microkernel and what makes it L4?

I'm surveying the field at the moment and looking at all the advances made in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a next-generation operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS file page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

In other words: it's readily-achievable to write a CLR in C#, get it running on itself before the kernel comes up, then load a kernel, VMM, VFS, ABI, scheduler, and so forth—all written in C#, compiled to CIL, and totally independent of the hardware architecture underneath—and then throw away the loader and bring up init. The kernel would need to use an abstract factory model to do common tasks which require hardware-specific code, and the CLR would need to load an assembly with a concrete factory for the particular hardware and a few built-in native assembly calls; but it's doable.

This is all theoretical at the moment, and I want to explore the particular edges a little to see what I find out.

Attachments:

attachment.html (text/html — 1.8 KB)

Show replies by date

Gernot Heiser

14 Sep 14 Sep

3:48 a.m.

Hi John,

In terms of what makes L4, the most comprehensive recent document is this: https://ts.data61.csiro.au/publications/nictaabstracts/Heiser_Elphinstone_16... https://ts.data61.csiro.au/publications/nictaabstracts/Heiser_Elphinstone_16.abstract.pml

Gernot

...

On 14 Sep 2018, at 10:55, John <john.r.moser@gmail.com mailto:john.r.moser@gmail.com> wrote:

Hi all.

I've been looking at Google and Wikipedia for information about L4, and it's fairly thin. I've come up with that L4 has read and write registers and performs IPC by fixed-length registers, with all actions synchronous (I guess that means you send a message and block until the target thread reads the message).

Is there a more concrete document anywhere about implementing an L4 microkernel and what makes it L4?

I'm surveying the field at the moment and looking at all the advances made in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a next-generation operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS file page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

In other words: it's readily-achievable to write a CLR in C#, get it running on itself before the kernel comes up, then load a kernel, VMM, VFS, ABI, scheduler, and so forth—all written in C#, compiled to CIL, and totally independent of the hardware architecture underneath—and then throw away the loader and bring up init. The kernel would need to use an abstract factory model to do common tasks which require hardware-specific code, and the CLR would need to load an assembly with a concrete factory for the particular hardware and a few built-in native assembly calls; but it's doable.

This is all theoretical at the moment, and I want to explore the particular edges a little to see what I find out. _______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de mailto:l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

John

4:27 a.m.

Thanks! That's a lot clearer than the rambling muck I'd read before. This is all highly-technical stuff and it does require some understanding of computers and operating systems; but it's so far on the fringe that many authors fail to consider an audience that doesn't already know what they're talking about, as if everyone is a kernel programmer who has simply never stepped back to get the big picture.

Wow, this document is really clear. It also makes it clear that L4 is still evolving, which suddenly makes things make a lot of sense.

On Thu, Sep 13, 2018 at 9:48 PM Gernot Heiser gernot@cse.unsw.edu.au wrote:

...

Hi John,

In terms of what makes L4, the most comprehensive recent document is this: https://ts.data61.csiro.au/publications/nictaabstracts/Heiser_Elphinstone_16...

Gernot

On 14 Sep 2018, at 10:55, John john.r.moser@gmail.com wrote:

Hi all.

I've been looking at Google and Wikipedia for information about L4, and it's fairly thin. I've come up with that L4 has read and write registers and performs IPC by fixed-length registers, with all actions synchronous (I guess that means you send a message and block until the target thread reads the message).

Is there a more concrete document anywhere about implementing an L4 microkernel and what makes it L4?

I'm surveying the field at the moment and looking at all the advances made in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a next-generation operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS file page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

In other words: it's readily-achievable to write a CLR in C#, get it running on itself before the kernel comes up, then load a kernel, VMM, VFS, ABI, scheduler, and so forth—all written in C#, compiled to CIL, and totally independent of the hardware architecture underneath—and then throw away the loader and bring up init. The kernel would need to use an abstract factory model to do common tasks which require hardware-specific code, and the CLR would need to load an assembly with a concrete factory for the particular hardware and a few built-in native assembly calls; but it's doable.

This is all theoretical at the moment, and I want to explore the particular edges a little to see what I find out. _______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

John

4:34 a.m.

Also, perhaps I should look into seL4; although I'm not sure I can leverage it. Long-term, I want to get .NET Core running on something akin to L4 implementing Minix (reliability features) and Linux (userland ABI), with a capability-based security model. That's years out, and short-term I'm probably just using Alpine Linux as a base for things like electronic voting machines (everybody is doing elections integrity wrong; I started with an integrity model and built around that).

On Thu, Sep 13, 2018 at 10:27 PM John john.r.moser@gmail.com wrote:

...

Thanks! That's a lot clearer than the rambling muck I'd read before. This is all highly-technical stuff and it does require some understanding of computers and operating systems; but it's so far on the fringe that many authors fail to consider an audience that doesn't already know what they're talking about, as if everyone is a kernel programmer who has simply never stepped back to get the big picture.

Wow, this document is really clear. It also makes it clear that L4 is still evolving, which suddenly makes things make a lot of sense.

On Thu, Sep 13, 2018 at 9:48 PM Gernot Heiser gernot@cse.unsw.edu.au wrote:

...
Hi John,

In terms of what makes L4, the most comprehensive recent document is this: https://ts.data61.csiro.au/publications/nictaabstracts/Heiser_Elphinstone_16...

Gernot

On 14 Sep 2018, at 10:55, John john.r.moser@gmail.com wrote:

Hi all.

I've been looking at Google and Wikipedia for information about L4, and it's fairly thin. I've come up with that L4 has read and write registers and performs IPC by fixed-length registers, with all actions synchronous (I guess that means you send a message and block until the target thread reads the message).

Is there a more concrete document anywhere about implementing an L4 microkernel and what makes it L4?

I'm surveying the field at the moment and looking at all the advances made in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a next-generation operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS file page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

In other words: it's readily-achievable to write a CLR in C#, get it running on itself before the kernel comes up, then load a kernel, VMM, VFS, ABI, scheduler, and so forth—all written in C#, compiled to CIL, and totally independent of the hardware architecture underneath—and then throw away the loader and bring up init. The kernel would need to use an abstract factory model to do common tasks which require hardware-specific code, and the CLR would need to load an assembly with a concrete factory for the particular hardware and a few built-in native assembly calls; but it's doable.

This is all theoretical at the moment, and I want to explore the particular edges a little to see what I find out. _______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

4:32 a.m.

On 9/13/18, John john.r.moser@gmail.com wrote:

...

I'm surveying the field at the moment and looking at all the advances made in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a next-generation operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS file page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

It seems like things might be moving away from managed code somewhat. Safer native code languages like Rust have been starting to become more popular in recent years. I think safer native code is a better approach than Java/.NET-style manged code, since there's no performance penalty and the runtime is just a library rather than a more complex VM (which is often written in an unsafe native code language, leaving more attack surface than a system in which everything security-critical is written in a safer language). I'm taking the safer native code approach in the OS that I'm writing (a Rust-based next-generation Unix-like OS that will somewhat resemble QNX and Plan 9; https://gitlab.com/uxrt).

John

4:49 a.m.

On Thu, Sep 13, 2018 at 10:34 PM Andrew Warkentin andreww591@gmail.com wrote:

...

On 9/13/18, John john.r.moser@gmail.com wrote:

...
I'm surveying the field at the moment and looking at all the advances

made

...
in computer software—security, managed language runtimes, hypervisors, real-time OS—and trying to project the possibilities for a

next-generation

...
operating system. There's enough divergence that a rewrite from scratch might make sense; and besides, I've drawn up a method for getting a self-hosting CLR running with only anonymous memory allocation and VFS

file

...
page mapping, along with a method of getting the CLR to self-host and getting it up and running before an actual OS kernel.

It seems like things might be moving away from managed code somewhat. Safer native code languages like Rust have been starting to become more popular in recent years. I think safer native code is a better approach than Java/.NET-style manged code, since there's no performance penalty and the runtime is just a library rather than a more complex VM (which is often written in an unsafe native code language, leaving more attack surface than a system in which everything security-critical is written in a safer language). I'm

Eh, I happen to like C# as a language.

Also, there's not necessarily a performance penalty in CLR because it's JIT to native code. The CLR isn't actually interpreting bytecode as it goes; rather it's compiling the program, storing that compiled program in memory, and entering the compiled native code. The outcome is the same as compiling a language to a native machine language.

I designed a way for the CLR to self-host: it'd be written in C#, and running on itself. This is still theoretical, of course.

At its heart, the CLR is just converting bytecode to machine code. The thing people call a "VM" is basically a compiler and a garbage collector.

...

taking the safer native code approach in the OS that I'm writing (a Rust-based next-generation Unix-like OS that will somewhat resemble QNX and Plan 9; https://gitlab.com/uxrt).

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

7 a.m.

(accidentally replied privately instead of to the list; I really don't get why "reply to list" isn't usually a thing; it should be the default when replying to a message from a list) On 9/13/18, John john.r.moser@gmail.com wrote:

...

Also, there's not necessarily a performance penalty in CLR because it's JIT to native code. The CLR isn't actually interpreting bytecode as it goes; rather it's compiling the program, storing that compiled program in memory, and entering the compiled native code. The outcome is the same as compiling a language to a native machine language.

It's certainly closer to native code than a pure interpreter in performance and may be comparable in some situations, but I was pretty sure that the overall performance was still somewhat weaker, especially when you're dealing with real-time code.

...

I designed a way for the CLR to self-host: it'd be written in C#, and running on itself. This is still theoretical, of course.

I'd think you'd have to statically compile it to native code first.

To me, language-based OSes have always seemed interesting but a bit impractical and limiting. A language-based OS would face more of a challenge when it comes to success in the real world than a conventional OS would (that's one reason why I'm writing an advanced Unix-like OS, since that is what is most likely to be used in the real world).

Gernot Heiser

7:15 a.m.

On 14 Sep 2018, at 15:00, Andrew Warkentin andreww591@gmail.com wrote:

...

To me, language-based OSes have always seemed interesting but a bit impractical and limiting. A language-based OS would face more of a challenge when it comes to success in the real world than a conventional OS would (that's one reason why I'm writing an advanced Unix-like OS, since that is what is most likely to be used in the real world).

I think when you say “language-based OS” you mean type-system enforced security rather than MMU-enforced security.

I haven’t seen any approach of this sort that didn’t end up with an unsafe but trusted blob of code (language run-time) that’s as big if not bigger than your typical L4 kernel, besides having higher run-time overheads. I fail to see how this would be better in any way. Particularly when you can have your L4 kernel formally verified (i.e. seL4).

Gernot

Andrew Warkentin

8:18 a.m.

On Thu, Sep 13, 2018, 11:15 PM Gernot Heiser gernot@cse.unsw.edu.au wrote:

...

I think when you say “language-based OS” you mean type-system enforced security rather than MMU-enforced security

Yes, basically. I'm talking about OSes that integrate some form of language-specific VM at a fundamental level, often depending on the VM for security.

...

I haven’t seen any approach of this sort that didn’t end up with an unsafe but trusted blob of code (language run-time) that’s as big if not bigger than your typical L4 kernel, besides having higher run-time overheads. I fail to see how this would be better in any way. Particularly when you can have your L4 kernel formally verified (i.e. seL4).

Yes, that's basically the same thing I was saying. Managed code has higher overhead and a bigger TCB (which often contains a lot of unsafe code) than safer native code (either formally verified or just written in a safer language like Rust) does.

Vasily A. Sartakov

8:41 a.m.

Hello

...

Yes, basically. I'm talking about OSes that integrate some form of language-specific VM at a fundamental level, often depending on the VM for security.

Theseus OS [1], Microsoft Singularity [2] ?

1. http://www.ruf.rice.edu/~mobile/publications/boos2017plos.pdf 2. https://www.microsoft.com/en-us/research/project/singularity/

Gernot Heiser

9:58 a.m.

On 14 Sep 2018, at 16:41, Vasily A. Sartakov sartakov@ksyslabs.org wrote:

...

Theseus OS [1], Microsoft Singularity [2] ?

http://www.ruf.rice.edu/~mobile/publications/boos2017plos.pdf http://www.ruf.rice.edu/~mobile/publications/boos2017plos.pdf

https://www.microsoft.com/en-us/research/project/singularity/ https://www.microsoft.com/en-us/research/project/singularity/

If you’re interested in a more detailed comparison between L4 and Singularity, here’s a blog (fairly dated, but so is Singularity by now): Part 1: https://microkerneldude.wordpress.com/2008/03/13/q-what-is-the-difference-be... https://microkerneldude.wordpress.com/2008/03/13/q-what-is-the-difference-between-a-microkernel/ Part 2: https://microkerneldude.wordpress.com/2008/04/10/about-security-singularity-... https://microkerneldude.wordpress.com/2008/04/10/about-security-singularity-vs-l4-part-2/

Gernot

Gábor Wacha

9 a.m.

Zenaan Harkness

9:38 a.m.

On Fri, Sep 14, 2018 at 09:00:25AM +0200, Gábor Wacha wrote:

...

Hello, I do not really want to start a flamewar about JIT and AOT, but AFAIK using a JIT can introduce nondeterministic behavior (in runtime, performance), which is not really wanted in a kernel. Am I correct? Gabor Wacha

Mainly JIT is associated also with runtime garbage collection or GC.

JIT can even happen ahead of time just like a normal compiler, but not with profiling (which must necessarily run at run time or with test/ sample data), but when JIT happens at runtime, later execution of a JITed code path will be quicker than the first execution(s) of that same code path, depending on how the JIT is configured to run (either first time always, or only after a path is taken say 10 times etc).

John

1:44 p.m.

On Fri, Sep 14, 2018 at 3:42 AM Zenaan Harkness zen@freedbms.net wrote:

...

On Fri, Sep 14, 2018 at 09:00:25AM +0200, Gábor Wacha wrote:

...
Hello, I do not really want to start a flamewar about JIT and AOT, but AFAIK using a JIT can introduce nondeterministic behavior (in runtime, performance), which is not really wanted in a kernel. Am I correct? Gabor Wacha

Mainly JIT is associated also with runtime garbage collection or GC.

Which is actually interesting in an OS-level JIT because you can do fun things like garbage collect based around LRU pages, use memory protections to track access, and so forth.

On one hand, it's possible for a compacting GC to allocate additional physical RAM and compact in that, marking pages as read-only in the target program (service) before starting to rewrite them. If the service accesses a page involved in GC to write, then the service can fault to the GC service, which will make note and correct the protection on the page (relying on fast IPC and a fast page fault handler for this path). Meanwhile, the GC is actually writing the rewritten pages to newly-allocated physical RAM.

In the end, the GC settles at a consistent state (potentially partially-collected) at a time when each of the modified pages is read-only in the target service. It can then identify strongly-connected components to find the fewest pages which, when replaced, result in a consistent state for the service. For each such set, it first moves through to remove all permissions on each page (unless it's faster or the same to handle a page fault to an unmapped page), which is still interruptable by handling the page fault on access and putting permissions back; and then makes a second pass to replace all of those pages, handling page faults in this phase by replacing the particular page in question.

This form of garbage collection has the same impact on the kernel service being collected as swapping its pages out to RAM (that is: unmapping the pages from the process, but not doing anything else with them except mapping them back in upon access).

You can stick to LRU pages for active processes, and even garbage collect any pages going in and out of swap if you're doing that with kernel services. Really, you're going to need to GC active pages at some point, and can target processes which are sleeping.

...

JIT can even happen ahead of time just like a normal compiler, but not with profiling (which must necessarily run at run time or with test/ sample data), but when JIT happens at runtime, later execution of a JITed code path will be quicker than the first execution(s) of that same code path, depending on how the JIT is configured to run (either first time always, or only after a path is taken say 10 times etc).

With the same considerations as above: no need to stop the world for the whole process; you rewrite a shadow copy of the pages, then turn them over. At any stage, you're one page fault away from simply executing in a consistent state, and you can fight the uphill battle of getting all relevant pages unmapped at once as long as you want.

These JIT and GC strategies give full priority to performance: the service being altered keeps running, and in the worst case hits an unmapped page and triggers a page fault. That page fault is satisfied immediately by mapping the correct page and continuing execution. In theory, it's fastest to have the GC itself handle that mapping; in practice, you'd likely have VM notify GC of the fault (synchronously) and then take action, which makes this not the fastest pagefault. There may be other ways to achieve a minimum-performance-impact page fault in this scenario.

Obviously, idle CPU becomes GC and JIT CPU, so this all eats more electricity.

_______________________________________________

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

John

1:16 p.m.

On Fri, Sep 14, 2018 at 1:00 AM Andrew Warkentin andreww591@gmail.com wrote:

...

(accidentally replied privately instead of to the list; I really don't get why "reply to list" isn't usually a thing; it should be the default when replying to a message from a list) On 9/13/18, John john.r.moser@gmail.com wrote:

...
Also, there's not necessarily a performance penalty in CLR because it's JIT to native code. The CLR isn't actually interpreting bytecode as it goes; rather it's compiling the program, storing that compiled program in memory, and entering the compiled native code. The outcome is the same as compiling a language to a native machine language.

It's certainly closer to native code than a pure interpreter in performance and may be comparable in some situations, but I was pretty sure that the overall performance was still somewhat weaker, especially when you're dealing with real-time code.

It's a compiler, the same as gcc. In some situations, equivalent C# and C++ code have compared with the C# program being faster in business logic sections.

...

...
I designed a way for the CLR to self-host: it'd be written in C#, and running on itself. This is still theoretical, of course.

I'd think you'd have to statically compile it to native code first.

You'd have to perform the translation ahead of time and then store the CIL, native, and instrumentation data in an image. When you load the image, it's the same as if a CLR had loaded the original CIL and gotten it JIT'd; except the CLR is the JIT result, and the CIL is the CLR. Runs on itself.

...

To me, language-based OSes have always seemed interesting but a bit impractical and limiting. A language-based OS would face more of a challenge when it comes to success in the real world than a conventional OS would (that's one reason why I'm writing an advanced Unix-like OS, since that is what is most likely to be used in the real world).

The plan is to run a Linux userspace by providing the Linux-specific ABI.

Andrew Warkentin

1:58 p.m.

On 9/14/18, John john.r.moser@gmail.com wrote:

...

It's a compiler, the same as gcc. In some situations, equivalent C# and C++ code have compared with the C# program being faster in business logic sections.

Is that on highly CPU-bound code, or just I/O-bound code?

...

...
...
I designed a way for the CLR to self-host: it'd be written in C#, and running on itself. This is still theoretical, of course.

I'd think you'd have to statically compile it to native code first.

You'd have to perform the translation ahead of time and then store the CIL, native, and instrumentation data in an image. When you load the image, it's the same as if a CLR had loaded the original CIL and gotten it JIT'd; except the CLR is the JIT result, and the CIL is the CLR. Runs on itself.

Yes, I guess that would work, but you still have the problem of a rather complex language runtime being part of the TCB, which you don't get with an OS based on safer native code.

...

...
To me, language-based OSes have always seemed interesting but a bit impractical and limiting. A language-based OS would face more of a challenge when it comes to success in the real world than a conventional OS would (that's one reason why I'm writing an advanced Unix-like OS, since that is what is most likely to be used in the real world).

The plan is to run a Linux userspace by providing the Linux-specific ABI.

Developers probably aren't going to port to (or completely rewrite for in the case of non-.NET languages) some obscure alternative OS. They'll just say "oh, it runs Linux applications, so our Linux port is good enough", and users will probably pass it over because there will be few native applications, and Linux applications will be second-class citizens (unless you actually manage to integrate Linux applications and allow them to take advantage of your OS's advanced features, but that sounds like it might be tricky in a non-Unix-like language-based OS).

An advanced Unix-like OS would have a much easier time integrating Linux applications (even in its native environment, UX/RT will implement Linux-specific APIs where it makes sense to do so, and its Linux binary compatibility layer will be more or less transparent, allowing Linux programs to integrate reasonably well with the rest of the OS and take advantage of many of its advanced features).

However, if you're writing your OS for some reason other than practical real-world usage, then I guess compatibility doesn't matter as much.

John

2:27 p.m.

On Fri, Sep 14, 2018 at 7:58 AM Andrew Warkentin andreww591@gmail.com wrote:

...

On 9/14/18, John john.r.moser@gmail.com wrote:

...
It's a compiler, the same as gcc. In some situations, equivalent C# and C++ code have compared with the C# program being faster in business logic sections.

Is that on highly CPU-bound code, or just I/O-bound code?

CPU-bound code. Things like flat array loops. When you get into implementing stuff on List<T> and the like, there's a lot of overhead because of how C# implements its linked lists; although you can say the same for arrays versus Std::List in C++. Std::List is apparently faster than List<T>.

The usual analysis is that you can't make the comparison because some code will be faster and some slower; this is the same as Objective-C vs. C++, or Rust vs. C++. C# has a garbage collector, which adds another system that may stop the world and otherwise create interruptions; although there are ways around that.

...

...
...
...
I designed a way for the CLR to self-host: it'd be written in C#, and running on itself. This is still theoretical, of course.

I'd think you'd have to statically compile it to native code first.

You'd have to perform the translation ahead of time and then store the

CIL,

...
native, and instrumentation data in an image. When you load the image, it's the same as if a CLR had loaded the original CIL and gotten it

JIT'd;

...
except the CLR is the JIT result, and the CIL is the CLR. Runs on

itself.

...
Yes, I guess that would work, but you still have the problem of a rather complex language runtime being part of the TCB, which you don't get with an OS based on safer native code.

Fair enough. You're overestimating that particular problem, though: besides that the Kernel-CLR itself would be fairly small (the .NET runtime is mostly huge standard libraries), it only runs against microkernel services. If you can load a microkernel service, you have kernel-level access and don't need to exploit the CLR.

Basically, the CLR is equivalent to Linux's loadable kernel module loader.

...

...
...
To me, language-based OSes have always seemed interesting but a bit impractical and limiting. A language-based OS would face more of a challenge when it comes to success in the real world than a conventional OS would (that's one reason why I'm writing an advanced Unix-like OS, since that is what is most likely to be used in the real world).

The plan is to run a Linux userspace by providing the Linux-specific ABI.

Developers probably aren't going to port to (or completely rewrite for in the case of non-.NET languages) some obscure alternative OS. They'll just say "oh, it runs Linux applications, so our Linux port is good enough", and users will probably pass it over because there will be few native applications, and Linux applications will be second-class citizens (unless you actually manage to integrate Linux applications and allow them to take advantage of your OS's advanced features, but that sounds like it might be tricky in a non-Unix-like language-based OS).

Potentially. Things like systemd, udev, iptables, and direct rendering should work out of the box. Many advanced subsystems, such as KVM-flipping hypervisors (you push a key, the monitor and keyboard are now natively another OS domain) and virtual file systems (you can mount device:/mnt/foo at /mnt/bar on several systems, and the kernel handles a call to open /mnt/bar/qux by passing it through: multiple OS Domains can natively use the same file system in read-write mode) are transparent to applications. The Local Security Authentication Service—an abstraction which allows an OS to provide or use a particular security domain as a source of users and group, such that one LDAP connection or /etc/passwd source is accessed through the device on multiple OS domains—requires a PAM module. KVM would use the native HVM (non-paravirt).

Things get advanced when you start playing with capabilities-based security or using OS domains to strongly sandbox. The Control OS Domain (like Xen Dom0) could pass to an OS domain the Capability to create paravirtualized domains. An Ubuntu Linux system running on this and specifically-aware of the capability could then not only provide a PAM module for LSAS, but also create an OS Domain with reduced Capabilities sharing the same Network interface (it can listen on the same adapter, MAC, and IP) with restricted ports. That OS Domain isn't run under the Ubuntu domain, but rather directly next to it; the Ubuntu domain simply has a capability allowing it to access that domain, and the init and login systems strip that capability from child processes except for the management application, etc. The new OS Domain has a read-only mount of Ubuntu's own root, with its own /tmp, and writable areas for Samba's dynamic data files in /var/lib/samba and whatnot: it provides Active Directory authentication.

To do all that, you need to be aware of things like LSAS, Capability-based security, and the capability to create paravirtualized domains. You could, instead, just use the cgroups interface and run it all under Docker.

...

An advanced Unix-like OS would have a much easier time integrating Linux applications (even in its native environment, UX/RT will implement Linux-specific APIs where it makes sense to do so, and its Linux binary compatibility layer will be more or less transparent, allowing Linux programs to integrate reasonably well with the rest of the OS and take advantage of many of its advanced features).

That's the idea: if we implement Minix's resurrection service architecture, L4's high-performance architecture, and Linux's complete ABI, we can run Linux userspace as an L4/Minix natively. Any special things like advanced schedulers, ZFS, paravirt, and the like are either magic under the hood or things that are available to binaries targeting the L4Mx ABI or using exposed devices.

However, if you're writing your OS for some reason other than

...

practical real-world usage, then I guess compatibility doesn't matter as much.

Eh, right now I'm writing up an outline of what can be done based on what has been done in the real world and what is accomplished by combining those things in interesting new ways. If one of my business ventures takes off, I'll set cash aside to fund a research group to actually implement it, which will include releasing it as yet another Linux distribution.

At the moment, this is just feeding my addiction for all kinds of knowledge. I've burned out a few times from putting too much stuff in my brain at once.

_______________________________________________

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Paul Boddie

3:47 p.m.

On Friday 14. September 2018 05.58.09 Andrew Warkentin wrote:

...

Yes, I guess that would work, but you still have the problem of a rather complex language runtime being part of the TCB, which you don't get with an OS based on safer native code.

I am very fond of dynamic languages which usually their own runtimes, but I would also be somewhat wary of deploying some of these runtimes in any kind of privileged role. (I don't know if that is what is being suggested, not having digested every word of this discussion - sorry!)

As I recall, there are just-in-time compilers being used in the Linux kernel, maybe in the network firewall/rules functionality, and it is possible that this has already caused security problems. Of course, there are so many sources of such problems that it might be mean-spirited to single out just one of them.

...

Developers probably aren't going to port to (or completely rewrite for in the case of non-.NET languages) some obscure alternative OS.

I see that nobody (prominently) mentioned JX whose documentation was quite approachable, I thought:

https://en.wikipedia.org/wiki/JX_(operating_system)

But one of the lessons from the phenomenon of Java is that people don't like to be told to rewrite all their stuff in a particular language, and although you can try and make the runtime language-neutral, some languages end up being more equal than others.

...

They'll just say "oh, it runs Linux applications, so our Linux port is good enough", and users will probably pass it over because there will be few native applications, and Linux applications will be second-class citizens (unless you actually manage to integrate Linux applications and allow them to take advantage of your OS's advanced features, but that sounds like it might be tricky in a non-Unix-like language-based OS).

It is interesting to consider Nemesis, which evolved into Xen, in this regard:

https://en.wikipedia.org/wiki/Nemesis_(operating_system)

One of the goals was to support POSIX applications, and it was apparently a usable system in its day, maybe still is.

...

An advanced Unix-like OS would have a much easier time integrating Linux applications (even in its native environment, UX/RT will implement Linux-specific APIs where it makes sense to do so, and its Linux binary compatibility layer will be more or less transparent, allowing Linux programs to integrate reasonably well with the rest of the OS and take advantage of many of its advanced features).

It is also interesting to see the UX/RT label given the heritage of Unix systems:

https://en.wikipedia.org/wiki/Multi-Environment_Real-Time

In the L4 microkernel-related materials I have seen, nobody seems to mention this work or even the foundational influences on it. Indeed, one set of slides calls the Mach microkernel "The mother of all microkernels", later to discuss IBM Workplace OS and to note that "OS personalities did not work". Presumably such remarks are made in that specific context given that DMERT apparently supported multiple personalities (Unix-RTR and RSX-11).

...

However, if you're writing your OS for some reason other than practical real-world usage, then I guess compatibility doesn't matter as much.

Who is to say that an incompatible system does not have "practical real-world usage", though? I agree that compatibility would be helpful, and there are plenty of systems that exist only through the perseverence of their creators, only to fade away at some point, partly due to a lack of interest and difficulties integrating mainstream software.

Personally, I am also interested in what it would take to develop a vaguely standard multiserver system, albeit leveraging L4 technologies. A while back, people tried to get the Hurd working on L4 (Pistachio, I think), and I think there is potential to get something like that going on Fiasco.OC, particularly since the lack of capability support was supposedly the reason why the Hurd-L4 work was abandoned.

On that note, since documentation was discussed at the start of this thread, I should note that for Fiasco.OC, things like the register usage employed in IPC appears different from certain specifications. So, for example, the L4 X.2 (revision 6) and V2/MIPS specifications differ from what Fiasco.OC uses, defined here in the sources:

pkg/l4re-core/l4sys/include/ARCH-mips/L4API-l4f/ipc.h

Obviously, I'm only focusing on MIPS for various reasons. Maybe there is similar documentation describing these interfaces and a particular specification that Fiasco.OC implements, but I didn't find it.

Paul

John

5:15 p.m.

On Fri, Sep 14, 2018 at 9:51 AM Paul Boddie paul@boddie.org.uk wrote:

...

On Friday 14. September 2018 05.58.09 Andrew Warkentin wrote:

...
Yes, I guess that would work, but you still have the problem of a rather complex language runtime being part of the TCB, which you don't get with an OS based on safer native code.

I am very fond of dynamic languages which usually their own runtimes, but I would also be somewhat wary of deploying some of these runtimes in any kind of privileged role. (I don't know if that is what is being suggested, not having digested every word of this discussion - sorry!)

My scratch paper is this https://drive.google.com/open?id=1enJbj6oeeYYIuUieRlockrnUFbCowwVPx5n2MDMW56...

The Kernel-CLR runtime is basically a fancy privileged service loader, and doesn't run userspace applications. Basically, if you can load a driver, you can get Kernel-CLR to process arbitrary input.

The most-simple CLR would be fairly small: it only needs to read in the instructions, map them to native instructions, and spit out an unoptimized blob of memory. You could, conceptually, provide things like optimization and the garbage collector as separate services; even then, those might be fairly small. Being strict about what you interpret and produce (input validation) helps; and the CLR itself would be C# and managed by itself (yep) so easier to verify.

...

As I recall, there are just-in-time compilers being used in the Linux kernel, maybe in the network firewall/rules functionality, and it is possible that this has already caused security problems. Of course, there are so many sources of such problems that it might be mean-spirited to single out just one of them.

There are. I'd actually be looking at implementing iptables with undelying high-performance approaches such as with nf-HiPAC. https://www.hipac.org/

...

...
Developers probably aren't going to port to (or completely rewrite for in the case of non-.NET languages) some obscure alternative OS.

I see that nobody (prominently) mentioned JX whose documentation was quite approachable, I thought:

https://en.wikipedia.org/wiki/JX_(operating_system)

But one of the lessons from the phenomenon of Java is that people don't like to be told to rewrite all their stuff in a particular language, and although you can try and make the runtime language-neutral, some languages end up being more equal than others.

Yeah. L4Mx would need to port in Minix and BSD drivers (I use the MIT License), and there would be a lot of rewriting hardware drives into C#.

I've long considered the prospect of reverse paravirtualization, whereby you put a paravirtualizing driver inside an HVM OS and pass the OS direct IO mapping to a hardware device. The PV driver then lets the hypervisor send it requests, and it arbitrates with the local operating system. In other words: the hypervisor is paravirtualized on the guest OS (reverse paravirtualization). This would enable using Linux as a mega-driver, while L4Mx/Linux running a Linux kernel paravirtualized could do the opposite.

...

...
However, if you're writing your OS for some reason other than practical real-world usage, then I guess compatibility doesn't matter as much.

Who is to say that an incompatible system does not have "practical real-world usage", though? I agree that compatibility would be helpful, and there are plenty of systems that exist only through the perseverence of their creators, only to fade away at some point, partly due to a lack of interest and difficulties integrating mainstream software.

Linux extended POSIX. Extending what is now the Linux ABI seems an excellent way to dive right in.

...

Personally, I am also interested in what it would take to develop a vaguely standard multiserver system, albeit leveraging L4 technologies. A while back, people tried to get the Hurd working on L4 (Pistachio, I think), and I think there is potential to get something like that going on Fiasco.OC, particularly since the lack of capability support was supposedly the reason why the Hurd-L4 work was abandoned.

I came up with the Kernel-CLR approach to abstract all platform-specific implementations of generic kernel functions into interchangeable objects. That way you run the same code but with a different Type implementing the same interface and all the right stuff happens.

By building a Loader specifically for the target platform, creating an assembly that implements those lowest-level functions correctly for the platform, and providing an AOT of the Kernel-CLR translator in the machine's native ISA, you can boot the same kernel (no recompile) and load the same services (no recompile here, either). Each binary runs on all platforms.

If you have a Type that sends messages and does the right thing under the hood, you can change how the underlying message passing architecture physically works (so long as you have the same semantics): what would normally be an ABI change isn't even an API change. Conceptually, a service could run on completely-different microkernels with the same API if they're written using Kernel-CLR—and even Kernel-CLR and the Kernel itself use an assembly providing those classes implementing the underlying details of IPC and the native ISA.

Imagine releasing one driver that runs on Windows, Linux, MacOSX, and HURD, and on any of these on x86, x86-64, IA64, ARM, ARM64, MIPS, SPARC, SPARC64, and RISC-V. That would be impossible today; but with L4Mx and any Kernel-CLR OS implementing L4Mx semantics, those any-OS-to-any-Architecture mappings are implied. Porting L4/Linux to L4Mx/Linux (by paravirtualizing Linux and providing an adapter to communicate between the two architectures) would allow Linux to take advantage of L4Mx drivers as well.

Of course, you'd need to recompile all userspace software to run on a new CPU architecture. A native x86-64 program isn't going to run on RISC-V. Yes, of course, if you compile .NET Core and CoreFX Linux binaries for RISC-V, you can load up .NET Core applications without a recompile; that's downstream from the Kernel, and not everything is Python or Java or .NET.

You can get pretty close to "we rewrote a few thousand lines of code and now the entire operating system and all software run on a totally-new ISA and tuned as best an optimizing compiler can accomplish to the specifics and enabled features of that platform." I'm looking at that in a kernel.

Paul

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

15 Sep 15 Sep

3:04 a.m.

On 9/14/18, Paul Boddie paul@boddie.org.uk wrote:

...

But one of the lessons from the phenomenon of Java is that people don't like

to be told to rewrite all their stuff in a particular language, and although

you can try and make the runtime language-neutral, some languages end up being more equal than others.

Yeah, that would be the biggest thing that limits the appeal of language-based OSes.

...

It is interesting to consider Nemesis, which evolved into Xen, in this regard:

https://en.wikipedia.org/wiki/Nemesis_(operating_system)

One of the goals was to support POSIX applications, and it was apparently a

usable system in its day, maybe still is.

I knew Nemesis and Xen were from the same group at Cambridge but I didn't know that there was much of a connection other than both being somewhat exokernel-ish.

...

It is also interesting to see the UX/RT label given the heritage of Unix systems:

https://en.wikipedia.org/wiki/Multi-Environment_Real-Time

I'm well aware of MERT/Unix-RTR, although UX/RT won't really be similar beyond being a microkernel-based Unix-like RTOS. It will have a general structure like that of QNX and a VFS model similar to that of Plan 9 (the name is kind of an indirect reference to QNX's name originally being derived from "Quick uNiX").

...

In the L4 microkernel-related materials I have seen, nobody seems to mention

this work or even the foundational influences on it. Indeed, one set of slides calls the Mach microkernel "The mother of all microkernels", later to discuss IBM Workplace OS and to note that "OS personalities did not work". Presumably such remarks are made in that specific context given that DMERT apparently supported multiple personalities (Unix-RTR and RSX-11).

Yeah, a lot of people seem to think that Mach was the first microkernel, but nothing could be further from the truth. QNX predates Mach by several years, and it has always been somewhat similar to L3/L4. I think Mach was the single worst thing to happen to microkernels. It completely destroyed the reputation of microkernels with its slow, heavyweight IPC. I think it may have hurt the reputation of OS research in general as well, creating more of a divide between research and mainstream OSes. I wonder if microkernels would be widespread on an alternate timeline where QNX had somehow had significant influence on OS research as soon as it came out (Mach may not have been written, or may have been QNX-like instead).

Another thing that has hurt performance on microkernel OSes that I haven't heard discussed is component-level versus subsystem-level modularization. It seems like most microkernel OSes go for component-level modularization everywhere (e.g. separate processes for disk device driver and disk filesystem, or process manager and VFS). The problem with component-level modularization is that it multiplies the number of context switches for a system call in many cases, and often provides little benefit to security or stability (e.g. both a disk device driver and disk filesystem are both dealing with the exact same data, just on different levels, and compromising either would basically be equivalent; similarly, an OS with restartable servers should allow recovery from either a disk device driver or disk filesystem crash, so splitting them up doesn't provide any benefit there). UX/RT will take a similar approach to QNX and will use subsystem-level modularization in many places (e.g. there will be a disk server incorporating both device drivers and filesystems, with support for multiple instances; similarly, the memory manager, VFS, and certain in-memory special filesystems will all be part of a single root server).

...

Who is to say that an incompatible system does not have "practical real-world usage", though? I agree that compatibility would be helpful, and there are plenty of systems that exist only through the perseverence of their creators, only to fade away at some point, partly due to a lack of interest and difficulties integrating mainstream software.

Yes, an incompatible OS may be able to achieve success in some niches, but even that will be made more difficult by the incompatibility.

...

Personally, I am also interested in what it would take to develop a vaguely

standard multiserver system, albeit leveraging L4 technologies. A while back, people tried to get the Hurd working on L4 (Pistachio, I think), and I think

there is potential to get something like that going on Fiasco.OC, particularly since the lack of capability support was supposedly the reason why the Hurd-L4 work was abandoned.

I've never really been a fan of Hurd. It gets its VFS model backwards by requiring you to bind servers to disk files. It causes major problems with determining the context that passive translators should be started in, and it also requires the disk filesystem to support storing the translator as file metadata.

UX/RT will use pure in-memory mounts like QNX and Plan 9, and the contexts in which demand-started servers are run will be explicitly specified by the init system configuration, rather than inheriting them from another server (no need for elaborate process persistence schemes to resolve the context problem like have been discussed for Hurd).

On 9/14/18, John john.r.moser@gmail.com wrote:

...

The Kernel-CLR runtime is basically a fancy privileged service loader, and doesn't run userspace applications. Basically, if you can load a driver, you can get Kernel-CLR to process arbitrary input.

Then you effectively have a monolithic kernel, not a microkernel, if you have a kernel module loader and drivers run in the kernel's context rather than as normal processes. The whole point of a microkernel is to make an OS that's extensible through normal processes. A kernel module loader greatly increases the attack surface, even if you are using language features to protect kernel modules from one another (as a few people here have said, hardware-based protection is generally more robust than language-based protection).

...

I've long considered the prospect of reverse paravirtualization, whereby you put a paravirtualizing driver inside an HVM OS and pass the OS direct IO mapping to a hardware device. The PV driver then lets the hypervisor send it requests, and it arbitrates with the local operating system. In other words: the hypervisor is paravirtualized on the guest OS (reverse paravirtualization). This would enable using Linux as a mega-driver, while L4Mx/Linux running a Linux kernel paravirtualized could do the opposite.

This is a little bit like what I was planning to do on UX/RT, although I was planning to use the LKL project https://github.com/lkl/linux, which turns the Linux kernel into a library (UX/RT will have good support for virtualization, but it won't really focus on internal full-system virtualization, and will focus on containers/overlays more as far as built-in virtualization goes; it will probably also eventually include a separate Type 1 VMM that will mostly be intended for desktops and servers rather than embedded systems).

John

3:22 a.m.

On Fri, Sep 14, 2018 at 9:05 PM Andrew Warkentin andreww591@gmail.com wrote:

...

On 9/14/18, Paul Boddie paul@boddie.org.uk wrote:

On 9/14/18, John john.r.moser@gmail.com wrote:

...
The Kernel-CLR runtime is basically a fancy privileged service loader,

and

...
doesn't run userspace applications. Basically, if you can load a driver, you can get Kernel-CLR to process arbitrary input.

Then you effectively have a monolithic kernel, not a microkernel, if you have a kernel module loader and drivers run in the kernel's context rather than as normal processes. The whole point of a microkernel is to make an OS that's extensible through normal processes. A kernel module loader greatly increases the attack surface, even if you are using language features to protect kernel modules from one another (as a few people here have said, hardware-based protection is generally more robust than language-based protection).

It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

3:35 a.m.

On Fri, Sep 14, 2018, 7:22 PM John john.r.moser@gmail.com wrote:

...

It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

No L4 kernel I'm aware of has any facility for loading drivers into the kernel. Drivers on L4 OSes are either regular processes that are allowed limited hardware access ore are libraries loaded into such processes. From what it sounds like, you are wanting to run all privileged services in the same address space and hardware privilege level, relying solely on the CLR to enforce protection domains, which would be less secure than a formally verified microkernel using hardware protection.

John

3:49 a.m.

On Fri, Sep 14, 2018 at 9:36 PM Andrew Warkentin andreww591@gmail.com wrote:

...

On Fri, Sep 14, 2018, 7:22 PM John john.r.moser@gmail.com wrote:

...
It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

No L4 kernel I'm aware of has any facility for loading drivers into the kernel. Drivers on L4 OSes are either regular processes that are allowed limited hardware access ore are libraries loaded into such processes. From what it sounds like, you are wanting to run all privileged services in the same address space and hardware privilege level, relying solely on the CLR to enforce protection domains, which would be less secure than a formally verified microkernel using hardware protection.

Those processes with limited hardware access are able to do funny thing.

The process that manages virtual memory, for example, can get into the memory space of any process running on the system. It crosses all security boundaries.

If you load a rogue VFS driver, it can take over all file system access, injecting code into software and crossing all security boundaries.

Your Ring-3 process scheduler isn't some user process like init or X11; it's an OS service running at a high privilege level, able to manipulate how the system runs.

A malicious ring-3 microkernel networking service can eaves drop and MITM everything going through networking. It's a packet sniffer, dumper, and network scanner running in a place with high amounts of control.

Yes, they have different virtual address spaces, they have Ring-3 execution level, and they function as part of the operating system software instead of the userland. They don't load through the POSIX ABI and make mundane calls; they PROVIDE the POSIX ABI.

So imagine if you loaded a malicious network card driver into L4. It's running Ring-3, it's passing IPC messages to the L4 kernel and to the TCP stack, it has its own memory space, and it's tampering with your connection and sending copies of bank data to a command and control server in Russia.

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

4:23 a.m.

On 9/14/18, John john.r.moser@gmail.com wrote:

...

Those processes with limited hardware access are able to do funny thing.

The process that manages virtual memory, for example, can get into the memory space of any process running on the system. It crosses all security boundaries.

Under UX/RT it will be impossible to load code into either the process server or the kernel (which will be the most security-critical components in the system), and both will be based on safer code (the kernel will be seL4, and the process server will be written in Rust), so the attack surface will be very small.

...

If you load a rogue VFS driver, it can take over all file system access, injecting code into software and crossing all security boundaries.

Not on UX/RT. Filesystem servers will not be able to control the entire VFS. They will only have control over filesystems they export. In fact, they will not be able to access the VFS at all as anything other than a client until another process mounts their filesystems (to act as a filesystem server, a process will request a "port" from a special filesystem built into the root server, and then some other process, usually the mount command, will send a request to the root server to mount the port on a directory, analogous to mounting a block device in a conventional Unix).

...

So imagine if you loaded a malicious network card driver into L4. It's running Ring-3, it's passing IPC messages to the L4 kernel and to the TCP stack, it has its own memory space, and it's tampering with your connection and sending copies of bank data to a command and control server in Russia.

Yes, but the malware can't make itself persistent unless it was installed some other way, since the network stack won't have privileges to install software. Under UX/RT, only the package manager or a shell session with the proper administrative role will be allowed to install software. The package manager will have a "locked open" verification scheme based on reproducible builds, where multiple mirrors controlled by different organizations in different countries will all build the same source from the same commit and sign the package if the binary matches, which will require all changes to be on the record, and replacing a verified package with a non-verified one will fail unless explicitly overridden.

John

4:33 a.m.

They're still attacks.

Curious: how do you implement MM filters, such as paging and compressed memory? What about file system filters such as unionfs?

There are a lot of interesting things you can do in modern OSes. Those capabilities also imply code running in dangerous places, like between the FS and the user, or on the virtual memory manager.

On Fri, Sep 14, 2018, 10:23 PM Andrew Warkentin andreww591@gmail.com wrote:

...

On 9/14/18, John john.r.moser@gmail.com wrote:

...
Those processes with limited hardware access are able to do funny thing.

The process that manages virtual memory, for example, can get into the memory space of any process running on the system. It crosses all

security

...
boundaries.

Under UX/RT it will be impossible to load code into either the process server or the kernel (which will be the most security-critical components in the system), and both will be based on safer code (the kernel will be seL4, and the process server will be written in Rust), so the attack surface will be very small.

...
If you load a rogue VFS driver, it can take over all file system access, injecting code into software and crossing all security boundaries.

Not on UX/RT. Filesystem servers will not be able to control the entire VFS. They will only have control over filesystems they export. In fact, they will not be able to access the VFS at all as anything other than a client until another process mounts their filesystems (to act as a filesystem server, a process will request a "port" from a special filesystem built into the root server, and then some other process, usually the mount command, will send a request to the root server to mount the port on a directory, analogous to mounting a block device in a conventional Unix).

...
So imagine if you loaded a malicious network card driver into L4. It's running Ring-3, it's passing IPC messages to the L4 kernel and to the TCP stack, it has its own memory space, and it's tampering with your

connection

...
and sending copies of bank data to a command and control server in

Russia.

...
Yes, but the malware can't make itself persistent unless it was installed some other way, since the network stack won't have privileges to install software. Under UX/RT, only the package manager or a shell session with the proper administrative role will be allowed to install software. The package manager will have a "locked open" verification scheme based on reproducible builds, where multiple mirrors controlled by different organizations in different countries will all build the same source from the same commit and sign the package if the binary matches, which will require all changes to be on the record, and replacing a verified package with a non-verified one will fail unless explicitly overridden.

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Andrew Warkentin

5:29 a.m.

On Fri, Sep 14, 2018, 8:33 PM John john.r.moser@gmail.com wrote:

...

They're still attacks.

Curious: how do you implement MM filters, such as paging and compressed memory? What about file system filters such as unionfs?

Paging and compressed memory will be implemented in the root server (for paging there will be hooks to allow it to call a disk server). File system filters will be implemented by a "firm link" or transclusion facility where one (sufficiently privileged) server can include another server's files in a directory it exports, much like the one that QNX has (servers will only need directory access to transclude files).

Gernot Heiser

3:37 a.m.

On 15 Sep 2018, at 11:22, John john.r.moser@gmail.com wrote:

...

On Fri, Sep 14, 2018 at 9:05 PM Andrew Warkentin <andreww591@gmail.com mailto:andreww591@gmail.com> wrote: On 9/14/18, Paul Boddie <paul@boddie.org.uk mailto:paul@boddie.org.uk> wrote:

On 9/14/18, John <john.r.moser@gmail.com mailto:john.r.moser@gmail.com> wrote:

...
The Kernel-CLR runtime is basically a fancy privileged service loader, and doesn't run userspace applications. Basically, if you can load a driver, you can get Kernel-CLR to process arbitrary input.

Then you effectively have a monolithic kernel, not a microkernel, if you have a kernel module loader and drivers run in the kernel's context rather than as normal processes. The whole point of a microkernel is to make an OS that's extensible through normal processes. A kernel module loader greatly increases the attack surface, even if you are using language features to protect kernel modules from one another (as a few people here have said, hardware-based protection is generally more robust than language-based protection).

It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

You. don’t. ever. Running all drivers as encapsulated usermode processes is one of the core features of L4.

Eg seL4 has exactly two drivers in the kernel: a timer driver (for preemption interrupts) and an interrupt controller. And IOMMU (if you consider that a device, I consider it part of memory management). There’s also a serial driver for console output, but only when the kernel is compiled for debugging. Certainly no NIC drivers, that would completely undermine the fundamental microkernel design.

Gernot

John

4 a.m.

On Fri, Sep 14, 2018 at 9:38 PM Gernot Heiser gernot@cse.unsw.edu.au wrote:

...

On 15 Sep 2018, at 11:22, John john.r.moser@gmail.com wrote:

On Fri, Sep 14, 2018 at 9:05 PM Andrew Warkentin andreww591@gmail.com wrote:

...
On 9/14/18, Paul Boddie paul@boddie.org.uk wrote:

On 9/14/18, John john.r.moser@gmail.com wrote:

...
The Kernel-CLR runtime is basically a fancy privileged service loader,

and

...
doesn't run userspace applications. Basically, if you can load a

driver,

...
you can get Kernel-CLR to process arbitrary input.

Then you effectively have a monolithic kernel, not a microkernel, if you have a kernel module loader and drivers run in the kernel's context rather than as normal processes. The whole point of a microkernel is to make an OS that's extensible through normal processes. A kernel module loader greatly increases the attack surface, even if you are using language features to protect kernel modules from one another (as a few people here have said, hardware-based protection is generally more robust than language-based protection).

It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

You. don’t. ever. Running all drivers as encapsulated usermode processes is one of the core features of L4.

Eg seL4 has exactly two drivers in the kernel: a timer driver (for preemption interrupts) and an interrupt controller. And IOMMU (if you consider that a device, I consider it part of memory management). There’s also a serial driver for console output, but only when the kernel is compiled for debugging. Certainly no NIC drivers, that would completely undermine the fundamental microkernel design.

The fact that you separated the drivers into different process contexts doesn't mean they're now some userland program. These are privileged parts of the operating system. You're thinking about being able to directly access and write to parts of the kernel; I'm thinking about being able to control security contexts, inject malicious code into applications as they load from disk, and otherwise use the service's context as an underlying operating system component to do bad things.

Unless all of your drivers load with the boot loader and nothing can be loaded when someone plugs in a new USB device, you have some sort of driver loader to load new services into L4. Those drivers run with their own VMA, in Ring-3, sure; and they could be drivers for things like disk control and file systems, for security contexts, and the lot, thus giving a way to replace parts of the underlying OS with malicious code.

Your fences are only around memory areas and privileged instruction calls. These are still hardware drivers and software schedulers supporting userspace applications.

Gernot

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

John

4:15 a.m.

On Fri, Sep 14, 2018 at 10:00 PM John john.r.moser@gmail.com wrote:

...

On Fri, Sep 14, 2018 at 9:38 PM Gernot Heiser gernot@cse.unsw.edu.au wrote:

...
On 15 Sep 2018, at 11:22, John john.r.moser@gmail.com wrote:

On Fri, Sep 14, 2018 at 9:05 PM Andrew Warkentin andreww591@gmail.com wrote:

...
On 9/14/18, Paul Boddie paul@boddie.org.uk wrote:

On 9/14/18, John john.r.moser@gmail.com wrote:

...
The Kernel-CLR runtime is basically a fancy privileged service loader,

and

...
doesn't run userspace applications. Basically, if you can load a

driver,

...
you can get Kernel-CLR to process arbitrary input.

Then you effectively have a monolithic kernel, not a microkernel, if you have a kernel module loader and drivers run in the kernel's context rather than as normal processes. The whole point of a microkernel is to make an OS that's extensible through normal processes. A kernel module loader greatly increases the attack surface, even if you are using language features to protect kernel modules from one another (as a few people here have said, hardware-based protection is generally more robust than language-based protection).

It doesn't have to run at Ring-0 you know. Think about if you loaded a malicious network card driver into L4.

You. don’t. ever. Running all drivers as encapsulated usermode processes is one of the core features of L4.

Eg seL4 has exactly two drivers in the kernel: a timer driver (for preemption interrupts) and an interrupt controller. And IOMMU (if you consider that a device, I consider it part of memory management). There’s also a serial driver for console output, but only when the kernel is compiled for debugging. Certainly no NIC drivers, that would completely undermine the fundamental microkernel design.

The fact that you separated the drivers into different process contexts doesn't mean they're now some userland program. These are privileged parts of the operating system. You're thinking about being able to directly access and write to parts of the kernel; I'm thinking about being able to control security contexts, inject malicious code into applications as they load from disk, and otherwise use the service's context as an underlying operating system component to do bad things.

Unless all of your drivers load with the boot loader and nothing can be loaded when someone plugs in a new USB device, you have some sort of driver loader to load new services into L4. Those drivers run with their own VMA, in Ring-3, sure; and they could be drivers for things like disk control and file systems, for security contexts, and the lot, thus giving a way to replace parts of the underlying OS with malicious code.

Your fences are only around memory areas and privileged instruction calls. These are still hardware drivers and software schedulers supporting userspace applications.

This might help clear things up.

https://imgur.com/9pPKZ9c

In that diagram, the proposed malicious NIC driver would be somewhere in the hardware layer. It might have its own VMA and run in Ring-3, but it's still a driver loaded into the OS. That's not a monolithic kernel unless Minix3 is a monolithic kernel.

...

Gernot

...

l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Gernot Heiser

4:44 a.m.

On 15 Sep 2018, at 12:00, John john.r.moser@gmail.com wrote:

...

The fact that you separated the drivers into different process contexts doesn't mean they're now some userland program.

It means exactly that.

...

These are privileged parts of the operating system.

Sorry, they are not. There seems to be a fundamental misunderstanding of microkernels.

A driver is a normal usermode program. Like any program, it has some privileges, authorised by caps. These include the rights to its own memory and some communication channels (like any process). In the case of a driver, it also includes rights to access device registers, and to receive interrupt notifications. That’s it.

The driver reads and writes device registers, receives and acknowledges interrupts, and communicates with other parts of the system by some system-defined protocols, which may be via IPC endpoints, notifications, shared memory, or DMA memory. It cannot do more damage than corrupting I/O buffers, sending garbage to the device, deny service. (Yes, there are broken devices where by controlling them you can cause other damage, such as re-flashing firmware, but that’s a hardware flaw against which all software is defenceless.)

...

You're thinking about being able to directly access and write to parts of the kernel; I'm thinking about being able to control security contexts, inject malicious code into applications as they load from disk, and otherwise use the service's context as an underlying operating system component to do bad things.

Yes, the driver can launch integrity, confidentiality and availability attacks against *the data that is read or written by the device*. But there are well-known means to protect against the first tow: sign and encrypt all data. The only thing threat that cannot easily be prevented is availability, but only as far as the specific device is concerned.

I can recommend the following paper on the threat scenarios that can and cannot be dealt with by a microkernel design: https://ts.data61.csiro.au/publications/csiroabstracts/Biggs_LH_18.abstract....

Gernot

Paul Boddie

17 Sep 17 Sep

3:54 p.m.

New subject: Moving code into applications (was Re: Information on implementing L4)

On Friday 14. September 2018 19.04.10 Andrew Warkentin wrote:

...

On 9/14/18, Paul Boddie paul@boddie.org.uk wrote:

...
It is interesting to consider Nemesis, which evolved into Xen, in this regard:

https://en.wikipedia.org/wiki/Nemesis_(operating_system)

One of the goals was to support POSIX applications, and it was apparently a usable system in its day, maybe still is.

I knew Nemesis and Xen were from the same group at Cambridge but I didn't know that there was much of a connection other than both being somewhat exokernel-ish.

Digging slightly deeper, my recollections may have been wrong: there may not be a strong technological connection between the two. But reading a little about Nemesis reminds me of certain things about L4Re:

"The guiding principle in the design of Nemesis was to structure the operating system in such a way that the majority of code could execute in the application process itself."

...

From what I've seen in L4Re in various places, there is a certain amount of

device driver code registering itself for incorporation by application programs. I feel that this is a bit awkward - code getting run at program start-up to register itself in fixed-length arrays enumerating the available drivers (many of which may be irrelevant to a given platform) - and it surely blends different responsibilities into the same process, potentially leaving applications with access to I/O memory regions.

What my own rather elementary work does is to separate drivers out into separate servers. Since many of them do not need to actively communicate with other components - they either initialise peripherals and do practically nothing or they share designated memory with other tasks - the only cost of employing multiple servers is the accumulation of duplicate or superfluous code contributed by each one of them in their standalone form, which is why I have been trying to use shared libraries as much as possible.

The virtual filesystem libraries also seem to promote a "client-side" approach to providing such functionality. Looking at different filesystem architectures is another of my current activities.

Paul

Andrew Warkentin

18 Sep 18 Sep

12:54 a.m.

New subject: Moving code into applications (was Re: Information on implementing L4)

On 9/17/18, Paul Boddie paul@boddie.org.uk wrote:

...

From what I've seen in L4Re in various places, there is a certain amount of

device driver code registering itself for incorporation by application programs. I feel that this is a bit awkward - code getting run at program start-up to register itself in fixed-length arrays enumerating the available

drivers (many of which may be irrelevant to a given platform) - and it surely blends different responsibilities into the same process, potentially leaving

applications with access to I/O memory regions.

What my own rather elementary work does is to separate drivers out into separate servers. Since many of them do not need to actively communicate with other components - they either initialise peripherals and do practically nothing or they share designated memory with other tasks - the only cost of

employing multiple servers is the accumulation of duplicate or superfluous code contributed by each one of them in their standalone form, which is why I have been trying to use shared libraries as much as possible.

Yes, I'd say putting drivers into servers rather than libraries is the best idea, and putting shared state into libraries is a terrible idea, since it effectively bypasses memory protection if shared memory is used. Despite this, it seems like libraries with shared state (is there a name for this anti-pattern? I call it "cooperative state") seem to be common (on Linux, ALSA and Qt Embedded are notable examples), even though there is no reason why the shared state can't be placed in a server. I'm not at all sure why it's so common.

The only exception to this would be on a kernel that allows protecting libraries from the rest of a process (which could be rather tricky to implement and might have the same overhead as just running the drivers as servers in the first place).

...

The virtual filesystem libraries also seem to promote a "client-side" approach to providing such functionality. Looking at different filesystem architectures is another of my current activities.

UX/RT will use a split architecture for its VFS, where read(), write(), and seek() will call kernel IPC APIs to communicate with the server directly, and all other filesystem-related functions will call the VFS component in the process server. This seems to be the easiest way to implement a VFS architecture that maps each file descriptor onto a capability (actually, a group of related capabilities) while still having reasonable performance.

Paul Boddie

5 Oct 5 Oct

5:24 p.m.

New subject: Moving code into applications (was Re: Information on implementing L4)

Sorry to hold off responding for so long!

On Monday 17. September 2018 16.54.14 Andrew Warkentin wrote:

...

UX/RT will use a split architecture for its VFS, where read(), write(), and seek() will call kernel IPC APIs to communicate with the server directly, and all other filesystem-related functions will call the VFS component in the process server. This seems to be the easiest way to implement a VFS architecture that maps each file descriptor onto a capability (actually, a group of related capabilities) while still having reasonable performance.

I sent a mail about filesystem design choices to this list back in August, but I guess all the action is elsewhere with regard to people writing code for these systems:

https://os.inf.tu-dresden.de/pipermail/l4-hackers/2018/008303.html

Recent experiments of mine have focused on separating client functionality (below the C library level, in practice) from filesystem operations (implemented in their own server) and from block device operations (also in their own server, currently faked using access to "rom" files in L4Re). The objective here was to explore mechanisms for sharing buffers between processes and figuring out what the communication has to be.

What I want to do in future is to switch out the VFS code and library functionality up to and including the C library, with musl-libc looking like a fairly clean basis for doing that. However, this is probably going to be a lot of work.

Still, I hope to look a bit more closely at MINIX 3 and the Hurd (on Mach and on L4) to see what patterns have been employed previously. I gave some relevant links to documentation in my previous message (see above).

Paul

2095

Age (days ago)

2116

Last active (days ago)

l4-hackers@os.inf.tu-dresden.de

32 comments

7 participants

tags (0)

participants (7)

Andrew Warkentin
Gernot Heiser
Gábor Wacha
John
Paul Boddie
Vasily A. Sartakov
Zenaan Harkness