Modern hardware is increasingly prone to faults in both functional
processing units and memory. Mechanisms to protect systems against
those faults have been developed at the software and hardware levels.
Commercial-off-the-shelf systems often avoid expensive hardware
extensions and therefore have to rely on software-implemented fault
Many software-level methods come as compiler extensions, which
requires the availability of all components' source code. In contrast,
replication-based approaches often work without this requirement and
therefore allow to protect arbitrary binary applications.
In my talk I am going to present Romain, an operating system service
that provides transparent redundant multithreading on top of the
L4/Fiasco.OC microkernel. I will discuss how Romain leverages OS-level
virtualization support to execute replicas. Replicating multithreaded
applications requires special care, because scheduling-induced
non-determinism hinders error detection. I will show how Romain
achieves average overheads between 10% for triple-replicating the
single-threaded SPEC CPU 2006 benchmarks and 65% for replicating a
multithreaded benchmark suite (SPLASH2).
OS-Level fault tolerance methods only protect user code, while
other software components that are of critical importance for the
functioning of the whole system are left unprotected. I denote these
components as the Reliable Computing Base (RCB) of a system and will
discuss case studies that give hints on how to achieve full-system
protection against hardware faults.
Operating System Support for Redundant Mulithreading
Sondertermin: 13:30 Uhr, INF/1004