Betriebssysteme · Institut für Systemarchitektur · Fakultät Informatik · TU Dresden

25. 11. 2014

Operating System Support for Redundant Mulithreading

Björn Döbel

TU Dresden

Sondertermin: 13:30 Uhr, INF/1004

Modern hardware is increasingly prone to faults in both functional processing units and memory. Mechanisms to protect systems against those faults have been developed at the software and hardware levels. Commercial-off-the-shelf systems often avoid expensive hardware extensions and therefore have to rely on software-implemented fault tolerance. Many software-level methods come as compiler extensions, which requires the availability of all components' source code. In contrast, replication-based approaches often work without this requirement and therefore allow to protect arbitrary binary applications. In my talk I am going to present Romain, an operating system service that provides transparent redundant multithreading on top of the L4/Fiasco.OC microkernel. I will discuss how Romain leverages OS-level virtualization support to execute replicas. Replicating multithreaded applications requires special care, because scheduling-induced non-determinism hinders error detection. I will show how Romain achieves average overheads between 10% for triple-replicating the single-threaded SPEC CPU 2006 benchmarks and 65% for replicating a multithreaded benchmark suite (SPLASH2). OS-Level fault tolerance methods only protect user code, while other software components that are of critical importance for the functioning of the whole system are left unprotected. I denote these components as the Reliable Computing Base (RCB) of a system and will discuss case studies that give hints on how to achieve full-system protection against hardware faults.
25. Jun 2020
· Copyright © 2001-2019 Operating Systems Group, TU Dresden | Impressum ·