RE: Deterministic scheduling with local APIC

13 Jan 2003


      [Jacob Gorm Hansen]
...
On Mon, 2003-01-13 at 16:06, Josh English wrote:
...
The largest problem I see with this scheme is that I cannot
conceive of a mechanism where the data sent to the CPU and the
failover CPU will not cause enormous delays.  This is because any
failover system will require a confirmation of the data received by
the backup CPU.  Even if the transport and confirmation mechanism
was hardware, the bottleneck inherent in the design would render
the system unfeasible in a real-world implementation.
...
I imagine that if the system is only ever accessed from the network,
all the data that is to be duplicated comes from there. I suppose
one could just use an Ethernet hub or similar hardware to make sure
all Ethernet frames arrive at both hosts, coupled with a dedicated
low-latency link between the hosts used for syncronisation (which
given certain guarantees about the performance of the local Ethernet
segment could be as simple as just each host counting all incoming
frames and sharing the counter via the dedicated link).
In many situations you can also achieve fault-tolerance using logging,
and exploit some sort of causal logging technique to avoid unnecessary
ackownledgements from the logger.  I suspect that the following two
projects might give some useful solutions and/or pointers:
Lightweight Fault-Tolerance
   URL:http://www.cs.utexas.edu/users/lorenzo/lft.html
WAFT: Support for Fault-Tolerance in Wide-Area Object-Oriented Systems
   URL:http://www-cse.ucsd.edu/users/marzullo/WAFT/index.html
In particular, there are proposals on how to achive fault-tolerance
(through logging) for TCP-like protocols.
eSk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

RE: Deterministic scheduling with local APIC