Deterministic scheduling with local APIC

Espen Skoglund esk at ira.uka.de
Mon Jan 13 19:49:59 CET 2003


[Jacob Gorm Hansen]
> On Mon, 2003-01-13 at 16:06, Josh English wrote:
>> The largest problem I see with this scheme is that I cannot
>> conceive of a mechanism where the data sent to the CPU and the
>> failover CPU will not cause enormous delays.  This is because any
>> failover system will require a confirmation of the data received by
>> the backup CPU.  Even if the transport and confirmation mechanism
>> was hardware, the bottleneck inherent in the design would render
>> the system unfeasible in a real-world implementation.

> I imagine that if the system is only ever accessed from the network,
> all the data that is to be duplicated comes from there. I suppose
> one could just use an Ethernet hub or similar hardware to make sure
> all Ethernet frames arrive at both hosts, coupled with a dedicated
> low-latency link between the hosts used for syncronisation (which
> given certain guarantees about the performance of the local Ethernet
> segment could be as simple as just each host counting all incoming
> frames and sharing the counter via the dedicated link).

In many situations you can also achieve fault-tolerance using logging,
and exploit some sort of causal logging technique to avoid unnecessary
ackownledgements from the logger.  I suspect that the following two
projects might give some useful solutions and/or pointers:

   Lightweight Fault-Tolerance
   <URL:http://www.cs.utexas.edu/users/lorenzo/lft.html>

   WAFT: Support for Fault-Tolerance in Wide-Area Object-Oriented Systems
   <URL:http://www-cse.ucsd.edu/users/marzullo/WAFT/index.html>

In particular, there are proposals on how to achive fault-tolerance
(through logging) for TCP-like protocols.

	eSk






More information about the l4-hackers mailing list