Deterministic scheduling with local APIC
Espen Skoglund
esk at ira.uka.de
Mon Jan 13 19:49:59 CET 2003
[Jacob Gorm Hansen]
> On Mon, 2003-01-13 at 16:06, Josh English wrote:
>> The largest problem I see with this scheme is that I cannot
>> conceive of a mechanism where the data sent to the CPU and the
>> failover CPU will not cause enormous delays. This is because any
>> failover system will require a confirmation of the data received by
>> the backup CPU. Even if the transport and confirmation mechanism
>> was hardware, the bottleneck inherent in the design would render
>> the system unfeasible in a real-world implementation.
> I imagine that if the system is only ever accessed from the network,
> all the data that is to be duplicated comes from there. I suppose
> one could just use an Ethernet hub or similar hardware to make sure
> all Ethernet frames arrive at both hosts, coupled with a dedicated
> low-latency link between the hosts used for syncronisation (which
> given certain guarantees about the performance of the local Ethernet
> segment could be as simple as just each host counting all incoming
> frames and sharing the counter via the dedicated link).
In many situations you can also achieve fault-tolerance using logging,
and exploit some sort of causal logging technique to avoid unnecessary
ackownledgements from the logger. I suspect that the following two
projects might give some useful solutions and/or pointers:
Lightweight Fault-Tolerance
<URL:http://www.cs.utexas.edu/users/lorenzo/lft.html>
WAFT: Support for Fault-Tolerance in Wide-Area Object-Oriented Systems
<URL:http://www-cse.ucsd.edu/users/marzullo/WAFT/index.html>
In particular, there are proposals on how to achive fault-tolerance
(through logging) for TCP-like protocols.
eSk
More information about the l4-hackers
mailing list