On Mon, 2003-01-13 at 16:06, Josh English wrote:
Jacob,
Josh, I hope you don't mind me CCing this to the L4 list.
The largest problem I see with this scheme is that I cannot conceive of a mechanism where the data sent to the CPU and the failover CPU will not cause enormous delays. This is because any failover system will require a confirmation of the data received by the backup CPU. Even if the transport and confirmation mechanism was hardware, the bottleneck inherent in the design would render the system unfeasible in a real-world implementation.
I imagine that if the system is only ever accessed from the network, all the data that is to be duplicated comes from there. I suppose one could just use an Ethernet hub or similar hardware to make sure all Ethernet frames arrive at both hosts, coupled with a dedicated low-latency link between the hosts used for syncronisation (which given certain guarantees about the performance of the local Ethernet segment could be as simple as just each host counting all incoming frames and sharing the counter via the dedicated link).
Since my understanding of deterministic scheduling is that in multiplicity scheduling, jobs are partitioned into job types where the jobs in each type are identical. Then the job is sent to any given machine as determined by the scheduling mechanism. Thus sending every job to every machine would cause much greater overhead than is justified by having a failover mechanism.
What I meant by 'deterministic' is just that all threads always get their full time slice. For this to happen a means of determining upon the arrival of an interrupt exactly how many cycles the running thread has left of its slice, and a means of granting it precisely this number of cycles before it gets preempted, is needed. I suppose you could use the TSC for the former and perhaps the local APIC for the latter, but I am unsure if trying to do so would be crazy.
Well, unless you are building systems to withstand nuclear attack or something. So, if you want to write the code, I think that your system would work provided your 'master' scheduling mechanism resides above the kernel scheduler. I think that this would prevent race conditions from forming becuase hardware delays would be invisible to the kernel at that point.
I don't know if I will ever find time to write something like this, even if it were possible, but it struck me as an interesting future project.
best, Jacob
[Jacob Gorm Hansen]
On Mon, 2003-01-13 at 16:06, Josh English wrote:
The largest problem I see with this scheme is that I cannot conceive of a mechanism where the data sent to the CPU and the failover CPU will not cause enormous delays. This is because any failover system will require a confirmation of the data received by the backup CPU. Even if the transport and confirmation mechanism was hardware, the bottleneck inherent in the design would render the system unfeasible in a real-world implementation.
I imagine that if the system is only ever accessed from the network, all the data that is to be duplicated comes from there. I suppose one could just use an Ethernet hub or similar hardware to make sure all Ethernet frames arrive at both hosts, coupled with a dedicated low-latency link between the hosts used for syncronisation (which given certain guarantees about the performance of the local Ethernet segment could be as simple as just each host counting all incoming frames and sharing the counter via the dedicated link).
In many situations you can also achieve fault-tolerance using logging, and exploit some sort of causal logging technique to avoid unnecessary ackownledgements from the logger. I suspect that the following two projects might give some useful solutions and/or pointers:
Lightweight Fault-Tolerance URL:http://www.cs.utexas.edu/users/lorenzo/lft.html
WAFT: Support for Fault-Tolerance in Wide-Area Object-Oriented Systems URL:http://www-cse.ucsd.edu/users/marzullo/WAFT/index.html
In particular, there are proposals on how to achive fault-tolerance (through logging) for TCP-like protocols.
eSk
On Mon, 2003-01-13 at 17:16, Jacob Gorm Hansen wrote:
On Mon, 2003-01-13 at 16:06, Josh English wrote:
What I meant by 'deterministic' is just that all threads always get their full time slice. For this to happen a means of determining upon the arrival of an interrupt exactly how many cycles the running thread has left of its slice, and a means of granting it precisely this number of cycles before it gets preempted, is needed. I suppose you could use the TSC for the former and perhaps the local APIC for the latter, but I am unsure if trying to do so would be crazy.
Well, unless you are building systems to withstand nuclear attack or something. So, if you want to write the code, I think that your system would work provided your 'master' scheduling mechanism resides above the kernel scheduler. I think that this would prevent race conditions from forming becuase hardware delays would be invisible to the kernel at that point.
It seems my idea is similar to the HP Hypervisor project by Bressoud and Schneider (1996). However, their effort benefited from the presence of a 'recovery register' on the PA-RISC. This register counts completed instructions rather than cycles, and allows you to implement deterministic scheduling (though that is not their exact approach). I cannot find info on a similar register in P3 or P4, does anyone know if it exists? If not, is anyone aware of other modern CPUs which sport such a register? My guess is that the Itanium might have one due to its PA-RISC heritage.
I am afraid that just relying on the TSC will lead to unpredictable results on a modern CPU.
Best, Jacob
l4-hackers@os.inf.tu-dresden.de