On Mon, 2003-01-13 at 16:06, Josh English wrote:
Jacob,
Josh, I hope you don't mind me CCing this to the L4 list.
The largest problem I see with this scheme is that I cannot conceive of a mechanism where the data sent to the CPU and the failover CPU will not cause enormous delays. This is because any failover system will require a confirmation of the data received by the backup CPU. Even if the transport and confirmation mechanism was hardware, the bottleneck inherent in the design would render the system unfeasible in a real-world implementation.
I imagine that if the system is only ever accessed from the network, all the data that is to be duplicated comes from there. I suppose one could just use an Ethernet hub or similar hardware to make sure all Ethernet frames arrive at both hosts, coupled with a dedicated low-latency link between the hosts used for syncronisation (which given certain guarantees about the performance of the local Ethernet segment could be as simple as just each host counting all incoming frames and sharing the counter via the dedicated link).
Since my understanding of deterministic scheduling is that in multiplicity scheduling, jobs are partitioned into job types where the jobs in each type are identical. Then the job is sent to any given machine as determined by the scheduling mechanism. Thus sending every job to every machine would cause much greater overhead than is justified by having a failover mechanism.
What I meant by 'deterministic' is just that all threads always get their full time slice. For this to happen a means of determining upon the arrival of an interrupt exactly how many cycles the running thread has left of its slice, and a means of granting it precisely this number of cycles before it gets preempted, is needed. I suppose you could use the TSC for the former and perhaps the local APIC for the latter, but I am unsure if trying to do so would be crazy.
Well, unless you are building systems to withstand nuclear attack or something. So, if you want to write the code, I think that your system would work provided your 'master' scheduling mechanism resides above the kernel scheduler. I think that this would prevent race conditions from forming becuase hardware delays would be invisible to the kernel at that point.
I don't know if I will ever find time to write something like this, even if it were possible, but it struck me as an interesting future project.
best, Jacob