On 06/02/2011, at 10:36 , Chen Tian wrote:
Well, I did run it on a real machine (a dual-core processor with hyper-threading). It takes more than one million cycles for one-way. It seems like the number I got is unusual. Do you think using affinity will affect the results? I pinned the ping thread and pong thread down to different cores by setting the affinities before they started calling send/wait IPC calls. I am not sure if I did something wrong there.
X-core IPC is more expensive than local IPC, as cache lines must be migrated, and, depening on how it's implemented, you may get inter-core interrupts (IPIs) which are expensive on x86. And you don't get any benefit from parallelism, as one thread is always blocked.
But none of that should result in such bad latencies. I could see a few 10k cyc, but not Mcyc.
Gernot