Hi All,
Now I want to how many IPIs are sent during an IPC. So I add a counter for IPI in the kernel, and in the user level, there is a client sending IPC to a server on another core. But I found some behavior are very strange.
1. The total number of IPI are not multiple times of the number of IPC. It seems that there are some "extra IPIs", and with the number of IPC increases, the number of "extra IPIs" are also increasing. So my question is if there are some background program in the kernel or L4Re to send IPIs?
2. If I add printf statement in the kernel, the number of IPIs are also increasing. Does printf in the kernel cause sending IPI? If it does, is there any methods to avoid this. Like in Linux, it has printk function in the kernel.
Thanks a lot!! Yuxin
On Thu Jul 17, 2014 at 13:03:31 -0400, Yuxin Ren wrote:
Now I want to how many IPIs are sent during an IPC. So I add a counter for IPI in the kernel, and in the user level, there is a client sending IPC to a server on another core. But I found some behavior are very strange.
- The total number of IPI are not multiple times of the number of IPC. It
seems that there are some "extra IPIs", and with the number of IPC increases, the number of "extra IPIs" are also increasing.
Well, as we already discussed, depending on the situation there might be two or four IPIs. Do you mean that with 'extra IPIs'?
So my question is if there are some background program in the kernel or L4Re to send IPIs?
In user-land there's nothing per default that runs on other CPUs, so no. The kernel will also only send IPIs around when necessary (what else :) ) such as when doing IPC.
- If I add printf statement in the kernel, the number of IPIs are also
increasing. Does printf in the kernel cause sending IPI? If it does, is there any methods to avoid this. Like in Linux, it has printk function in the kernel.
printf in the kernel does not use IPIs. But I could image that is alters timing and thus order which can lead to more IPIs for IPC (as above).
Adam
Thank you so much. I think I have found the reson. But I cannot understand the logic of that code. In the Context::enqueue_drq method, there is a piece of code to determin if IPI is needed. if (!_pending_rq.queued()) { if (!q.first()) ipi = true; q.enqueue(&_pending_rq); } In my understanding, the logic behind this is to check if _pending_rq is queued first, if it is queued, it's done; else enqueue it. And check if the queue is empty. If it is empty, IPI is needed, else this request can be picked up by IPI of other request. The problem is I cannot image in what case the _pending_rq is already queued, at least in my program. My scenario is: there is only one client and one server in different core, and client sends ipc in a tight loop. I think each time the server receives the request, it should dequeue _pending_rq, so it is not possible for client to see the _pending_rq is queued when it wants to ipc.But I observed this indeed happan sometime. I hope I descibe my question clearly. Could you give me some hints?
Best, Yuxin
On Mon, Jul 21, 2014 at 5:23 AM, Adam Lackorzynski < adam@os.inf.tu-dresden.de> wrote:
On Thu Jul 17, 2014 at 13:03:31 -0400, Yuxin Ren wrote:
Now I want to how many IPIs are sent during an IPC. So I add a counter
for
IPI in the kernel, and in the user level, there is a client sending IPC
to
a server on another core. But I found some behavior are very strange.
- The total number of IPI are not multiple times of the number of IPC.
It
seems that there are some "extra IPIs", and with the number of IPC increases, the number of "extra IPIs" are also increasing.
Well, as we already discussed, depending on the situation there might be two or four IPIs. Do you mean that with 'extra IPIs'?
So my question is if there are some background program in the kernel or L4Re to send IPIs?
In user-land there's nothing per default that runs on other CPUs, so no. The kernel will also only send IPIs around when necessary (what else :) ) such as when doing IPC.
- If I add printf statement in the kernel, the number of IPIs are also
increasing. Does printf in the kernel cause sending IPI? If it does, is there any methods to avoid this. Like in Linux, it has printk function in the kernel.
printf in the kernel does not use IPIs. But I could image that is alters timing and thus order which can lead to more IPIs for IPC (as above).
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
On Mon Jul 21, 2014 at 10:09:27 +0800, Yuxin Ren wrote:
I think I have found the reson. But I cannot understand the logic of that code. In the Context::enqueue_drq method, there is a piece of code to determin if IPI is needed. if (!_pending_rq.queued()) { if (!q.first()) ipi = true; q.enqueue(&_pending_rq); } In my understanding, the logic behind this is to check if _pending_rq is queued first, if it is queued, it's done; else enqueue it. And check if the queue is empty. If it is empty, IPI is needed, else this request can be picked up by IPI of other request. The problem is I cannot image in what case the _pending_rq is already queued, at least in my program. My scenario is: there is only one client and one server in different core, and client sends ipc in a tight loop. I think each time the server receives the request, it should dequeue _pending_rq, so it is not possible for client to see the _pending_rq is queued when it wants to ipc.But I observed this indeed happan sometime.
As I read the code this part is handling the case of migration that might happen to the thread. Your description does not like like threads would move between cores. Do you see this during your IPC loop or before?
Adam
Hi,
That happens during IPC loop. I am pretty there is no migration during my IPC. However the helping lock will cause migration, right? Bur I do not think this will happen in my case. Do you think something can delay the dequeue operation so that the __pending_rqq is still in the queue when next request happens? Because after I add some delay in the IPC path (add printf somewhere), the number of IPIs is exactly 4 for flex-page mapping IPC, and 2 for simple data-only IPC.
Thanks a lot. Yuxin
On Thu, Jul 24, 2014 at 4:46 PM, Adam Lackorzynski < adam@os.inf.tu-dresden.de> wrote:
On Mon Jul 21, 2014 at 10:09:27 +0800, Yuxin Ren wrote:
I think I have found the reson. But I cannot understand the logic of that code. In the Context::enqueue_drq method, there is a piece of code to determin
if
IPI is needed. if (!_pending_rq.queued()) { if (!q.first()) ipi = true; q.enqueue(&_pending_rq); } In my understanding, the logic behind this is to check if _pending_rq is queued first, if it is queued, it's done; else enqueue it. And check if the queue is empty. If it is empty, IPI is needed, else this request can be picked up by IPI of other request. The problem is I cannot image in what case the _pending_rq is already queued, at least in my program. My scenario is: there is only one client and one server in different
core,
and client sends ipc in a tight loop. I think each time the server
receives
the request, it should dequeue _pending_rq, so it is not possible for client to see the _pending_rq is queued when it wants to ipc.But I
observed
this indeed happan sometime.
As I read the code this part is handling the case of migration that might happen to the thread. Your description does not like like threads would move between cores. Do you see this during your IPC loop or before?
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Hi,
On Thu Jul 24, 2014 at 17:41:14 -0400, Yuxin Ren wrote:
That happens during IPC loop. I am pretty there is no migration during my IPC. However the helping lock will cause migration, right? Bur I do not think this will happen in my case. Do you think something can delay the dequeue operation so that the __pending_rqq is still in the queue when next request happens? Because after I add some delay in the IPC path (add printf somewhere), the number of IPIs is exactly 4 for flex-page mapping IPC, and 2 for simple data-only IPC.
The printfs incredible slow down the IPCs so I'm not sure this tells us something. Are you sure that your IPC loops run at the highest prio (so that nothing else can run) and they are static, i.e. no memory allocation, no creation/destruction of anything, no other syscall? The request you see in there, which request is it? There's a function attached to each request, is it IPC related?
Adam
l4-hackers@os.inf.tu-dresden.de