Fiasco.OC performance issues

Thu Feb 7 21:38:03 CET 2013

On Wed Feb 06, 2013 at 11:27:51 +0100, Sebastian Sumpf wrote:
> On 01/29/2013 12:09 AM, Adam Lackorzynski wrote:
> > On Fri Jan 25, 2013 at 00:00:26 +0100, Sebastian Sumpf wrote:
> >> On 01/17/2013 11:31 PM, Adam Lackorzynski wrote:
> >>> On Thu Jan 17, 2013 at 17:03:36 +0100, Sebastian Sumpf wrote:
> >>>> I recently upgraded Fiasco.OC to SVN revision 42 and experience some
> >>>> pretty severe performance degradation compared to revision 40 on the
> >>>> Pandaboard (SMP). It seems that 'simga0' and the root task stall for 5
> >>>> to 10 seconds during boot up. I tracked the issue down to be caused by
> >>>> the initial mapping operations, especially our root task maps all the
> >>>> available memory during bootstrap. Within the kernel the
> >>>> 'Context::xcpu_tlb_flush' is called for each mapping. The function sends
> >>>> an IPI (to CPU1 which is idle) and then waits for an IPI in order to
> >>>> signal the end of the operation. The whole operation seems to have
> >>>> gotten slower compared to revision 40, but I could not find many
> >>>> differences in the IPI-handling code. Do you have any ideas or
> >>>> suggestions what could cause the delay (maybe scheduling changes) and
> >>>> how to fix it?
> >>>
> >>> I noticed a similar/same thing but hadn't time to investigate yet.
> >>
> >> Okay, I just wanted to make sure that the problem is not at our side nor
> >> at our usage pattern.
> >> Another thing I wonder is: Since you now have second level cache support
> >> for the PandaBoard, how do I map DMA memory to a client? The problem
> >> seems to be that sigma0 maps all memory as cached. So what we have been
> >> trying to do is this: When someone requests DMA memory we map the page
> >> as uncached and then call 'l4_cache_dma_coherent' afterwards. This
> >> doesn't seem to work out well for our drivers. The thing I think I could
> >> gather is that memory that is mapped cached (sigma0, roottask) and
> >> uncached (client) at the same time has an undefined behavior (I might be
> >> wrong here) on ARM. So, what is the protocol to implement this on
> >> Fiasco.OC/L4RE setups?
> > 
> > Indeed, having memory with different attributes must be avoided.
> > But it's also about accessing that memory. So for example for sigma0
> > this isn't a problem because sigma0 does not touch the memory itself.
> > Is your roottask accessing the memory, i.e. pulling it into caches?
> 
> Yes this is the problem we're trying to solve. We don't have a notice of
> normal RAM and DMA pools within our roottask (if we had this, the
> question of how to dimension DMA pools would arise, also this seems to
> be an ARM only issue). So here is what we did with L1 caches enabled
> only: Acquire the memory in roottask, map it to the client as
> non-cacheable, zero out the memory (as it might have been previously
> used as normal RAM and for security reasons), clean and invalidate the
> data-cache. With the L2-cache enabled on PandaBoard, I tried to use the
> 'l4_cache_dma_coherent' function to accomplish the same behavior, which
> worked out well for the L1-case ... but it didn't. So, what am I doing
> wrong here, or isn't this supported anyways?

Sounds reasonable to me. Could you check whether
l4_cache_dma_coherent_full makes a difference in your setup?

Adam
-- 
Adam                 adam at os.inf.tu-dresden.de
  Lackorzynski         http://os.inf.tu-dresden.de/~adam/