memory problems on Fiasco porting

Thu Feb 19 09:01:03 CET 2009

Hi,

On Wed Feb 18, 2009 at 02:34:34 +0800, Tsai, Tung-Chieh wrote:
> On Mon, Feb 16, 2009 at 6:41 PM, Adam Lackorzynski
> <adam at os.inf.tu-dresden.de> wrote:
> > Hi,
> >
> > On Sat Feb 14, 2009 at 20:25:19 +0800, Tsai, Tung-Chieh wrote:
> >> On Fri, Feb 13, 2009 at 5:32 AM, Adam Lackorzynski
> >> <adam at os.inf.tu-dresden.de> wrote:
> >> > The the data abort happens it should end up in the page fault handler
> >> > and further on make a page visible at 0xc0080000. Does this happen or
> >> > not?
> >>
> >> Yes, and then it would stuck between irq_handler() and
> >> Timeslice_timeout::expired().
> >
> > Hmm, so after the page-fault happened it is properly resolved and after
> > the page fault handling is done there is a page at 0xc0080000? I'm just
> > asking because when the irq_handler is invoked this has not really
> > anything to do with page-fault handling but probably is a timer
> > interrupt. Of course this timer interrupt is executed in a specific
> > context and might also access the same address range. Might it be any
> > cache related thing? At which instruction is it stuck? Is there anything
> > special?
> >
> 
> After page fault handler, there's a page at 0xc0080000 ~ 0xc0081000.
> 
> Initially, I thought this data abort exception is an error...  Now I've
> understood it's not a problem.

Yes.

> But after I skip these exception, it would stuck between irq_handler()
> and Timeslice_timeout::expired(). I found that it is because in the while
> loop of Timeout_q::do_timeouts(), traveling the queue Timeout_q::_q via
> Timeout::_next would exceed  Timeout_q::Wakeup_queue_count.
> 
> I put an assertion on how many time this while loop is executed, and
> this assertion would not fail when I run it on QEMU integratorcp
> platform. After this assertion failed, entering jdb, jdb shows that sigma0
> had been added to ready/present list.
> 
> Since timmer interrupt had been enabled before
> Kernel_thread::init_workload(), I thought this error may be occur at the
> first timer interrupt after sigma0 had been added to ready queue.

Ok, if it runs in Qemu it might very well be a cache issue...

> I would check cache & tlb relative part again. Currently, for cache & tlb
> part, I write the following function for arm922t:
> 
>   * Mmu<Flush_area, Ram>::flush_cache()
>       clean D cache(write back), invalidate I & D cache, drain write buffer
>   * Mmu<Flush_area, Ram>::clean_dcache()
>       clean D cache, , drain write buffer
>   * Mmu<Flush_area, Ram>::flush_dcache()
>       clean and invalidate D cache, , drain write buffer
>   * Mem_unit::tlb_flush( void* va, unsigned long)
>       I just flush whole tlb here, since arm922t looks not have any
>       instruction to flush a specific tlb entry without knowing it's
>       instruction tlb or data tlb.
>   * The other functions of Mmu and Men_unit I doesn't mention are
>   using codes of arm926/armv5.

Ok, this is the stuff you need to change, lets hope it's right :)

> Did I missing something ? Or is there any other possible direction ?
> Any advice is appreciate. Thanks.

One thing you should try is the following... in
src/kern/arm/config-arm.c there's a constant called 'cache_enabled'.
It's set the true (obviously), do things change (and start to work
better) if you set it to false?

Adam
-- 
Adam                 adam at os.inf.tu-dresden.de
  Lackorzynski         http://os.inf.tu-dresden.de/~adam/