RE: IPC/Capabilities Overview

5 Jan 2004


      ...
That is wrong.  The direct lookup drastically reduces cache and TLB
footprint.  For a full IPC we have to access two TCBs (which are
virtually mapped and have the stack in the same page) which costs two
TLB entries.  The complete lookup is therefore a simple mask (plus maybe
a shift), a relative mov (e.g. mov 0xe0000000(%eax), %ebx) and a
compare.  Overall costs therefore (on IA32):

2 TLB entries (but we need them anyway for the stack, they could be

reduced to one TLB entry when using 4M pages for all TCBs, but that
would add an indirection table and therefore a cache line); refetch
costs ~80 cycles/entry

shift and move (~3 cycles)
1 cache line for the thread id (which is shared with thread state

etc).
Assume you add 2 more TLB entries and 5 more L2 cache lines--your
aftercosts for IPC go up by 2*80 + 5*80 = 560 cycles.
Considering overall IPC costs of 1000 cycles on a P4 with all those
nasty cache and TLB flushes you add an overhead of >50%.
The "but we need them anyway for the stack, they could be reduced to one TLB 
entry when using 4M pages for all TCBs, but that would add an indirection 
table and therefore a cache line" part seems very interesting, I've seen 
something about it in your microkernel presentations, so I guess you have 
done some speed measurements.
Now if you would give each address space it's own indirection table, you 
would have a thread space.
Have you done measurements? What does this methode cost?
-- Rudy Koot
_________________________________________________________________
MSN Search, for accurate results! http://search.msn.nl

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

RE: IPC/Capabilities Overview