Hi Adam
Yes, the cache is enabled as I manually activate the SCTLR_EL1 registers with the enable values for I and C. I can see by debugging that some instructions are cached.
On analysis, I noticed that latencies seem to occur when accessing on stack or memory areas. I did a test using the assembly to rewrite the time calculation function and this method is considerably more agile (from 4650 ticks to 79 ticks).
Going deeper and disassembling the code with the for loop, I noticed that the difference is in the non-use of calls of type => ldr x0, [sp, #104], i.e. to the stack.
The stack is an area of memory set by the linker and used as described in the manual ( ARM DAI 0527A Non-Confidential - Application Note - Bare-metal Boot Code for ARMv8-A Processors).
Could there be something I need to manage on the hypervisor side or settings I need to make to optimise these exchanges?
Thanks again for your support and courtesy Gianluca