Dear L4 hackers,
I have implemented a basic support for the HiKey960 board (based on the Kirin 960 SoC) [1] in Fiasco.OC and L4Re. If you are interested, please feel free to examine the code at GitHub [2]. I would be more than happy for a code review and suggestions towards potential upstream merging.
The code is based on the latest L4Re base 21.07.0 snapshot. I have wanted to rebase the implementation on the latest revisions of the respective GitHub repositories, but unfortunately the current upstream Fiasco does not work for me properly even in QEMU.
The implementation is very basic (fixed physical memory layout, UART support only, etc.), but seems to be working fine on the default examples. Unfortunately, I am struggling for some time with making the additional CPU cores work. I would greatly appreciate your assistance in this matter.
In a nutshell, the additional 3 little cores (A53) are woken up correctly using the PSCI call (I am not tackling the 4 big A73 cores yet), but the Fiasco code livelocks in the loop around the STXR instruction in src/kern/arm/64/tramp-mp.S [3]. The STXR instruction always reports that the exclusive access to the _tramp_mp_spinlock failed despite no other accesses to the spinlock happened (confirmed using a JTAG debugger).
I have checked all the usual culprits like proper virtual memory mapping attributes. I don't see any significant difference between the initialization code of Fiasco and the initialization code of other kernels that support HiKey960. I am also aware of the recent Adam's fix to the _tramp_mp_spinlock code [4], but not even that fixes the livelock issue on HiKey 960.
Does anyone have any idea what might be root cause of the livelock here? Thanks in advance for any input.
[1] https://www.96boards.org/product/hikey960/ [2] https://github.com/martin-decky/L4Re-HiKey960 [3] https://github.com/martin-decky/L4Re-HiKey960/blob/master/src/fiasco/src/ker n/arm/64/tramp-mp.S#L247 [4] https://github.com/kernkonzept/fiasco/commit/6bc2f7132982bc1e159abb716eb9026 71d381e9e
Best regards
Martin Decky
Hi Martin,
On Fri Sep 17, 2021 at 13:05:38 +0000, Martin Decky wrote:
Dear L4 hackers,
I have implemented a basic support for the HiKey960 board (based on the Kirin 960 SoC) [1] in Fiasco.OC and L4Re. If you are interested, please feel free to examine the code at GitHub [2]. I would be more than happy for a code review and suggestions towards potential upstream merging.
The code is based on the latest L4Re base 21.07.0 snapshot. I have wanted to rebase the implementation on the latest revisions of the respective GitHub repositories, but unfortunately the current upstream Fiasco does not work for me properly even in QEMU.
The implementation is very basic (fixed physical memory layout, UART support only, etc.), but seems to be working fine on the default examples. Unfortunately, I am struggling for some time with making the additional CPU cores work. I would greatly appreciate your assistance in this matter.
In a nutshell, the additional 3 little cores (A53) are woken up correctly using the PSCI call (I am not tackling the 4 big A73 cores yet), but the Fiasco code livelocks in the loop around the STXR instruction in src/kern/arm/64/tramp-mp.S [3]. The STXR instruction always reports that the exclusive access to the _tramp_mp_spinlock failed despite no other accesses to the spinlock happened (confirmed using a JTAG debugger).
I have checked all the usual culprits like proper virtual memory mapping attributes. I don't see any significant difference between the initialization code of Fiasco and the initialization code of other kernels that support HiKey960. I am also aware of the recent Adam's fix to the _tramp_mp_spinlock code [4], but not even that fixes the livelock issue on HiKey 960.
Does anyone have any idea what might be root cause of the livelock here? Thanks in advance for any input.
I'm not sure about the QEMU part (works fine for me) but on the HiKey my guess would be that the cache on those cores is not enabled? Could you check this?
Adam
Dear Adam,
thanks for your reply!
I'm not sure about the QEMU part (works fine for me)
For the record, this is what I did:
(1) Start with the vanilla base 21.07.0 snapshot.
(2) Manually update the respective components (i.e. src/fiasco, src/l4/mk, src/l4/pkg/bootstrap, etc.) from the respective GitHub repositories. Maybe I've missed something, but I was unable to find any integration repository that would just point to the other repos as submodules or something in that style. And I don't know where is the upstream of things like bin/setup.d and alike.
(3) Compile the resulting composition for arm64-virt-el2. The compilation finishes OK.
(4) Run the compilation output in QEMU. See the log attached. The UART still echoes the input, thus I believe the kernel does not crash, but there is no forward progress.
I am sure that I would be able to find and fix the problem eventually. But I simply prefer a working baseline before doing some development and therefore I have stuck to the vanilla snapshot :)
but on the HiKey my guess would be that the cache on those cores is not enabled? Could you check this?
Like I have written before, I have tried to confirm that the memory is mapped with the correct attributes. The JTAG debugger reports that the memory region where _tramp_mp_spinlock is located in a memory region that is inner shareable, inner write-back, outer write-back, read allocate, write allocate, non-transient.
I have also tried to confirm this from the code:
(a) MAIR_EL2 is set to 0x00ff4400. Which means that the attribute index 2 represents normal cacheable memory.
(b) TCR_EL2 is set to 0x80853510. Which means that the memory is inner shareable, normal outer write-back read allocate write allocate, normal inner write-back read allocate write allocate.
(c) SCTLR_EL2 is set to 0x30c51835. Which means that the instruction cache, the data cache and the memory translation are enabled.
(d) I must say that the code in kern/arm/paging-arm.cpp is extremely hard to understand and analyze (compared to most kernels I've ever seen), not just because it is plagued by non-symbolic constants. But I still believe that the value of 0x008 on the line 835 translates to using the attribute index 2 (see above).
Am I missing something?
Best regards
Martin Decky
Hi Martin!
On 9/20/21 18:44, Martin Decky wrote:
(2) Manually update the respective components (i.e. src/fiasco, src/l4/mk, src/l4/pkg/bootstrap, etc.) from the respective GitHub repositories. Maybe I've missed something, but I was unable to find any integration repository that would just point to the other repos as submodules or something in that style. And I don't know where is the upstream of things like bin/setup.d and alike.
Do you need something from the snapshot other than the packages you then update from GitHub? If you only need the GitHub packages, the tool for maintaining all of the individual repos at once is called ham and its use is described in our tutorial at:
https://github.com/kernkonzept/manifest/wiki/BUILDING
Cheers, Jakub
Dear Jakub,
Do you need something from the snapshot other than the packages you then update from GitHub?
Yes, I actually need the gnu-efi package to build the UEFI boot images.
If you only need the GitHub packages, the tool for maintaining all of the individual repos at once is called ham and its use is described in our tutorial at:
OK, thanks for the hint! That actually worked nicely and I can build a working L4Re both for QEMU and for HiKey960.
Now I have individual repositories [1][2][3][4] that can closely follow the upstream. Unfortunately, the problem with the additional CPU cores still persists.
[1] https://github.com/martin-decky/fiasco-hikey960 [2] https://github.com/martin-decky/mk-hikey960 [3] https://github.com/martin-decky/bootstrap-hikey960 [4] https://github.com/martin-decky/L4Re-HiKey960-deploy
Best regards
Martin Decky
On 9/20/21 18:44, Martin Decky wrote:
(4) Run the compilation output in QEMU. See the log attached. The UART still echoes the input, thus I believe the kernel does not crash, but there is no forward progress.
From the symptoms in the log it seems to me like you were running a virt-enabled kernel with the L4Re build configured for a non-virt-enabled kernel. Can you check that CONFIG_KERNEL_CPU_VIRT is set to y in your .kconfig?
Cheers, Jakub
Dear Jakub,
From the symptoms in the log it seems to me like you were running a virt-enabled kernel with the L4Re build configured for a non-virt-enabled kernel. Can you check that CONFIG_KERNEL_CPU_VIRT is set to y in your .kconfig?
Yes, you are correct. I confirm that this configuration mismatch was the root cause of the issue. I must've overlooked something while I've been updating the files in the snapshot with the files from the upstream repos.
I'm wondering whether it would be useful to have a compile-time (or at least run-time) check to catch such configuration mismatch between Fiasco and L4Re.
Well, updating the snapshot was a poor man's solution anyway and using Ham is IMHO a better approach. Thanks!
Best regards
Martin Decky
Hi Martin!
On 9/17/21 15:05, Martin Decky wrote:
In a nutshell, the additional 3 little cores (A53) are woken up correctly using the PSCI call (I am not tackling the 4 big A73 cores yet), but the Fiasco code livelocks in the loop around the STXR instruction in src/kern/arm/64/tramp-mp.S [3]. The STXR instruction always reports that the exclusive access to the _tramp_mp_spinlock failed despite no other accesses to the spinlock happened (confirmed using a JTAG debugger).
Does the behavior change if you enable only 2 CPUs in your Fiasco config so that you have the BSP and only one AP?
Jakub
l4-hackers@os.inf.tu-dresden.de