Hi Adam,
We’re using the Radxa SiRider S1 development board, which is based on the Rockchip RK3588 SoC. You can find more details here: https://arace.tech/products/radxa-sirider-s1. The CPU consists of 4 Cortex-A76 cores and 2 Cortex-A55 cores.
Best, Stephen.yang
At 2025-10-06 06:00:44, "Adam Lackorzynski" adam@l4re.org wrote:
Hi Stephen,
I doubt this is hardware, it seldomly is. Would you be able to share which Arm core it is if you can? I'll try to reproduce here, your indication of increasing the timer frequency is a good hint. And knowing which core or at least category of core can be helpful.
BR, Adam
On Thu Oct 02, 2025 at 13:02:50 +0800, yy18513676366 wrote:
Hi Adam,
I truly appreciate your reply. I actually encountered this issue on real hardware rather than QEMU. May I ask if this problem could be related to the hardware itself? I’m not quite sure I fully understand.
Best regards Stephen.yang
在 2025-09-30 15:50:22,"Adam Lackorzynski" adam@l4re.org 写道:
Hi Stephen,
ok, thanks, that's tricky indeed.
In case you are doing this with QEMU, could you please make sure you have the following change in your QEMU?: https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html
Or do you see this on hardware?
Thanks, Adam
On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
Hi Adam,
Thank you very much for your reply — it really gave me some hope.
This issue is indeed difficult to reproduce reliably, which has been one of the main challenges during my debugging. So far, I have found that increasing the vtimer interrupt frequency, while keeping the traditional handling mode (i.e., without direct injection), makes the problem significantly easier to reproduce.
The relevant changes are as follows. 1、In this setup, the vtimer is adjusted from roughly one trigger per millisecond to approximately one trigger per microsecond, and the system remains stable and functional:
diff --git a/src/kern/arm/timer-arm-generic.cpp b/src/kern/arm/timer-arm-generic.cpp index a040cf46..b4cbbceb 100644 --- a/src/kern/arm/timer-arm-generic.cpp +++ b/src/kern/arm/timer-arm-generic.cpp @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu) if (cpu == Cpu_number::boot_cpu()) { _freq0 = frequency();
_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000;
//_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000;_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000000; printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n", _freq0, _interval, Gtimer::counter()); assert(_freq0);2、In addition, I selected the mode where interrupts are not directly injected: diff --git a/src/Kconfig b/src/Kconfig index 4391c996..55deeb1c 100644 --- a/src/Kconfig +++ b/src/Kconfig @@ -367,7 +367,7 @@ config IOMMU config IRQ_DIRECT_INJECT bool "Support direct interrupt forwarding to guests" depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
default y
default n help Adds support in the kernel to allow the VMM to let Fiasco directly forward hardware interrupts to a guest. This enables just theAt the moment, this is the only way I have found that can noticeably increase the reproduction rate. Once again, thank you for your valuable time and feedback!
Best regards, Stephen.yang
At 2025-09-29 00:11:41, "Adam Lackorzynski" adam@l4re.org wrote:
Hi,
On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
When running a virtual machine, I encounter an assertion failure after the VM has been up for some time. The kernel crashes in src/kern/arm/ thread-arm-hyp.cpp, specifically in the function vcpu_vgic_upcall(unsigned virq):
vcpu_vgic_upcall(unsigned virq) { ...... assert(state() & Thread_vcpu_user); ...... }
Based on source code inspection and preliminary debugging, the problem seems to be related to the management of the Thread_vcpu_user state.
1 Under normal circumstances, the vcpu_resume path (transitioning from the kernel back to the guest OS) updates the vCPU state to include Thread_vcpu_user. However, if an interrupt is delivered during this transition while the receiving side is not yet ready, the vCPU frequently return to the kernel (via vcpu_return_to_kernel) and subsequently process the interrupt through guest_irq in vcpu_entries. In this situation, the expected update of Thread_vcpu_user may not yet have taken place, which seems result in the assert being triggered when a VGIC interrupt is involved.
2 A similar condition seems to occur in the vcpu_async_ipc path. At the end of IPC handling, this function explicitly clears the Thread_vcpu_user flag. If a VGIC interrupt is delivered during this phase, the absence of the expected Thread_vcpu_user state seems to lead to the same assertion failure.
I would like to confirm if the two points above are correct, and what steps I should take next to further debug this issue.
Thanks for repording. At least the description sounds reasonable to me.
Do you have a good way of reliably reproducing this situation?
In addition, I have some assumptions I would like to confirm:
First, for IPC between non-vcpu threads, the L4 microkernel handles message delivery and scheduling (wake/schedule) directly, without requiring any forwarding through uvmm. Similarly, interrupts bound via the interrupt controller (ICU) to a non-vcpu thread or handler are also managed by the kernel and scheduler, and therefore do not necessarily involve uvmm.
IPCs between threads are handled by the microkernel. vcpu-thread vs. non-vcpu-thread is just making the difference regarding how it is delivered to the thread. For a non-vcpu thread the receiver has to wait in IPC to get it, in vcpu mode the IPC is received by causing a vcpu event and bringing the vcpu to its entry. This also works without virtualization (note that vcpus also work without hw-virtualization). For interrupts it is the same. For non-vcpu threads they have to block in IPC to get an interrupt, or for vcpu threads, they will be brought to their entry.
Second, passthrough interrupts, when not delivered in direct-injection mode, are routed to uvmm for handling if they are bound to a vCPU. Likewise, services provided by uvmm (such as virq) are also bound to a vCPU and therefore require forwarding through uvmm.
Yes. Direct injection will only happen when the vcpu is running.
There seems to have been a similar question in the past, but it does not seem to have been resolved.
Re: Assertion failure error in kernel vgic interrupt processing - l4-hackers - OS Site
I wonder if my questions are related to that post, and if any solutions exist.
Thanks, we need to work on it. Reproducing this situation on our side would be very valuable.
Thanks, Adam
l4-hackers mailing list -- l4-hackers@os.inf.tu-dresden.de To unsubscribe send an email to l4-hackers-leave@os.inf.tu-dresden.de
l4-hackers mailing list -- l4-hackers@os.inf.tu-dresden.de To unsubscribe send an email to l4-hackers-leave@os.inf.tu-dresden.de