At 2025-10-06 06:00:44, "Adam Lackorzynski" <adam@l4re.org> wrote:
>Hi Stephen,
>
>I doubt this is hardware, it seldomly is. Would you be able to share
>which Arm core it is if you can?
>I'll try to reproduce here, your indication of increasing the timer
>frequency is a good hint. And knowing which core or at least category of
>core can be helpful.
>
>
>BR, Adam
>
>On Thu Oct 02, 2025 at 13:02:50 +0800, yy18513676366 wrote:
>> Hi Adam,
>> 
>> 
>> I truly appreciate your reply. 
>> I actually encountered this issue on real hardware rather than QEMU. 
>> May I ask if this problem could be related to the hardware itself? I’m not quite sure I fully understand.
>> 
>> 
>> Best regards
>> Stephen.yang
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 在 2025-09-30 15:50:22，"Adam Lackorzynski" <adam@l4re.org> 写道：
>> >Hi Stephen,
>> >
>> >ok, thanks, that's tricky indeed.
>> >
>> >In case you are doing this with QEMU, could you please make sure you
>> >have the following change in your QEMU?: https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html
>> >
>> >Or do you see this on hardware?
>> >
>> >
>> >Thanks,
>> >Adam
>> >
>> >On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
>> >> Hi Adam,
>> >> 
>> >> 
>> >> Thank you very much for your reply — it really gave me some hope.
>> >> 
>> >> 
>> >> This issue is indeed difficult to reproduce reliably, which has been one of the main challenges during my debugging.
>> >> So far, I have found that increasing the vtimer interrupt frequency, while keeping the traditional handling mode (i.e., without direct injection),
>> >> makes the problem significantly easier to reproduce. 
>> >> 
>> >> 
>> >> The relevant changes are as follows. 
>> >> 1、In this setup, the vtimer is adjusted from roughly one trigger per millisecond to approximately one trigger per microsecond, 
>> >> and the system remains stable and functional:
>> >> 
>> >> 
>> >> diff --git a/src/kern/arm/timer-arm-generic.cpp b/src/kern/arm/timer-arm-generic.cpp
>> >> index a040cf46..b4cbbceb 100644
>> >> --- a/src/kern/arm/timer-arm-generic.cpp
>> >> +++ b/src/kern/arm/timer-arm-generic.cpp
>> >> @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu)
>> >>    if (cpu == Cpu_number::boot_cpu())
>> >>      {
>> >>        _freq0 = frequency();
>> >> -      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000;
>> >> +      //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000;
>> >> +      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 1000000000;
>> >>        printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n",
>> >>               _freq0, _interval, Gtimer::counter());
>> >>        assert(_freq0);
>> >> 
>> >> 
>> >> 2、In addition, I selected the mode where interrupts are not directly injected:
>> >> diff --git a/src/Kconfig b/src/Kconfig
>> >> index 4391c996..55deeb1c 100644
>> >> --- a/src/Kconfig
>> >> +++ b/src/Kconfig
>> >> @@ -367,7 +367,7 @@ config IOMMU
>> >>  config IRQ_DIRECT_INJECT
>> >>         bool "Support direct interrupt forwarding to guests"
>> >>         depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
>> >> -       default y
>> >> +      default n
>> >>         help
>> >>           Adds support in the kernel to allow the VMM to let Fiasco directly
>> >>           forward hardware interrupts to a guest. This enables just the
>> >> 
>> >> At the moment, this is the only way I have found that can noticeably increase the reproduction rate.
>> >> Once again, thank you for your valuable time and feedback!
>> >> 
>> >> Best regards,
>> >> Stephen.yang
>> >> 
>> >> 
>> >> 
>> >> 
>> >> At 2025-09-29 00:11:41, "Adam Lackorzynski" <adam@l4re.org> wrote:
>> >> >Hi,
>> >> >
>> >> >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
>> >> >> When running a virtual machine, I encounter an assertion failure after the VM
>> >> >> has been up for some time. The kernel crashes in src/kern/arm/
>> >> >> thread-arm-hyp.cpp, specifically in the function vcpu_vgic_upcall(unsigned
>> >> >> virq):
>> >> >> 
>> >> >> vcpu_vgic_upcall(unsigned virq)
>> >> >> {
>> >> >>    ......
>> >> >>    assert(state() & Thread_vcpu_user);
>> >> >>    ......
>> >> >> }
>> >> >> 
>> >> >> Based on source code inspection and preliminary debugging, the problem seems to
>> >> >> be related to the management of the Thread_vcpu_user state.
>> >> >> 
>> >> >>   1  Under normal circumstances, the vcpu_resume path (transitioning from the
>> >> >> kernel back to the guest OS) updates the vCPU state to include
>> >> >> Thread_vcpu_user. However, if an interrupt is delivered during this transition
>> >> >> while the receiving side is not yet ready, the vCPU frequently return to the
>> >> >> kernel (via vcpu_return_to_kernel) and subsequently process the interrupt
>> >> >> through guest_irq in vcpu_entries. In this situation, the expected update of
>> >> >> Thread_vcpu_user may not yet have taken place, which seems result in the assert
>> >> >> being triggered when a VGIC interrupt is involved.
>> >> >> 
>> >> >>   2  A similar condition seems to occur in the vcpu_async_ipc path. At the end
>> >> >> of IPC handling, this function explicitly clears the Thread_vcpu_user flag. If
>> >> >> a VGIC interrupt is delivered during this phase, the absence of the expected
>> >> >> Thread_vcpu_user state seems to lead to the same assertion failure.
>> >> >> 
>> >> >> I would like to confirm if the two points above are correct, and what steps I
>> >> >> should take next to further debug this issue.
>> >> >
>> >> >Thanks for repording. At least the description sounds reasonable to me.
>> >> >
>> >> >Do you have a good way of reliably reproducing this situation?
>> >> >
>> >> >> In addition, I have some assumptions I would like to confirm:
>> >> >> 
>> >> >> First, for IPC between non-vcpu threads, the L4 microkernel handles message
>> >> >> delivery and scheduling (wake/schedule) directly, without requiring any
>> >> >> forwarding through uvmm. Similarly, interrupts bound via the interrupt
>> >> >> controller (ICU) to a non-vcpu thread or handler are also managed by the kernel
>> >> >> and scheduler, and therefore do not necessarily involve uvmm.
>> >> >
>> >> >IPCs between threads are handled by the microkernel. vcpu-thread vs.
>> >> >non-vcpu-thread is just making the difference regarding how it is
>> >> >delivered to the thread. For a non-vcpu thread the receiver has to wait
>> >> >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu
>> >> >event and bringing the vcpu to its entry. This also works without
>> >> >virtualization (note that vcpus also work without hw-virtualization).
>> >> >For interrupts it is the same. For non-vcpu threads they have to block
>> >> >in IPC to get an interrupt, or for vcpu threads, they will be brought to
>> >> >their entry.
>> >> >
>> >> >> Second, passthrough interrupts, when not delivered in direct-injection mode,
>> >> >> are routed to uvmm for handling if they are bound to a vCPU. Likewise, services
>> >> >> provided by uvmm (such as virq) are also bound to a vCPU and therefore require
>> >> >> forwarding through uvmm.
>> >> >
>> >> >Yes. Direct injection will only happen when the vcpu is running.
>> >> >
>> >> >> There seems to have been a similar question in the past, but it does not seem
>> >> >> to have been resolved.
>> >> >> 
>> >> >> Re: Assertion failure error in kernel vgic interrupt processing - l4-hackers -
>> >> >> OS Site
>> >> >> 
>> >> >> I wonder if my questions are related to that post, and if any solutions exist.
>> >> >
>> >> >Thanks, we need to work on it. Reproducing this situation on our side
>> >> >would be very valuable.
>> >> >
>> >> >
>> >> >Thanks, Adam
>> >_______________________________________________
>> >l4-hackers mailing list -- l4-hackers@os.inf.tu-dresden.de
>> >To unsubscribe send an email to l4-hackers-leave@os.inf.tu-dresden.de
>_______________________________________________
>l4-hackers mailing list -- l4-hackers@os.inf.tu-dresden.de
>To unsubscribe send an email to l4-hackers-leave@os.inf.tu-dresden.de