KVM: x86: Clear bit12 of ICR after APIC-write VM-exit
authorTao Su <tao1.su@linux.intel.com>
Thu, 14 Sep 2023 05:55:04 +0000 (13:55 +0800)
committerSean Christopherson <seanjc@google.com>
Thu, 28 Sep 2023 17:42:16 +0000 (10:42 -0700)
When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.

Under the x2APIC section, regarding ICR, the SDM says:

  It remains readable only to aid in debugging; however, software should
  not assume the value returned by reading the ICR is the last written
  value.

I.e. the guest is allowed to set bit 12.  However, the SDM also gives KVM
free reign to do whatever it wants with the bit, so long as KVM's behavior
doesn't confuse userspace or break KVM's ABI.

Clear bit 12 so that it reads back as '0'. This approach is safer than
"do nothing" and is consistent with the case where IPI virtualization is
disabled or not supported, i.e.,

  handle_fastpath_set_x2apic_icr_irqoff() -> kvm_x2apic_icr_write()

Opportunistically replace the TODO with a comment calling out that eating
the write is likely faster than a conditional branch around the busy bit.

Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/
Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
Cc: stable@vger.kernel.org
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20230914055504.151365-1-tao1.su@linux.intel.com
[sean: tweak changelog, replace TODO with comment, drop local "val"]
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/kvm/lapic.c

index c6a5a59450210a809a27dd010fcc89f5239420dc..953700e68cf6e552d5050341408bb1399b6a5fe7 100644 (file)
@@ -2444,22 +2444,22 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 {
        struct kvm_lapic *apic = vcpu->arch.apic;
-       u64 val;
 
        /*
-        * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
-        * xAPIC, ICR writes need to go down the common (slightly slower) path
-        * to get the upper half from ICR2.
+        * ICR is a single 64-bit register when x2APIC is enabled, all others
+        * registers hold 32-bit values.  For legacy xAPIC, ICR writes need to
+        * go down the common path to get the upper half from ICR2.
+        *
+        * Note, using the write helpers may incur an unnecessary write to the
+        * virtual APIC state, but KVM needs to conditionally modify the value
+        * in certain cases, e.g. to clear the ICR busy bit.  The cost of extra
+        * conditional branches is likely a wash relative to the cost of the
+        * maybe-unecessary write, and both are in the noise anyways.
         */
-       if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
-               val = kvm_lapic_get_reg64(apic, APIC_ICR);
-               kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
-               trace_kvm_apic_write(APIC_ICR, val);
-       } else {
-               /* TODO: optimize to just emulate side effect w/o one more write */
-               val = kvm_lapic_get_reg(apic, offset);
-               kvm_lapic_reg_write(apic, offset, (u32)val);
-       }
+       if (apic_x2apic_mode(apic) && offset == APIC_ICR)
+               kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR));
+       else
+               kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
 }
 EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);