Merge tag 'pci-v6.9-changes' of git://git./linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate interrupt related code in irq.c (Ilpo Järvinen) - Reduce kernel size by replacing sysfs resource macros with functions (Ilpo Järvinen) - Reduce kernel size by compiling sysfs support only when CONFIG_SYSFS=y (Lukas Wunner) - Avoid using Extended Tags on 3ware-9650SE Root Port to work around an apparent hardware defect (Jörg Wedekind) Resource management: - Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner) - Move pci_iomap.c and other PCI-specific devres code to drivers/pci (Philipp Stanner) - Consolidate PCI devres code in devres.c (Philipp Stanner) Power management: - Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't know how to return correctly to D0, and remove previous quirk that wasn't as specific (Daniel Drake) - Allow runtime PM when the driver enables it but doesn't need any runtime PM callbacks (Raag Jadav) - Drain runtime-idle callbacks before driver removal to avoid races between .remove() and .runtime_idle(), which caused intermittent page faults when the rtsx .runtime_idle() accessed registers that its .remove() had already unmapped (Rafael J. Wysocki) Virtualization: - Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs with VFIO, e.g., for professional audio software on many Apple machines, at the cost of leaking state between VMs (Edmund Raile) Error handling: - Print all logged TLP Prefixes, not just the first, after AER or DPC errors (Ilpo Järvinen) - Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which still don't advertise a legal size (Paul Menzel) - Ignore expected DPC Surprise Down errors on hot removal (Smita Koralahalli) - Block runtime suspend while handling AER errors to avoid races that prevent the device form being resumed from D3hot (Stanislaw Gruszka) Peer-to-peer DMA: - Use atomic XA allocation in RCU read section (Christophe JAILLET) ASPM: - Collect bits of ASPM-related code that we need even without CONFIG_PCIEASPM into aspm.c (David E. Box) - Save/restore L1 PM Substates config for suspend/resume (David E. Box) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) Endpoint framework: - Refactor and improve pci_epf_alloc_space() API (Niklas Cassel) - Clean up endpoint BAR descriptions (Niklas Cassel) - Fix ntb_register_device() name leak in error path (Yang Yingliang) - Return actual error code for pci_vntb_probe() failure (Yang Yingliang) Broadcom STB PCIe controller driver: - Fix MDIO write polling, which previously never waited for completion (Jonathan Bell) Cadence PCIe endpoint driver: - Clear the ARI "Next Function Number" of last function (Jasko-EXT Wojciech) Freescale i.MX6 PCIe controller driver: - Simplify by replacing switch statements with function pointers for different hardware variants (Frank Li) - Simplify by using clk_bulk*() API (Frank Li) - Remove redundant DT clock and reg/reg-name details (Frank Li) - Add i.MX95 DT and driver support for both Root Complex and Endpoint mode (Frank Li) Microsoft Hyper-V host bridge driver: - Reduce memory usage by limiting ring buffer size to 16KB instead of 4 pages (Michael Kelley) Qualcomm PCIe controller driver: - Add X1E80100 DT and driver support (Abel Vesa) - Add DT 'required-opps' for SoCs that require a minimum performance level (Johan Hovold) - Make DT 'msi-map-mask' optional, depending on how MSI interrupts are mapped (Johan Hovold) - Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY configuration isn't tuned correctly for L0s (Johan Hovold) - Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150, SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof Kozlowski) - Enable BDF to SID translation by disabling bypass mode (Manivannan Sadhasivam) - Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay Sarkar) Synopsys DesignWare PCIe controller driver: - Allocate 64-bit MSI address if no 32-bit address is available (Ajay Agarwal) - Fix endpoint Resizable BAR to actually advertise the required 1MB size (Niklas Cassel) MicroSemi Switchtec management driver: - Release resources if the .probe() fails (Christophe JAILLET) Miscellaneous: - Make pcie_port_bus_type const (Ricardo B. Marliere)" * tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits) PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI: hv: Fix ring buffer size calculation PCI: dwc: endpoint: Fix advertised resizable BAR size PCI: cadence: Clear the ARI Capability Next Function Number of the last function PCI: dwc: Strengthen the MSI address allocation logic PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling PCI: qcom: Add X1E80100 PCIe support dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller PCI: qcom: Enable BDF to SID translation properly PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p dt-bindings: PCI: qcom: Do not require 'msi-map-mask' dt-bindings: PCI: qcom: Allow 'required-opps' PCI/AER: Block runtime suspend when handling errors PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c ...
Merge tag 'pm-6.9-rc1' of git://git./linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "From the functional perspective, the most significant change here is the addition of support for Energy Models that can be updated dynamically at run time. There is also the addition of LZ4 compression support for hibernation, the new preferred core support in amd-pstate, new platforms support in the Intel RAPL driver, new model-specific EPP handling in intel_pstate and more. Apart from that, the cpufreq default transition delay is reduced from 10 ms to 2 ms (along with some related adjustments), the system suspend statistics code undergoes a significant rework and there is a usual bunch of fixes and code cleanups all over. Specifics: - Allow the Energy Model to be updated dynamically (Lukasz Luba) - Add support for LZ4 compression algorithm to the hibernation image creation and loading code (Nikhil V) - Fix and clean up system suspend statistics collection (Rafael Wysocki) - Simplify device suspend and resume handling in the power management core code (Rafael Wysocki) - Fix PCI hibernation support description (Yiwei Lin) - Make hibernation take set_memory_ro() return values into account as appropriate (Christophe Leroy) - Set mem_sleep_current during kernel command line setup to avoid an ordering issue with handling it (Maulik Shah) - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a driver's system suspend callback (Qingliang Li) - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus) - Add a tracepoint for runtime_status changes tracking (Vilas Bhat) - Fix section title markdown in the runtime PM documentation (Yiwei Lin) - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li) - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li) - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada) - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby) - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar) - Change default transition delay in cpufreq to 2ms (Qais Yousef) - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois) - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef) - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar) - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova) - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan) - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois) - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle) - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng) - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang) - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui) - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano) - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li) - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda) - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil) - Fix a couple of warnings in the OPP core code related to W=1 builds (Viresh Kumar) - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar) - Extend dev_pm_opp_data with turbo support (Sibi Sankar) - dt-bindings: drop maxItems from inner items (David Heidelberg)" * tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits) dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support Fix cpupower-frequency-info.1 man page typo cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpuidle: ACPI/intel: fix MWAIT hint target C-state computation PM: sleep: wakeirq: fix wake irq warning in system suspend powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function cpufreq: Don't unregister cpufreq cooling on CPU hotplug PM: suspend: Set mem_sleep_current during kernel command line setup cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max Documentation: PM: Fix runtime_pm.rst markdown syntax cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake ...
Merge branch 'pci/enumeration' - Collect interrupt-related code in irq.c (Ilpo Järvinen) - Mark 3ware-9650SE Root Port Extended Tags as broken (Jörg Wedekind) * pci/enumeration: PCI: Mark 3ware-9650SE Root Port Extended Tags as broken PCI: Place interrupt related code into irq.c # Conflicts: # drivers/pci/Makefile
Merge branch 'pci/devres' - Unmap MMIO mappings in pci_iounmap() to avoid a leak when ARCH_HAS_GENERIC_IOPORT_MAP is defined (Philipp Stanner) - Move pci_iomap.c to drivers/pci/ since it's all PCI-related (Philipp Stanner) - Move other PCI-related devres code from lib/devres.c to drivers/pci/ (Philipp Stanner) - Move other devres code from pci.c to devres.c (Philipp Stanner) * pci/devres: PCI: Move devres code from pci.c to devres.c PCI: Move PCI-specific devres code to drivers/pci/ PCI: Move pci_iomap.c to drivers/pci/ pci_iounmap(): Fix MMIO mapping leak
Merge branch 'pci/aspm' - Collect ASPM-related code into aspm.c (David E. Box) - Save and restore ASPM L1 PM Substates configuration so these states continue working after suspend/resume (David E. Box) - Move the ASPM L1.2-related LTR save/restore next to the ASPM save/restore (David E. Box) - Move the required L1 disable before L1 Substate configuration into pci_restore_aspm_l1ss_state() (Bjorn Helgaas) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) * pci/aspm: PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c PCI/ASPM: Move pci_configure_ltr() to aspm.c
PCI/ASPM: Disable L1 before configuring L1 Substates Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM Substates enable bits. Previously this was enforced by clearing PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state(). Move the L1 (and L0s, although that doesn't seem required) disable into pci_restore_aspm_l1ss_state() itself so it's closer to the code that depends on it. Link: https://lore.kernel.org/r/20240223213733.GA115410@bhelgaas Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() ASPM state is saved and restored from pci_save/restore_pcie_state(). Since the LTR Capability is linked with ASPM, move the LTR save and restore calls there as well. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-6-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240223205851.114931-6-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") restored the L1 PM Substates Capability after resume, which reduced power consumption by making the ASPM L1.x states work after resume. a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"") reverted 4ff116d0d5fd because resume failed on some systems, so power consumption after resume increased again. a7152be79b62 mentioned that we restore L1 PM substate configuration even though ASPM L1 may already be enabled. This is due the fact that the pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state(). Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec 5.5.4 more closely by: 1) Do not restore ASPM configuration in pci_restore_pcie_state() but do that after PCIe capability is restored in pci_restore_aspm_state() following PCIe r6.1, sec 5.5.4. 2) If BIOS reenables L1SS, particularly L1.2, we need to clear the enables in the right order, downstream before upstream. Defer restoring the L1SS config until we are at the downstream component. Then update the config for both ends of the link in the prescribed order. 3) Program ASPM L1 PM substate configuration before L1 enables. 4) Program ASPM L1 PM substate enables last, after rest of the fields in the capability are programmed. [bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores in pci_restore_pcie_state()] Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Co-developed-by: David E. Box <david.e.box@linux.intel.com> Reported-by: Koba Ko <koba.ko@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA Cc: Mark Enriquez <enriquezmark36@gmail.com> Cc: Thomas Witt <kernel@witt.link> Cc: Werner Sembach <wse@tuxedocomputers.com> Cc: Vidya Sagar <vidyas@nvidia.com>
mm: Introduce vmap_page_range() to map pages in PCI address space ioremap_page_range() should be used for ranges within vmalloc range only. The vmalloc ranges are allocated by get_vm_area(). PCI has "resource" allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence introduce vmap_page_range() to be used exclusively to map pages in PCI address space. Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.") Reported-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/bpf/CANiq72ka4rir+RTN2FQoT=Vvprp_Ao-CvoYEkSNqtSY+RZj+AA@mail.gmail.com
Merge branch 'pm-runtime' Merge changes related to the runtime power management of devices for 6.9-rc1: - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus). - Add a tracepoint for runtime_status changes tracking (Vilas Bhat). - Fix section title markdown in the runtime PM documentation (Yiwei Lin). * pm-runtime: Documentation: PM: Fix runtime_pm.rst markdown syntax PM: runtime: add tracepoint for runtime_status changes PM: runtime: Add pm_runtime_put_autosuspend() replacement PM: runtime: Simplify pm_runtime_get_if_active() usage
PCI/AER: Generalize TLP Header Log reading Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after AER as the struct aer_header_log_regs. Also, not all places that handle TLP Header Log use the struct and the struct members are named individually. Generalize the struct name and members, and use it consistently where TLP Header Log is being handled so that a pcie_read_tlp_log() helper can be easily added. Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: drop ixgbe changes for now, tidy whitespace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI/ASPM: Move pci_save_ltr_state() to aspm.c Even when CONFIG_PCIEASPM is not set, we save and restore the LTR Capability so that if ASPM L1.2 and LTR were configured by the platform, ASPM L1.2 will still work after suspend/resume, when that platform configuration may be lost. See dbbfadf23190 ("PCI/ASPM: Save LTR Capability for suspend/resume"). Since ASPM L1.2 depends on the LTR Capability, move the save/restore code to the part of aspm.c that is always compiled regardless of CONFIG_PCIEASPM. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-5-david.e.box@linux.intel.com [bhelgaas: commit log, reorder to make this a pure move] Link: https://lore.kernel.org/r/20240223205851.114931-4-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI/ASPM: Move pci_configure_ltr() to aspm.c The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2 state and is only configured when CONFIG_PCIEASPM is set. Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since they only build when CONFIG_PCIEASPM is set. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com [bhelgaas: commit log, split build change from function moves] Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Merge tag 'pci-v6.8-fixes-3' of git://git./linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Keep bridges in D0 if we need to poll downstream devices for PME to resolve a v6.6 regression where we failed to enumerate devices below bridges put in D3hot by runtime PM, e.g., NVMe drives connected via Thunderbolt or USB4 docks (Alex Williamson) - Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer * tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer PCI: Fix active state requirement in PME polling
PCI: Move devres code from pci.c to devres.c The file pci.c is very large and contains a number of devres functions. These functions should now reside in devres.c. Move as much devres-specific code from pci.c to devres.c as possible. There are a few callers left in pci.c that do devres operations. These should be ported in the future. Add corresponding TODOs. The reason they are not moved right now in this commit is that PCI's devres currently implements a sort of "hybrid-mode": pci_request_region(), for instance, does not have a corresponding pcim_ equivalent, yet. Instead, the function can be made managed by previously calling pcim_enable_device() (instead of pci_enable_device()). This makes it unreasonable to move pci_request_region() to devres.c. Moving the functions would require changes to PCI's API and is, therefore, left for future work. In summary, this commit serves as a preparation step for a following patch series that will cleanly separate the PCI's managed and unmanaged API. Link: https://lore.kernel.org/r/20240131090023.12331-5-pstanner@redhat.com Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PM: runtime: Simplify pm_runtime_get_if_active() usage There are two ways to opportunistically increment a device's runtime PM usage count, calling either pm_runtime_get_if_active() or pm_runtime_get_if_in_use(). The former has an argument to tell whether to ignore the usage count or not, and the latter simply calls the former with ign_usage_count set to false. The other users that want to ignore the usage_count will have to explicitly set that argument to true which is a bit cumbersome. To make this function more practical to use, remove the ign_usage_count argument from the function. The main implementation is in a static function called pm_runtime_get_conditional() and implementations of pm_runtime_get_if_active() and pm_runtime_get_if_in_use() are moved to runtime.c. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Takashi Iwai <tiwai@suse.de> # sound/ Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> # drivers/accel/ivpu/ Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # drivers/gpu/drm/i915/ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PCI: Fix active state requirement in PME polling The commit noted in fixes added a bogus requirement that runtime PM managed devices need to be in the RPM_ACTIVE state for PME polling. In fact, only devices in low power states should be polled. However there's still a requirement that the device config space must be accessible, which has implications for both the current state of the polled device and the parent bridge, when present. It's not sufficient to assume the bridge remains in D0 and cases have been observed where the bridge passes the D0 test, but the PM state indicates RPM_SUSPENDING and config space of the polled device becomes inaccessible during pci_pme_wakeup(). Therefore, since the bridge is already effectively required to be in the RPM_ACTIVE state, formalize this in the code and elevate the PM usage count to maintain the state while polling the subordinate device. This resolves a regression reported in the bugzilla below where a Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint downstream of a bridge in a D3hot power state. Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.com Fixes: d3fcd7360338 ("PCI: Fix runtime PM race with PME polling") Reported-by: Sanath S <sanath.s@amd.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Sanath S <sanath.s@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
PCI/ASPM: Fix deadlock when enabling ASPM A last minute revert in 6.7-final introduced a potential deadlock when enabling ASPM during probe of Qualcomm PCIe controllers as reported by lockdep: ============================================ WARNING: possible recursive locking detected 6.7.0 #40 Not tainted -------------------------------------------- kworker/u16:5/90 is trying to acquire lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc but task is already holding lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(pci_bus_sem); lock(pci_bus_sem); *** DEADLOCK *** Call trace: print_deadlock_bug+0x25c/0x348 __lock_acquire+0x10a4/0x2064 lock_acquire+0x1e8/0x318 down_read+0x60/0x184 pcie_aspm_pm_state_change+0x58/0xdc pci_set_full_power_state+0xa8/0x114 pci_set_power_state+0xc4/0x120 qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom] pci_walk_bus+0x64/0xbc qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom] The deadlock can easily be reproduced on machines like the Lenovo ThinkPad X13s by adding a delay to increase the race window during asynchronous probe where another thread can take a write lock. Add a new pci_set_power_state_locked() and associated helper functions that can be called with the PCI bus semaphore held to avoid taking the read lock twice. Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 6.7
PCI: Place interrupt related code into irq.c Interrupt related code is spread into irq.c, pci.c, and setup-irq.c. Group them into pre-existing irq.c. Link: https://lore.kernel.org/r/20240129113655.3368-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Merge tag 'cxl-for-6.8' of git://git./linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: "The bulk of this update is support for enumerating the performance capabilities of CXL memory targets and connecting that to a platform CXL memory QoS class. Some follow-on work remains to hook up this data into core-mm policy, but that is saved for v6.9. The next significant update is unifying how CXL event records (things like background scrub errors) are processed between so called "firmware first" and native error record retrieval. The CXL driver handler that processes the record retrieved from the device mailbox is now the handler for that same record format coming from an EFI/ACPI notification source. This also contains miscellaneous feature updates, like Get Timestamp, and other fixups. Summary: - Add support for parsing the Coherent Device Attribute Table (CDAT) - Add support for calculating a platform CXL QoS class from CDAT data - Unify the tracing of EFI CXL Events with native CXL Events. - Add Get Timestamp support - Miscellaneous cleanups and fixups" * tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (41 commits) cxl/core: use sysfs_emit() for attr's _show() cxl/pci: Register for and process CPER events PCI: Introduce cleanup helpers for device reference counts and locks acpi/ghes: Process CXL Component Events cxl/events: Create a CXL event union cxl/events: Separate UUID from event structures cxl/events: Remove passing a UUID to known event traces cxl/events: Create common event UUID defines cxl/events: Promote CXL event structures to a core header cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe() cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge() cxl: Fix device reference leak in cxl_port_perf_data_calculate() cxl: Convert find_cxl_root() to return a 'struct cxl_root *' cxl: Introduce put_cxl_root() helper cxl/port: Fix missing target list lock cxl/port: Fix decoder initialization when nr_targets > interleave_ways cxl/region: fix x9 interleave typo cxl/trace: Pass UUID explicitly to event traces cxl/region: use %pap format to print resource_size_t cxl/region: Add dev_dbg() detail on failure to allocate HPA space ...