summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
2022-05-11PCI: mediatek-gen3: Assert resets to ensure expected init stateAngeloGioacchino Del Regno
The controller may have been left out of reset by the bootloader, in which case, before the powerup sequence, the controller will be found preconfigured with values that were set before booting the kernel: this produces a controller failure, with the result of a failure during the mtk_pcie_startup_port() sequence as the PCIe link never gets up. To ensure that we get a clean start in an expected state, assert both the PHY and MAC resets before executing the controller power-up sequence. Link: https://lore.kernel.org/r/20220404144858.92390-1-angelogioacchino.delregno@collabora.com Fixes: d3bf75b579b9 ("PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-11PCI: microchip: Add missing chained_irq_enter()/exit() callsConor Dooley
Two of the chained IRQ handlers miss their chained_irq_enter()/chained_irq_exit() calls, so add them in to avoid potentially lost interrupts. Reported by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/linux-pci/87h76b8nxc.wl-maz@kernel.org Link: https://lore.kernel.org/r/20220511095504.2273799-1-conor.dooley@microchip.com Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-11PCI: imx6: Fix PERST# start-up sequenceFrancesco Dolcini
According to the PCIe standard the PERST# signal (reset-gpio in fsl,imx* compatible dts) should be kept asserted for at least 100 usec before the PCIe refclock is stable, should be kept asserted for at least 100 msec after the power rails are stable and the host should wait at least 100 msec after it is de-asserted before accessing the configuration space of any attached device. From PCIe CEM r2.0, sec 2.6.2 T-PVPERL: Power stable to PERST# inactive - 100 msec T-PERST-CLK: REFCLK stable before PERST# inactive - 100 usec. From PCIe r5.0, sec 6.6.1 With a Downstream Port that does not support Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms before sending a Configuration Request to the device immediately below that Port. Failure to do so could prevent PCIe devices to be working correctly, and this was experienced with real devices. Move reset assert to imx6_pcie_assert_core_reset(), this way we ensure that PERST# is asserted before enabling any clock, move de-assert to the end of imx6_pcie_deassert_core_reset() after the clock is enabled and deemed stable and add a new delay of 100 msec just afterward. Link: https://lore.kernel.org/all/20220211152550.286821-1-francesco.dolcini@toradex.com Link: https://lore.kernel.org/r/20220404081509.94356-1-francesco.dolcini@toradex.com Fixes: bb38919ec56e ("PCI: imx6: Add support for i.MX6 PCIe controller") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2022-05-11PCI: qcom-ep: Move enable/disable resources code to common functionsDmitry Baryshkov
Remove code duplication by moving the code related to enabling/disabling the resources (PHY, CLK, Reset) to common functions so that they can be called from multiple places. [mani: renamed the functions and reworded the commit message] Link: https://lore.kernel.org/r/20220502104938.97033-1-manivannan.sadhasivam@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-05PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq()Rafael J. Wysocki
Calling pci_set_power_state() to put the given device into D0 in pci_pm_thaw_noirq() may cause it to restore the device's BARs, which is redundant before calling pci_restore_state(), so replace it with a direct pci_power_up() call followed by pci_update_current_state() if it returns a nonzero value, in analogy with pci_pm_default_resume_early(). Avoid code duplication by introducing a wrapper function to contain the repeating pattern and calling it in both places. Link: https://lore.kernel.org/r/3639079.MHq7AAxBmi@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Rearrange pci_set_power_state()Rafael J. Wysocki
The part of pci_set_power_state() related to transitions into low-power states is unnecessary convoluted, so clearly divide it into the D3cold special case and the general case covering all of the other states. Also fix a potential issue with calling pci_bus_set_current_state() to set the current state of all devices on the subordinate bus to D3cold without checking if the power state of the parent bridge has really changed to D3cold. Link: https://lore.kernel.org/r/2139440.Mh6RI2rZIc@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-05PCI/PM: Clean up pci_set_low_power_state()Rafael J. Wysocki
Make the following assorted non-essential changes in pci_set_low_power_state(): 1. Drop two redundant checks from it (the caller takes care of these conditions). 2. Change the log level of a messages printed by it to "debug", because it only indicates a programming mistake. Link: https://lore.kernel.org/r/2539071.Lt9SDvczpP@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-05PCI/PM: Do not restore BARs if device is not in D0Rafael J. Wysocki
Do not attempt to restore the device's BARs in pci_set_full_power_state() if the actual current power state of the device is not D0. Link: https://lore.kernel.org/r/1849718.CQOukoFCf9@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Split pci_power_up()Rafael J. Wysocki
One of the two callers of pci_power_up() invokes pci_update_current_state() and pci_restore_state() right after calling it, in which case running the part of it happening after the mandatory transition delays is redundant, so move that part out of it into a new function called pci_set_full_power_state() that will be invoked from pci_set_power_state() which is the other caller of pci_power_up(). Link: https://lore.kernel.org/r/1942150.usQuhbGJ8B@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Write 0 to PMCSR in pci_power_up() in all casesRafael J. Wysocki
Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in order to put it into D0 regardless of the power state returned by the previous read from that register which should not matter. Link: https://lore.kernel.org/r/5748066.MhkbZ0Pkbq@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Do not call pci_update_current_state() from pci_power_up()Rafael J. Wysocki
Notice that calling pci_update_current_state() from pci_power_up() is redundant and may be harmful in some cases. First, if the device is in a low-power state before pci_power_up() gets called for it and platform_pci_set_power_state() successfully changes its power state to D0, pci_update_current_state() will update current_state to reflect that and pci_power_up() will return success right away without restoring the device's BARs or reconfiguring ASPM which may be necessary. This is arguably incorrect and definitely inconsistent with the case when platform_pci_set_power_state() returns an error (for example, because the device is not power-manageable by the platform firmware). Second, current_state should not be overwritten until the decision whether or not to restore the device's BARs is made, because that decision generally depends on its value. Again, calling pci_update_current_state() in pci_power_up() is not consistent with this observation. Next, pci_power_up() attempts to read from the device's PCI_PM_CTRL register regardless of the current_state value unless it is PCI_D0, including the case when pci_update_current_state() sets current_state to PCI_D3cold to indicate that the device is not accessible. If the register read is not successful, current_state will be set to PCI_D3cold anyway, so that pci_update_current_state() action is redundant. Further, if pci_update_current_state() reads the device's PCI_PM_CTRL register, pci_power_up() will repeat that read going forward and it is not necessary to update current_state in the meantime. Finally, if pm_cap is not set (in which case the PCI_PM_CTRL register is not present), the power state of the device should be determined with the help of the platform firmware or set to D0 if that's not possible and pci_update_current_state() does not do that. Accordingly, rearrange pci_power_up() so as to address the above shortcomings. Link: https://lore.kernel.org/r/3695055.kQq0lBPeGt@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Unfold pci_platform_power_transition() in pci_power_up()Rafael J. Wysocki
Some actions carried out by pci_platform_power_transition(() in pci_power_up() are redundant, but before dealing with them, make pci_power_up() call the pci_platform_power_transition() code directly (and avoid a redundant check when pm_cap is unset while at it). Link: https://lore.kernel.org/r/1922486.PYKUYFuaPT@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Set current_state to D3cold if the device is not accessibleRafael J. Wysocki
Make pci_power_up() and pci_set_low_power_state() change current_state to PCI_D3cold when the device is not accessible along the lines of pci_update_current_state(). Link: https://lore.kernel.org/r/10104376.nUPlyArG6x@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Relocate pci_set_low_power_state()Rafael J. Wysocki
Because pci_set_power_state() is the only caller of pci_set_low_power_state(), put the latter next to the former. No functional impact. Link: https://lore.kernel.org/r/3202976.44csPzL39Z@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Split pci_raw_set_power_state()Rafael J. Wysocki
The transitions from low-power states to D0 and the other way around are unnecessarily tangled in pci_raw_set_power_state() which makes it rather hard to follow. Moreover, the only caller of pci_raw_set_power_state() passing PCI_D0 as its state argument is pci_power_up(), so the code carrying out transitions into D0 can be put directly into that function. Accordingly, move the code handling transitions from low-power states into D0 directly into pci_power_up() and rename the remaining part of pci_raw_set_power_state() to pci_set_low_power_state(), because it only handles transitions into low-power state now. While at it, fix up some white space, update some comments and modify messages printed by pci_power_up() and pci_set_low_power_state() to be less confusing (which is the only expected functional impact of this change). Link: https://lore.kernel.org/r/13038676.uLZWGnKmhe@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Rearrange pci_update_current_state()Rafael J. Wysocki
Save one config space access in pci_update_current_state() by testing the retrieved PCI_PM_CTRL register value against PCI_POSSIBLE_ERROR() instead of invoking pci_device_is_present() separately. While at it, drop a pair of unnecessary parens. No expected functional impact. Link: https://lore.kernel.org/r/1917095.PYKUYFuaPT@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-05PCI/PM: Drop the runtime_d3cold device flagRafael J. Wysocki
The runtime_d3cold flag is not needed any more, so drop it. Link: https://lore.kernel.org/r/8077784.T7Z3S40VBb@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-05PCI/PM: Resume subordinate bus in bus type callbacksRafael J. Wysocki
Calling pci_resume_bus() on the secondary bus from pci_power_up() as it is done now is questionable, because it depends on the mandatory bridge power-up delays that are only covered by the PCI bus type PM callbacks. For this reason, move the subordinate bus resume to those callbacks too and use the observation that if a bridge is being powered-up during resume from system-wide suspend, it may be still desirable to runtime-resume its subordinate bus after completing the system-wide transition (in case the resume of the devices on that bus is skipped during it). Link: https://lore.kernel.org/r/3190097.aeNJFYEL58@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-05PCI/PM: Power up all devices during runtime resumeRafael J. Wysocki
Currently, endpoint devices may not be powered up entirely during runtime resume that follows a D3hot -> D0 transition of the parent bridge. Namely, even if the power state of an endpoint device, as indicated by its PCI_PM_CTRL register, is D0 after powering up its parent bridge, it may be still necessary to bring its ACPI companion into D0 and that should be done before accessing it. However, the current code assumes that reading the PCI_PM_CTRL register is sufficient to establish the endpoint device's power state, which may lead to problems. Address that by forcing a power-up of all PCI devices, including the platform firmware part of it, during runtime resume. Link: https://lore.kernel.org/linux-pm/11967527.O9o76ZdvQC@kreacher Fixes: 5775b843a619 ("PCI: Restore config space on runtime resume despite being unbound") Link: https://lore.kernel.org/r/2652115.mvXUDI8C0e@kreacher Reported-by: Abhishek Sahu <abhsahu@nvidia.com> Tested-by: Abhishek Sahu <abhsahu@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-05PCI/PM: Define pci_restore_standard_config() only for CONFIG_PM_SLEEPKrzysztof Kozlowski
pci_restore_standard_config() was defined under CONFIG_PM but called only by pci_pm_resume() (defined under CONFIG_SUSPEND) and pci_pm_restore() (defined under CONFIG_HIBERNATE_CALLBACKS). A configuration with only CONFIG_PM leads to a warning: drivers/pci/pci-driver.c:533:12: error: ‘pci_restore_standard_config’ defined but not used [-Werror=unused-function] CONFIG_PM_SLEEP depends on CONFIG_SUSPEND and CONFIG_HIBERNATE_CALLBACKS, so define pci_restore_standard_config() under that instead. Link: https://lore.kernel.org/r/20220420141135.444820-1-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-05-03PCI: qcom: Remove ddrss_sf_tbu clock from SC8180XBjorn Andersson
The Qualcomm SC8180X platform was piggy-backing on the SM8250 qcom_pcie_cfg, but SC8180X doesn't have the ddrss_sf_tbu clock, so it now fails to probe due to the missing clock. Give SC8180X its own qcom_pcie_cfg, without the ddrss_sf_tbu flag set. Fixes: 0614f98bbb9f ("PCI: qcom: Add ddrss_sf_tbu flag") Link: https://lore.kernel.org/r/20220331013415.592748-1-bjorn.andersson@linaro.org Tested-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com>
2022-05-03PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot timeDexuan Cui
Currently when the pci-hyperv driver finishes probing and initializing the PCI device, it sets the PCI_COMMAND_MEMORY bit; later when the PCI device is registered to the core PCI subsystem, the core PCI driver's BAR detection and initialization code toggles the bit multiple times, and each toggling of the bit causes the hypervisor to unmap/map the virtual BARs from/to the physical BARs, which can be slow if the BAR sizes are huge, e.g., a Linux VM with 14 GPU devices has to spend more than 3 minutes on BAR detection and initialization, causing a long boot time. Reduce the boot time by not setting the PCI_COMMAND_MEMORY bit when we register the PCI device (there is no need to have it set in the first place). The bit stays off till the PCI device driver calls pci_enable_device(). With this change, the boot time of such a 14-GPU VM is reduced by almost 3 minutes. Link: https://lore.kernel.org/lkml/20220419220007.26550-1-decui@microsoft.com/ Tested-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Jake Oshins <jakeo@microsoft.com> Link: https://lore.kernel.org/r/20220502074255.16901-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-05-02pci/msi: Stop warning for MSI enabling failureChris Wilson
If the MSI is already enabled, trying to enable it again results in an -EINVAL and on the first attempt a WARN. That WARN causes our CI to abort the run [on each first attempt to suspend]: <4> [463.142025] WARNING: CPU: 0 PID: 2225 at drivers/pci/msi.c:1074 __pci_enable_msi_range+0x3cb/0x420 <4> [463.142026] Modules linked in: snd_hda_intel i915 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec btusb btrtl btbcm btintel e1000e bluetooth snd_hwdep snd_hda_core ptp ecdh_generic snd_pcm ecc pps_core mei_me mei prime_numbers [last unloaded: i915] <4> [463.142045] CPU: 0 PID: 2225 Comm: kworker/u8:14 Tainted: G U 5.7.0-rc2-CI-CI_DRM_8350+ #1 <4> [463.142046] Hardware name: Intel Corporation NUC7i5BNH/NUC7i5BNB, BIOS BNKBL357.86A.0060.2017.1214.2013 12/14/2017 <4> [463.142049] Workqueue: events_unbound async_run_entry_fn <4> [463.142051] RIP: 0010:__pci_enable_msi_range+0x3cb/0x420 <4> [463.142053] Code: 76 58 49 8d 56 48 48 89 df e8 31 73 fd ff e9 20 fe ff ff 31 f6 48 89 df e8 c2 e9 fd ff e9 d6 fe ff ff 45 89 fc e9 1a ff ff ff <0f> 0b 41 bc ea ff ff ff e9 0d ff ff ff 41 bc ea ff ff ff e9 02 ff <4> [463.142054] RSP: 0018:ffffc90000593cd0 EFLAGS: 00010202 <4> [463.142056] RAX: 0000000000000010 RBX: ffff888274051000 RCX: 0000000000000000 <4> [463.142057] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888274051000 <4> [463.142058] RBP: ffff888238aa1018 R08: 0000000000000001 R09: 0000000000000001 <4> [463.142060] R10: ffffc90000593d90 R11: 00000000c79cdfd5 R12: ffff8882740510b0 <4> [463.142061] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 <4> [463.142062] FS: 0000000000000000(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000 <4> [463.142064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [463.142065] CR2: 000055706f347d80 CR3: 0000000005610003 CR4: 00000000003606f0 <4> [463.142066] Call Trace: <4> [463.142073] pci_enable_msi+0x11/0x20 <4> [463.142077] azx_resume+0x1ab/0x200 [snd_hda_intel] <4> [463.142080] ? pci_pm_thaw+0x80/0x80 <4> [463.142084] dpm_run_callback+0x64/0x280 <4> [463.142089] device_resume+0xd4/0x1c0 <4> [463.142093] ? dpm_watchdog_set+0x60/0 While this would appear to be a bug in snd-hda, it does appear inconsequential, at least for gfx-ci. Downgrade the warning to an info, like the other already-enabled error for MSI-X. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1687 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423082753.3899018-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-04-28PCI: hv: Fix hv_arch_irq_unmask() for multi-MSIJeffrey Hugo
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first MSI of the N allocated. This is because only the first msi_desc is cached and it is shared by all the MSIs of the multi-MSI block. This means that hv_arch_irq_unmask() gets the correct address, but the wrong data (always 0). This can break MSIs. Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0. hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure the MSI address and data (0) to vector 34 of CPU0. This is correct. Then hv_arch_irq_unmask is called on MSI1. It uses another hypercall to configure the MSI address and data (0) to vector 33 of CPU0. This is wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux will observe extra instances of MSI1 and no instances of MSI0 despite the endpoint device behaving correctly. For the multi-MSI case, we need unique address and data info for each MSI, but the cached msi_desc does not provide that. However, that information can be gotten from the int_desc cached in the chip_data by compose_msi_msg(). Fix the multi-MSI case to use that cached information instead. Since hv_set_msi_entry_from_desc() is no longer applicable, remove it. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-28PCI: portdrv: Set driver_managed_dmaLu Baolu
If a switch lacks ACS P2P Request Redirect, a device below the switch can bypass the IOMMU and DMA directly to other devices below the switch, so all the downstream devices must be in the same IOMMU group as the switch itself. The existing VFIO framework allows the portdrv driver to be bound to the bridge while its downstream devices are assigned to user space. The pci_dma_configure() marks the IOMMU group as containing only devices with kernel drivers that manage DMA. Avoid this default behavior for the portdrv driver in order for compatibility with the current VFIO usage. We achieve this by setting ".driver_managed_dma = true" in pci_driver structure. It is safe because the portdrv driver meets below criteria: - This driver doesn't use DMA, as you can't find any related calls like pci_set_master() or any kernel DMA API (dma_map_*() and etc.). - It doesn't use MMIO as you can't find ioremap() or similar calls. It's tolerant to userspace possibly also touching the same MMIO registers via P2P DMA access. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220418005000.897664-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28PCI: pci_stub: Set driver_managed_dmaLu Baolu
The current VFIO implementation allows pci-stub driver to be bound to a PCI device with other devices in the same IOMMU group being assigned to userspace. The pci-stub driver has no dependencies on DMA or the IOVA mapping of the device, but it does prevent the user from having direct access to the device, which is useful in some circumstances. The pci_dma_configure() marks the iommu_group as containing only devices with kernel drivers that manage DMA. For compatibility with the VFIO usage, avoid this default behavior for the pci_stub. This allows the pci_stub still able to be used by the admin to block driver binding after applying the DMA ownership to VFIO. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220418005000.897664-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28bus: platform,amba,fsl-mc,PCI: Add device DMA ownership managementLu Baolu
The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28PCI: versatile: Remove redundant variable retvalColin Ian King
Variable retval is being assigned a value that is never read, the variable is redundant and can be removed. Cleans up clang scan build warning: drivers/pci/controller/pci-versatile.c:37:10: warning: Although the value stored to 'retval' is used in the enclosing expression, the value is never actually read from 'retval' [deadcode.DeadStores] Link: https://lore.kernel.org/r/20220418144416.86121-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-04-25PCI: hv: Fix synchronization between channel callback and hv_compose_msi_msg()Andrea Parri (Microsoft)
Dexuan wrote: "[...] when we disable AccelNet, the host PCI VSP driver sends a PCI_EJECT message first, and the channel callback may set hpdev->state to hv_pcichild_ejecting on a different CPU. This can cause hv_compose_msi_msg() to exit from the loop and 'return', and the on-stack variable 'ctxt' is invalid. Now, if the response message from the host arrives, the channel callback will try to access the invalid 'ctxt' variable, and this may cause a crash." Schematically: Hyper-V sends PCI_EJECT msg hv_pci_onchannelcallback() state = hv_pcichild_ejecting hv_compose_msi_msg() alloc and init comp_pkt state == hv_pcichild_ejecting Hyper-V sends VM_PKT_COMP msg hv_pci_onchannelcallback() retrieve address of comp_pkt 'free' comp_pkt and return comp_pkt->completion_func() Dexuan also showed how the crash can be triggered after introducing suitable delays in the driver code, thus validating the 'assumption' that the host can still normally respond to the guest's compose_msi request after the host has started to eject the PCI device. Fix the synchronization by leveraging the requestor lock as follows: - Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while holding the requestor lock) associated to the completion packet. - Retrieve the address *and call ->completion_func() within a same (requestor) critical section in hv_pci_onchannelcallback(). Reported-by: Wei Hu <weh@microsoft.com> Reported-by: Dexuan Cui <decui@microsoft.com> Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220419122325.10078-7-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-25PCI: hv: Use vmbus_requestor to generate transaction IDs for VMbus hardeningAndrea Parri (Microsoft)
Currently, pointers to guest memory are passed to Hyper-V as transaction IDs in hv_pci. In the face of errors or malicious behavior in Hyper-V, hv_pci should not expose or trust the transaction IDs returned by Hyper-V to be valid guest memory addresses. Instead, use small integers generated by vmbus_requestor as request (transaction) IDs. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220419122325.10078-3-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-25PCI: hv: Fix multi-MSI to allow more than one MSI vectorJeffrey Hugo
If the allocation of multiple MSI vectors for multi-MSI fails in the core PCI framework, the framework will retry the allocation as a single MSI vector, assuming that meets the min_vecs specified by the requesting driver. Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR domain to implement that for x86. The VECTOR domain does not support multi-MSI, so the alloc will always fail and fallback to a single MSI allocation. In short, Hyper-V advertises a capability it does not implement. Hyper-V can support multi-MSI because it coordinates with the hypervisor to map the MSIs in the IOMMU's interrupt remapper, which is something the VECTOR domain does not have. Therefore the fix is simple - copy what the x86 IOMMU drivers (AMD/Intel-IR) do by removing X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's pci_msi_prepare(). Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1649856981-14649-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-25PCI: microchip: Add a missing semicolonUwe Kleine-König
If the driver is configured as a module (after allowing this by changing PCIE_MICROCHIP_HOST from bool to tristate) the missing semicolon makes the compiler very unhappy. While there isn't a real problem as MODULE_DEVICE_TABLE always evaluates to nothing for a built-in driver, do it right for consistency with other drivers. Link: https://lore.kernel.org/r/20220420065832.14173-1-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Daire McNamara <daire.mcnamara@microchip.com>
2022-04-25PCI: mvebu: Add support for sending Set_Slot_Power_Limit messagePali Rohár
If DT supplies the 'slot-power-limit-milliwatt' property, program the value in the Slot Power Limit in the Slot Capabilities register and program the Root Port to send a Set_Slot_Power_Limit Message when the Link transitions to DL_Up. Link: https://lore.kernel.org/r/20220412094946.27069-5-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org>
2022-04-25PCI: Add function for parsing 'slot-power-limit-milliwatt' DT propertyPali Rohár
Add function of_pci_get_slot_power_limit(), which parses the 'slot-power-limit-milliwatt' DT property, returning the value in milliwatts and in format ready for the PCIe Slot Capabilities Register. Link: https://lore.kernel.org/r/20220412094946.27069-4-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
2022-04-22PCI: Use driver_set_override() instead of open-codingKrzysztof Kozlowski
Use a helper to set driver_override to the reduce amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-6-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13PCI: ACPI: PM: Power up devices in D3cold before scanning themRafael J. Wysocki
The initial configuration of ACPI power resources on some systems implies that some PCI devices on them are initially in D3cold. In some cases, especially for PCIe Root Ports, this is a "logical" D3cold, meaning that the configuration space of the device is accessible, but some of its functionality may be missing, but it very well may be real D3cold, in which case the device will not be accessible at all. However, the PCI bus type driver will need to access its configuration space in order to enumerate it. To prevent possible device enumeration failures that may ensue as a result of ACPI power resources being initially in the "off" state, power up all children of the host bridge ACPI device object that hold valid _ADR objects (which indicates that they will be enumerated by the PCI bus type driver) and do that to all children of the ACPI device objects corresponding to PCI bridges (including PCIe ports). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-04-11PCI/P2PDMA: Whitelist Intel Skylake-E Root Ports at any devfnShlomo Pongratz
In 7b94b53db34f ("PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to the whitelist"), Andrew Maier added Skylake-E 2031, 2032, and 2033 Root Ports to the pci_p2pdma_whitelist[], so we assume P2PDMA between devices below these ports works. Previously we only checked the whitelist for a device at devfn 00.0 on the root bus, which is often a "host bridge". But these Skylake Root Ports may be at any devfn and there may be no "host bridge" device. Generalize pci_host_bridge_dev() so we check the first device on the root bus, whether it is devfn 00.0 or a PCIe Root Port, against the whitelist. [bhelgaas: commit log, comment] Link: https://lore.kernel.org/r/20220410105213.690-2-shlomop@pliops.com Tested-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Shlomo Pongratz <shlomop@pliops.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Andrew Maier <andrew.maier@eideticom.com>
2022-04-11PCI/PM: Avoid putting Elo i2 PCIe Ports in D3coldRafael J. Wysocki
If a Root Port on Elo i2 is put into D3cold and then back into D0, the downstream device becomes permanently inaccessible, so add a bridge D3 DMI quirk for that system. This was exposed by 14858dcc3b35 ("PCI: Use pci_update_current_state() in pci_enable_device_flags()"), but before that commit the Root Port in question had never been put into D3cold for real due to a mismatch between its power state retrieved from the PCI_PM_CTRL register (which was accessible even though the platform firmware indicated that the port was in D3cold) and the state of an ACPI power resource involved in its power management. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215715 Link: https://lore.kernel.org/r/11980172.O9o76ZdvQC@kreacher Reported-by: Stefan Gottwald <gottwald@igel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v5.15+
2022-04-11PCI: tegra194: Remove unnecessary MSI enable reg save and restoreJisheng Zhang
The integrated MSI Receiver enable register is always initialized in dw_pcie_setup_rc() which is also called in resume code path, so we don't need to save/restore the enable register during suspend/resume. Link: https://lore.kernel.org/r/20211226074910.2722-1-jszhang@kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Vidya Sagar <vidyas@nvidia.com>
2022-04-08PCI: dwc: Fix setting error return on MSI DMA mapping failureJiantao Zhang
When dma_mapping_error() returns error because of no enough memory, but dw_pcie_host_init() returns success, which will mislead the callers. Link: https://lore.kernel.org/r/30170911-0e2f-98ce-9266-70465b9073e5@huawei.com Fixes: 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume") Signed-off-by: Jianrong Zhang <zhangjianrong5@huawei.com> Signed-off-by: Jiantao Zhang <water.zhangjiantao@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org>
2022-04-08PCI: mediatek: Fix refcount leak in mtk_pcie_subsys_powerup()Miaoqian Lin
The of_find_compatible_node() function returns a node pointer with refcount incremented, We should use of_node_put() on it when done Add the missing of_node_put() to release the refcount. Link: https://lore.kernel.org/r/20220309091953.5630-1-linmq006@gmail.com Fixes: 87e8657ba99c ("PCI: mediatek: Add new method to get shared pcie-cfg base address") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Rob Herring <robh@kernel.org>
2022-04-08PCI: rockchip: Fix find_first_zero_bit() limitDan Carpenter
The ep->ob_region_map bitmap is a long and it has BITS_PER_LONG bits. Link: https://lore.kernel.org/r/20220315065944.GB13572@kili Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-04-08PCI: cadence: Fix find_first_zero_bit() limitDan Carpenter
The ep->ob_region_map bitmap is a long and it has BITS_PER_LONG bits. Link: https://lore.kernel.org/r/20220315065829.GA13572@kili Fixes: 37dddf14f1ae ("PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-04-07PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimitedMika Westerberg
Intel DG2 discrete graphics PCIe endpoints advertise L1 acceptable exit latency to be < 1us even though they can actually tolerate unlimited exit latencies just fine. Quirk the L1 acceptable exit latency for these endpoints to be unlimited so ASPM L1 can be enabled. [bhelgaas: use FIELD_GET/FIELD_PREP, wordsmith comment & commit log] Link: https://lore.kernel.org/r/20220405093810.76613-1-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-04-07Merge tag 'hyperv-fixes-signed-20220407' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Correctly propagate coherence information for VMbus devices (Michael Kelley) - Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng) - Use barrier to prevent reording when reading ring buffer (Michael Kelley) - Use virt_store_mb in favour of smp_store_mb (Andrea Parri) - Fix VMbus device object initialization (Andrea Parri) - Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri) - Fix a crash when unloading VMbus module (Guilherme G. Piccoli) * tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() Drivers: hv: balloon: Disable balloon and hot-add accordingly Drivers: hv: balloon: Support status report for larger page sizes Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer PCI: hv: Propagate coherence from VMbus device to PCI device Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device Drivers: hv: vmbus: Fix potential crash on module unload Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register() Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
2022-04-05PCI: hotplug: Clean up include filesChristophe Leroy
arch/powerpc/include/asm/prom.h includes some headers that it doesn't need itself. Add the missing headers to files that include prom.h so we can remove them from prom.h. Link: https://lore.kernel.org/r/79201f5fae8d003164ac36ed3be7789db1bc5ab4.1648833421.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-04-04PCI/ACPI: Allow D3 only if Root Port can signal and wake from D3Mario Limonciello
acpi_pci_bridge_d3(dev) returns "true" if "dev" is a hotplug bridge that can handle hotplug events while in D3. Previously this meant either: - "dev" has a _PS0 or _PR0 method (acpi_pci_power_manageable()), or - The Root Port above "dev" has a _DSD with a "HotPlugSupportInD3" property with value 1. This did not consider _PRW, which tells us about wakeup GPEs (ACPI v6.4, sec 7.3.13). Without a wakeup GPE, from an ACPI perspective the Root Port has no way of generating wakeup signals, so hotplug events will be lost if we use D3. Similarly, it did not consider _S0W, which tells us the deepest D-state from which a device can wake itself (sec 7.3.20). If _S0W tells us the device cannot wake from D3, hotplug events will again be lost if we use D3. Some platforms, e.g., AMD Yellow Carp, supply "HotPlugSupportInD3" without _PRW or with an _S0W that says the Root Port cannot wake from D3. On those platforms, we previously put bridges in D3hot, hotplug events were lost, and hotplugged devices would not be recognized without manually rescanning. Allow bridges to be put in D3 only if the Root Port can generate wakeup GPEs (wakeup.flags.valid), it can wake from D3 (_S0W), AND it has the "HotPlugSupportInD3" property. Neither Windows 10 nor Windows 11 puts the bridge in D3 when the firmware is configured this way, and this change aligns the handling of the situation to be the same. [bhelgaas: commit log, tidy "HotPlugSupportInD3" check and comment] Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/07_Power_and_Performance_Mgmt/device-power-management-objects.html?highlight=s0w#s0w-s0-device-wake-state Link: https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3 Link: https://lore.kernel.org/r/20220401034003.3166-1-mario.limonciello@amd.com Fixes: 26ad34d510a87 ("PCI / ACPI: Whitelist D3 for more PCIe hotplug ports") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-02Merge tag 'pci-v5.18-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: - Fix Hyper-V "defined but not used" build issue added during merge window (YueHaibing) * tag 'pci-v5.18-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: hv: Remove unused hv_set_msi_entry_from_desc()
2022-03-31PCI: hv: Remove unused hv_set_msi_entry_from_desc()YueHaibing
Fix the following build error: drivers/pci/controller/pci-hyperv.c:769:13: error: ‘hv_set_msi_entry_from_desc’ defined but not used [-Werror=unused-function] 769 | static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, The arm64 implementation of hv_set_msi_entry_from_desc() is not used after d06957d7a692 ("PCI: hv: Avoid the retarget interrupt hypercall in irq_unmask() on ARM64"), so remove it. Fixes: d06957d7a692 ("PCI: hv: Avoid the retarget interrupt hypercall in irq_unmask() on ARM64") Link: https://lore.kernel.org/r/20220317085130.36388-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Boqun Feng <boqun.feng@gmail.com>
2022-03-29PCI: hv: Propagate coherence from VMbus device to PCI deviceMichael Kelley
PCI pass-thru devices in a Hyper-V VM are represented as a VMBus device and as a PCI device. The coherence of the VMbus device is set based on the VMbus node in ACPI, but the PCI device has no ACPI node and defaults to not hardware coherent. This results in extra software coherence management overhead on ARM64 when devices are hardware coherent. Fix this by setting up the PCI host bus so that normal PCI mechanisms will propagate the coherence of the VMbus device to the PCI device. There's no effect on x86/x64 where devices are always hardware coherent. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1648138492-2191-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>