Age | Commit message (Collapse) | Author |
|
The IPIs come in as HVI not EE, so we need to test the appropriate
SRR1 bits. The encoding is such that it won't have false positives
on P7 and P8 so we can just test it like that. We also need to handle
the icp-opal variant of the flush.
Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for the kexec_file_load() syscall, which is a prereq for
secure and trusted boot.
- Prevent kernel execution of userspace on P9 Radix (similar to
SMEP/PXN).
- Sort the exception tables at build time, to save time at boot, and
store them as relative offsets to save space in the kernel image &
memory.
- Allow building the kernel with thin archives, which should allow us
to build an allyesconfig once some other fixes land.
- Build fixes to allow us to correctly rebuild when changing the
kernel endian from big to little or vice versa.
- Plumbing so that we can avoid doing a full mm TLB flush on P9
Radix.
- Initial stack protector support (-fstack-protector).
- Support for dumping the radix (aka. Linux) and hash page tables via
debugfs.
- Fix an oops in cxl coredump generation when cxl_get_fd() is used.
- Freescale updates from Scott: "Highlights include 8xx hugepage
support, qbman fixes/cleanup, device tree updates, and some misc
cleanup."
- Many and varied fixes and minor enhancements as always.
Thanks to:
Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz,
Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar
Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff
Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin,
Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N.
Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica
Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj
Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain"
[ And thanks to Michael, who took time off from a new baby to get this
pull request done. - Linus ]
* tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits)
powerpc/fsl/dts: add FMan node for t1042d4rdb
powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb
powerpc/fsl/dts: add QMan and BMan nodes on t1024
powerpc/fsl/dts: add QMan and BMan nodes on t1023
soc/fsl/qman: test: use DEFINE_SPINLOCK()
powerpc/fsl-lbc: use DEFINE_SPINLOCK()
powerpc/8xx: Implement support of hugepages
powerpc: get hugetlbpage handling more generic
powerpc: port 64 bits pgtable_cache to 32 bits
powerpc/boot: Request no dynamic linker for boot wrapper
soc/fsl/bman: Use resource_size instead of computation
soc/fsl/qe: use builtin_platform_driver
powerpc/fsl_pmc: use builtin_platform_driver
powerpc/83xx/suspend: use builtin_platform_driver
powerpc/ftrace: Fix the comments for ftrace_modify_code
powerpc/perf: macros for power9 format encoding
powerpc/perf: power9 raw event format encoding
powerpc/perf: update attribute_group data structure
powerpc/perf: factor out the event format field
powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This release has a few updates:
- STM can hook into the function tracer
- Function filtering now supports more advance glob matching
- Ftrace selftests updates and added tests
- Softirq tag in traces now show only softirqs
- ARM nop added to non traced locations at compile time
- New trace_marker_raw file that allows for binary input
- Optimizations to the ring buffer
- Removal of kmap in trace_marker
- Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
- Other various fixes and clean ups"
* tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
selftests: ftrace: Shift down default message verbosity
kprobes/trace: Fix kprobe selftest for newer gcc
tracing/kprobes: Add a helper method to return number of probe hits
tracing/rb: Init the CPU mask on allocation
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
fgraph: Handle a case where a tracer ignores set_graph_notrace
tracing: Replace kmap with copy_from_user() in trace_marker writing
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
tracing: Allow benchmark to be enabled at early_initcall()
tracing: Have system enable return error if one of the events fail
tracing: Do not start benchmark on boot up
tracing: Have the reg function allow to fail
ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
ring-buffer: Froce rb_update_write_stamp() to be inlined
ring-buffer: Force inline of hotpath helper functions
tracing: Make __buffer_unlock_commit() always_inline
tracing: Make tracepoint_printk a static_key
ring-buffer: Always inline rb_event_data()
ring-buffer: Make rb_reserve_next_event() always inlined
...
|
|
Some tracepoints have a registration function that gets enabled when the
tracepoint is enabled. There may be cases that the registraction function
must fail (for example, can't allocate enough memory). In this case, the
tracepoint should also fail to register, otherwise the user would not know
why the tracepoint is not working.
Cc: David Howells <dhowells@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Commit 2965faa5e03d ("kexec: split kexec_load syscall from kexec core
code") introduced CONFIG_KEXEC_CORE so that CONFIG_KEXEC means whether
the kexec_load system call should be compiled-in and CONFIG_KEXEC_FILE
means whether the kexec_file_load system call should be compiled-in.
These options can be set independently from each other.
Since until now powerpc only supported kexec_load, CONFIG_KEXEC and
CONFIG_KEXEC_CORE were synonyms. That is not the case anymore, so we
need to make a distinction. Almost all places where CONFIG_KEXEC was
being used should be using CONFIG_KEXEC_CORE instead, since
kexec_file_load also needs that code compiled in.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Merge the topic branch we're sharing with the kvm-ppc tree.
|
|
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.
It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
PHB, PE (and by association MVE) numbers are printed as a mix of decimal
and hexadecimal throughout the kernel. This can be misleading, so make
them all hexadecimal.
Standardising on hex instead of dec because:
- PHB numbers are presented in hex in sysfs/debugfs (and lspci, etc)
- PE numbers are presented as hex in sysfs and parsed in hex in debugfs
The only place I think this could cause confusing are the messages during
boot, i.e.
pci 000a:01 : [PE# 000] Secondary bus 1 associated with PE#0
which can be a quick way to check PE numbers. pe_level_printk() will
only print two characters instead of three, so the above would be
pci 000a:01 : [PE# 00] Secondary bus 1 associated with PE#0
which gives a hint it's in hex.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Whenever a PE is initialised in powernv, opal_pci_eeh_freeze_clear() is
called. This is to remove any existing freeze, and has no negative side
effects if the PE is already in an unfrozen state. On PHB backends that
don't support this operation and return OPAL_UNSUPPORTED, this creates a
scary and misleading warning message.
Skip the warning message on init if OPAL_UNSUPPORTED is returned.
As far as I'm aware, this currently only affects NPUs.
Fixes: 313483d ("powerpc/powernv: Unfreeze PE on allocation")
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This condenses the opal node searching into a single function that finds
all compatible nodes, instead of just searching the ibm,opal children,
for ipmi, flash, and prd similar to how opal-i2c nodes are found.
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
- Use gas sections for arranging exception vectors et. al.
- Large set of TM cleanups and selftests (Cyril Bur)
- Enable transactional memory (TM) lazily for userspace (Cyril Bur)
- Support for XZ compression in the zImage wrapper (Oliver
O'Halloran)
- Add support for bpf constant blinding (Naveen N. Rao)
- Beginnings of upstream support for PA Semi Nemo motherboards
(Darren Stevens)
Fixes:
- Ensure .mem(init|exit).text are within _stext/_etext (Michael
Ellerman)
- xmon: Don't use ld on 32-bit (Michael Ellerman)
- vdso64: Use double word compare on pointers (Anton Blanchard)
- powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
- powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
- powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
(Aneesh Kumar K.V)
- Fix memory leak in queue_hotplug_event() error path (Andrew
Donnellan)
- Replay hypervisor maintenance interrupt first (Nicholas Piggin)
Various performance optimisations (Anton Blanchard):
- Align hot loops of memset() and backwards_memcpy()
- During context switch, check before setting mm_cpumask
- Remove static branch prediction in atomic{, 64}_add_unless
- Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little
endian
- Set default CPU type to POWER8 for little endian builds
Cleanups & features:
- Sparse fixes/cleanups (Daniel Axtens)
- Preserve CFAR value on SLB miss caused by access to bogus address
(Paul Mackerras)
- Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
- Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU)
(Simon Guo)
- Optimise syscall entry for virtual, relocatable case (Nicholas
Piggin)
- Optimise MSR handling in exception handling (Nicholas Piggin)
- Support for kexec with Radix MMU (Benjamin Herrenschmidt)
- powernv EEH fixes (Russell Currey)
- Suprise PCI hotplug support for powernv (Gavin Shan)
- Endian/sparse fixes for powernv PCI (Gavin Shan)
- Defconfig updates (Anton Blanchard)
- KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
- cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
- cxl: replace loop with for_each_child_of_node(), remove unneeded
of_node_put() (Andrew Donnellan)
- Fix HV facility unavailable to use correct handler (Nicholas
Piggin)
- Remove unnecessary syscall trampoline (Nicholas Piggin)
- fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael
Ellerman)
- Quieten EEH message when no adapters are found (Anton Blanchard)
- powernv: Add PHB register dump debugfs handle (Russell Currey)
- Use kprobe blacklist for exception handlers & asm functions
(Nicholas Piggin)
- Document the syscall ABI (Nicholas Piggin)
- MAINTAINERS: Update cxl maintainers (Michael Neuling)
- powerpc: Remove all usages of NO_IRQ (Michael Ellerman)
Minor cleanups:
- Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur,
Frederic Barrat, Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng,
Simon Guo"
* tag 'powerpc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
powerpc/bpf: Add support for bpf constant blinding
powerpc/bpf: Implement support for tail calls
powerpc/bpf: Introduce accessors for using the tmp local stack space
powerpc/fadump: Fix build break when CONFIG_PROC_VMCORE=n
powerpc: tm: Enable transactional memory (TM) lazily for userspace
powerpc/tm: Add TM Unavailable Exception
powerpc: Remove do_load_up_transact_{fpu,altivec}
powerpc: tm: Rename transct_(*) to ck(\1)_state
powerpc: tm: Always use fp_state and vr_state to store live registers
selftests/powerpc: Add checks for transactional VSXs in signal contexts
selftests/powerpc: Add checks for transactional VMXs in signal contexts
selftests/powerpc: Add checks for transactional FPUs in signal contexts
selftests/powerpc: Add checks for transactional GPRs in signal contexts
selftests/powerpc: Check that signals always get delivered
selftests/powerpc: Add TM tcheck helpers in C
selftests/powerpc: Allow tests to extend their kill timeout
selftests/powerpc: Introduce GPR asm helper header file
selftests/powerpc: Move VMX stack frame macros to header file
selftests/powerpc: Rework FPU stack placement macros and move to header file
selftests/powerpc: Check for VSX preservation across userspace preemption
...
|
|
Pull KVM updates from Radim Krčmář:
"All architectures:
- move `make kvmconfig` stubs from x86
- use 64 bits for debugfs stats
ARM:
- Important fixes for not using an in-kernel irqchip
- handle SError exceptions and present them to guests if appropriate
- proxying of GICV access at EL2 if guest mappings are unsafe
- GICv3 on AArch32 on ARMv8
- preparations for GICv3 save/restore, including ABI docs
- cleanups and a bit of optimizations
MIPS:
- A couple of fixes in preparation for supporting MIPS EVA host
kernels
- MIPS SMP host & TLB invalidation fixes
PPC:
- Fix the bug which caused guests to falsely report lockups
- other minor fixes
- a small optimization
s390:
- Lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups and fixes
x86:
- IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
- Hyper-V TSC page
- per-vcpu tsc_offset in debugfs
- accelerated INS/OUTS in nVMX
- cleanups and fixes"
* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
KVM: MIPS: Drop dubious EntryHi optimisation
KVM: MIPS: Invalidate TLB by regenerating ASIDs
KVM: MIPS: Split kernel/user ASID regeneration
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
KVM: arm64: Require in-kernel irqchip for PMU support
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
KVM: PPC: BookE: Fix a sanity check
KVM: PPC: Book3S HV: Take out virtual core piggybacking code
KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
ARM: gic-v3: Work around definition of gic_write_bpr1
KVM: nVMX: Fix the NMI IDT-vectoring handling
KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
KVM: nVMX: Fix reload apic access page warning
kvmconfig: add virtio-gpu to config fragment
config: move x86 kvm_guest.config to a common location
arm64: KVM: Remove duplicating init code for setting VMID
ARM: KVM: Support vgic-v3
...
|
|
This fixes warning reported from sparse:
pci-ioda.c:451:49: warning: incorrect type in argument 2 (different base types)
Fixes: 262af557dd75 ("powerpc/powernv: Enable M64 aperatus for PHB3")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This fixes the warnings reported from sparse:
pci.c:312:33: warning: restricted __be64 degrades to integer
pci.c:313:33: warning: restricted __be64 degrades to integer
Fixes: cee72d5bb489 ("powerpc/powernv: Display diag data on p7ioc EEH errors")
Cc: stable@vger.kernel.org # v3.3+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
pnv_eeh_get_and_dump_hub_diag()
The hub diag-data type is filled with big-endian data by OPAL call
opal_pci_get_hub_diag_data(). We need convert it to CPU-endian value
before using it. The issue is reported by sparse as pointed by Michael
Ellerman:
eeh-powernv.c:1309:21: warning: restricted __be16 degrades to integer
This converts hub diag-data type to CPU-endian before using it in
pnv_eeh_get_and_dump_hub_diag().
Fixes: 2a485ad7c88d ("powerpc/powernv: Drop PHB operation next_error()")
Cc: stable@vger.kernel.org # v4.1+
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The PE number (@frozen_pe_no), filled by opal_pci_next_error() is in
big-endian format. It should be converted to CPU-endian before it is
passed to opal_pci_eeh_freeze_clear() when clearing the frozen state if
the PE is invalid one. As Michael Ellerman pointed out, the issue is
also detected by sparse:
eeh-powernv.c:1541:41: warning: incorrect type in argument 2 (different base types)
This passes CPU-endian PE number to opal_pci_eeh_freeze_clear() and it
should be part of commit <0f36db77643b> ("powerpc/eeh: Fix wrong printed
PE number"), which was merged to 4.3 kernel.
Fixes: 71b540adffd9 ("powerpc/powernv: Don't escalate non-existing frozen PE")
Cc: stable@vger.kernel.org # v4.3+
Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This unfreezes PE when it's initialized because the PE might be put
into frozen state in the last hot remove path. It's not harmful to
do so if the PE is already in unfrozen state.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When issuing PHB reset, OPAL API opal_pci_poll() is called to drive
the state machine in OPAL forward. However, we needn't always call
the function under some circumstances like reset deassert.
This avoids calling opal_pci_poll() when OPAL_SUCCESS is returned
from opal_pci_reset(). Except the overhead introduced by additional
one unnecessary OPAL call, I didn't run into real issue because of
this.
Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When the PE used in pnv_eeh_reset() is that of a VF,
pnv_eeh_reset_vf_pe() is used. Unlike the other reset functions called
in pnv_eeh_reset(), the VF reset doesn't require a bus, and if a bus was
missing the function would error out before resetting the VF PE.
To avoid this, reorder the VF reset function to occur before finding and
checking the bus.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
On EEH events the kernel will print a dump of relevant registers.
If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform
doesn't have EEH support, etc) this information isn't readily available.
Add a new debugfs handler to trigger a PHB register dump, so that this
information can be made available on demand.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags. This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.
The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.
Revert these cases to the previous behaviour, adding a new helper function
to do so. This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.
Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
NO_IRQ has been == 0 on powerpc for just over ten years (since commit
0ebfff1491ef ("[POWERPC] Add new interrupt mapping core and change
platforms to use it")). It's also 0 on most other arches.
Although it's fairly harmless, every now and then it causes confusion
when a driver is built on powerpc and another arch which doesn't define
NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least
some of which are to work around that problem.
So we'd like to remove it. This is fairly trivial in the arch code, we
just convert:
if (irq == NO_IRQ) to if (!irq)
if (irq != NO_IRQ) to if (irq)
irq = NO_IRQ; to irq = 0;
return NO_IRQ; to return 0;
And a few other odd cases as well.
At least for now we keep the #define NO_IRQ, because there is driver
code that uses NO_IRQ and the fixes to remove those will go via other
trees.
Note we also change some occurrences in PPC sound drivers, drivers/ps3,
and drivers/macintosh.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In commit f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.
Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.
Fixes: f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.
Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.
Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Squash a bunch of sparse warnings by making things static.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU. In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.
Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on. In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).
This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running. We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.
We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask. This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.
This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call. This function can be called in real mode. This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.
This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.
[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.
This fixes above issue by caching the PE number before the PE object
is cleared.
Fixes: 1e9167726c41 ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().
# echo 0 > /sys/bus/pci/slots/C7/power
iommu: Removing device 0000:01:00.1 from group 0
iommu: Removing device 0000:01:00.0 from group 0
Unable to handle kernel paging request for data at address 0x00000010
Faulting instruction address: 0xc00000000005d898
cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
sp: c000000fe82178a0
msr: 9000000000009033
dar: 10
dsisr: 40000000
current = 0xc000000fe815ab80
paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01
pid = 2709, comm = sh
Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
(gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
Tue Sep 6 13:37:29 AEST 2016
enter ? for help
[c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
[c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
[c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
[c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
[c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
[c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
[c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
[c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
[c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
[c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
[c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
[c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
[c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
[c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
[c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
[c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
[c000000fe8217e30] c000000000009524 system_call+0x38/0x108
It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.
Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In a situation, where Linux kernel gets notified about duplicate error log
from OPAL, it is been observed that kernel fails to remove sysfs entries
(/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
we currently search the error log/dump kobject in the kset list via
'kset_find_obj()' routine. Which eventually increment the reference count
by one, once it founds the kobject.
So, unless we decrement the reference count by one after it found the kobject,
we would not be able to release the kobject properly later.
This patch adds the 'kobject_put()' which was missing earlier.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Unsigned type is always non-negative, so the loop could not end in case
condition is never true.
The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The generic allocation code may sometimes decide to assign a prefetchable
64-bit BAR to the M32 window. In fact it may also decide to allocate
a 64-bit non-prefetchable BAR to the M64 one ! So using the resource
flags as a test to decide which window was used for PE allocation is
just wrong and leads to insane PE numbers.
Instead, compare the addresses to figure it out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename the function as agreed by Ben & Gavin]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When machine check occurs with MSR(RI=0), it means MC interrupt is
unrecoverable and kernel goes down to panic path. But the console
message still shows it as recovered. This patch fixes the MCE console
messages.
Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE
invalidate register") broke TCE invalidation on IODA2/PHB3 for real
mode.
This makes invalidate work again.
Fixes: fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This sets the type of the interrupt appropriately. We set it as follow:
- If not mapped from the device-tree, we use edge. This is the case
of the virtual interrupts and PCI MSIs for example.
- If mapped from the device-tree and #interrupt-cells is 2 (PAPR
compliant), we use the second cell to set the appropriate type
- If mapped from the device-tree and #interrupt-cells is 1 (current
OPAL on P8 does that), we assume level sensitive since those are
typically going to be the PSI LSIs which are level sensitive.
Additionally, we mark the interrupts requested via the opal_interrupts
property all level. This is a bit fishy but the best we can do until we
fix OPAL to properly expose them with a complete descriptor. It is also
correct for the current HW anyway as OPAL interrupts are currently PCI
error and PSI interrupts which are level.
Finally now that edge interrupts are properly identified, we can enable
CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if
they occur while masked, which some drivers rely upon.
This fixes issues with lost interrupts on some Mellanox adapters.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
|
|
The iommu_table_ops::exchange() callback writes new TCE to the table and
returns old value and permission mask. The old TCE value is correctly
converted from BE to CPU endian; however permission mask was calculated
from BE value and therefore always returned DMA_NONE which could cause
memory leak on LE systems using VFIO SPAPR TCE IOMMU v1 driver.
This fixes pnv_tce_xchg() to have @oldtce a CPU endian.
Fixes: 05c6cfb9dce0 ("powerpc/iommu/powernv: Release replaced TCE")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
It is now called right after platform probe, so the probe function
can just do the job.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We no long need the machine type that early, so we can move probe_machine()
to after the device-tree has been expanded. This will allow further
consolidation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We move it into early_mmu_init() based on firmware features. For PS3,
we have to move the setting of these into early_init_devtree().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
pnv_cxl_enable_phb_kernel_api() grabs a reference to the cxl module to
prevent it from being unloaded after the PHB has been switched to CX4
mode. This breaks the build when CONFIG_MODULES=n as module_mutex
doesn't exist.
However, if we don't have modules, we don't need to protect against the
case of the cxl module being unloaded. As such, split the relevant code
out into a function surrounded with #if IS_MODULE(CXL) so we don't try
to compile it if cxl isn't being compiled as a module.
Fixes: 5918dbc9b4ec ("powerpc/powernv: Add support for the cxl kernel api on the real phb")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
If the firmware encounters an error (internal or HW) during initialization
of a PHB, it might leave the device-node in the tree but mark it disabled
using the "status" property. We should check it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
M64's are the configurable 64-bit windows that cover the 64-bit MMIO
space. We used to hard code 16 windows. Newer chips might have a
variable number and might need to reserve some as well (for example
on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0
to map the 32-bit space).
So newer OPALs will provide a property we can use to know what range
of windows is available. The property is named so that it can
eventually support multiple ranges but we only use the first one for
now.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
If we don't find registers for the PHB or don't know the model
specific invalidation method, use OPAL calls instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|