summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2012-07-18Merge branch 'linus' into perf/coreIngo Molnar
Pick up the latest ring-buffer fixes, before applying a new fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-16KVM guest: switch to apic_set_eoi_write, apic_writeMichael S. Tsirkin
Use apic_set_eoi_write, apic_write to avoid meedling in core apic driver data structures directly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16apic: add apic_set_eoi_write for PV useMichael S. Tsirkin
KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-14vsyscall_64: add missing ifdef CONFIG_SECCOMPWill Drewry
vsyscall_seccomp introduced a dependency on __secure_computing. On configurations with CONFIG_SECCOMP disabled, compilation will fail. Reported-by: feng xiangjun <fengxj325@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13x86/vsyscall: allow seccomp filter in vsyscall=emulateWill Drewry
If a seccomp filter program is installed, older static binaries and distributions with older libc implementations (glibc 2.13 and earlier) that rely on vsyscall use will be terminated regardless of the filter program policy when executing time, gettimeofday, or getcpu. This is only the case when vsyscall emulation is in use (vsyscall=emulate is the default). This patch emulates system call entry inside a vsyscall=emulate by populating regs->ax and regs->orig_ax with the system call number prior to calling into seccomp such that all seccomp-dependencies function normally. Additionally, system call return behavior is emulated in line with other vsyscall entrypoints for the trace/trap cases. [ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ] Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11Merge branch 'mce-ripvfix' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce Merge memory fault handling fix from Tony Luck. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-11x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faultsTony Luck
In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine check recovery if it is safe") we fixed mce_notify_process() to force a signal to the current process if it was not restartable (RIPV bit not set in MCG_STATUS). But doing it here means that the process doesn't get told the virtual address of the fault via siginfo_t->si_addr. This would prevent application level recovery from the fault. Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so that we will provide the right information with the signal. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org # 3.4+
2012-07-11KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava
While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11Merge tag 'v3.5-rc6' into x86/mceIngo Molnar
Merge Linux 3.5-rc6 before merging more code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-10x86/mm/mtrr: Slightly simplify print_mtrr_state()Jan Beulich
high_width can be easily calculated in a single expression when making use of __ffs64(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4FF71053020000780008E1B5@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-10x86/mm/mtrr: Fix alignment determination in range_to_mtrr()Jan Beulich
With the variable operated on being of "unsigned long" type, neither ffs() nor fls() are suitable to use on them, as those truncate their arguments to 32 bits. Using __ffs() and __fls() respectively at once eliminates the need to subtract 1 from their results. Additionally, with the alignment value subsequently used as a shift count, it must be enforced to be less than BITS_PER_LONG (and on 64-bit there's no need for it to be any smaller). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4FF70D54020000780008E179@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06perf/x86: Fix intel_perfmon_event_mapformattingPekka Enberg
Use tabs for "intel_perfmon_event_map" formatting in perf_event_intel.c. Signed-off-by: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1341568786-7045-1-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic/x2apic: Use multiple cluster members for the irq destination only ↵Suresh Siddha
with the explicit affinity During boot or driver load etc, interrupt destination is setup using default target cpu's. Later the user (irqbalance etc) or the driver (irq_set_affinity/ irq_set_affinity_hint) can request the interrupt to be migrated to some specific set of cpu's. In the x2apic cluster routing, for the default scenario use single cpu as the interrupt destination and when there is an explicit interrupt affinity request, route the interrupt to multiple members of a x2apic cluster specified in the cpumask of the migration request. This will minmize the vector pressure when there are lot of interrupt sources and relatively few x2apic clusters (for example a single socket server). This will allow the performance critical interrupts to be routed to multiple cpu's in the x2apic cluster (irqbalance for example uses the cache siblings etc while specifying the interrupt destination) and allow non-critical interrupts to be serviced by a single logical cpu. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-4-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic/x2apic: Limit the vector reservation to the user specified maskSuresh Siddha
For the x2apic cluster mode, vector for an interrupt is currently reserved on all the cpu's that are part of the x2apic cluster. But the interrupts will be routed only to the cluster (derived from the first cpu in the mask) members specified in the mask. So there is no need to reserve the vector in the unused cluster members. Modify __assign_irq_vector() to reserve the vectors based on the user specified irq destination mask. If the new mask is a proper subset of the currently used mask, cleanup the vector allocation on the unused cpu members. Also, allow the apic driver to tune the vector domain based on the affinity mask (which in most cases is the user-specified mask). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain ↵Suresh Siddha
membership Currently __assign_irq_vector() goes through each cpu in the specified mask until it finds a free vector in all the cpu's that are part of the same interrupt domain. We visit all the interrupt domain sibling cpus to reserve the free vector. So, when we fail to find a free vector in an interrupt domain, it is safe to continue our search with a cpu belonging to a new interrupt domain. No need to go through each cpu, if the domain containing that cpu is already visited. Use the irq_cfg's old_domain to track the visited domains and optimize the cpu traversal while finding a free vector in the given cpumask. NOTE: We can also optimize the search by using for_each_cpu() and skip the current cpu, if it is not the first cpu in the mask returned by the vector_allocation_domain(). But re-using the cfg->old_domain to track the visited domains will be slightly faster. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Uncore filter support for SandyBridge-EPYan, Zheng
This patch adds C-Box and PCU filter support for SandyBridge-EP uncore. We can filter C-Box events by thread/core ID and filter PCU events by frequency/voltage. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341381616-12229-5-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Detect number of instances of uncore CBoxYan, Zheng
The CBox manages the interface between the core and the LLC, so the instances of uncore CBox is equal to number of cores. Reported-by: Andrew Cooks <acooks@gmail.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341381616-12229-4-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Fix event constraint for SandyBridge-EP C-BoxYan, Zheng
The constraint for C-Box event 0x1f should have overlap flag set. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340866596-22502-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Use 0xff as pseudo code for fixed uncore eventYan, Zheng
Stephane Eranian suggestted using 0xff as pseudo code for fixed uncore event and using the umask value to determine which of the fixed events we want to map to. So far there is at most one fixed counter in a uncore PMU. So just change the definition of UNCORE_FIXED_EVENT to 0xff. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340780953-21130-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Save a few bytes in 'struct x86_pmu'Peter Zijlstra
All these are basically boolean flags, use a bitfield to save a few bytes. Suggested-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-vsevd5g8lhcn129n3s7trl7r@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Add a microcode revision check for SNB-PEBSPeter Zijlstra
Recent Intel microcode resolved the SNB-PEBS issues, so conditionally enable PEBS on SNB hardware depending on the microcode revision. Thanks to Stephane for figuring out the various microcode revisions. Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Improve debug output in check_hw_exists()Robert Richter
It might be of interest which perfctr msr failed. Signed-off-by: Robert Richter <robert.richter@amd.com> [ added hunk to avoid GCC warn ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-5-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86/amd: Unify AMD's generic and family 15h pmusRobert Richter
There is no need for keeping separate pmu structs. We can enable amd_{get,put}_event_constraints() functions also for family 15h event. The advantage is that there is only a single pmu struct for all AMD cpus. This patch introduces functions to setup the pmu to enabe core performance counters or counter constraints. Also, cpuid checks are used instead of family checks where possible. Thus, it enables the code independently of cpu families if the feature flag is set. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Move Intel specific code to intel_pmu_init()Robert Richter
There is some Intel specific code in the generic x86 path. Move it to intel_pmu_init(). Since p4 and p6 pmus don't have fixed counters we may skip the check in case such a pmu is detected. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Rename Intel specific macrosRobert Richter
There are macros that are Intel specific and not x86 generic. Rename them into INTEL_*. This patch removes X86_PMC_IDX_GENERIC and does: $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \ arch/x86/include/asm/kvm_host.h \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_intel_ds.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'x86/microcode' into perf/coreIngo Molnar
Merge this branch because we want to rely on the newer (and saner) microcode loading and checking facilities. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'x86/cpu' into perf/coreIngo Molnar
Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Fix USER/KERNEL tagging of samplesPeter Zijlstra
Several perf interrupt handlers (PEBS,IBS,BTS) re-write regs->ip but do not update the segment registers. So use an regs->ip based test instead of an regs->cs/regs->flags based test. Reported-and-tested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-xxrt0a1zronm1sm36obwc2vy@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-01x86, microcode: Make reload interface per systemBorislav Petkov
The reload interface should be per-system so that a full system ucode reload happens (on each core) when doing echo 1 > /sys/devices/system/cpu/microcode/reload Move it to the cpu subsys directory instead of it being per-cpu. Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1340280437-7718-3-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-01x86, microcode: Sanitize per-cpu microcode reloading interfaceBorislav Petkov
Microcode reloading in a per-core manner is a very bad idea for both major x86 vendors. And the thing is, we have such interface with which we can end up with different microcode versions applied on different cores of an otherwise homogeneous wrt (family,model,stepping) system. So turn off the possibility of doing that per core and allow it only system-wide. This is a minimal fix which we'd like to see in stable too thus the more-or-less arbitrary decision to allow system-wide reloading only on the BSP: $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload ... and disable the interface on the other cores: $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload -bash: echo: write error: Invalid argument Also, allowing the reload only from one CPU (the BSP in that case) doesn't allow the reload procedure to degenerate into an O(n^2) deal when triggering reloads from all /sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes simultaneously. A more generic fix will follow. Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2012-06-30Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & Power Management patches from Len Brown. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: acpi_pad: fix power_saving thread deadlock ACPI video: Still use ACPI backlight control if _DOS doesn't exist ACPI, APEI, Avoid too much error reporting in runtime ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding ACPI: Remove one board specific WARN when ignoring timer overriding ACPI: Make acpi_skip_timer_override cover all source_irq==0 cases ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI ACPI sysfs.c strlen fix
2012-06-30Merge branches 'acpi_pad-bugzilla-42981', 'apei-bugzilla-43282', ↵Len Brown
'video-bugzilla-43168', 'bugzilla-40002' and 'bugfix-misc' into release bug fixes
2012-06-29x86/copy_user_generic: Optimize copy_user_generic with CPU erms featureFenghua Yu
According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning with Ivybridge, REG string operation using MOVSB and STOSB can provide both flexible and high-performance REG string operations in cases like memory copy. Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB). If CPU erms feature is detected, patch copy_user_generic with enhanced fast string version of copy_user_generic. A few new macros are defined to reduce duplicate code in ALTERNATIVE and ALTERNATIVE_2. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl x86, cpufeature: Catch duplicate CPU feature strings x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM x86: Fix kernel-doc warnings x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery
2012-06-27x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTORAlex Shi
There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27x86/tlb: add tlb_flushall_shift for specific CPUAlex Shi
Testing show different CPU type(micro architectures and NUMA mode) has different balance points between the TLB flush all and multiple invlpg. And there also has cases the tlb flush change has no any help. This patch give a interface to let x86 vendor developers have a chance to set different shift for different CPU type. like some machine in my hands, balance points is 16 entries on Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing help. For untested machine, do a conservative optimization, same as NHM CPU. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27x86/tlb_info: get last level TLB entry number of CPUAlex Shi
For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and instruction TLB, second level is shared TLB for both data and instructions. For hupe page TLB, usually there is just one level and seperated by 2MB/4MB and 1GB. Although each levels TLB size is important for performance tuning, but for genernal and rude optimizing, last level TLB entry number is suitable. And in fact, last level TLB always has the biggest entry number. This patch will get the biggest TLB entry number and use it in furture TLB optimizing. Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other function and data will be released after system boot up. For all kinds of x86 vendor friendly, vendor specific code was moved to its specific files. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-26x86, cpufeature: Remove stray %s, add -w to mkcapflags.plH. Peter Anvin
There was a stray %s left from testing, remove it. Add -w to the #! line (which is parsed by Perl even if the Perl interpreter is invoked explicitly on the command line) to catch these kinds of errors in the future. Reported-by: Jean Delvare <khali@linux-fr.org> Link: http://lkml.kernel.org/r/20120626143246.0c9bf301@endymion.delvare Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-25x86, cpufeature: Catch duplicate CPU feature stringsH. Peter Anvin
We had a case of duplicate CPU feature strings, a user space ABI violation, for almost two years. Make it a build error so that doesn't happen again. Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Jean Delvare <khali@linux-fr.org>
2012-06-25x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERMH. Peter Anvin
It makes sense to label "Digital Thermal Sensor" as "DTS", but unfortunately the string "dts" was already used for "Debug Store", and /proc/cpuinfo is a user space ABI. Therefore, rename this to "dtherm". This conflict went into mainline via the hwmon tree without any x86 maintainer ack, and without any kind of hint in the subject. a4659053 x86/hwmon: fix initialization of coretemp Reported-by: Jean Delvare <khali@linux-fr.org> Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com Cc: Jan Beulich <JBeulich@suse.com> Cc: <stable@vger.kernel.org> v2.6.36..v3.4 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-25iommu: Remove group_mfAlex Williamson
The iommu=group_mf is really no longer needed with the addition of ACS support in IOMMU drivers creating groups. Most multifunction devices will now be grouped already. If a device has gone to the trouble of exposing ACS, trust that it works. We can use the device specific ACS function for fixing devices we trust individually. This largely reverts bcb71abe. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-06-25KVM guest: guest side for eoi avoidanceMichael S. Tsirkin
The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. Guest tests it using a single est and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. I run a simple microbenchmark to show exit reduction (note: for testing, need to apply follow-up patch 'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host): Before: Performance counter stats for 'sleep 1s': 47,357 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 5,001 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 22,124 kvm:kvm_apic [99.98%] 49,849 kvm:kvm_exit [99.98%] 21,115 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 22,937 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.98%] 22,207 kvm:kvm_apic_accept_irq [99.98%] 22,421 kvm:kvm_eoi [99.98%] 0 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 57 kvm:kvm_emulate_insn [99.99%] 0 kvm:vcpu_match_mmio [99.99%] 0 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 23,609 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [99.99%] 226 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002100578 seconds time elapsed After: Performance counter stats for 'sleep 1s': 28,354 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 1,347 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 1,931 kvm:kvm_apic [99.98%] 29,595 kvm:kvm_exit [99.98%] 24,884 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 1,986 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.99%] 25,953 kvm:kvm_apic_accept_irq [99.99%] 26,132 kvm:kvm_eoi [99.99%] 26,593 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 284 kvm:kvm_emulate_insn [99.99%] 68 kvm:vcpu_match_mmio [99.99%] 68 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 28,288 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [100.00%] 588 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002039622 seconds time elapsed We see that # of exits is almost halved. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25perf/x86: Fix section mismatch in uncore_pci_init()Robert Richter
Fix section mismatch in uncore_pci_init(): WARNING: vmlinux.o(.init.text+0x9246): Section mismatch in reference from the function uncore_pci_init() to the function .devexit.text:uncore_pci_remove() The function __init uncore_pci_init() references a function __devexit uncore_pci_remove(). [...] Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: <a.p.zijlstra@chello.nl> Cc: <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/20120620163927.GI5046@erda.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20x86, reboot: Drop redundant write of reboot_modeH. Peter Anvin
We write reboot_mode to BIOS location 0x472 in native_machine_emergency_restart() (reboot.c:542) already, there is no need to then write it again in machine_real_restart(). This means nothing gets written there for MRR_APM, but the APM call is a poweroff call and doesn't use this memory location. Link: http://lkml.kernel.org/n/tip-3i0pfh44c1e3jv5lab0cf7sc@git.kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-20x86: Move call to print_modules() out of show_regs()Jan Beulich
Printing the list of loaded modules is really unrelated to what this function is about, and is particularly unnecessary in the context of the SysRQ key handling (gets printed so far over and over). It should really be the caller of the function to decide whether this piece of information is useful (and to avoid redundantly printing it). Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4FDF21A4020000780008A67F@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20x86/microcode: Mark microcode_id[] as __initconstJan Beulich
It's not being used for other than creating module aliases (i.e. no loadable section has any reference to it). Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4FDF1EFD020000780008A65D@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20x86/nmi: Clean up register_nmi_handler() usageLi Zhong
Implement a cleaner and easier to maintain version for the section warning fixes implemented in commit eeaaa96a3a21 ("x86/nmi: Fix section mismatch warnings on 32-bit"). Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1340049393-17771-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20Merge commit 'v3.5-rc3' into x86/debugIngo Molnar
Merge it in to pick up a fix that we are going to clean up in this branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18perf/x86: Lowercase uncore PMU event namesPeter Zijlstra
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-ucnds8gkve4x3s4biuukyph3@git.kernel.org [ Trivial build fix ] Signed-off-by: Ingo Molnar <mingo@kernel.org>