Age | Commit message (Collapse) | Author |
|
[ Upstream commit c7103f650a11328f28b9fa1c95027db331b7774b ]
Broadwell made a small change to the rank target register moving the
target rank ID field up from bits 16:19 to bits 20:23.
Also found that the offset field grew by one bit in the IVY_BRIDGE to
HASWELL transition, so fix the RIR_OFFSET() macro too.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org # v3.19+
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit c4fc1956fa31003bfbe4f597e359d751568e2954 ]
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit ff15e95c82768d589957dbb17d7eb7dba7904659 ]
In commit:
eb1af3b71f9d ("Fix computation of channel address")
I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b71f9d ("Fix computation of channel address")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit eb1af3b71f9d83e45f2fd2fd649356e98e1c582c ]
Large memory Haswell-EX systems with multiple DIMMs per channel were
sometimes reporting the wrong DIMM.
Found three problems:
1) Debug printouts for socket and channel interleave were not interpreting
the register fields correctly. The socket interleave field is a 2^X
value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
2=3. 3=4).
2) Actual use of the socket interleave value didn't interpret as 2^X
3) Conversion of address to channel address was complicated, and wrong.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 6f3508f61c814ee852c199988a62bd954c50dfc1 ]
dct_sel_base_off is declared as a u64 but we're only using the lower 32
bits because of a shift wrapping bug. This can possibly truncate the
upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS
row.
Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit fcd5c4dd8201595d4c598c9cca5e54760277d687 ]
EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.
Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.
EDAC i7core: Driver loaded.
general protection fault: 0000 [#1] SMP
CPU 12
Modules linked in:
Supported: Yes
Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
< ... regs ...>
Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
Stack:
...
Call Trace:
call_timer_fn
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
intel_idle
cpuidle_idle_call
cpu_idle
Code: ...
RIP __queue_work
RSP <...>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit 12e26969b32c79018165d52caff3762135614aa1 ]
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:
.loc 1 1108 0
movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
.loc 1 1109 0
popq %rbx #
.loc 1 1108 0
movq (%rax), %rdi # _7->name,
jmp kfree #
and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.
Fix that by saving the name ptr before doing stuff to string it points to.
general protection fault: 0000 [#1] SMP
Modules linked in: ...
CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48
Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292
RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
Stack:
Call Trace:
i7core_unregister_mci.isra.9
i7core_remove
pci_device_remove
__device_release_driver
driver_detach
bus_remove_driver
pci_unregister_driver
i7core_exit
SyS_delete_module
system_call_fastpath
0x7fc9bf426536
Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSP <ffff88030da3fe28>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: <stable@vger.kernel.org> # v3.6..
Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
commit 2900ea609616c2651dec65312beeb2a6e536bc50 upstream.
In commit
7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")
NUM_CHANNELS was changed to 8 and the channel space was renumerated to
handle EN, EP, and EX configurations.
The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
got a new device presence check in the form of saw_chan_mask. However,
sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.
With the increase in NUM_CHANNELS, this loop fails at index 4 since
SB only has 4 TADs. This results in the following error on SB machines:
EDAC sbridge: Some needed devices are missing
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handle
This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
well.
After this patch:
EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c16179b550b9fd8114637a56b153c9768ea06a5 upstream.
The commit
de3910eb79ac ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")
changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 75a15a7864c9e281c74a1670b10b69d1d7ff1c82 upstream.
Commit
debe6a623d3c ("MIPS: OCTEON: Update octeon-model.h code for new SoCs.")
renamed some SoC model helper functions, but forgot to update the EDAC
drivers resulting in build failures. Fix that.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/1435747132-10954-1-git-send-email-aaro.koskinen@nokia.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
of_device_id is always used as const. See driver.of_match_table and open
firmware functions.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Sören Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1426535685-25996-10-git-send-email-fabf@skynet.be
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
type T;
identifier f;
@@
static T f (...) { ... }
@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@
-EXPORT_SYMBOL_GPL(f);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tim Small <tim@buttersideup.com>
Link: http://lkml.kernel.org/r/1426092997-30605-13-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... and do the proper thing using EDAC core facilities.
Cc: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
edac_init() does not deallocate already allocated resources on failure
path.
Found by Linux Driver Verification project (linuxtesting.org).
[ Boris: The unwind path functions have __exit annotation but are being
used in an __init function, leading to section mismatches. Drop the
section annotation and make them normal functions. ]
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... instead of manual device_create_file() and device_remove_file()
calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-9-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... instead of manual device_create_file() and device_remove_file()
calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-8-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... instead of manual device_create_file() and device_remove_file()
calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: http://lkml.kernel.org/r/1423046938-18111-7-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... instead of manual device_create_file() and device_remove_file()
calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ Add NULL terminator to i7core_dev_attrs[] caught by the build robot. ]
Reported-by: Huang Ying <ying.huang@intel.com>
Link: http://lkml.kernel.org/r/1423046938-18111-6-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... instead of possibly uninitialized return value.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-5-git-send-email-tiwai@suse.de
[ Add a commit message, albeit a small one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Instead of calling device_create_file() and device_remove_file()
manually, pass the static attribute groups with the new
edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs
files is done by a proper is_visible callback.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.
edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.
This simplifies the code a lot and avoids the possible races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The pci_dev_put() function tests whether its argument is NULL and thus
the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/54CFC12C.9010002@users.sourceforge.net
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull two EDAC fixes from Borislav Petkov:
- A fix to sb_edac for proper detection on SNB machines
- A fix to amd64_edac to not explode on Numascale machines with more
than 16 memory controllers, from Daniel J Blueman.
* tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
sb_edac: Fix detection on SNB machines
|
|
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
the kernel fatally dereferences unallocated structures, see splat below;
this occurs on at least NumaConnect systems.
Fix by checking if a memory controller info structure was found.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in:
CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015
task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
Stack:
0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
Call Trace:
<IRQ>
? vprintk_default
? printk
amd_decode_mce
notifier_call_chain
atomic_notifier_call_chain
mce_log
machine_check_poll
mce_timer_fn
? mce_cpu_restart
call_timer_fn.isra.29
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
<EOI>
? down_read_trylock
__do_page_fault
? __schedule
do_page_fault
page_fault
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
Cc: stable@vger.kernel.org
[ Boris: massage commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
d0585cd815fa ("sb_edac: Claim a different PCI device") changed the
probing of sb_edac to look for PCI device 0x3ca0:
3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
...
but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
in sbridge_probe() therefore the probing fails.
Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
.i.e., the 14.0 device, fixes the issue and driver loads successfully
again:
[ 2449.013120] EDAC DEBUG: sbridge_init:
[ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
...
Add a debug printk while at it to be able to catch the failure in the
future and dump driver version on successful load.
Fixes: d0585cd815fa ("sb_edac: Claim a different PCI device")
Cc: stable@vger.kernel.org # 3.18
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
If edac_mc_add_mc() fails then we should preserve the error code, but
instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20150128191351.GC10259@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fix sparse warnings.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.
Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Remove the function i5100_recmema_dm_buf_id() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Link: http://lkml.kernel.org/r/1420999030-21770-1-git-send-email-rickard_strandqvist@spectrumdigital.se
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Add EDAC support for ecc errors reporting on the synopsys ddr
controller. The ddr ecc controller corrects single bit errors and
detects double bit errors.
Selected important-ish notes from the changelog:
- I have not taken care of spliting synps_edac_geterror_info function as
it adds additional indentation levels and moreover the existing changes
were made as part of the v2 review comments
- Removed dt binding info as already there is a binding info available
under memorycontroller. so, updated ecc info there.
- Shortened the prefix "sysnopsys" to "synps"
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Link: http://lkml.kernel.org/r/a728a8d4678f4dbf9de189a480297c3d@BY2FFO11FD034.protection.gbl
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
s/kenel/kernel/g
[ Boris: massage commit message a bit ]
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: http://lkml.kernel.org/r/1419749085-7128-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fixes the following sparse warnings:
drivers/edac/mce_amd_inj.c:204:3: warning:
symbol 'dfs_fls' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Link: http://lkml.kernel.org/r/1418087095-14174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac updates from Mauro Carvalho Chehab:
- Broadwell-DE support on sb-edac driver
- Some fixes at sb-edac driver
* tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: Fix typo computing number of banks
sb_edac: Add support for Broadwell-DE processor
sb_edac: Fix discovery of top-of-low-memory for Haswell
sb_edac: Fix erroneous bytes->gigabytes conversion
sb_edac: Fix off-by-one error in number of channels
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
"The biggest change in this cycle is better support for UCNA
(UnCorrected No Action) events:
"Handle all uncorrected error reports in the same way (soft
offline the page). We used to only do that for SRAO
(software recoverable action optional) machine checks, but
it makes sense to also do it for UCNA (UnCorrected No
Action) logs found by CMCI or polling."
plus various x86 MCE handling updates and fixes"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Spell "panicked" correctly
x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
x86, MCE, AMD: Assign interrupt handler only when bank supports it
x86, MCE, AMD: Drop software-defined bank in error thresholding
x86, MCE, AMD: Move invariant code out from loop body
x86, MCE, AMD: Correct thresholding error logging
x86, MCE, AMD: Use macros to compute bank MSRs
RAS, HWPOISON: Fix wrong error recovery status
GHES: Make ghes_estatus_caches static
APEI, GHES: Cleanup unnecessary function for lockless list
|
|
Pull EDAC updates from Borislav Petkov:
"EDAC updates all over the place:
- Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
support. Out of tree stuff is adding the required PCI IDs. From
Aravind Gopalakrishnan.
- Enable amd64_edac for 32-bit due to popular demand. From Tomasz
Pala.
- Convert the AMD MCE injection module to debugfs, where it belongs.
- Misc EDAC cleanups"
* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, MCE, AMD: Correct formatting of decoded text
EDAC, mce_amd_inj: Add an injector function
EDAC, mce_amd_inj: Add hw-injection attributes
EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
EDAC: Delete unnecessary check before calling pci_dev_put()
EDAC, pci_sysfs: remove unneccessary ifdef around entire file
ghes_edac: Use snprintf() to silence a static checker warning
amd64_edac: Build module on x86-32
EDAC, MCE, AMD: Add decoding table for MC6 xec
amd64_edac: Add F15h M60h support
{mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
EDAC: Sync memory types and names
EDAC: Add DDR3 LRDIMM entries to edac_mem_types
x86, amd_nb: Add device IDs to NB tables for F15h M60h
pci_ids: Add PCI device IDs for F15h M60h
|
|
Code will always think there are 16 banks because of a typo
Reported-by: Misha
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
Broadwell-DE is the microserver version of next generation Xeon
processors. A whole bunch of new PCIe device ids, but otherwise
pretty much the same as Haswell.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset. There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).
This resulted in a bogus value for the top of low memory:
EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523eac86
sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
With the fix we see the correct TOLM value:
DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
and we decode address 2000000 correctly:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523e1086
sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
This prevented edac sysfs code from properly handling 6 channels
per memory controller.
Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Selectively inject either a real MCE or a sw-only version which
exercises the decoding path only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Expose struct mce->inject_flags.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Normally, writing those causes a #GP but HWCR[McStatusWrEn] controls
that. Provide a knob.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This module's interface belongs in debugfs, not in sysfs.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.
This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.
In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test before the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net
[ Boris: commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already
modelled in the Makefile, but edac_pci_sysfs.o is still contained in
the list of files compiled even without CONFIG_PCI.
This change removes edac_pci_sysfs.o from the list of built objects
when not having CONFIG_PCI enabled and removes the then-unnecessary
ifdef from the source file.
Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|