Age | Commit message (Collapse) | Author |
|
In protected nVHE mode the host cannot directly access
hypervisor memory, so we will dump the hypervisor stacktrace
to a shared buffer with the host.
The minimum size for the buffer required, assuming the min frame
size of [x29, x30] (2 * sizeof(long)), is half the combined size of
the hypervisor and overflow stacks plus an additional entry to
delimit the end of the stacktrace.
The stacktrace buffers are used later in the series to dump the
nVHE hypervisor stacktrace when using protected-mode.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-14-kaleshsingh@google.com
|
|
This can be used to disable stacktrace for the protected KVM
nVHE hypervisor, in order to save on the associated memory usage.
This option is disabled by default, since protected KVM is not widely
used on platforms other than Android currently.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-13-kaleshsingh@google.com
|
|
In non-protected nVHE mode, unwinds and dumps the hypervisor backtrace
from EL1. This is possible beacause the host can directly access the
hypervisor stack pages in non-protected mode.
The nVHE backtrace is dumped on hyp_panic(), before panicking the host.
[ 101.498183] kvm [377]: nVHE call trace:
[ 101.498363] kvm [377]: [<ffff8000090a6570>] __kvm_nvhe_hyp_panic+0xac/0xf8
[ 101.499045] kvm [377]: [<ffff8000090a65cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[ 101.499498] kvm [377]: [<ffff8000090a61e4>] __kvm_nvhe_recursive_death+0x24/0x34
. . .
[ 101.524929] kvm [377]: [<ffff8000090a61e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 101.525062] kvm [377]: [<ffff8000090a61e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 101.525195] kvm [377]: [<ffff8000090a5de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
[ 101.525333] kvm [377]: [<ffff8000090a8b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
[ 101.525468] kvm [377]: [<ffff8000090a88b8>] __kvm_nvhe_handle_trap+0xc4/0x128
[ 101.525602] kvm [377]: [<ffff8000090a7864>] __kvm_nvhe___host_exit+0x64/0x64
[ 101.525745] kvm [377]: ---[ end nVHE call trace ]---
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-12-kaleshsingh@google.com
|
|
Implements the common framework necessary for unwind() to work
for non-protected nVHE mode:
- on_accessible_stack()
- on_overflow_stack()
- unwind_next()
Non-protected nVHE unwind() is used to unwind and dump the hypervisor
stacktrace by the host in EL1
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-11-kaleshsingh@google.com
|
|
In non-protected nVHE mode (non-pKVM) the host can directly access
hypervisor memory; and unwinding of the hypervisor stacktrace is
done from EL1 to save on memory for shared buffers.
To unwind the hypervisor stack from EL1 the host needs to know the
starting point for the unwind and information that will allow it to
translate hypervisor stack addresses to the corresponding kernel
addresses. This patch sets up this book keeping. It is made use of
later in the series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-10-kaleshsingh@google.com
|
|
Add stub implementations of non-protected nVHE stack unwinder, for
building. These are implemented later in this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-9-kaleshsingh@google.com
|
|
On hyp stack overflow switch to 16-byte aligned secondary stack.
This provides us stack space to better handle overflows; and is
used in a subsequent patch to dump the hypervisor stacktrace.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-8-kaleshsingh@google.com
|
|
Add brief description on how to use stacktrace/common.h to implement
a stack unwinder.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-7-kaleshsingh@google.com
|
|
Move unwind() to stacktrace/common.h, and as a result
the kernel unwind_next() to asm/stacktrace.h. This allow
reusing unwind() in the implementation of the nVHE HYP
stack unwinder, later in the series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-6-kaleshsingh@google.com
|
|
The unwinder code is made reusable so that it can be used to
unwind various types of stacks. One usecase is unwinding the
nVHE hyp stack from the host (EL1) in non-protected mode. This
means that the unwinder must be able to translate HYP stack
addresses to kernel addresses.
Add a callback (stack_trace_translate_fp_fn) to allow specifying
the translation function.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-5-kaleshsingh@google.com
|
|
Move common unwind_next logic to stacktrace/common.h. This allows
reusing the code in the implementation the nVHE hypervisor stack
unwinder, later in this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-4-kaleshsingh@google.com
|
|
Move common on_accessible_stack checks to stacktrace/common.h. This is
used in the implementation of the nVHE hypervisor unwinder later in
this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-3-kaleshsingh@google.com
|
|
In order to reuse the arm64 stack unwinding logic for the nVHE
hypervisor stack, move the common code to a shared header
(arch/arm64/include/asm/stacktrace/common.h).
The nVHE hypervisor cannot safely link against kernel code, so we
make use of the shared header to avoid duplicated logic later in
this series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220726073750.3219117-2-kaleshsingh@google.com
|
|
Copy the task argument passed to arch_stack_walk() to unwind_state so that
it can be passed to unwind functions via unwind_state rather than as a
separate argument. The task is a fundamental part of the unwind state.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220617180219.20352-3-madvenka@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
unwind_init() is currently a single function that initializes all of the
unwind state. Split it into the following functions and call them
appropriately:
- unwind_init_from_regs() - initialize from regs passed by caller.
- unwind_init_from_caller() - initialize for the current task
from the caller of arch_stack_walk().
- unwind_init_from_task() - initialize from the saved state of a
task other than the current task. In this case, the other
task must not be running.
This is done for two reasons:
- the different ways of initializing are clear
- specialized code can be added to each initializer in the future.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220617180219.20352-2-madvenka@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Use the non-atomic version of set_bit() in arch/arm64/kernel/stacktrace.c,
as there is no concurrent accesses to frame->prev_type.
This speeds up stack trace collection and improves the boot time of
Generic KASAN by 2-5%.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/23dfa36d1cc91e4a1059945b7834eac22fb9854d.1653317461.git.andreyknvl@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Disable KASAN instrumentation of arch/arm64/kernel/stacktrace.c.
This speeds up Generic KASAN by 5-20%.
As a side-effect, KASAN is now unable to detect bugs in the stack trace
collection code. This is taken as an acceptable downside.
Also replace READ_ONCE_NOCHECK() with READ_ONCE() in stacktrace.c.
As the file is now not instrumented, there is no need to use the
NOCHECK version of READ_ONCE().
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/c4c944a2a905e949760fbeb29258185087171708.1653317461.git.andreyknvl@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Make RESERVE_BRK() work again with older binutils. The recent
'simplification' broke that.
- Make early #VE handling increment RIP when successful.
- Make the #VE code consistent vs. the RIP adjustments and add
comments.
- Handle load_unaligned_zeropad() across page boundaries correctly in
#VE when the second page is shared.
* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
x86/tdx: Clarify RIP adjustments in #VE handler
x86/tdx: Fix early #VE handling
x86/mm: Fix RESERVE_BRK() for older binutils
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull build tooling updates from Thomas Gleixner:
- Remove obsolete CONFIG_X86_SMAP reference from objtool
- Fix overlapping text section failures in faddr2line for real
- Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
with finegrained annotations so objtool can validate that code
correctly.
* tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
faddr2line: Fix overlapping text section failures, the sequel
objtool: Fix obsolete reference to CONFIG_X86_SMAP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix plugging a race between sched_setscheduler()
and balance_push().
sched_setscheduler() spliced the balance callbacks accross a lock
break which makes it possible for an interleaving schedule() to
observe an empty list"
* tag 'sched-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix balance_push() vs __sched_setscheduler()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull lockdep fix from Thomas Gleixner:
"A RT fix for lockdep.
lockdep invokes prandom_u32() to create cookies. This worked until
prandom_u32() was switched to the real random generator, which takes a
spinlock for extraction, which does not work on RT when invoked from
atomic contexts.
lockdep has no requirement for real random numbers and it turns out
sched_clock() is good enough to create the cookie. That works
everywhere and is faster"
* tag 'locking-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Use sched_clock() for random numbers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of interrupt subsystem updates:
Core:
- Ensure runtime power management for chained interrupts
Drivers:
- A collection of OF node refcount fixes
- Unbreak MIPS uniprocessor builds
- Fix xilinx interrupt controller Kconfig dependencies
- Add a missing compatible string to the Uniphier driver"
* tag 'irq-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Use architecture register to get coreid
irqchip/uniphier-aidet: Add compatible string for NX1 SoC
dt-bindings: interrupt-controller/uniphier-aidet: Add bindings for NX1 SoC
irqchip/realtek-rtl: Fix refcount leak in map_interrupts
irqchip/gic-v3: Fix refcount leak in gic_populate_ppi_partitions
irqchip/gic-v3: Fix error handling in gic_populate_ppi_partitions
irqchip/apple-aic: Fix refcount leak in aic_of_ic_init
irqchip/apple-aic: Fix refcount leak in build_fiq_affinity
irqchip/gic/realview: Fix refcount leak in realview_gic_of_init
irqchip/xilinx: Remove microblaze+zynq dependency
genirq: PM: Use runtime PM for chained interrupts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes for real from Greg KH:
"Let's tag the proper branch this time...
Here are some small char/misc driver fixes for 5.19-rc3 that resolve
some reported issues.
They include:
- mei driver fixes
- comedi driver fix
- rtsx build warning fix
- fsl-mc-bus driver fix
All of these have been in linux-next for a while with no reported
issues"
This is what the merge in commit f0ec9c65a8d6 _should_ have merged, but
Greg fat-fingered the pull request and I got some small changes from
linux-next instead there. Credit to Nathan Chancellor for eagle-eyes.
Link: https://lore.kernel.org/all/Yqywy+Md2AfGDu8v@dev-arch.thelio-3990X/
* tag 'char-misc-5.19-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
bus: fsl-mc-bus: fix KASAN use-after-free in fsl_mc_bus_remove()
mei: me: add raptor lake point S DID
mei: hbm: drop capability response on early shutdown
mei: me: set internal pg flag to off on hardware reset
misc: rtsx: Fix clang -Wsometimes-uninitialized in rts5261_init_from_hw()
comedi: vmk80xx: fix expression for tx buffer size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"MAINTAINERS rectifications and a few minor driver fixes"
* tag 'i2c-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: Fix an error handling path in mtk_i2c_probe()
i2c: designware: Use standard optional ref clock implementation
MAINTAINERS: core DT include belongs to core
MAINTAINERS: add include/dt-bindings/i2c to I2C SUBSYSTEM HOST DRIVERS
i2c: npcm7xx: Add check for platform_driver_register
MAINTAINERS: Update Synopsys DesignWare I2C to Supported
|
|
Pull xfs fixes from Darrick Wong:
"There's not a whole lot this time around (I'm still on vacation) but
here are some important fixes for new features merged in -rc1:
- Fix a bug where inode flag changes would accidentally drop nrext64
- Fix a race condition when toggling LARP mode"
* tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes
xfs: fix variable state usage
xfs: fix TOCTOU race involving the new logged xattrs control knob
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a variety of bugs, many of which were found by folks using fuzzing
or error injection.
Also fix up how test_dummy_encryption mount option is handled for the
new mount API.
Finally, fix/cleanup a number of comments and ext4 Documentation
files"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix a doubled word "need" in a comment
ext4: add reserved GDT blocks check
ext4: make variable "count" signed
ext4: correct the judgment of BUG in ext4_mb_normalize_request
ext4: fix bug_on ext4_mb_use_inode_pa
ext4: fix up test_dummy_encryption handling for new mount API
ext4: use kmemdup() to replace kmalloc + memcpy
ext4: fix super block checksum incorrect after mount
ext4: improve write performance with disabled delalloc
ext4: fix warning when submitting superblock in ext4_commit_super()
ext4, doc: remove unnecessary escaping
ext4: fix incorrect comment in ext4_bio_write_page()
fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment
|
|
Pull cifs client fixes from Steve French:
"Two cifs debugging improvements - one found to deal with debugging a
multichannel problem and one for a recent fallocate issue
This does include the two larger multichannel reconnect (dynamically
adjusting interfaces on reconnect) patches, because we recently found
an additional problem with multichannel to one server type that I want
to include at the same time"
* tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: when a channel is not found for server, log its connection id
smb3: add trace point for SMB2_set_eof
|
|
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220605091503.12513-1-wangxiang@cdjrlc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We capture a NULL pointer issue when resizing a corrupt ext4 image which
is freshly clear resize_inode feature (not run e2fsck). It could be
simply reproduced by following steps. The problem is because of the
resize_inode feature was cleared, and it will convert the filesystem to
meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
not reduced to zero, so could we mistakenly call reserve_backup_gdb()
and passing an uninitialized resize_inode to it when adding new group
descriptors.
mkfs.ext4 /dev/sda 3G
tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
mount /dev/sda /mnt
resize2fs /dev/sda 8G
========
BUG: kernel NULL pointer dereference, address: 0000000000000028
CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
...
RIP: 0010:ext4_flex_group_add+0xe08/0x2570
...
Call Trace:
<TASK>
ext4_resize_fs+0xbec/0x1660
__ext4_ioctl+0x1749/0x24e0
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xa6/0x110
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2dd739617b
========
The fix is simple, add a check in ext4_resize_begin() to make sure that
the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
disabled.
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
be a signed integer so we can correctly check for an error code returned
by dx_make_map().
Fixes: 46c116b920eb ("ext4: verify dir block before splitting it")
Cc: stable@kernel.org
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4_mb_normalize_request() can move logical start of allocated blocks
to reduce fragmentation and better utilize preallocation. However logical
block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
always be covered by allocated blocks so we should check that by
modifying and to or in the assertion.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
ext4_mb_new_blocks+0x9df/0x5d30
ext4_ext_map_blocks+0x1803/0x4d80
ext4_map_blocks+0x3a4/0x1a10
ext4_writepages+0x126d/0x2c30
do_writepages+0x7f/0x1b0
__filemap_fdatawrite_range+0x285/0x3b0
file_write_and_wait_range+0xb1/0x140
ext4_sync_file+0x1aa/0xca0
vfs_fsync_range+0xfb/0x260
do_fsync+0x48/0xa0
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
do_fsync
vfs_fsync_range
ext4_sync_file
file_write_and_wait_range
__filemap_fdatawrite_range
do_writepages
ext4_writepages
mpage_map_and_submit_extent
mpage_map_one_extent
ext4_map_blocks
ext4_mb_new_blocks
ext4_mb_normalize_request
>>> start + size <= ac->ac_o_ex.fe_logical
ext4_mb_regular_allocator
ext4_mb_simple_scan_group
ext4_mb_use_best_found
ext4_mb_new_preallocation
ext4_mb_new_inode_pa
ext4_mb_use_inode_pa
>>> set ac->ac_b_ex.fe_len <= 0
ext4_mb_mark_diskspace_used
>>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
we can easily reproduce this problem with the following commands:
`fallocate -l100M disk`
`mkfs.ext4 -b 1024 -g 256 disk`
`mount disk /mnt`
`fsstress -d /mnt -l 0 -n 1000 -p 1`
The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.
Cc: stable@kernel.org
Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since ext4 was converted to the new mount API, the test_dummy_encryption
mount option isn't being handled entirely correctly, because the needed
fscrypt_set_test_dummy_encryption() helper function combines
parsing/checking/applying into one function. That doesn't work well
with the new mount API, which split these into separate steps.
This was sort of okay anyway, due to the parsing logic that was copied
from fscrypt_set_test_dummy_encryption() into ext4_parse_param(),
combined with an additional check in ext4_check_test_dummy_encryption().
However, these overlooked the case of changing the value of
test_dummy_encryption on remount, which isn't allowed but ext4 wasn't
detecting until ext4_apply_options() when it's too late to fail.
Another bug is that if test_dummy_encryption was specified multiple
times with an argument, memory was leaked.
Fix this up properly by using the new helper functions that allow
splitting up the parse/check/apply steps for test_dummy_encryption.
Fixes: cebe85d570cf ("ext4: switch to the new mount api")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Replace kmalloc + memcpy with kmemdup()
Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525030120.803330-1-zhangshuqi3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got issue as follows:
[home]# mount /dev/sda test
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[home]# dmesg
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda): Errors on filesystem, clearing orphan list.
EXT4-fs (sda): recovery complete
EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none.
[home]# debugfs /dev/sda
debugfs 1.46.5 (30-Dec-2021)
Checksum errors in superblock! Retrying...
Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update
super block checksum.
To solve above issue, defer update super block checksum after
ext4_orphan_cleanup.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
cifs_ses_get_chan_index gets the index for a given server pointer.
When a match is not found, we warn about a possible bug.
However, printing details about the non-matching server could be
more useful to debug here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.
In TDX guests, the second page can be shared page and a VMM may configure
it to trigger #VE.
The kernel assumes that #VE on a shared page is an MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.
Fix it by detecting split page MMIO accesses and failing them.
load_unaligned_zeropad() will recover using exception fixups.
The issue was discovered by analysis and reproduced artificially. It was
not triggered during testing.
[ dhansen: fix up changelogs and comments for grammar and clarity,
plus incorporate Kirill's off-by-one fix]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
|
|
Pull NFS client fixes from Anna Schumaker:
- Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail
- Fix trunking detection & cl_max_connect setting
- Avoid pnfs_update_layout() livelocks
- Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE
* tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
sunrpc: set cl_max_connect when cloning an rpc_clnt
pNFS: Avoid a live lock condition in pnfs_update_layout()
pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert clipping of PCI host bridge windows to avoid E820 regions,
which broke several machines by forcing unnecessary BAR reassignments
(Hans de Goede)"
* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
"Make the global console_sem available for CPU that is handling panic()
or shutdown.
This is an old problem when an existing console lock owner might block
console output, but it became more visible with the kthreads"
* tag 'printk-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Wait for the global console lock when the system is going down
printk: Block console kthreads when direct printing will be required
|
|
This reverts commit 4c5e242d3e93.
Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows. We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.
After 4c5e242d3e93, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions. This unnecessary BAR
reassignment broke several machines.
Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We
clipped the window that contained most 32-bit BARs:
BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]
which forced us to reassign all those BARs, for example, this NVMe BAR:
pci 0000:00:01.2: PCI bridge to [bus 01]
pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff]
pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window
pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit]
All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.
Andy reported a similar failure on an Intel MID platform. Benjamin
reported a similar failure on a VMWare Fusion VM.
Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().
[bhelgaas: commit log, add more reporters and testers]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Reported-by: Jongman Heo <jongman.heo@gmail.com>
Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Revert the moving of the jump labels initialisation before
setup_machine_fdt(). The bug was fixed in drivers/char/random.c.
- Ftrace fixes: branch range check and consistent handling of PLTs.
- Clean rather than invalidate FROM_DEVICE buffers at start of DMA
transfer (safer if such buffer is mapped in user space). A cache
invalidation is done already at the end of the transfer.
- A couple of clean-ups (unexport symbol, remove unused label).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
arm64/cpufeature: Unexport set_cpu_feature()
arm64: ftrace: remove redundant label
arm64: ftrace: consistently handle PLTs.
arm64: ftrace: fix branch range checks
Revert "arm64: Initialize jump labels before setup_machine_fdt()"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Add missing ELF_DETAILS in vmlinux.lds.S and fix document rendering"
* tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
docs/zh_CN/LoongArch: Fix notes rendering by using reST directives
docs/LoongArch: Fix notes rendering by using reST directives
LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the PolarFire SOC's device tree
- A handful of fixes for the recently added Svpmbt support
- An improvement to the Kconfig text for Svpbmt
* tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
riscv: drop cpufeature_apply_feature tracking variable
riscv: fix dependency for t-head errata
riscv: dts: microchip: re-add pdma to mpfs device tree
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix hv_init_clocksource annotation (Masahiro Yamada)
- Two bug fixes for vmbus driver (Saurabh Sengar)
- Fix SEV negotiation (Tianyu Lan)
- Fix comments in code (Xiang Wang)
- One minor fix to HID driver (Michael Kelley)
* tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
Drivers: hv: vmbus: Release cpu lock in error case
HID: hyperv: Correctly access fields declared as __le16
clocksource: hyper-v: unexport __init-annotated hv_init_clocksource()
Drivers: hv: Fix syntax errors in comments
Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- Quirks, quirks, quirks to work around buggy consumer grade
devices (Keith Bush, Ning Wang, Stefan Reiter, Rasheed Hsueh)
- Better kernel messages for devices that need quirking (Keith
Bush)
- Make a kernel message more useful (Thomas Weißschuh)
- MD pull request from Song, with a few fixes
- blk-mq sysfs locking fixes (Ming)
- BFQ stats fix (Bart)
- blk-mq offline queue fix (Bart)
- blk-mq flush request tag fix (Ming)
* tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block:
block/bfq: Enable I/O statistics
blk-mq: don't clear flush_rq from tags->rqs[]
blk-mq: avoid to touch q->elevator without any protection
blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
block: Fix handling of offline queues in blk_mq_alloc_request_hctx()
md/raid5-ppl: Fix argument order in bio_alloc_bioset()
Revert "md: don't unregister sync_thread with reconfig_mutex held"
nvme-pci: disable write zeros support on UMIC and Samsung SSDs
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs
nvme-pci: sk hynix p31 has bogus namespace ids
nvme-pci: smi has bogus namespace ids
nvme-pci: phison e12 has bogus namespace ids
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50
nvme-pci: add trouble shooting steps for timeouts
nvme: add bug report info for global duplicate id
nvme: add device name to warning in uuid_show()
|
|
Pull io_uring fixes from Jens Axboe:
"Bigger than usual at this time, both because we missed -rc2, but also
because of some reverts that we chose to do. In detail:
- Adjust mapped buffer API while we still can (Dylan)
- Mapped buffer fixes (Dylan, Hao, Pavel, me)
- Fix for uring_cmd wrong API usage for task_work (Dylan)
- Fix for bug introduced in fixed file closing (Hao)
- Fix race in buffer/file resource handling (Pavel)
- Revert the NOP support for CQE32 and buffer selection that was
brought up during the merge window (Pavel)
- Remove IORING_CLOSE_FD_AND_FILE_SLOT introduced in this merge
window. The API needs further refining, so just yank it for now and
we'll revisit for a later kernel.
- Series cleaning up the CQE32 support added in this merge window,
making it more integrated rather than sitting on the side (Pavel)"
* tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block: (21 commits)
io_uring: recycle provided buffer if we punt to io-wq
io_uring: do not use prio task_work_add in uring_cmd
io_uring: commit non-pollable provided mapped buffers upfront
io_uring: make io_fill_cqe_aux honour CQE32
io_uring: remove __io_fill_cqe() helper
io_uring: fix ->extra{1,2} misuse
io_uring: fill extra big cqe fields from req
io_uring: unite fill_cqe and the 32B version
io_uring: get rid of __io_fill_cqe{32}_req()
io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
Revert "io_uring: add buffer selection support to IORING_OP_NOP"
Revert "io_uring: support CQE32 for nop operation"
io_uring: limit size of provided buffer ring
io_uring: fix types in provided buffer ring
io_uring: fix index calculation
io_uring: fix double unlock for pbuf select
io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
io_uring: openclose: fix bug of closing wrong fixed file
io_uring: fix not locked access to fixed buf table
io_uring: fix races with buffer table unregister
...
|
|
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers
When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:
(1) If the DMA transfer does not write to every byte in the buffer,
then the unwritten bytes will contain stale data once the transfer
has completed.
(2) If the buffer has a virtual alias in userspace, then stale data
may be visible via this alias during the period between performing
the cache invalidation and the DMA writes landing in memory.
Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull writeback and ext2 fixes from Jan Kara:
"A fix for writeback bug which prevented machines with kdevtmpfs from
booting and also one small ext2 bugfix in IO error handling"
* tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
init: Initialize noop_backing_dev_info early
ext2: fix fs corruption when trying to remove a non-empty directory with IO error
|