Age | Commit message (Collapse) | Author |
|
If the RCU_EXPERT Kconfig option is not set (the default), then the
RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
and 64 on 64-bit systems. This can result in excessive lock contention.
This commit therefore changes the computation of the leaf-level rcu_node
tree fanout so that the result will be 16 unless an explicit Kconfig or
kernel-boot setting says otherwise.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
taking various actions to supply RCU with quiescent states, depending
on the outcomes of the various checks. This is a bit much for scheduling
fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
the rcu_dynticks structure that acts as a global guard for these checks.
Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
check the ->rcu_urgent_qs field, find it false, and simply return.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
|
|
The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
that one of them still needs a quiescent state before doing an expensive
atomic operation on the ->dynticks counter. However, this check reduces
overhead only after a rare race condition, and increases complexity. This
commit therefore removes the scan and the mechanism enabling the scan.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_qs_ctr variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
disabled, wakeups must be deferred in order to avoid self-deadlock in the
cases where the disabled interrupts are due to scheduler locks being held.
In this case, a flag is set and is checked on entry to extended quiescent
states (usermode, idle), on exit from the RCU_SOFTIRQ handler, when the
CPU in question goes offline, on a subsequent invocation of call_rcu(),
and from rcu_all_qs(). However, a given CPU could avoid all of those
states for a considerable length of time.
This commit therefore allows an invocation of rcu_note_context_switch()
to do the wakeup. It also makes the wakeup function clear the
deferred-wakeup flag.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
disabled, wakeups must be deferred in order to avoid self-deadlock in
the cases where the disabled interrupts are due to scheduler locks being
held. In this case, a flag is set and is checked on entry to extended
quiescent states (usermode, idle), on exit from the RCU_SOFTIRQ handler,
when the CPU in question goes offline, and on a subsequent invocation
of call_rcu(). However, a given CPU could avoid all of those states
for a considerable length of time.
This commit therefore allows an invocation of rcu_all_qs() to do the
wakeup. It also makes the wakeup function clear the deferred-wakeup flag.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
disabled, wakeups must be deferred in order to avoid self-deadlock in the
cases where the disabled interrupts are due to scheduler locks being held.
In this case, a flag is set and is checked on entry to extended quiescent
states (usermode, idle), on exit from the RCU_SOFTIRQ handler, and when
the CPU in question goes offline. However, a given CPU could avoid all
of those states for a considerable length of time.
This commit therefore allows a subsequent invocation of call_rcu() with
interrupts enabled to do the wakeup. It also makes the wakeup function
clear the deferred-wakeup flag.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code. This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code. This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code. This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Users of SRCU are obliged to complete all grace-period activity before
invoking cleanup_srcu_struct(). This means that all calls to either
synchronize_srcu() or synchronize_srcu_expedited() must have returned,
and all calls to call_srcu() must have returned, and the last call to
call_srcu() must have been followed by a call to srcu_barrier().
Furthermore, the caller must have done something to prevent any
further calls to synchronize_srcu(), synchronize_srcu_expedited(),
and call_srcu().
Therefore, if there has ever been an invocation of call_srcu() on
the srcu_struct in question, the sequence of events must be as
follows:
1. Prevent any further calls to call_srcu().
2. Wait for any pre-existing call_srcu() invocations to return.
3. Invoke srcu_barrier().
4. It is now safe to invoke cleanup_srcu_struct().
On the other hand, if there has ever been a call to synchronize_srcu()
or synchronize_srcu_expedited(), the sequence of events must be as
follows:
1. Prevent any further calls to synchronize_srcu() or
synchronize_srcu_expedited().
2. Wait for any pre-existing synchronize_srcu() or
synchronize_srcu_expedited() invocations to return.
3. It is now safe to invoke cleanup_srcu_struct().
If there have been calls to all both types of functions (call_srcu()
and either of synchronize_srcu() and synchronize_srcu_expedited()), then
the caller must do the first three steps of the call_srcu() procedure
above and the first two steps of the synchronize_s*() procedure above,
and only then invoke cleanup_srcu_struct().
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The srcu_reschedule() function invokes rcu_batch_empty() on each of
the four rcu_batch structures in the srcu_struct in question twice.
Given that this check will also be needed in cleanup_srcu_struct(), this
commit consolidates these four checks into a new rcu_all_batches_empty()
function.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Currently, IPIs are used to force other CPUs to invalidate their TLBs
in response to a kernel virtual-memory mapping change. This works, but
degrades both battery lifetime (for idle CPUs) and real-time response
(for nohz_full CPUs), and in addition results in unnecessary IPIs due to
the fact that CPUs executing in usermode are unaffected by stale kernel
mappings. It would be better to cause a CPU executing in usermode to
wait until it is entering kernel mode to do the flush, first to avoid
interrupting usemode tasks and second to handle multiple flush requests
with a single flush in the case of a long-running user task.
This commit therefore reserves a bit at the bottom of the ->dynticks
counter, which is checked upon exit from extended quiescent states.
If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
invoked, which, if not supplied, is an empty single-pass do-while loop.
If this bottom bit is set on -entry- to an extended quiescent state,
then a WARN_ON_ONCE() triggers.
This bottom bit may be set using a new rcu_eqs_special_set() function,
which returns true if the bit was set, or false if the CPU turned
out to not be in an extended quiescent state. Please note that this
function refuses to set the bit for a non-nohz_full CPU when that CPU
is executing in usermode because usermode execution is tracked by RCU
as a dyntick-idle extended quiescent state only for nohz_full CPUs.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
'fixes.2017.01.23a', 'srcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD
doc.2017.01.15b: Documentation updates
dyntick.2017.01.23a: Dyntick tracking consolidation
fixes.2017.01.23a: Miscellaneous fixes
srcu.2017.01.25a: SRCU rewrite, fixes, and verification
torture.2017.01.15b: Torture-test updates
|
|
Because there are no memory barriers between the srcu_flip() ->completed
increment and the summation of the read-side ->unlock_count[] counters,
both the compiler and the CPU can reorder the summation with the
->completed increment. If the updater is preempted long enough during
this process, the read-side counters could overflow, resulting in a
too-short grace period.
This commit therefore adds a memory barrier just after the ->completed
increment, ensuring that if the summation misses an increment of
->unlock_count[] from __srcu_read_unlock(), the next __srcu_read_lock()
will see the new value of ->completed, thus bounding the number of
->unlock_count[] increments that can be missed to NR_CPUS. The actual
overflow computation is more complex due to the possibility of nesting
of __srcu_read_lock().
Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
If a process invokes synchronize_srcu(), is delayed just the right amount
of time, and thus does not sleep when waiting for the grace period to
complete, there is no ordering between the end of the grace period and
the code following the synchronize_srcu(). Similarly, there can be a
lack of ordering between the end of the SRCU grace period and callback
invocation.
This commit adds the necessary ordering.
Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
|
|
SRCU uses two per-cpu counters: a nesting counter to count the number of
active critical sections, and a sequence counter to ensure that the nesting
counters don't change while they are being added together in
srcu_readers_active_idx_check().
This patch instead uses per-cpu lock and unlock counters. Because both
counters only increase and srcu_readers_active_idx_check() reads the unlock
counter before the lock counter, this achieves the same end without having
to increment two different counters in srcu_read_lock(). This also saves a
smp_mb() in srcu_readers_active_idx_check().
Possible bug: There is no guarantee that the lock counter won't overflow
during srcu_readers_active_idx_check(), as there are no memory barriers
around srcu_flip() (see comment in srcu_readers_active_idx_check() for
details). However, this problem was already present before this patch.
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Lance Roy <ldr709@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU"),
as its title suggests, got rid of RCU's remaining CPU-hotplug timing
guesswork. This commit therefore removes the one-jiffy kludge that was
used to paper over this guesswork.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Commit 4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks
for RCU") moved quiescent-state generation out of cond_resched()
and commit bde6c3aa9930 ("rcu: Provide cond_resched_rcu_qs() to force
quiescent states in long loops") introduced cond_resched_rcu_qs(), and
commit 5cd37193ce85 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU
flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently
polled by the RCU core state machine.
This frequent polling can increase grace-period rate, which in turn
increases grace-period overhead, which is visible in some benchmarks
(for example, the "open1" benchmark in Anton Blanchard's "will it scale"
suite). This commit therefore reduces the rate at which rcu_qs_ctr
is polled by moving that polling into the force-quiescent-state (FQS)
machinery, and by further polling it only after the grace period has
been in effect for at least jiffies_till_sched_qs jiffies.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit is the fourth step towards full abstraction of all accesses
to the ->dynticks counter, implementing previously open-coded checks and
comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since()
functions. This abstraction will ease changes to the ->dynticks counter
operation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit is the third step towards full abstraction of all accesses
to the ->dynticks counter, implementing the previously open-coded atomic
add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and
the same but with exit checks in a new rcu_dynticks_eqs_exit() function.
This abstraction will ease changes to the ->dynticks counter operation.
Note that this commit gets rid of the smp_mb__before_atomic() and the
smp_mb__after_atomic() calls that were previously present. The reason
that this is OK from a memory-ordering perspective is that the atomic
operation is now atomic_add_return(), which, as a value-returning atomic,
guarantees full ordering.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fixed RCU_TRACE() statements added by this commit. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The non-expedited synchronize_*rcu() primitives have lockdep checks, but
their expedited counterparts lack these checks. This commit therefore
adds these checks to the expedited synchronize_*rcu() primitives.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Expedited grace periods no longer fall back to normal grace periods
in response to lock contention, given that expedited grace periods
now use the rcu_node tree so as to avoid contention. This commit
therfore removes the expedited_normal counter.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
It used to be that the rcuo callback-offload kthreads were spawned
in rcu_organize_nocb_kthreads(), and the comment before the "for"
loop says as much. However, this spawning has long since moved to
the CPU-hotplug code, so this commit fixes this comment.
Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
incoming CPU's rcu_data structure. This works for the boot CPU and for
all CPUs onlined after rcu_init() executes (during very early boot).
Currently, this is the full set of CPUs, so all is well. But if
anyone ever parallelizes boot before rcu_init() time, it will fail.
This commit therefore substitutes the rcu_cpu_starting() function's
this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
(arguably) improving readability.
This commit inadvertently fixes a latent bug: If there ever had been
more than just the boot CPU online at rcu_init() time, the old code
would not initialize the non-boot CPUs, but rather would repeatedly
initialize the boot CPU.
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
These functions (rcu_exp_gp_seq_start(), rcu_exp_gp_seq_end(),
rcu_exp_gp_seq_snap(), and rcu_exp_gp_seq_done() seemed too obvious
to comment when written, but not so much when being documented.
This commit therefore adds header comments to each of them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Chris Friesen notice that rcuc/X kthreads were consuming CPU even on
NOCB CPUs. This makes no sense because the only purpose or these
kthreads is to invoke normal (non-offloaded) callbacks, of which there
will never be any on NOCB CPUs. This problem was due to a bug in
cpu_has_callbacks_ready_to_invoke(), which should have been checking
->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly)
checking ->nxttail[RCU_DONE_TAIL]. Because ->nxttail[RCU_DONE_TAIL] is
never NULL, the only effect is to cause the rcuc/X kthread to execute
when it should not do so.
This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL
for NOCB CPUs.
Reported-by: Chris Friesen <chris.friesen@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit is for all intents and purposes a revert of bc1dce514e9b
("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose
that this can now safely be reverted is the presence of 42a0bb3f7138
("printk/nmi: generic solution for safe printk in NMI"), which is said
to have made NMI-based stack dumps safe.
However, this reversion keeps one nice property of bc1dce514e9b
("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that
only those CPUs blocking the grace period are dumped. The new
trigger_single_cpu_backtrace() is used to make this happen, as
suggested by Josh Poimboeuf.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Commit 4914950aaa12d ("rcu: Stop treating in-kernel CPU-bound workloads
as errors") added a (relatively) short-timeout call to resched_cpu().
This was inspired by as issue that was fixed by b7e7ade34e61 ("sched/core:
Fix remote wakeups"). But given that this issue was fixed, it is time
for the current commit to remove this call to resched_cpu().
Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit prepares for the removal of short-term CPU kicking (in a
subsequent commit). It does so by starting to invoke resched_cpu()
for each holdout at each force-quiescent-state interval that is more
than halfway through the stall-warning interval.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Since commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for
RCU"), the variable mask in rcu_init_percpu_data is set but no longer
used. Remove it to fix the following warning when building with 'W=1':
kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’:
kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The declarations of __rcu_process_callbacks() and rcu_process_callbacks()
are not needed, as the definition of both of these functions appear before
any uses. This commit therefore removes both declarations.
Reported-by: "Ahmed, Iftekhar" <ahmedi@oregonstate.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The print_other_cpu_stall() function currently unconditionally invokes
rcu_print_detail_task_stall(). This is OK because if there was a stall
sufficient to cause print_other_cpu_stall() to be invoked, that stall
is very likely to persist through the entire print_other_cpu_stall()
execution. However, if the stall did not persist, the variable ndetected
will be zero, and that variable is already tested in an "if" statement.
Therefore, this commit moves the call to rcu_print_detail_task_stall()
under that pre-existing "if" to improve readability, with a very rare
reduction in overhead.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ paulmck: Reworked commit log. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
RCU_EXPEDITE_BOOT should speed up the boot process by enforcing
synchronize_rcu_expedited() instead of synchronize_rcu() during the boot
process. There should be no reason why one does not want this and there
is no need worry about real time latency at this point.
Therefore make it default.
Note that users wishing to avoid expediting entirely, for example when
bringing up new hardware possibly having flaky IPIs, can use the
rcu_normal boot parameter to override boot-time expediting.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Reworded commit log. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit is the second step towards full abstraction of all accesses to
the ->dynticks counter, implementing the previously open-coded atomic
add of zero in a new rcu_dynticks_snap() function. This abstraction will
ease changes o the ->dynticks counter operation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit is the first step towards full abstraction of all accesses to
the ->dynticks counter, implementing the previously open-coded atomic add
of two in a new rcu_dynticks_momentary_idle() function. This abstraction
will ease changes to the ->dynticks counter operation.
Note that this commit gets rid of the smp_mb__before_atomic() and the
smp_mb__after_atomic() calls that were previously present. The reason
that this is OK from a memory-ordering perspective is that the atomic
operation is now atomic_add_return(), which, as a value-returning atomic,
guarantees full ordering.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The current preemptible RCU implementation goes through three phases
during bootup. In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running. During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending). In the third and final phase, RCU is fully operational and
everything works normally.
This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This code worked "by accident" for awhile, but started failing as soon
as expedited RCU grace periods switched over to workqueues in commit
8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue").
Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter). The callchain from the failure case is as follows:
early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited
The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.
This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase. This commit is therefore a
fix to a regression introduced in v4.9, and is therefore being put
forward post-merge-window in v4.10.
This commit sets a flag from the existing rcu_scheduler_starting()
function which causes all synchronous grace periods to take the expedited
path. The expedited path now checks this flag, using the requesting task
to drive the expedited grace period forward during the mid-boot phase.
Finally, this flag is updated by a core_initcall() function named
rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.
Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Stan Kain <stan.kain@gmail.com>
Tested-by: Ivan <waffolz@hotmail.com>
Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
Tested-by: Bruno Pesavento <bpesavento@infinito.it>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Frederic Bezies <fredbezies@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
|
|
It is now legal to invoke synchronize_sched() at early boot, which causes
Tiny RCU's synchronize_sched() to emit spurious splats. This commit
therefore removes the cond_resched() from Tiny RCU's synchronize_sched().
Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
|
|
The current code can result in spurious kicks when there are no grace
periods in progress and no grace-period-related requests. This is
sort of OK for a diagnostic aid, but the resulting ftrace-dump messages
in dmesg are annoying. This commit therefore avoids spurious kicks
in the common case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Expedited grace periods check dyntick-idle state, and avoid sending
IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
kernels, nohz_full CPUs. However, the kernel has been observed checking
a CPU while it was non-idle, but sending the IPI after it has gone
idle. This commit therefore rechecks idle state immediately before
sending the IPI, refraining from IPIing CPUs that have since gone idle.
Reported-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Although rcutorture will occasionally do a 50-millisecond grace-period
delay, these delays are quite rare. And rightly so, because otherwise
the read rate would be quite low. Thie means that it can be important
to identify whether or not a given run contained a long-delay read.
This commit therefore inserts a trace_rcu_torture_read() event to flag
runs containing long delays.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The __call_rcu() comment about opportunistically noting grace period
beginnings and endings is obsolete. RCU still does such opportunistic
noting, but in __call_rcu_core() rather than __call_rcu(), and there
already is an appropriate comment in __call_rcu_core(). This commit
therefore removes the obsolete comment.
Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
In the deep past, rcu_check_callbacks() was only invoked if rcu_pending()
returned true. Which was fine, but these days rcu_check_callbacks()
is invoked unconditionally. This commit therefore removes the obsolete
sentence from the header comment.
Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Commit 720abae3d68ae ("rcu: force alignment on struct
callback_head/rcu_head") forced the rcu_head (AKA callback_head)
structure's alignment to pointer size, that is, to 4-byte boundaries on
32-bit systems and to 8-byte boundaries on 64-bit systems. This
commit therefore checks for this same alignment in __call_rcu(),
which used to contain a looser check for two-byte alignment.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).
At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"
* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin
|
|
The __latent_entropy gcc attribute can be used only on functions and
variables. If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents. The variable must
be an integer, an integer array type or a structure with integer fields.
These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- rwsem micro-optimizations (Davidlohr Bueso)
- Improve the implementation and optimize the performance of
percpu-rwsems. (Peter Zijlstra.)
- Convert all lglock users to better facilities such as percpu-rwsems
or percpu-spinlocks and remove lglocks. (Peter Zijlstra)
- Remove the ticket (spin)lock implementation. (Peter Zijlstra)
- Korean translation of memory-barriers.txt and related fixes to the
English document. (SeongJae Park)
- misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cmpxchg, locking/atomics: Remove superfluous definitions
x86, locking/spinlocks: Remove ticket (spin)lock implementation
locking/lglock: Remove lglock implementation
stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
fs/locks: Use percpu_down_read_preempt_disable()
locking/percpu-rwsem: Add down_read_preempt_disable()
fs/locks: Replace lg_local with a per-cpu spinlock
fs/locks: Replace lg_global with a percpu-rwsem
locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
locking/rwsem, x86: Drop a bogus cc clobber
futex: Add some more function commentry
locking/hung_task: Show all locks
locking/rwsem: Scan the wait_list for readers only once
locking/rwsem: Remove a few useless comments
locking/rwsem: Return void in __rwsem_mark_wake()
locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
locking/Documentation: Add Korean translation
locking/Documentation: Fix a typo of example result
locking/Documentation: Fix wrong section reference
...
|
|
'hotplug.2016.08.22c' and 'torture.2016.08.22c' into HEAD
doc.2016.08.22c: Documentation updates
exp.2016.08.22c: Expedited grace-period updates
fixes.2016.09.14a: Miscellaneous fixes
hotplug.2016.08.22c: CPU-hotplug changes
torture.2016.08.22c: Torture-test changes
|
|
A few rcuperf dmesg output messages have no space between the flag and
the start of the message. In contrast, every other messages consistently
supplies a single space. This difference makes rcuperf dmesg output
hard to read and to mechanically parse. This commit therefore fixes
this problem by modifying a pr_alert() call and PERFOUT_STRING() macro
function to provide that single space.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|