summaryrefslogtreecommitdiff
path: root/kernel/sched_rt.c
AgeCommit message (Collapse)Author
2008-01-25sched: de-SCHED_OTHER-ize the RT pathGregory Haskins
The current wake-up code path tries to determine if it can optimize the wake-up to "this_cpu" by computing load calculations. The problem is that these calculations are only relevant to SCHED_OTHER tasks where load is king. For RT tasks, priority is king. So the load calculation is completely wasted bandwidth. Therefore, we create a new sched_class interface to help with pre-wakeup routing decisions and move the load calculation as a function of CFS task's class. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: clean up this_rq use in kernel/sched_rt.cGregory Haskins
"this_rq" is normally used to denote the RQ on the current cpu (i.e. "cpu_rq(this_cpu)"). So clean up the usage of this_rq to be more consistent with the rest of the code. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: add RT-balance cpu-weightGregory Haskins
Some RT tasks (particularly kthreads) are bound to one specific CPU. It is fairly common for two or more bound tasks to get queued up at the same time. Consider, for instance, softirq_timer and softirq_sched. A timer goes off in an ISR which schedules softirq_thread to run at RT50. Then the timer handler determines that it's time to smp-rebalance the system so it schedules softirq_sched to run. So we are in a situation where we have two RT50 tasks queued, and the system will go into rt-overload condition to request other CPUs for help. This causes two problems in the current code: 1) If a high-priority bound task and a low-priority unbounded task queue up behind the running task, we will fail to ever relocate the unbounded task because we terminate the search on the first unmovable task. 2) We spend precious futile cycles in the fast-path trying to pull overloaded tasks over. It is therefore optimial to strive to avoid the overhead all together if we can cheaply detect the condition before overload even occurs. This patch tries to achieve this optimization by utilizing the hamming weight of the task->cpus_allowed mask. A weight of 1 indicates that the task cannot be migrated. We will then utilize this information to skip non-migratable tasks and to eliminate uncessary rebalance attempts. We introduce a per-rq variable to count the number of migratable tasks that are currently running. We only go into overload if we have more than one rt task, AND at least one of them is migratable. In addition, we introduce a per-task variable to cache the cpus_allowed weight, since the hamming calculation is probably relatively expensive. We only update the cached value when the mask is updated which should be relatively infrequent, especially compared to scheduling frequency in the fast path. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: disable standard balancer for RT tasksSteven Rostedt
Since we now take an active approach to load balancing, we don't need to balance RT tasks via the normal task balancer. In fact, this code was found to pull RT tasks away from CPUS that the active movement performed, resulting in large latencies. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: push RT tasks from overloaded CPUsSteven Rostedt
This patch adds pushing of overloaded RT tasks from a runqueue that is having tasks (most likely RT tasks) added to the run queue. TODO: We don't cover the case of waking of new RT tasks (yet). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: pull RT tasks from overloaded runqueuesSteven Rostedt
This patch adds the algorithm to pull tasks from RT overloaded runqueues. When a pull RT is initiated, all overloaded runqueues are examined for a RT task that is higher in prio than the highest prio task queued on the target runqueue. If another runqueue holds a RT task that is of higher prio than the highest prio task on the target runqueue is found it is pulled to the target runqueue. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: add rt-overload trackingSteven Rostedt
This patch adds an RT overload accounting system. When a runqueue has more than one RT task queued, it is marked as overloaded. That is that it is a candidate to have RT tasks pulled from it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: add RT task pushingSteven Rostedt
This patch adds an algorithm to push extra RT tasks off a run queue to other CPU runqueues. When more than one RT task is added to a run queue, this algorithm takes an assertive approach to push the RT tasks that are not running onto other run queues that have lower priority. The way this works is that the highest RT task that is not running is looked at and we examine the runqueues on the CPUS for that tasks affinity mask. We find the runqueue with the lowest prio in the CPU affinity of the picked task, and if it is lower in prio than the picked task, we push the task onto that CPU runqueue. We continue pushing RT tasks off the current runqueue until we don't push any more. The algorithm stops when the next highest RT task can't preempt any other processes on other CPUS. TODO: The algorithm may stop when there are still RT tasks that can be migrated. Specifically, if the highest non running RT task CPU affinity is restricted to CPUs that are running higher priority tasks, there may be a lower priority task queued that has an affinity with a CPU that is running a lower priority task that it could be migrated to. This patch set does not address this issue. Note: checkpatch reveals two over 80 character instances. I'm not sure that breaking them up will help visually, so I left them as is. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: track highest prio task queuedSteven Rostedt
This patch adds accounting to each runqueue to keep track of the highest prio task queued on the run queue. We only care about RT tasks, so if the run queue does not contain any active RT tasks its priority will be considered MAX_RT_PRIO. This information will be used for later patches. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: count # of queued RT tasksSteven Rostedt
This patch adds accounting to keep track of the number of RT tasks running on a runqueue. This information will be used in later patches. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: group scheduling, change how cpu load is calculatedSrivatsa Vaddagiri
This patch changes how the cpu load exerted by fair_sched_class tasks is calculated. Load exerted by fair_sched_class tasks on a cpu is now a summation of the group weights, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it. This version of patch has a minor impact on code size, but should have no runtime/functional impact for !CONFIG_FAIR_GROUP_SCHED. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-20sched: rt: account the cpu time during the tickPeter Zijlstra
Realtime tasks would not account their runtime during ticks. Which would lead to: struct sched_param param = { .sched_priority = 10 }; pthread_setschedparam(pthread_self(), SCHED_FIFO, &param); while (1) ; Not showing up in top. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-02sched: cpu accounting controller (V2)Srivatsa Vaddagiri
Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24sched: isolate SMP balancing code a bit morePeter Williams
At the moment, a lot of load balancing code that is irrelevant to non SMP systems gets included during non SMP builds. This patch addresses this issue and reduces the binary size on non SMP systems: text data bss dec hex filename 10983 28 1192 12203 2fab sched.o.before 10739 28 1192 11959 2eb7 sched.o.after Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24sched: reduce balance-tasks overheadPeter Williams
At the moment, balance_tasks() provides low level functionality for both move_tasks() and move_one_task() (indirectly) via the load_balance() function (in the sched_class interface) which also provides dual functionality. This dual functionality complicates the interfaces and internal mechanisms and makes the run time overhead of operations that are called with two run queue locks held. This patch addresses this issue and reduces the overhead of these operations. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: tidy up SCHED_RRDmitry Adamushko
- make timeslices of SCHED_RR tasks constant and not dependent on task's static_prio [1] ; - remove obsolete code (timeslice related bits); - make sched_rr_get_interval() return something more meaningful [2] for SCHED_OTHER tasks. [1] according to the following link, it's not compliant with SUSv3 (not sure though, what is the reference for us :-) http://lkml.org/lkml/2007/3/7/656 [2] the interval is dynamic and can be depicted as follows "should a task be one of the runnable tasks at this particular moment, it would expect to run for this interval of time before being re-scheduled by the scheduler tick". (i.e. it's more precise if a task is runnable at the moment) yeah, this seems to require task_rq_lock/unlock() but this is not a hot path. results: (SCHED_FIFO) dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval time_slice: 0 : 0 (SCHED_RR) dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval time_slice: 0 : 99984800 (SCHED_NORMAL) dimm@earth:~/storage/prog$ ./rr_interval time_slice: 0 : 19996960 (SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result) dimm@earth:~/storage/prog$ taskset 1 ./rr_interval time_slice: 0 : 9998480 Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: uninline schedulerAlexey Dobriyan
* save ~300 bytes * activate_idle_task() was moved to avoid a warning bloat-o-meter output: add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <=== function old new delta __enqueue_entity - 165 +165 finish_task_switch - 110 +110 update_curr_rt - 79 +79 __load_balance_iterator - 32 +32 __task_rq_unlock - 28 +28 find_process_by_pid - 24 +24 do_sched_setscheduler 133 123 -10 sys_sched_rr_get_interval 176 165 -11 sys_sched_getparam 156 145 -11 normalize_rt_tasks 482 470 -12 sched_getaffinity 112 99 -13 sys_sched_getscheduler 86 72 -14 sched_setaffinity 226 212 -14 sched_setscheduler 666 642 -24 load_balance_start_fair 33 9 -24 load_balance_next_fair 33 9 -24 dequeue_task_rt 133 67 -66 put_prev_task_rt 97 28 -69 schedule_tail 133 50 -83 schedule 682 594 -88 enqueue_entity 499 366 -133 task_new_fair 317 180 -137 Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: mark scheduling classes as constIngo Molnar
mark scheduling classes as const. The speeds up the code a bit and shrinks it: text data bss dec hex filename 40027 4018 292 44337 ad31 sched.o.before 40190 3842 292 44324 ad24 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: revert recent removal of set_curr_task()Srivatsa Vaddagiri
Revert removal of set_curr_task. Use put_prev_task/set_curr_task when changing groups/policies Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()Dmitry Adamushko
rework enqueue/dequeue_entity() to get rid of sched_class::set_curr_task(). This simplifies sched_setscheduler(), rt_mutex_setprio() and sched_move_tasks(). text data bss dec hex filename 24330 2734 20 27084 69cc sched.o.before 24233 2730 20 26983 6967 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: simplify sched_class::yield_task()Dmitry Adamushko
the 'p' (task_struct) parameter in the sched_class :: yield_task() is redundant as the caller is always the 'current'. Get rid of it. text data bss dec hex filename 24341 2734 20 27095 69d7 sched.o.before 24330 2734 20 27084 69cc sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group-scheduler coreSrivatsa Vaddagiri
Add interface to control cpu bandwidth allocation to task-groups. (not yet configurable, due to missing CONFIG_CONTAINERS) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-08-24sched: optimize task_tick_rt() a bitDmitry Adamushko
Mitchell Erblich suggested a quality-of-implementation change to not requeue SCHED_RR tasks if there's only a single task on the runqueue, by checking for rq->nr_running == 1. provide a more efficient implementation of that, to check that particular RT priority-queue only. [ From: mingo@elte.hu ] Also first requeue the task then set need_resched - results in slightly better machine-instruction ordering. Also clean up the code a bit. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove the 'u64 now' parameter from ->put_prev_task()Ingo Molnar
remove the 'u64 now' parameter from ->put_prev_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove the 'u64 now' parameter from ->pick_next_task()Ingo Molnar
remove the 'u64 now' parameter from ->pick_next_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove the 'u64 now' parameter from ->dequeue_task()Ingo Molnar
remove the 'u64 now' parameter from ->dequeue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove the 'u64 now' parameter from ->enqueue_task()Ingo Molnar
remove the 'u64 now' parameter from ->enqueue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove the 'u64 now' parameter from update_curr_rt()Ingo Molnar
remove the 'u64 now' parameter from update_curr_rt(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: remove 'now' use from assignmentsIngo Molnar
change all 'now' timestamp uses in assignments to rq->clock. ( this is an identity transformation that causes no functionality change: all such new rq->clock is necessarily preceded by an update_rq_clock() call. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: fix bug in balance_tasks()Peter Williams
There are two problems with balance_tasks() and how it used: 1. The variables best_prio and best_prio_seen (inherited from the old move_tasks()) were only required to handle problems caused by the active/expired arrays, the order in which they were processed and the possibility that the task with the highest priority could be on either. These issues are no longer present and the extra overhead associated with their use is unnecessary (and possibly wrong). 2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same this_best_prio variable needs to be used by all scheduling classes or there is a risk of moving too much load. E.g. if the highest priority task on this at the beginning is a fairly low priority task and the rt class migrates a task (during its turn) then that moved task becomes the new highest priority task on this_rq but when the sched_fair class initializes its copy of this_best_prio it will get the priority of the original highest priority task as, due to the run queue locks being held, the reschedule triggered by pull_task() will not have taken place. This could result in inappropriate overriding of skip_for_load and excessive load being moved. The attached patch addresses these problems by deleting all reference to best_prio and best_prio_seen and making this_best_prio a reference parameter to the various functions involved. load_balance_fair() has also been modified so that this_best_prio is only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should preserve the effect of helping spread groups' higher priority tasks around the available CPUs while improving system performance when CONFIG_FAIR_GROUP_SCHED isn't set. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-09sched: simplify move_tasks()Peter Williams
The move_tasks() function is currently multiplexed with two distinct capabilities: 1. attempt to move a specified amount of weighted load from one run queue to another; and 2. attempt to move a specified number of tasks from one run queue to another. The first of these capabilities is used in two places, load_balance() and load_balance_idle(), and in both of these cases the return value of move_tasks() is used purely to decide if tasks/load were moved and no notice of the actual number of tasks moved is taken. The second capability is used in exactly one place, active_load_balance(), to attempt to move exactly one task and, as before, the return value is only used as an indicator of success or failure. This multiplexing of sched_task() was introduced, by me, as part of the smpnice patches and was motivated by the fact that the alternative, one function to move specified load and one to move a single task, would have led to two functions of roughly the same complexity as the old move_tasks() (or the new balance_tasks()). However, the new modular design of the new CFS scheduler allows a simpler solution to be adopted and this patch addresses that solution by: 1. adding a new function, move_one_task(), to be used by active_load_balance(); and 2. making move_tasks() a single purpose function that tries to move a specified weighted load and returns 1 for success and 0 for failure. One of the consequences of these changes is that neither move_one_task() or the new move_tasks() care how many tasks sched_class.load_balance() moves and this enables its interface to be simplified by returning the amount of load moved as its result and removing the load_moved pointer from the argument list. This helps simplify the new move_tasks() and slightly reduces the amount of work done in each of sched_class.load_balance()'s implementations. Further simplification, e.g. changes to balance_tasks(), are possible but (slightly) complicated by the special needs of load_balance_fair() so I've left them to a later patch (if this one gets accepted). NB Since move_tasks() gets called with two run queue locks held even small reductions in overhead are worthwhile. [ mingo@elte.hu ] this change also reduces code size nicely: text data bss dec hex filename 39216 3618 24 42858 a76a sched.o.before 39173 3618 24 42815 a73f sched.o.after Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-02[PATCH] sched: reduce debug codeIngo Molnar
move the rest of the debugging/instrumentation code to under CONFIG_SCHEDSTATS too. This reduces code size and speeds code up: text data bss dec hex filename 33044 4122 28 37194 914a sched.o.before 32708 4122 28 36858 8ffa sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-02[PATCH] sched: ->task_new cleanupIngo Molnar
make sched_class.task_new == NULL a 'default method', this allows the removal of task_rt_new. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09sched: cfs core, kernel/sched_rt.cIngo Molnar
add kernel/sched_rt.c: SCHED_FIFO/SCHED_RR support. The behavior and semantics of SCHED_FIFO/SCHED_RR tasks is unchanged. Signed-off-by: Ingo Molnar <mingo@elte.hu>