From 888a214dc4c65e1486ef7d3ac6b0cf93ae3cc2c7 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Thu, 12 Jan 2012 17:17:39 -0800 Subject: slub: document setting min order with debug_guardpage_minorder > 0 Acked-by: David Rientjes Cc: Pekka Enberg Cc: "Rafael J. Wysocki" Signed-off-by: Stanislaw Gruszka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ABI/testing/sysfs-kernel-slab | 4 ++++ Documentation/vm/slub.txt | 5 ++++- 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 8b093f8222d..91bd6ca5440 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -346,6 +346,10 @@ Description: number of objects per slab. If a slab cannot be allocated because of fragmentation, SLUB will retry with the minimum order possible depending on its characteristics. + When debug_guardpage_minorder=N (N > 0) parameter is specified + (see Documentation/kernel-parameters.txt), the minimum possible + order is used and this sysfs entry can not be used to change + the order at run time. What: /sys/kernel/slab/cache/order_fallback Date: April 2008 diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index 2acdda9601b..6752870c497 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt @@ -131,7 +131,10 @@ slub_min_objects. slub_max_order specified the order at which slub_min_objects should no longer be checked. This is useful to avoid SLUB trying to generate super large order pages to fit slub_min_objects of a slab cache with -large object sizes into one high order page. +large object sizes into one high order page. Setting command line +parameter debug_guardpage_minorder=N (N > 0), forces setting +slub_max_order to 0, what cause minimum possible order of slabs +allocation. SLUB Debug output ----------------- -- cgit v1.2.3 From d66c1ce7df6ceb94d1902ac865ad50df4ffdb95f Mon Sep 17 00:00:00 2001 From: Zhu Yanhai Date: Thu, 12 Jan 2012 17:18:24 -0800 Subject: Documentation/cgroups/memory.txt: fix typo It should be memsw.max_usage_in_bytes. This typo has been there for a really long time. Signed-off-by: Zhu Yanhai Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4d8774f6f48..c7de7dac43b 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -61,7 +61,7 @@ Brief summary of control files. memory.failcnt # show the number of memory usage hits limits memory.memsw.failcnt # show the number of memory+Swap hits limits memory.max_usage_in_bytes # show max memory usage recorded - memory.memsw.usage_in_bytes # show max memory+Swap usage recorded + memory.memsw.max_usage_in_bytes # show max memory+Swap usage recorded memory.soft_limit_in_bytes # set/show soft limit of memory usage memory.stat # show various statistics memory.use_hierarchy # set/show hierarchical account enabled -- cgit v1.2.3 From 0527b6903b2838e214c895d527806aefd61b3c8f Mon Sep 17 00:00:00 2001 From: Ying Han Date: Thu, 12 Jan 2012 17:18:27 -0800 Subject: memcg: fix pgpgin/pgpgout documentation The two memcg stats pgpgin/pgpgout have different meaning than the ones in vmstat, which indicates that we picked a bad naming for them. It might be late to change the stat name, but better documentation is always helpful. Signed-off-by: Ying Han Acked-by: KAMEZAWA Hiroyuki Acked-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index c7de7dac43b..4c95c0034a4 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -410,8 +410,11 @@ memory.stat file includes following statistics cache - # of bytes of page cache memory. rss - # of bytes of anonymous and swap cache memory. mapped_file - # of bytes of mapped file (includes tmpfs/shmem) -pgpgin - # of pages paged in (equivalent to # of charging events). -pgpgout - # of pages paged out (equivalent to # of uncharging events). +pgpgin - # of charging events to the memory cgroup. The charging + event happens each time a page is accounted as either mapped + anon page(RSS) or cache page(Page Cache) to the cgroup. +pgpgout - # of uncharging events to the memory cgroup. The uncharging + event happens each time a page is unaccounted from the cgroup. swap - # of bytes of swap usage inactive_anon - # of bytes of anonymous memory and swap cache memory on LRU list. -- cgit v1.2.3 From b8f566b04d3cddd192cfd2418ae6d54ac6353792 Mon Sep 17 00:00:00 2001 From: Pavel Emelyanov Date: Thu, 12 Jan 2012 17:20:27 -0800 Subject: sysctl: add the kernel.ns_last_pid control The sysctl works on the current task's pid namespace, getting and setting its last_pid field. Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible to create a task with desired pid value. This ability is required badly for the checkpoint/restore in userspace. This approach suits all the parties for now. Signed-off-by: Pavel Emelyanov Acked-by: Tejun Heo Cc: Oleg Nesterov Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: Serge Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 8 ++++++++ kernel/pid.c | 4 +++- kernel/pid_namespace.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 42 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6d8cd8b2c30..8c20fbd8b42 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -415,6 +415,14 @@ PIDs of value pid_max or larger are not allocated. ============================================================== +ns_last_pid: + +The last pid allocated in the current (the one task using this sysctl +lives in) pid namespace. When selecting a pid for a next task on fork +kernel tries to allocate a number starting from this one. + +============================================================== + powersave-nap: (PPC only) If set, Linux-PPC will use the 'nap' mode of powersaving, diff --git a/kernel/pid.c b/kernel/pid.c index fa5f72227e5..ce8e00deacc 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -137,7 +137,9 @@ static int pid_before(int base, int a, int b) } /* - * We might be racing with someone else trying to set pid_ns->last_pid. + * We might be racing with someone else trying to set pid_ns->last_pid + * at the pid allocation time (there's also a sysctl for this, but racing + * with this one is OK, see comment in kernel/pid_namespace.c about it). * We want the winner to have the "later" value, because if the * "earlier" value prevails, then a pid may get reused immediately. * diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index e9c9adc84ca..a8968396046 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -191,9 +191,40 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) return; } +static int pid_ns_ctl_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table tmp = *table; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + /* + * Writing directly to ns' last_pid field is OK, since this field + * is volatile in a living namespace anyway and a code writing to + * it should synchronize its usage with external means. + */ + + tmp.data = ¤t->nsproxy->pid_ns->last_pid; + return proc_dointvec(&tmp, write, buffer, lenp, ppos); +} + +static struct ctl_table pid_ns_ctl_table[] = { + { + .procname = "ns_last_pid", + .maxlen = sizeof(int), + .mode = 0666, /* permissions are checked in the handler */ + .proc_handler = pid_ns_ctl_handler, + }, + { } +}; + +static struct ctl_path kern_path[] = { { .procname = "kernel", }, { } }; + static __init int pid_namespaces_init(void) { pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC); + register_sysctl_paths(kern_path, pid_ns_ctl_table); return 0; } -- cgit v1.2.3 From b3f7f573a20081910e34e99cbc91831f4f02f1ff Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov Date: Thu, 12 Jan 2012 17:20:53 -0800 Subject: c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are involved into calculation of program text/data segment sizes (which might be seen in /proc//statm) and into brk() call final address. For restore we need to know all these values. While mm->start_code/end_code already present in /proc/$pid/stat, the rest members are not, so this patch brings them in. The restore procedure of these members is addressed in another patch using prctl(). Signed-off-by: Cyrill Gorcunov Acked-by: Serge Hallyn Reviewed-by: Kees Cook Reviewed-by: KAMEZAWA Hiroyuki Cc: Alexey Dobriyan Cc: Tejun Heo Cc: Andrew Vagin Cc: Vasiliy Kulikov Cc: Alexey Dobriyan Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 3 +++ fs/proc/array.c | 7 +++++-- 2 files changed, 8 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 12fee132fbe..a76a26a1db8 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -307,6 +307,9 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) blkio_ticks time spent waiting for block IO gtime guest time of the task in jiffies cgtime guest time of the task children in jiffies + start_data address above which program data+bss is placed + end_data address below which program data+bss is placed + start_brk address above which program heap can be expanded with brk() .............................................................................. The /proc/PID/maps file containing the currently mapped memory regions and diff --git a/fs/proc/array.c b/fs/proc/array.c index 8c344f037bd..9252ee3b71e 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -464,7 +464,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, seq_printf(m, "%d (%s) %c %d %d %d %d %d %u %lu \ %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \ -%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld\n", +%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld %lu %lu %lu\n", pid_nr_ns(pid, ns), tcomm, state, @@ -511,7 +511,10 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, task->policy, (unsigned long long)delayacct_blkio_ticks(task), cputime_to_clock_t(gtime), - cputime_to_clock_t(cgtime)); + cputime_to_clock_t(cgtime), + (mm && permitted) ? mm->start_data : 0, + (mm && permitted) ? mm->end_data : 0, + (mm && permitted) ? mm->start_brk : 0); if (mm) mmput(mm); return 0; -- cgit v1.2.3