summaryrefslogtreecommitdiff
path: root/mm/percpu.c
AgeCommit message (Collapse)Author
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-07Merge branch 'for-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
2010-12-22percpu: print out alloc information with KERN_DEBUG instead of KERN_INFOTejun Heo
Now that percpu allocator is mostly stable, there is no reason to print alloc information with KERN_INFO and clutter the boot messages. Switch it to KERN_DEBUG. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com>
2010-12-07percpu: zero memory more efficiently in mm/percpu.c::pcpu_mem_alloc()Jesper Juhl
Don't do vmalloc() + memset() when vzalloc() will do. tj: dropped unnecessary temp variable ptr. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-24Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing|ed] => interest[ing|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c
2010-10-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: update comments to reflect that percpu allocations are always zero-filled percpu: Optimize __get_cpu_var() x86, percpu: Optimize this_cpu_ptr percpu: clear memory allocated with the km allocator percpu: fix build breakage on s390 and cleanup build configuration tests percpu: use percpu allocator on UP too percpu: reduce PCPU_MIN_UNIT_SIZE to 32k vmalloc: pcpu_get/free_vm_areas() aren't needed on UP Fixed up trivial conflicts in include/linux/percpu.h
2010-09-21percpu: fix pcpu_last_unit_cpuTejun Heo
pcpu_first/last_unit_cpu are used to track which cpu has the first and last units assigned. This in turn is used to determine the span of a chunk for man/unmap cache flushes and whether an address belongs to the first chunk or not in per_cpu_ptr_to_phys(). When the number of possible CPUs isn't power of two, a chunk may contain unassigned units towards the end of a chunk. The logic to determine pcpu_last_unit_cpu was incorrect when there was an unused unit at the end of a chunk. It failed to ignore the unused unit and assigned the unused marker NR_CPUS to pcpu_last_unit_cpu. This was discovered through kdump failure which was caused by malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible CPUs by CAI Qian. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: CAI Qian <caiqian@redhat.com> Cc: stable@kernel.org
2010-09-10percpu: update comments to reflect that percpu allocations are always ↵Tejun Heo
zero-filled Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephane Eranian <eranian@google.com>
2010-09-09percpu: fix build breakage on s390 and cleanup build configuration testsTejun Heo
Commit bbddff05 (percpu: use percpu allocator on UP too) incorrectly excluded pcpu_build_alloc_info() on SMP configurations which use generic setup_per_cpu_area() like s390. The config ifdefs are becoming confusing. Fix and clean it up by, * Move pcpu_build_alloc_info() right on top of its two users - pcpu_{embed|page}_first_chunk() which are already in CONFIG_SMP block. * Define BUILD_{EMBED|PAGE}_FIRST_CHUNK which indicate whether each first chunk function needs to be included and use them to control inclusion of the three functions to reduce confusion. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sachin Sant <sachinp@in.ibm.com>
2010-09-08percpu: use percpu allocator on UP tooTejun Heo
On UP, percpu allocations were redirected to kmalloc. This has the following problems. * For certain amount of allocations (determined by PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu allocator can be used before the usual kernel memory allocator is brought online. On SMP, this is used to initialize the kernel memory allocator. * percpu allocator honors alignment upto PAGE_SIZE but kmalloc() doesn't. For example, workqueue makes use of larger alignments for cpu_workqueues. Currently, users of percpu allocators need to handle UP differently, which is somewhat fragile and ugly. Other than small amount of memory, there isn't much to lose by enabling percpu allocator on UP. It can simply use kernel memory based chunk allocation which was added for SMP archs w/o MMUs. This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and makes UP build use percpu-km. As percpu addresses and kernel addresses are always identity mapped and static percpu variables don't need any special treatment, nothing is arch dependent and mm/percpu.c implements generic setup_per_cpu_areas() for UP. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-08-27percpu: fix a mismatch between code and commentNamhyung Kim
When pcpu_build_alloc_info() searches best_upa value, it ignores current value if the number of waste units exceeds 1/3 of the number of total cpus. But the comment on the code says that it will ignore if wastage is over 25%. Modify the comment. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-27percpu: fix a memory leak in pcpu_extend_area_map()Huang Shijie
The original code did not free the old map. This patch fixes it. tj: use @old as memcpy source instead of @chunk->map, and indentation and description update Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org
2010-08-11fix a typo on comments in mm/percpu.cNamhyung Kim
'eqaul' should be 'equal'. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-06-27percpu: allow limited allocation before slab is onlineTejun Heo
This patch updates percpu allocator such that it can serve limited amount of allocation before slab comes online. This is primarily to allow slab to depend on working percpu allocator. Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how much memory space and allocation map slots are reserved. If this reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation will fail till slab comes online. The following changes are made to implement early alloc. * pcpu_mem_alloc() now checks slab_is_available() * Chunks are allocated using pcpu_mem_alloc() * Init paths make sure ai->dyn_size is at least as large as PERCPU_DYNAMIC_EARLY_SIZE. * Initial alloc maps are allocated in __initdata and copied to kmalloc'd areas once slab is online. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org>
2010-06-27percpu: make @dyn_size always mean min dyn_size in first chunk init functionsTejun Heo
In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum size. There's no use case for forcing 0 and the upcoming early alloc support always requires non-zero dynamic size. Make @dyn_size always mean minimum dyn_size. While at it, make pcpu_build_alloc_info() static which doesn't have any external caller as suggested by David Rientjes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com>
2010-06-18percpu: fix first chunk match in per_cpu_ptr_to_phys()Tejun Heo
per_cpu_ptr_to_phys() determines whether the passed in @addr belongs to the first_chunk or not by just matching the address against the address range of the base unit (unit0, used by cpu0). When an adress from another cpu was passed in, it will always determine that the address doesn't belong to the first chunk even when it does. This makes the function return a bogus physical address which may lead to crash. This problem was discovered by Cliff Wickman while investigating a crash during kdump on a SGI UV system. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Cliff Wickman <cpw@sgi.com> Tested-by: Cliff Wickman <cpw@sgi.com> Cc: stable@kernel.org
2010-06-17percpu: fix trivial bugs in pcpu_build_alloc_info()Pavel V. Panteleev
Fix the following two trivial bugs in pcpu_build_alloc_info() * we should memset group_cnt to 0 by size of group_cnt, not size of group_map (both are of the same size, so the bug isn't dangerous) * we can delete useless variable group_cnt_max. Signed-off-by: Pavel V. Panteleev <pp_84@mail.ru> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-05-01percpu: implement kernel memory based chunk allocationTejun Heo
Implement an alternate percpu chunk management based on kernel memeory for nommu SMP architectures. Instead of mapping into vmalloc area, chunks are allocated as a contiguous kernel memory using alloc_pages(). As such, percpu allocator on nommu will have the following restrictions. * It can't fill chunks on-demand page-by-page. It has to allocate each chunk fully upfront. * It can't support sparse chunk for NUMA configurations. SMP w/o mmu is crazy enough. Let's hope no one does NUMA w/o mmu. :-P * If chunk size isn't power-of-two multiple of PAGE_SIZE, the unaligned amount will be wasted on each chunk. So, archs which use this better align chunk size. For instructions on how to use this, read the comment on top of mm/percpu-km.c. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Graff Yang <graff.yang@gmail.com> Cc: Sonic Zhang <sonic.adi@gmail.com>
2010-05-01percpu: move vmalloc based chunk management into percpu-vm.cTejun Heo
Separate out and move chunk management (creation/desctruction and [de]population) code into percpu-vm.c which is included by percpu.c and compiled together. The interface for chunk management is defined as follows. * pcpu_populate_chunk - populate the specified range of a chunk * pcpu_depopulate_chunk - depopulate the specified range of a chunk * pcpu_create_chunk - create a new chunk * pcpu_destroy_chunk - destroy a chunk, always preceded by full depop * pcpu_addr_to_page - translate address to physical address * pcpu_verify_alloc_info - check alloc_info is acceptable during init Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and dummy pcpu_verify_alloc_info() implementation, this patch only moves code around. This separation is to allow alternate chunk management implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Graff Yang <graff.yang@gmail.com> Cc: Sonic Zhang <sonic.adi@gmail.com>
2010-05-01percpu: misc preparations for nommu supportTejun Heo
Make the following misc preparations for percpu nommu support. * Remove refernces to vmalloc in common comments as nommu percpu won't use it. * Rename chunk->vms to chunk->data and make it void *. Its use is determined by chunk management implementation. * Relocate utility functions and add __maybe_unused to functions which might not be used by different chunk management implementations. This patch doesn't cause any functional change. This is to allow alternate chunk management implementation for percpu nommu support. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Graff Yang <graff.yang@gmail.com> Cc: Sonic Zhang <sonic.adi@gmail.com>
2010-05-01percpu: reorganize chunk creation and destructionTejun Heo
Reorganize alloc/free_pcpu_chunk() such that chunk struct alloc/free live in pcpu_alloc/free_chunk() and the rest in pcpu_create/destroy_chunk(). While at it, add missing error handling for chunk->map allocation failure. This is to allow alternate chunk management implementation for percpu nommu support. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Graff Yang <graff.yang@gmail.com> Cc: Sonic Zhang <sonic.adi@gmail.com>
2010-05-01percpu: factor out pcpu_addr_in_first/reserved_chunk() and update ↵Tejun Heo
per_cpu_ptr_to_phys() Factor out pcpu_addr_in_first/reserved_chunk() from pcpu_chunk_addr_search() and use it to update per_cpu_ptr_to_phys() such that it handles first chunk differently from the rest. This patch doesn't cause any functional change and is to prepare for percpu nommu support. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Graff Yang <graff.yang@gmail.com> Cc: Sonic Zhang <sonic.adi@gmail.com>
2010-03-29percpu, module: implement and use is_kernel/module_percpu_address()Tejun Heo
lockdep has custom code to check whether a pointer belongs to static percpu area which is somewhat broken. Implement proper is_kernel/module_percpu_address() and replace the custom code. On UP, percpu variables are regular static variables and can't be distinguished from them. Always return %false on UP. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>
2010-02-17percpu: add __percpu sparse annotations to core kernel subsystemsTejun Heo
Add __percpu sparse annotations to core subsystems. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-mm@kvack.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Biederman <ebiederm@xmission.com>
2010-02-02Merge branch 'master' into percpuTejun Heo
2010-01-11percpu: avoid calling __pcpu_ptr_to_addr(NULL)Andrew Morton
__pcpu_ptr_to_addr() can be overridden by the architecture and might not behave well if passed a NULL pointer. So avoid calling it until we have verified that its arg is not NULL. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-05Merge branch 'master' into percpuTejun Heo
Conflicts: arch/powerpc/platforms/pseries/hvCall.S include/linux/percpu.h
2009-12-08percpu: refactor the code in pcpu_[de]populate_chunk()WANG Cong
Using break statement at the end of a for loop is confusing, refactor it by replacing the for loop. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-12-08Merge branch 'for-linus' into for-nextTejun Heo
Conflicts: mm/percpu.c
2009-11-25percpu: Fix kdump failure if booted with percpu_alloc=pageVivek Goyal
o kdump functionality reserves a per cpu area at boot time and exports the physical address of that area to user space through sys interface. This area stores some dump related information like cpu register states etc at the time of crash. o We were assuming that per cpu area always come from linearly mapped meory region and using __pa() to determine physical address. With percpu_alloc=page, per cpu area can come from vmalloc region also and __pa() breaks. o This patch implments a new function to convert per cpu address to physical address. Before the patch, crash_notes addresses looked as follows. cpu0 60fffff49800 cpu1 60fffff60800 cpu2 60fffff77800 These are bogus phsyical addresses. After the patch, address are following. cpu0 13eb44000 cpu1 13eb43000 cpu2 13eb42000 cpu3 13eb41000 These look fine. I got 4G of memory and /proc/iomem tell me following. 100000000-13fffffff : System RAM tj: * added missing asm/io.h include reported by Stephen Rothwell * repositioned per_cpu_ptr_phys() in percpu.c and added comment. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2009-11-13percpu: restructure pcpu_extend_area_map() to fix bugs and improve readabilityTejun Heo
pcpu_extend_area_map() had the following two bugs. * It should return 1 if pcpu_lock was dropped and reacquired but it returned 0. This could lead to oops if free_percpu() races with area map extension. * pcpu_mem_free() was called under pcpu_lock. pcpu_mem_free() might end up calling vfree() which isn't IRQ safe. This could lead to deadlock through lock order inversion via IRQ. In addition, Linus pointed out that the temporary lock dropping and subtle three-way return value of pcpu_extend_area_map() was very ugly and suggested to split the function into two - pcpu_need_to_extend() and pcpu_extend_area_map(). This patch restructures pcpu_extend_area_map() as suggested and fixes the two bugs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-10-29percpu: remove some sparse warningsTejun Heo
Make the following changes to remove some sparse warnings. * Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before defining it. * Annotate pcpu_extend_area_map() that it is entered with pcpu_lock held, releases it and then reacquires it. * Make percpu related macros use unique nested variable names. * While at it, add pcpu prefix to __size_call[_return]() macros as to-be-implemented sparse annotations will add percpu specific stuff to these macros. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-10-29percpu: allow pcpu_alloc() to be called with IRQs offJiri Kosina
pcpu_alloc() and pcpu_extend_area_map() perform a series of spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe with respect to being called from contexts which have IRQs off. This patch converts the code to perform save/restore of flags instead, making pcpu_alloc() (or __alloc_percpu() respectively) to be called from early kernel startup stage, where IRQs are off. This is needed for proper initialization of per-cpu rq_weight data from sched_init(). tj: added comment explaining why irqsave/restore is used in alloc path. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-12Merge branch 'for-linus' into for-nextTejun Heo
2009-10-12percpu: fix compile warningsTejun Heo
Fix the following two compile warnings which show up on i386. mm/percpu.c:1873: warning: comparison of distinct pointer types lacks a cast mm/percpu.c:1879: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'size_t' Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
2009-10-02percpu: kill legacy percpu allocatorTejun Heo
With ia64 converted, there's no arch left which still uses legacy percpu allocator. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Delightedly-acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org>
2009-09-29percpu: make allocation failures more verboseTejun Heo
Warn and dump stack when percpu allocation fails. percpu allocator is still young and unchecked NULL percpu pointer usage can result in random memory corruption when combined with the pointer shifting in access macros. Allocation failures should be rare and the warning message will be disabled after certain times. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-09-29percpu: make pcpu_setup_first_chunk() failures more verboseTejun Heo
The parameters to pcpu_setup_first_chunk() come from different sources depending on architecture and can be quite complex. The function runs various sanity checks on the parameters and triggers BUG() if something isn't right. However, this is very early during the boot and not reporting exactly what the problem is makes debugging even harder. Add PCPU_SETUP_BUG() macro which prints out enough information about the parameters. As the macro still puts separate BUG() for each check, it won't lose any information even on the situations where only the program counter can be retrieved. While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so that it's visible on the console if boot fails to complete. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-09-29percpu: make embedding first chunk allocator check vmalloc space sizeTejun Heo
Embedding first chunk allocator maintains the distances between units in the vmalloc area and thus needs vmalloc space to be larger than the maximum distances between units; otherwise, it wouldn't be able to create any dynamic chunks. This patch makes the embedding first chunk allocator check vmalloc space size and if the maximum distance between units is larger than 75% of it, print warning and, if page mapping allocator is available, fail initialization so that the system falls back onto it. This should work around percpu allocation failure problems on certain sparc64 configurations where distances between NUMA nodes are larger than the vmalloc area and makes percpu allocator more robust for future configurations. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-09-29percpu: make pcpu_build_alloc_info() clear static buffersTejun Heo
pcpu_build_alloc_info() may be called multiple times when percpu is falling back to different first chunk allocator. Make it clear static buffers so that they don't contain values from previous runs. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-09-29percpu: fix unit_map[] verification in pcpu_setup_first_chunk()Tejun Heo
pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible unit number while unit number can equal and go over NR_CPUS with sparse unit map. This triggers BUG_ON() spuriously on machines which have non-power-of-two number of cpus. Use UINT_MAX instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Tony Vroon <tony@linx.net>
2009-09-15Merge branch 'for-next' into for-linusTejun Heo
* pcpu_chunk_page_occupied() doesn't exist in for-next. * pcpu_chunk_addr_search() updated to use raw_smp_processor_id(). Conflicts: mm/percpu.c
2009-09-01percpu: don't assume existence of cpu0Tejun Heo
percpu incorrectly assumed that cpu0 was always there which led to the following warning and eventual oops on sparc machines w/o cpu0. WARNING: at mm/percpu.c:651 pcpu_map+0xdc/0x100() Modules linked in: Call Trace: [000000000045eb70] warn_slowpath_common+0x50/0xa0 [000000000045ebdc] warn_slowpath_null+0x1c/0x40 [00000000004d493c] pcpu_map+0xdc/0x100 [00000000004d59a4] pcpu_alloc+0x3e4/0x4e0 [00000000004d5af8] __alloc_percpu+0x18/0x40 [00000000005b112c] __percpu_counter_init+0x4c/0xc0 ... Unable to handle kernel NULL pointer dereference ... I7: <sysfs_new_dirent+0x30/0x120> Disabling lock debugging due to kernel taint Caller[000000000053c1b0]: sysfs_new_dirent+0x30/0x120 Caller[000000000053c7a4]: create_dir+0x24/0xc0 Caller[000000000053c870]: sysfs_create_dir+0x30/0x80 Caller[00000000005990e8]: kobject_add_internal+0xc8/0x200 ... Kernel panic - not syncing: Attempted to kill the idle task! This patch fixes the problem by backporting parts from devel branch to make percpu core not depend on the existence of cpu0. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Meelis Roos <mroos@linux.ee> Cc: David Miller <davem@davemloft.net>
2009-08-14percpu: kill lpage first chunk allocatorTejun Heo
With x86 converted to embedding allocator, lpage doesn't have any user left. Kill it along with cpa handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com>
2009-08-14percpu: update embedding first chunk allocator to handle sparse unitsTejun Heo
Now that percpu core can handle very sparse units, given that vmalloc space is large enough, embedding first chunk allocator can use any memory to build the first chunk. This patch teaches pcpu_embed_first_chunk() about distances between cpus and to use alloc/free callbacks to allocate node specific areas for each group and use them for the first chunk. This brings the benefits of embedding allocator to NUMA configurations - no extra TLB pressure with the flexibility of unified dynamic allocator and no need to restructure arch code to build memory layout suitable for percpu. With units put into atom_size aligned groups according to cpu distances, using large page for dynamic chunks is also easily possible with falling back to reuglar pages if large allocation fails. Embedding allocator users are converted to specify NULL cpu_distance_fn, so this patch doesn't cause any visible behavior difference. Following patches will convert them. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14percpu: use group information to allocate vmap areas sparselyTejun Heo
ai->groups[] contains which units need to be put consecutively and at what offset from the chunk base address. Compile this information into pcpu_group_offsets[] and pcpu_group_sizes[] in pcpu_setup_first_chunk() and use them to allocate sparse vm areas using pcpu_get_vm_areas(). This will be used to allow directly using sparse NUMA memories as percpu areas. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Nick Piggin <npiggin@suse.de>
2009-08-14percpu: add chunk->base_addrTejun Heo
The only thing percpu allocator wants to know about a vmalloc area is the base address. Instead of requiring chunk->vm, add chunk->base_addr which contains the necessary value. This simplifies the code a bit and makes the dummy first_vm unnecessary. This change will ease allowing a chunk to be mapped by multiple vms. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14percpu: add pcpu_unit_offsets[]Tejun Heo
Currently units are mapped sequentially into address space. This patch adds pcpu_unit_offsets[] which allows units to be mapped to arbitrary offsets from the chunk base address. This is necessary to allow sparse embedding which might would need to allocate address ranges and memory areas which aren't aligned to unit size but allocation atom size (page or large page size). This also simplifies things a bit by removing the need to calculate offset from unit number. With this change, there's no need for the arch code to know pcpu_unit_size. Update pcpu_setup_first_chunk() and first chunk allocators to return regular 0 or -errno return code instead of unit size or -errno. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David S. Miller <davem@davemloft.net>
2009-08-14percpu: introduce pcpu_alloc_info and pcpu_group_infoTejun Heo
Till now, non-linear cpu->unit map was expressed using an integer array which maps each cpu to a unit and used only by lpage allocator. Although how many units have been placed in a single contiguos area (group) is known while building unit_map, the information is lost when the result is recorded into the unit_map array. For lpage allocator, as all allocations are done by lpages and whether two adjacent lpages are in the same group or not is irrelevant, this didn't cause any problem. Non-linear cpu->unit mapping will be used for sparse embedding and this grouping information is necessary for that. This patch introduces pcpu_alloc_info which contains all the information necessary for initializing percpu allocator. pcpu_alloc_info contains array of pcpu_group_info which describes how units are grouped and mapped to cpus. pcpu_group_info also has base_offset field to specify its offset from the chunk's base address. pcpu_build_alloc_info() initializes this field as if all groups are allocated back-to-back as is currently done but this will be used to sparsely place groups. pcpu_alloc_info is a rather complex data structure which contains a flexible array which in turn points to nested cpu_map arrays. * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to help dealing with pcpu_alloc_info. * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info, generalized and renamed to pcpu_build_alloc_info(). @cpu_distance_fn may be NULL indicating that all cpus are of LOCAL_DISTANCE. * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info, generalized and renamed to pcpu_dump_alloc_info(). It now also prints which group each alloc unit belongs to. * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the separate parameters. All first chunk allocators are updated to use pcpu_build_alloc_info() to build alloc_info and call pcpu_setup_first_chunk() with it. This has the side effect of packing units for sparse possible cpus. ie. if cpus 0, 2 and 4 are possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4. * x86 setup_pcpu_lpage() is updated to deal with alloc_info. * sparc64 setup_per_cpu_areas() is updated to build alloc_info. Although the changes made by this patch are pretty pervasive, it doesn't cause any behavior difference other than packing of sparse cpus. It mostly changes how information is passed among initialization functions and makes room for more flexibility. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>