summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMel Gorman <mgorman@suse.de>2011-07-08 15:39:36 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2011-07-08 21:14:42 -0700
commit08951e545918c1594434d000d88a7793e2452a9b (patch)
tree56f91aba454751e2b6bcd67945ed2d4ebbeb2025
parent902daf6580cffe04721250fb71b5527a98718b11 (diff)
mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely
During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. Unfortunately, if the highest zone is small, a problem occurs. This seems to happen most with recent sandybridge laptops but it's probably a co-incidence as some of these laptops just happen to have a small Normal zone. The reproduction case is almost always during copying large files that kswapd pegs at 100% CPU until the file is deleted or cache is dropped. The problem is mostly down to sleeping_prematurely() keeping kswapd awake when the highest zone is small and unreclaimable and compounded by the fact we shrink slabs even when not shrinking zones causing a lot of time to be spent in shrinkers and a lot of memory to be reclaimed. Patch 1 corrects sleeping_prematurely to check the zones matching the classzone_idx instead of all zones. Patch 2 avoids shrinking slab when we are not shrinking a zone. Patch 3 notes that sleeping_prematurely is checking lower zones against a high classzone which is not what allocators or balance_pgdat() is doing leading to an artifical belief that kswapd should be still awake. Patch 4 notes that when balance_pgdat() gives up on a high zone that the decision is not communicated to sleeping_prematurely() This problem affects 2.6.38.8 for certain and is expected to affect 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable to be picked up by distros and this series is against 3.0-rc4. I've cc'd people that reported similar problems recently to see if they still suffer from the problem and if this fixes it. This patch: correct the check for kswapd sleeping in sleeping_prematurely() During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. A problem occurs if the highest zone is small. balance_pgdat() only considers unreclaimable zones when priority is DEF_PRIORITY but sleeping_prematurely considers all zones. It's possible for this sequence to occur 1. kswapd wakes up and enters balance_pgdat() 2. At DEF_PRIORITY, marks highest zone unreclaimable 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from highest zone, clearing all_unreclaimable. Highest zone is still unbalanced 5. kswapd returns and calls sleeping_prematurely 6. sleeping_prematurely looks at *all* zones, not just the ones being considered by balance_pgdat. The highest small zone has all_unreclaimable cleared but the zone is not balanced. all_zones_ok is false so kswapd stays awake This patch corrects the behaviour of sleeping_prematurely to check the zones balance_pgdat() checked. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Pádraig Brady <P@draigBrady.com> Tested-by: Pádraig Brady <P@draigBrady.com> Tested-by: Andrew Lutomirski <luto@mit.edu> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--mm/vmscan.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f49535d4cd..04c49fe781f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2326,7 +2326,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
return true;
/* Check the watermark levels */
- for (i = 0; i < pgdat->nr_zones; i++) {
+ for (i = 0; i <= classzone_idx; i++) {
struct zone *zone = pgdat->node_zones + i;
if (!populated_zone(zone))