summaryrefslogtreecommitdiff
path: root/mm
diff options
context:
space:
mode:
authorDennis Zhou (Facebook) <dennisszhou@gmail.com>2018-08-31 16:22:43 -0400
committerJens Axboe <axboe@kernel.dk>2018-08-31 14:48:56 -0600
commit59b57717fff8b562825d9d25e0180ad7e8048ca9 (patch)
treed3e3158cd0f5e9ff6340ba0f9d070772429401b2 /mm
parent6b06546206868f723f2061d703a3c3c378dcbf4c (diff)
blkcg: delay blkg destruction until after writeback has finished
Currently, blkcg destruction relies on a sequence of events: 1. Destruction starts. blkcg_css_offline() is called and blkgs release their reference to the blkcg. This immediately destroys the cgwbs (writeback). 2. With blkgs giving up their reference, the blkcg ref count should become zero and eventually call blkcg_css_free() which finally frees the blkcg. Jiufei Xue reported that there is a race between blkcg_bio_issue_check() and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent on the completion of all writeback associated with the blkcg. A count of the number of cgwbs is maintained and once that goes to zero, blkg destruction can follow. This should prevent premature blkg destruction related to writeback. The new process for blkcg cleanup is as follows: 1. Destruction starts. blkcg_css_offline() is called which offlines writeback. Blkg destruction is delayed on the cgwb_refcnt count to avoid punting potentially large amounts of outstanding writeback to root while maintaining any ongoing policies. Here, the base cgwb_refcnt is put back. 2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called and handles destruction of blkgs. This is where the css reference held by each blkg is released. 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called. This finally frees the blkg. It seems in the past blk-throttle didn't do the most understandable things with taking data from a blkg while associating with current. So, the simplification and unification of what blk-throttle is doing caused this. Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'mm')
-rw-r--r--mm/backing-dev.c5
1 files changed, 5 insertions, 0 deletions
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f5981e9d6ae2..8a8bb8796c6c 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -491,6 +491,7 @@ static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
release_work);
+ struct blkcg *blkcg = css_to_blkcg(wb->blkcg_css);
mutex_lock(&wb->bdi->cgwb_release_mutex);
wb_shutdown(wb);
@@ -499,6 +500,9 @@ static void cgwb_release_workfn(struct work_struct *work)
css_put(wb->blkcg_css);
mutex_unlock(&wb->bdi->cgwb_release_mutex);
+ /* triggers blkg destruction if cgwb_refcnt becomes zero */
+ blkcg_cgwb_put(blkcg);
+
fprop_local_destroy_percpu(&wb->memcg_completions);
percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
@@ -597,6 +601,7 @@ static int cgwb_create(struct backing_dev_info *bdi,
list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
list_add(&wb->memcg_node, memcg_cgwb_list);
list_add(&wb->blkcg_node, blkcg_cgwb_list);
+ blkcg_cgwb_get(blkcg);
css_get(memcg_css);
css_get(blkcg_css);
}