diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-11-16 11:41:22 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-11-16 11:41:22 -0800 |
commit | 487e2c9f44c4b5ea23bfe87bb34679f7297a0bce (patch) | |
tree | e9dcf16175078ae2bed9a2fc120e6bd0b28f48e9 /fs/afs/callback.c | |
parent | b630a23a731a436f9edbd9fa00739aaa3e174c15 (diff) | |
parent | 98bf40cd99fcfed0705812b6cbdbb3b441a42970 (diff) |
Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
"kAFS filesystem driver overhaul.
The major points of the overhaul are:
(1) Preliminary groundwork is laid for supporting network-namespacing
of kAFS. The remainder of the namespacing work requires some way
to pass namespace information to submounts triggered by an
automount. This requires something like the mount overhaul that's
in progress.
(2) sockaddr_rxrpc is used in preference to in_addr for holding
addresses internally and add support for talking to the YFS VL
server. With this, kAFS can do everything over IPv6 as well as
IPv4 if it's talking to servers that support it.
(3) Callback handling is overhauled to be generally passive rather
than active. 'Callbacks' are promises by the server to tell us
about data and metadata changes. Callbacks are now checked when
we next touch an inode rather than actively going and looking for
it where possible.
(4) File access permit caching is overhauled to store the caching
information per-inode rather than per-directory, shared over
subordinate files. Whilst older AFS servers only allow ACLs on
directories (shared to the files in that directory), newer AFS
servers break that restriction.
To improve memory usage and to make it easier to do mass-key
removal, permit combinations are cached and shared.
(5) Cell database management is overhauled to allow lighter locks to
be used and to make cell records autonomous state machines that
look after getting their own DNS records and cleaning themselves
up, in particular preventing races in acquiring and relinquishing
the fscache token for the cell.
(6) Volume caching is overhauled. The afs_vlocation record is got rid
of to simplify things and the superblock is now keyed on the cell
and the numeric volume ID only. The volume record is tied to a
superblock and normal superblock management is used to mediate
the lifetime of the volume fscache token.
(7) File server record caching is overhauled to make server records
independent of cells and volumes. A server can be in multiple
cells (in such a case, the administrator must make sure that the
VL services for all cells correctly reflect the volumes shared
between those cells).
Server records are now indexed using the UUID of the server
rather than the address since a server can have multiple
addresses.
(8) File server rotation is overhauled to handle VMOVED, VBUSY (and
similar), VOFFLINE and VNOVOL indications and to handle rotation
both of servers and addresses of those servers. The rotation will
also wait and retry if the server says it is busy.
(9) Data writeback is overhauled. Each inode no longer stores a list
of modified sections tagged with the key that authorised it in
favour of noting the modified region of a page in page->private
and storing a list of keys that made modifications in the inode.
This simplifies things and allows other keys to be used to
actually write to the server if a key that made a modification
becomes useless.
(10) Writable mmap() is implemented. This allows a kernel to be build
entirely on AFS.
Note that Pre AFS-3.4 servers are no longer supported, though this can
be added back if necessary (AFS-3.4 was released in 1998)"
* tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits)
afs: Protect call->state changes against signals
afs: Trace page dirty/clean
afs: Implement shared-writeable mmap
afs: Get rid of the afs_writeback record
afs: Introduce a file-private data record
afs: Use a dynamic port if 7001 is in use
afs: Fix directory read/modify race
afs: Trace the sending of pages
afs: Trace the initiation and completion of client calls
afs: Fix documentation on # vs % prefix in mount source specification
afs: Fix total-length calculation for multiple-page send
afs: Only progress call state at end of Tx phase from rxrpc callback
afs: Make use of the YFS service upgrade to fully support IPv6
afs: Overhaul volume and server record caching and fileserver rotation
afs: Move server rotation code into its own file
afs: Add an address list concept
afs: Overhaul cell database management
afs: Overhaul permit caching
afs: Overhaul the callback handling
afs: Rename struct afs_call server member to cm_server
...
Diffstat (limited to 'fs/afs/callback.c')
-rw-r--r-- | fs/afs/callback.c | 504 |
1 files changed, 136 insertions, 368 deletions
diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 25d404d22cae..f4291b576054 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -20,118 +20,151 @@ #include <linux/sched.h> #include "internal.h" -#if 0 -unsigned afs_vnode_update_timeout = 10; -#endif /* 0 */ - -#define afs_breakring_space(server) \ - CIRC_SPACE((server)->cb_break_head, (server)->cb_break_tail, \ - ARRAY_SIZE((server)->cb_break)) - -//static void afs_callback_updater(struct work_struct *); - -static struct workqueue_struct *afs_callback_update_worker; - /* - * allow the fileserver to request callback state (re-)initialisation + * Set up an interest-in-callbacks record for a volume on a server and + * register it with the server. + * - Called with volume->server_sem held. */ -void afs_init_callback_state(struct afs_server *server) +int afs_register_server_cb_interest(struct afs_vnode *vnode, + struct afs_server_entry *entry) { - struct afs_vnode *vnode; - - _enter("{%p}", server); + struct afs_cb_interest *cbi = entry->cb_interest, *vcbi, *new, *x; + struct afs_server *server = entry->server; + +again: + vcbi = vnode->cb_interest; + if (vcbi) { + if (vcbi == cbi) + return 0; + + if (cbi && vcbi->server == cbi->server) { + write_seqlock(&vnode->cb_lock); + vnode->cb_interest = afs_get_cb_interest(cbi); + write_sequnlock(&vnode->cb_lock); + afs_put_cb_interest(afs_v2net(vnode), cbi); + return 0; + } - spin_lock(&server->cb_lock); + if (!cbi && vcbi->server == server) { + afs_get_cb_interest(vcbi); + x = cmpxchg(&entry->cb_interest, cbi, vcbi); + if (x != cbi) { + cbi = x; + afs_put_cb_interest(afs_v2net(vnode), vcbi); + goto again; + } + return 0; + } + } - /* kill all the promises on record from this server */ - while (!RB_EMPTY_ROOT(&server->cb_promises)) { - vnode = rb_entry(server->cb_promises.rb_node, - struct afs_vnode, cb_promise); - _debug("UNPROMISE { vid=%x:%u uq=%u}", - vnode->fid.vid, vnode->fid.vnode, vnode->fid.unique); - rb_erase(&vnode->cb_promise, &server->cb_promises); - vnode->cb_promised = false; + if (!cbi) { + new = kzalloc(sizeof(struct afs_cb_interest), GFP_KERNEL); + if (!new) + return -ENOMEM; + + refcount_set(&new->usage, 1); + new->sb = vnode->vfs_inode.i_sb; + new->vid = vnode->volume->vid; + new->server = afs_get_server(server); + INIT_LIST_HEAD(&new->cb_link); + + write_lock(&server->cb_break_lock); + list_add_tail(&new->cb_link, &server->cb_interests); + write_unlock(&server->cb_break_lock); + + x = cmpxchg(&entry->cb_interest, cbi, new); + if (x == cbi) { + cbi = new; + } else { + cbi = x; + afs_put_cb_interest(afs_v2net(vnode), new); + } } - spin_unlock(&server->cb_lock); - _leave(""); + ASSERT(cbi); + + /* Change the server the vnode is using. This entails scrubbing any + * interest the vnode had in the previous server it was using. + */ + write_seqlock(&vnode->cb_lock); + + vnode->cb_interest = afs_get_cb_interest(cbi); + vnode->cb_s_break = cbi->server->cb_s_break; + clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags); + + write_sequnlock(&vnode->cb_lock); + return 0; } /* - * handle the data invalidation side of a callback being broken + * Set a vnode's interest on a server. */ -void afs_broken_callback_work(struct work_struct *work) +void afs_set_cb_interest(struct afs_vnode *vnode, struct afs_cb_interest *cbi) { - struct afs_vnode *vnode = - container_of(work, struct afs_vnode, cb_broken_work); + struct afs_cb_interest *old_cbi = NULL; - _enter(""); - - if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) + if (vnode->cb_interest == cbi) return; - /* we're only interested in dealing with a broken callback on *this* - * vnode and only if no-one else has dealt with it yet */ - if (!mutex_trylock(&vnode->validate_lock)) - return; /* someone else is dealing with it */ - - if (test_bit(AFS_VNODE_CB_BROKEN, &vnode->flags)) { - if (S_ISDIR(vnode->vfs_inode.i_mode)) - afs_clear_permits(vnode); - - if (afs_vnode_fetch_status(vnode, NULL, NULL) < 0) - goto out; - - if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) - goto out; - - /* if the vnode's data version number changed then its contents - * are different */ - if (test_and_clear_bit(AFS_VNODE_ZAP_DATA, &vnode->flags)) - afs_zap_data(vnode); + write_seqlock(&vnode->cb_lock); + if (vnode->cb_interest != cbi) { + afs_get_cb_interest(cbi); + old_cbi = vnode->cb_interest; + vnode->cb_interest = cbi; } + write_sequnlock(&vnode->cb_lock); + afs_put_cb_interest(afs_v2net(vnode), cbi); +} -out: - mutex_unlock(&vnode->validate_lock); - - /* avoid the potential race whereby the mutex_trylock() in this - * function happens again between the clear_bit() and the - * mutex_unlock() */ - if (test_bit(AFS_VNODE_CB_BROKEN, &vnode->flags)) { - _debug("requeue"); - queue_work(afs_callback_update_worker, &vnode->cb_broken_work); +/* + * Remove an interest on a server. + */ +void afs_put_cb_interest(struct afs_net *net, struct afs_cb_interest *cbi) +{ + if (cbi && refcount_dec_and_test(&cbi->usage)) { + if (!list_empty(&cbi->cb_link)) { + write_lock(&cbi->server->cb_break_lock); + list_del_init(&cbi->cb_link); + write_unlock(&cbi->server->cb_break_lock); + afs_put_server(net, cbi->server); + } + kfree(cbi); } - _leave(""); +} + +/* + * allow the fileserver to request callback state (re-)initialisation + */ +void afs_init_callback_state(struct afs_server *server) +{ + if (!test_and_clear_bit(AFS_SERVER_FL_NEW, &server->flags)) + server->cb_s_break++; } /* * actually break a callback */ -static void afs_break_callback(struct afs_server *server, - struct afs_vnode *vnode) +void afs_break_callback(struct afs_vnode *vnode) { _enter(""); - set_bit(AFS_VNODE_CB_BROKEN, &vnode->flags); + write_seqlock(&vnode->cb_lock); + + if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) { + vnode->cb_break++; + afs_clear_permits(vnode); - if (vnode->cb_promised) { spin_lock(&vnode->lock); _debug("break callback"); - spin_lock(&server->cb_lock); - if (vnode->cb_promised) { - rb_erase(&vnode->cb_promise, &server->cb_promises); - vnode->cb_promised = false; - } - spin_unlock(&server->cb_lock); - - queue_work(afs_callback_update_worker, &vnode->cb_broken_work); if (list_empty(&vnode->granted_locks) && !list_empty(&vnode->pending_locks)) afs_lock_may_be_available(vnode); spin_unlock(&vnode->lock); } + + write_sequnlock(&vnode->cb_lock); } /* @@ -143,49 +176,31 @@ static void afs_break_callback(struct afs_server *server, static void afs_break_one_callback(struct afs_server *server, struct afs_fid *fid) { + struct afs_cb_interest *cbi; + struct afs_iget_data data; struct afs_vnode *vnode; - struct rb_node *p; - - _debug("find"); - spin_lock(&server->fs_lock); - p = server->fs_vnodes.rb_node; - while (p) { - vnode = rb_entry(p, struct afs_vnode, server_rb); - if (fid->vid < vnode->fid.vid) - p = p->rb_left; - else if (fid->vid > vnode->fid.vid) - p = p->rb_right; - else if (fid->vnode < vnode->fid.vnode) - p = p->rb_left; - else if (fid->vnode > vnode->fid.vnode) - p = p->rb_right; - else if (fid->unique < vnode->fid.unique) - p = p->rb_left; - else if (fid->unique > vnode->fid.unique) - p = p->rb_right; - else - goto found; - } - - /* not found so we just ignore it (it may have moved to another - * server) */ -not_available: - _debug("not avail"); - spin_unlock(&server->fs_lock); - _leave(""); - return; + struct inode *inode; -found: - _debug("found"); - ASSERTCMP(server, ==, vnode->server); + read_lock(&server->cb_break_lock); - if (!igrab(AFS_VNODE_TO_I(vnode))) - goto not_available; - spin_unlock(&server->fs_lock); + /* Step through all interested superblocks. There may be more than one + * because of cell aliasing. + */ + list_for_each_entry(cbi, &server->cb_interests, cb_link) { + if (cbi->vid != fid->vid) + continue; + + data.volume = NULL; + data.fid = *fid; + inode = ilookup5_nowait(cbi->sb, fid->vnode, afs_iget5_test, &data); + if (inode) { + vnode = AFS_FS_I(inode); + afs_break_callback(vnode); + iput(inode); + } + } - afs_break_callback(server, vnode); - iput(&vnode->vfs_inode); - _leave(""); + read_unlock(&server->cb_break_lock); } /* @@ -216,261 +231,14 @@ void afs_break_callbacks(struct afs_server *server, size_t count, } /* - * record the callback for breaking - * - the caller must hold server->cb_lock + * Clear the callback interests in a server list. */ -static void afs_do_give_up_callback(struct afs_server *server, - struct afs_vnode *vnode) +void afs_clear_callback_interests(struct afs_net *net, struct afs_server_list *slist) { - struct afs_callback *cb; - - _enter("%p,%p", server, vnode); - - cb = &server->cb_break[server->cb_break_head]; - cb->fid = vnode->fid; - cb->version = vnode->cb_version; - cb->expiry = vnode->cb_expiry; - cb->type = vnode->cb_type; - smp_wmb(); - server->cb_break_head = - (server->cb_break_head + 1) & - (ARRAY_SIZE(server->cb_break) - 1); - - /* defer the breaking of callbacks to try and collect as many as - * possible to ship in one operation */ - switch (atomic_inc_return(&server->cb_break_n)) { - case 1 ... AFSCBMAX - 1: - queue_delayed_work(afs_callback_update_worker, - &server->cb_break_work, HZ * 2); - break; - case AFSCBMAX: - afs_flush_callback_breaks(server); - break; - default: - break; - } - - ASSERT(server->cb_promises.rb_node != NULL); - rb_erase(&vnode->cb_promise, &server->cb_promises); - vnode->cb_promised = false; - _leave(""); -} - -/* - * discard the callback on a deleted item - */ -void afs_discard_callback_on_delete(struct afs_vnode *vnode) -{ - struct afs_server *server = vnode->server; + int i; - _enter("%d", vnode->cb_promised); - - if (!vnode->cb_promised) { - _leave(" [not promised]"); - return; - } - - ASSERT(server != NULL); - - spin_lock(&server->cb_lock); - if (vnode->cb_promised) { - ASSERT(server->cb_promises.rb_node != NULL); - rb_erase(&vnode->cb_promise, &server->cb_promises); - vnode->cb_promised = false; + for (i = 0; i < slist->nr_servers; i++) { + afs_put_cb_interest(net, slist->servers[i].cb_interest); + slist->servers[i].cb_interest = NULL; } - spin_unlock(&server->cb_lock); - _leave(""); -} - -/* - * give up the callback registered for a vnode on the file server when the - * inode is being cleared - */ -void afs_give_up_callback(struct afs_vnode *vnode) -{ - struct afs_server *server = vnode->server; - - DECLARE_WAITQUEUE(myself, current); - - _enter("%d", vnode->cb_promised); - - _debug("GIVE UP INODE %p", &vnode->vfs_inode); - - if (!vnode->cb_promised) { - _leave(" [not promised]"); - return; - } - - ASSERT(server != NULL); - - spin_lock(&server->cb_lock); - if (vnode->cb_promised && afs_breakring_space(server) == 0) { - add_wait_queue(&server->cb_break_waitq, &myself); - for (;;) { - set_current_state(TASK_UNINTERRUPTIBLE); - if (!vnode->cb_promised || - afs_breakring_space(server) != 0) - break; - spin_unlock(&server->cb_lock); - schedule(); - spin_lock(&server->cb_lock); - } - remove_wait_queue(&server->cb_break_waitq, &myself); - __set_current_state(TASK_RUNNING); - } - - /* of course, it's always possible for the server to break this vnode's - * callback first... */ - if (vnode->cb_promised) - afs_do_give_up_callback(server, vnode); - - spin_unlock(&server->cb_lock); - _leave(""); -} - -/* - * dispatch a deferred give up callbacks operation - */ -void afs_dispatch_give_up_callbacks(struct work_struct *work) -{ - struct afs_server *server = - container_of(work, struct afs_server, cb_break_work.work); - - _enter(""); - - /* tell the fileserver to discard the callback promises it has - * - in the event of ENOMEM or some other error, we just forget that we - * had callbacks entirely, and the server will call us later to break - * them - */ - afs_fs_give_up_callbacks(server, true); -} - -/* - * flush the outstanding callback breaks on a server - */ -void afs_flush_callback_breaks(struct afs_server *server) -{ - mod_delayed_work(afs_callback_update_worker, &server->cb_break_work, 0); -} - -#if 0 -/* - * update a bunch of callbacks - */ -static void afs_callback_updater(struct work_struct *work) -{ - struct afs_server *server; - struct afs_vnode *vnode, *xvnode; - time64_t now; - long timeout; - int ret; - - server = container_of(work, struct afs_server, updater); - - _enter(""); - - now = ktime_get_real_seconds(); - - /* find the first vnode to update */ - spin_lock(&server->cb_lock); - for (;;) { - if (RB_EMPTY_ROOT(&server->cb_promises)) { - spin_unlock(&server->cb_lock); - _leave(" [nothing]"); - return; - } - - vnode = rb_entry(rb_first(&server->cb_promises), - struct afs_vnode, cb_promise); - if (atomic_read(&vnode->usage) > 0) - break; - rb_erase(&vnode->cb_promise, &server->cb_promises); - vnode->cb_promised = false; - } - - timeout = vnode->update_at - now; - if (timeout > 0) { - queue_delayed_work(afs_vnode_update_worker, - &afs_vnode_update, timeout * HZ); - spin_unlock(&server->cb_lock); - _leave(" [nothing]"); - return; - } - - list_del_init(&vnode->update); - atomic_inc(&vnode->usage); - spin_unlock(&server->cb_lock); - - /* we can now perform the update */ - _debug("update %s", vnode->vldb.name); - vnode->state = AFS_VL_UPDATING; - vnode->upd_rej_cnt = 0; - vnode->upd_busy_cnt = 0; - - ret = afs_vnode_update_record(vl, &vldb); - switch (ret) { - case 0: - afs_vnode_apply_update(vl, &vldb); - vnode->state = AFS_VL_UPDATING; - break; - case -ENOMEDIUM: - vnode->state = AFS_VL_VOLUME_DELETED; - break; - default: - vnode->state = AFS_VL_UNCERTAIN; - break; - } - - /* and then reschedule */ - _debug("reschedule"); - vnode->update_at = ktime_get_real_seconds() + - afs_vnode_update_timeout; - - spin_lock(&server->cb_lock); - - if (!list_empty(&server->cb_promises)) { - /* next update in 10 minutes, but wait at least 1 second more - * than the newest record already queued so that we don't spam - * the VL server suddenly with lots of requests - */ - xvnode = list_entry(server->cb_promises.prev, - struct afs_vnode, update); - if (vnode->update_at <= xvnode->update_at) - vnode->update_at = xvnode->update_at + 1; - xvnode = list_entry(server->cb_promises.next, - struct afs_vnode, update); - timeout = xvnode->update_at - now; - if (timeout < 0) - timeout = 0; - } else { - timeout = afs_vnode_update_timeout; - } - - list_add_tail(&vnode->update, &server->cb_promises); - - _debug("timeout %ld", timeout); - queue_delayed_work(afs_vnode_update_worker, - &afs_vnode_update, timeout * HZ); - spin_unlock(&server->cb_lock); - afs_put_vnode(vl); -} -#endif - -/* - * initialise the callback update process - */ -int __init afs_callback_update_init(void) -{ - afs_callback_update_worker = alloc_ordered_workqueue("kafs_callbackd", - WQ_MEM_RECLAIM); - return afs_callback_update_worker ? 0 : -ENOMEM; -} - -/* - * shut down the callback update process - */ -void afs_callback_update_kill(void) -{ - destroy_workqueue(afs_callback_update_worker); } |