summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-08-17jbd2: Annotate transaction start also for jbd2_journal_restart()Jan Kara
lockdep annotation for a transaction start has been at the end of jbd2_journal_start(). But a transaction is also started from jbd2_journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-18ext4: Show unwritten extent flag in ext4_ext_show_leaf()Mingming
ext4_ext_show_leaf() will display the leaf extents when extent debugging is enabled. Printing out the unwritten bit is useful for debugging unwritten extent, allow us to see the unwritten extents vs written extents, after the unwritten extents are splitted or converted. Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2009-09-01ext4: Compile warning fix when EXT_DEBUG enabledMingming
When EXT_DEBUG is enabled I received the following compile warning on PPC64: CC [M] fs/ext4/inode.o CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’: fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’ fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’: fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’ fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ CC [M] fs/ext4/migrate.o The patch fixes compile warning. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Index: linux-2.6.31-rc4/fs/ext4/extents.c ===================================================================
2009-09-18ext4: Avoid group preallocation for closed filesTheodore Ts'o
Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find *another* completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-09ext4: Fix bugs in mballoc's stream allocation modeTheodore Ts'o
The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all screwed up. These fields were getting unconditionally all the time, set even when stream allocation had not taken place, and if they were being used when the file was smaller than s_mb_stream_request, which is when the allocation should _not_ be doing stream allocation. Fix this by determining whether or not we stream allocation should take place once, in ext4_mb_group_or_file(), and setting a flag which gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). This simplifies the code and assures that we are consistently using (or not using) the stream allocation logic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-09ext4: Display the mballoc flags in mb_history in hex instead of decimalTheodore Ts'o
Displaying the flags in base 16 makes it easier to see which flags have been set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-18ext4: Add configurable run-time mballoc debuggingTheodore Ts'o
Allow mballoc debugging to be enabled at run-time instead of just at compile time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-10ext4: fix journal ref count in move_extent_par_pagePeng Tao
move_extent_par_page calls a_ops->write_begin() to increase journal handler's reference count. However, if either mext_replace_branches() or ext4_get_block fails, the increased reference count isn't decreased. This will cause a later attempt to umount of the fs to hang forever. The patch addresses the issue by calling ext4_journal_stop() if page is not NULL (which means a_ops->write_end() isn't invoked). Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-10jbd2: round commit timer up to avoid uncommitted transactionAndreas Dilger
fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or expicit sync/umount happens. Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com> Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-10ext4: remove redundant test on unsignedRoel Kluin
unsigned i_block cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-27ext4: fix build warning when EXT4FS_DEBUG is onPeng Tao
When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings: linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’: linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_group_t’ So add a type cast to suppress it. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: Fix compile warnings with MB_DEBUGAkira Fujita
When MB_DEBUG is enabled, we get some compile warnings because ext4_group_t is unsigned int. This patch fixes them. Signed-off-by Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05ext4: Remove unnecessary semicolons in mballoc.cJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-17ext4: More buffer head reference leaksCurt Wohlgemuth
After the patch I posted last week regarding buffer head ref leaks in no-journal mode, I looked at all the code that uses buffer heads and searched for more potential leaks. The patch below fixes the issues I found; these can occur even when a journal is present. The change to inode.c fixes a double release if ext4_journal_get_create_access() fails. The changes to namei.c are more complicated. add_dirent_to_buf() will release the input buffer head EXCEPT when it returns -ENOSPC. There are some callers of this routine that don't always do the brelse() in the event that -ENOSPC is returned. Unfortunately, to put this fix into ext4_add_entry() required capturing the return value of make_indexed_dir() and add_dirent_to_buf(). Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-17jbd2: Fail to load a journal if it is too shortJan Kara
Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-27ext4: Avoid null pointer dereference when decoding EROFS w/o a journalTheodore Ts'o
We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-27ext4: Fix typo in ext4/KconfigManish Katiyar
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-17ext4: Fix memory leak fix when mounting an ext4 filesystemAneesh Kumar K.V
The allocation of the ext4_group_info array was moved to a new function ext4_mb_add_group_info() in commit 5f21b0e6 so that online resize would use a common (and correct) codepath. Unfortunately, the call to the new ext4_mb_add_group_info() function was added without removing the code which originally allocated the array. This caused a memory leak each time an ext4 filesystem was mounted. The fix is simple; remove the code that did the original allocation, since it is no longer needed. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev debugfs: Modify default debugfs directory for debugging pktcdvd. debugfs: Modified default dir of debugfs for debugging UHCI. debugfs: Change debugfs directory of IWMC3200 debugfs: Change debuhgfs directory of trace-events-sample.h debugfs: Fix mount directory of debugfs by default in events.txt hpilo: add poll f_op hpilo: add interrupt handler hpilo: staging for interrupt handling driver core: platform_device_add_data(): use kmemdup() Driver core: Add support for compatibility classes uio: add generic driver for PCI 2.3 devices driver-core: move dma-coherent.c from kernel to driver/base mem_class: fix bug mem_class: use minor as index instead of searching the array driver model: constify attribute groups UIO: remove 'default n' from Kconfig Driver core: Add accessor for device platform data Driver core: move dev_get/set_drvdata to drivers/base/dd.c Driver core: add new device to bus's list before probing
2009-09-16writeback: fix possible bdi writeback refcounting problemNick Piggin
wb_clear_pending AFAIKS should not be called after the item has been put on the list, except by the worker threads. It could lead to the situation where the refcount is decremented below 0 and cause lots of problems. Presumably the !wb_has_dirty_io case is not a common one, so it can be discovered when the thread wakes up to check? Also add a comment in bdi_work_clear. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: Fix bdi use after free in wb_work_complete()Nick Piggin
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it can already have been deallocated and used for something else in the !on stack case, giving a false positive in this test and causing corruption. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: improve scalability of bdi writeback work queuesNick Piggin
If you're going to do an atomic RMW on each list entry, there's not much point in all the RCU complexities of the list walking. This is only going to help the multi-thread case I guess, but it doesn't hurt to do now. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: remove smp_mb(), it's not needed with list_add_tail_rcu()Nick Piggin
list_add_tail_rcu contains required barriers. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: use schedule_timeout_interruptible()Jens Axboe
Gets rid of a manual set_current_state(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: add comments to bdi_work structureJens Axboe
And document its retriever, get_next_work_item(). Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: separate starting of sync vs opportunistic writebackJens Axboe
bdi_start_writeback() is currently split into two paths, one for WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback() for WB_SYNC_ALL writeback and let bdi_start_writeback() handle only WB_SYNC_NONE. Push down the writeback_control allocation and only accept the parameters that make sense for each function. This cleans up the API considerably. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: inline allocation failure handling in bdi_alloc_queue_work()Jens Axboe
This gets rid of work == NULL in bdi_queue_work() and puts the OOM handling where it belongs. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: use RCU to protect bdi_listJens Axboe
Now that bdi_writeback_all() no longer handles integrity writeback, it doesn't have to block anymore. This means that we can switch bdi_list reader side protection to RCU. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeoutJens Axboe
Data integrity writeback must use bdi_start_writeback() and ensure that wbc->sb and wbc->bdi are set. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16fs: Assign bdi in super_blockJens Axboe
We do this automatically in get_sb_bdev() from the set_bdev_super() callback. Filesystems that have their own private backing_dev_info must assign that in ->fill_super(). Note that ->s_bdi assignment is required for proper writeback! Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: make wb_writeback() take an argument structureJens Axboe
We need to be able to pass in range_cyclic as well, so instead of growing yet another argument, split the arguments into a struct wb_writeback_args structure that we can use internally. Also makes it easier to just copy all members to an on-stack struct, since we can't access work after clearing the pending bit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: merely wakeup flusher thread if work allocation fails for ↵Christoph Hellwig
WB_SYNC_NONE Since it's an opportunistic writeback and not a data integrity action, don't punt to blocking writeback. Just wakeup the thread and it will flush old data. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16writeback: get rid of wbc->for_writepagesJens Axboe
It's only set, it's never checked. Kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16fs: remove bdev->bd_inode_backing_dev_infoJens Axboe
It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-15driver model: constify attribute groupsDavid Brownell
Let attribute group vectors be declared "const". We'd like to let most attribute metadata live in read-only sections... this is a start. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-14Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...
2009-09-14Merge branch 'osync_cleanup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: fsync: wait for data writeout completion before calling ->fsync vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() fat: Opencode sync_page_range_nolock() pohmelfs: Use new syncing helper xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() ocfs2: Update syncing after splicing to match generic version ntfs: Use new syncing helpers and update comments ext4: Remove syncing logic from ext4_file_write ext3: Remove syncing logic from ext3_file_write ext2: Update comment about generic_osync_inode vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode vfs: Rename generic_file_aio_write_nolock ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() vfs: Export __generic_file_aio_write() and add some comments vfs: Introduce filemap_fdatawait_range
2009-09-14Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Whitespace fixes GFS2: Remove unused sysfs file GFS2: Be extra careful about deallocating inodes GFS2: Remove no_formal_ino generating code GFS2: Rename eattr.[ch] as xattr.[ch] GFS2: Clean up of extended attribute support GFS2: Add explanation of extended attr on-disk format GFS2: Add "-o errors=panic|withdraw" mount options GFS2: jumping to wrong label? GFS2: free disk inode which is deleted by remote node -V2 GFS2: Add a document explaining GFS2's uevents GFS2: Add sysfs link to device GFS2: Replace assertion with proper error handling GFS2: Improve error handling in inode allocation GFS2: Add some more info to uevents GFS2: Add online uevent to GFS2
2009-09-14Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Fix possible corruption when close races with write udf: Perform preallocation only for regular files udf: Remove wrong assignment in udf_symlink udf: Remove dead code
2009-09-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (21 commits) fs/Kconfig: move nilfs2 outside misc filesystems nilfs2: convert nilfs_bmap_lookup to an inline function nilfs2: allow btree code to directly call dat operations nilfs2: add update functions of virtual block address to dat nilfs2: remove individual gfp constants for each metadata file nilfs2: stop zero-fill of btree path just before free it nilfs2: remove unused btree argument from btree functions nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free nilfs2: shorten freeze period due to GC in write operation v3 nilfs2: add more check routines in mount process nilfs2: An unassigned variable is assigned to a never used structure member nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT nilfs2: stop using periodic write_super callback nilfs2: clean up nilfs_write_super nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs nilfs2: remove redundant super block commit nilfs2: implement nilfs_show_options to display mount options in /proc/mounts nilfs2: always lookup disk block address before reading metadata block nilfs2: use semaphore to protect pointer to a writable FS-instance nilfs2: fix format string compile warning (ino_t) ...
2009-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: consolidate reconnect logic in smb_init routines cifs: Replace wrtPending with a real reference count cifs: protect GlobalOplock_Q with its own spinlock cifs: use tcon pointer in cifs_show_options cifs: send IPv6 addr in upcall with colon delimiters [CIFS] Fix checkpatch warnings PATCH] cifs: fix broken mounts when a SSH tunnel is used (try #4) [CIFS] Memory leak in ntlmv2 hash calculation [CIFS] potential NULL dereference in parse_DFS_referrals()
2009-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
2009-09-14udf: Fix possible corruption when close races with writeJan Kara
When we close a file, we remove preallocated blocks from it. But this truncation was not protected by i_mutex and thus it could have raced with a write through a different fd and cause crashes or even filesystem corruption. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Perform preallocation only for regular filesJan Kara
So far we preallocated blocks also for directories but that brings a problem, when to get rid of preallocated blocks we don't need. So far we removed them in udf_clear_inode() which has a disadvantage that 1) blocks are unavailable long after writing to a directory finished and thus one can get out of space unnecessarily early 2) releasing blocks from udf_clear_inode is problematic because VFS does not expect us to redirty inode there and it also slows down memory reclaim. So preallocate blocks only for regular files where we can drop preallocation in udf_release_file. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Remove wrong assignment in udf_symlinkJan Kara
Recomputation of the pointer was wrong (it should have been just increment). Luckily, we never use the computed value. Remove it. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Remove dead codeJan Kara
Remove code that gets never used. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14fsync: wait for data writeout completion before calling ->fsyncChristoph Hellwig
Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out the data, the calls into ->fsync to write out the metadata and then finally calls filemap_fdatawait to wait for the data I/O to complete. What sounds like a clever micro-optimization actually is nast trap for many filesystems. For many modern filesystems i_size or other inode information is only updated on I/O completion and we need to wait for I/O to finish before we can write out the metadata. For old fashionen filesystems that instanciate blocks during the actual write and also update the metadata at that point it opens up a large window were we could expose uninitialized blocks after a crash. While a few filesystems that need it already wait for the I/O to finish inside their ->fsync methods it is rather suboptimal as it is done under the i_mutex and also always for the whole file instead of just a part as we could do for O_SYNC handling. Here is a small audit of all fsync instances in the tree: - spufs_mfc_fsync: - ps3flash_fsync: - vol_cdev_fsync: - printer_fsync: - fb_deferred_io_fsync: - bad_file_fsync: - simple_sync_file: don't care - filesystems/drivers do't use the page cache or are purely in-memory. - simple_fsync: - file_fsync: - affs_file_fsync: - fat_file_fsync: - jfs_fsync: - ubifs_fsync: - reiserfs_dir_fsync: - reiserfs_sync_file: never touch pagecache themselves. We need to wait before if we do not want to expose stale data after an allocation. - afs_fsync: - fuse_fsync_common: do the waiting writeback itself in awkward ways, would benefit from proper semantics - block_fsync: Does a filemap_write_and_wait on the block device inode. Because we now have f_mapping that is the same inode we call it on in vfs_fsync. So just removing it and letting the VFS do the work in one go would be an improvement. - btrfs_sync_file: - cifs_fsync: - xfs_file_fsync: need the wait first and currently do it themselves. would benefit from doing it outside i_mutex. - coda_fsync: - ecryptfs_fsync: - exofs_file_fsync: - shm_fsync: only passes the fsync through to the lower layer - ext3_sync_file: doesn't seem to care, comments are confusing. - ext4_sync_file: would need the wait to work correctly for delalloc mode with late i_size updates. Otherwise the ext3 comment applies. currently implemens it's own writeback and wait in an odd way, could benefit from doing it properly. - gfs2_fsync: not needed for journaled data mode, but probably harmless there. Currently writes back data asynchronously itself. Needs some major audit. - hostfs_fsync: just calls fsync/datasync on the host FD. Without the wait before data might not even be inflight yet if we're unlucky. - hpfs_file_fsync: - ncp_fsync: no-ops. Dangerous before and after. - jffs2_fsync: just calls jffs2_flush_wbuf_gc, not sure how this relates to data. - nfs_fsync_dir: just increments stats, claims all directory operations are synchronous - nfs_file_fsync: only writes out data??? Looks very odd. - nilfs_sync_file: looks like it expects all data done, but not sure from the code - ntfs_dir_fsync: - ntfs_file_fsync: appear to do their own data writeback. Very convoluted code. - ocfs2_sync_file: does it's own data writeback, but no wait. probably needs the wait. - smb_fsync: according to a comment expects all pages written already, probably needs the wait before. This patch only changes vfs_fsync_range, removal of the wait in the methods that have it is left to the filesystem maintainers. Note that most filesystems really do need an audit for their fsync methods given the gems found in this very brief audit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()Jan Kara
Remove these three functions since nobody uses them anymore. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14fat: Opencode sync_page_range_nolock()Jan Kara
fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the only user of generic_osync_inode() which does not have a file open. So opencode needed actions for FAT so that we can convert generic_osync_inode() to a standard syncing path. Update a comment about generic_osync_inode(). CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()Jan Kara
Christoph Hellwig says that it is enough for XFS to call filemap_write_and_wait_range() instead of sync_page_range() because we do all the metadata syncing when forcing the log. CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>