btrfs and NILFS

[Posted June 19, 2007 by corbet]

Almost exactly one year ago, as the developers were discussing changes to the venerable ext3 filesystem, Andrew Morton was heard to say:

All that being said, Linux's filesystems are looking increasingly crufty and we are getting to the time where we would benefit from a greenfield start-a-new-one. That new one might even be based on reiser4 - has anyone looked? It's been sitting around for a couple of years.

Reiser4 looks like it may continue to sit around for a while yet. But that does not mean that there is no interest in the creation of interesting new filesystems. LogFS was discussed here in May, but it's not the only newcomer in the filesystem arena.

The most interesting new contender, perhaps, is btrfs, which was announced by Chris Mason on June 12. It is an entirely new filesystem intended for standard rotating storage with a number of interesting features. These include:

Btrfs is a fully extent-based filesystem, meaning that it can store large files far more efficiently than ext3 (the in-development ext4 filesystem has extent support). An extent-based filesystem does away with the long lists of pointers to the individual blocks contained within a file; instead, groups of contiguous blocks ("extents") are tracked together. The result is far less metadata overhead, especially with large files. For very small files, btrfs will store the file contents themselves within the extent structure, eliminating the need for a separate block allocation.
Filesystems can be split into "subvolumes," each of which has its own directory structure and disk quota. Subvolumes can be used to subdivide a btrfs filesystem, but there is another interesting use of them...
Btrfs can do snapshotting - freezing the state of the filesystem at any given time. Snapshots are just subvolumes; they become a separate, independent directory tree which can be navigated independently from the "live" filesystem. Interestingly, though, btrfs snapshots are also live, and can be modified after being taken and snapshotted as well.
Supporting subvolumes and snapshots forces a copy-on-write structure onto btrfs. If a given extent is written to, it will be copied and the new data written to the copy. Extents have reference counts; creating a snapshot, for example, will cause reference counts to be incremented. When an extent contained in both a snapshot and the "real" filesystem is modified, it will be copied for whatever subvolume is being changed but will remain in place, unchanged in the other. If the snapshot is eventually removed, all associated reference counts will be decremented and any unused extents will be reclaimed.
The subvolume and snapshot mechanism eliminates the need for a separate journaling feature. Changes to the filesystem can be made transactional simply by taking a snapshot which only lasts until the transaction completes.
This filesystem checksums everything - data and metadata both. As a result, it is able to detect many types of filesystem corruption on the fly.

Fast filesystem checking is also an important design goal for btrfs. The data and metadata are laid out in a way that allows the offline filesystem checker to read the disk in a nearly sequential manner. That should speed the process considerably; filesystem checking usually involves vast numbers of seek operations. Online filesystem checking is also in the plans, though it has not been implemented yet; once it is working, this feature could eliminate the need for separate, mount-time filesystem checks entirely.

This filesystem is in a very early state - not recommended for data which one might actually want to keep. There's not been a whole lot of benchmarking done, and, presumably, a lot of optimization work still to happen. For example, the entire filesystem is currently protected by a single mutex, a solution which is unlikely to perform well on those leading-edge 4096-processor systems. Little details - like not oopsing when the filesystem runs out of space, direct I/O, writing via mmap(), extended attributes, asynchronous I/O, and more - have yet to be taken care of. But btrfs has garnered a considerable amount of interest; if it lives up to its initial promise we could find ourselves using btrfs-based systems in the future.

(For more information, see the btrfs project page).

Another recently-announced filesystem is NILFS, which is now at version 2.0. NILFS is a log-structured filesystem, in that the storage medium is treated like a circular buffer and new blocks are always written to the end. These filesystems tend to do very well on benchmarks which measure write performance, since all writes go to a contiguous set of blocks; read performance is not always quite as good. Log-structured filesystems are often used for flash media since they will naturally perform wear-leveling; it would appear, however, that NILFS is not aimed at flash devices.

Instead, NILFS emphasizes snapshots. The log-structured approach is a specific form of copy-on-write behavior, so it naturally lends itself to the creation of filesystem snapshots. The NILFS developers talk about the creation of "continuous snapshots" which can be used to recover from user-initiated filesystem problems - those of the "rm -r" variety. NILFS claims scalability through 64-bit data structures, but, interestingly, support for the x86_64 architecture remains on the "TODO list." The filesystem does not yet have support for extents.

More information on NILFS can be found on nilfs.org.

Index entries for this article
Kernel	Btrfs
Kernel	Filesystems/Btrfs
Kernel	NILFS

to post comments

btrfs sounds mightly cool.

Posted Jun 21, 2007 12:16 UTC (Thu) by dion (guest, #2764) [Link] (3 responses)

Wow, it certainly sounds like btrfs got most of the features I've been wanting from ZFS (checksumming & snapshots), but with less of the license fuss.

All that's needed to gain coolness parity with ZFS is something like raid-z, unfortunately I don't see how that can be done without implementing it in the fs itself.

btrfs sounds mightly cool.

Posted Jun 22, 2007 13:13 UTC (Fri) by aglet (guest, #1334) [Link]

I wondered about the ZFS "license fuss" -- there's a good precis here: http://kerneltrap.org/node/8066

btrfs sounds mightly cool.

Posted Jun 23, 2007 6:26 UTC (Sat) by Tomasu (guest, #39889) [Link] (1 responses)

The part I like about ZFS is the way you can dynamically allocate
physical volume space to any logical volume (aka: filesystem) at any
time.

Nothing else does that as far as I know. All you get is LVM2, EVMS,
or "mdraid" none of which can dynamically resize the volume and
underlying filesystem on the fly, and EASILY. resizing an ext partition
is imo, too hard, and you _can't_ shrink an XFS filesyste. Function isn't
supported.

btrfs sounds mightly cool.

Posted Jun 24, 2007 22:16 UTC (Sun) by dlang (guest, #313) [Link]

how much of this is a limitation of the technology (like the inability to shrink XFS) and how much is just a need for better userspace tools (like easily being able to resize extX)

don't mix one with the other.

btrfs and NILFS

Posted Jun 21, 2007 13:23 UTC (Thu) by i3839 (guest, #31386) [Link]

There's also Chunkfs, though that seems more like a research project than a real filesystem at the moment.

btrfs and NILFS

Posted Jun 21, 2007 13:47 UTC (Thu) by Tet (subscriber, #5433) [Link] (1 responses)

I admit I haven't looked into it in any depth, but from skimming the btrfs feature list, it looks like they're mostly just reinventing vxfs, partial support for which is already in the mainline kernel.

btrfs and NILFS

Posted Jun 24, 2007 21:22 UTC (Sun) by k8to (guest, #15413) [Link]

Of course, a codebase is not just a set of features. Maybe some NIH is going on here, but maybe not.

NILFS looks quite good

Posted Jun 30, 2010 20:01 UTC (Wed) by nilfsguy (guest, #68159) [Link]

I just used NILFS (it's actually NILFS2) as filesystem for my USB-stick and it looks quite good - it's fast and didn't have problems (yet) with file corruption (opposed to BtrFS). I am using the kernel 2.6.34.
Here the measurements I did and some tips to use it as rootFS:
http://www.blah-blah.ch/Mra/Nilfs2performance