Stratis: Easy local storage management for Linux
Stratis is a new local storage-management solution for Linux. It can be compared to ZFS, Btrfs, or LVM. Its focus is on simplicity of concepts and ease of use, while giving users access to advanced storage features. Internally, Stratis's implementation favors tight integration of existing components instead of the fully-integrated, in-kernel approach that ZFS and Btrfs use. This has benefits and drawbacks for Stratis, but also greatly decreases the overall time needed to develop a useful and stable initial version, which can then be a base for further improvement in later versions. As the Stratis team lead at Red Hat, I am hoping to raise the profile of the project a bit so that more in our community will have it as an option.
Why make Stratis instead of working on ZFS or Btrfs?
A version of ZFS, originally developed by Sun Microsystems for Solaris (now owned by Oracle), was forked for use on other platforms including Linux (OpenZFS). However, its CDDL-licensed code is not able to be merged into the GPLv2-licensed Linux source tree. Whether CDDL and GPLv2 are truly incompatible is a continuing subject for debate, but the uncertainty is enough to make some enterprise Linux vendors unwilling to adopt and support it.
Btrfs is also well-established, and has no licensing issues. It was the anointed Chosen One for years (and years) for many people, but it is a large project that duplicates existing functionality, with a potentially high cost to complete and support over the long term.
Red Hat ultimately made a choice to instead explore the proposed Stratis solution.
How Stratis is different
Both ZFS and Btrfs can be called "volume-managing filesystems" (VMFs). These combine the filesystem and volume-management layers into one. VMFs focus on managing a pool of storage created from one or more block devices, and allowing the creation of multiple filesystems whose data resides in the pool. This model of management has proven attractive to users, since it makes storage easier to use not only for basic tasks, but also for more advanced features that would otherwise be challenging to set up.
Stratis is also a VMF, but unlike the others, it is not implemented entirely as an in-kernel filesystem. Instead, Stratis is a daemon that manages existing layers of functionality in Linux — the device-mapper (DM) subsystem and the XFS non-VMF filesystem — to achieve a similar result. While these components are not part of Stratis per se (and can indeed be used directly or via LVM) Stratis takes on the entire responsibility for configuring, maintaining, and monitoring the pool's layers on behalf of the user.
Although there are drawbacks to forgoing total integration, there are benefits. The natural primary benefit is that Stratis doesn't need to independently develop and debug the many features a VMF is expected to have. Also, it may be easier to incorporate new capabilities more quickly when they become available. Finally, as a new consumer of these components, Stratis may participate in their common upstream development, sharing mutual benefit with the component's other users.
In addition to this main implementation difference, Stratis also makes some different design choices, based on the current state of technology. First, the widespread use of SSDs minimizes the importance of optimizing for access times on rotational media. If performance is important, SSDs should be used either as primary storage, or as a caching tier for the spinning disks. Assuming this is the case lets Stratis focus more on other requirements in the data storage tier. Second, embedded use and automated deployments are now the norm. A new implementation should include an API from the start, so other programs can also configure it easily. Lastly, storage is starting to become commoditized: big enough for most uses, and perhaps no longer something users want to actively manage. Stratis should account for this by being easy to use. Many people will only interact with Stratis when a problem arises. Poor usability feels even worse when the user is responding to a rare storage alert, and also may be worried about losing data.
Implementation
Stratis is implemented as a user-space daemon, written in the Rust language. It uses D-Bus to present a language-neutral API, and also includes a command-line tool written in Python. The API and command-line interface are focused around the three concepts that a user must be familiar with — blockdevs, pools, and filesystems. A pool is made up of one or more blockdevs (block devices, such as a disk or a disk partition), and then once a pool is created, one or more filesystems can be created from the pool. While the pool has a total size, each filesystem does not have a fixed size.
Under the hood
Although how the pool works internally is not supposed to be a concern of the user, let's look at how the pool is constructed.
Starting from the bottom of the diagram on the "internal view" side, the layers that manage block devices and add value to them are called the Backstore, which is in turn divided into data and cache tiers. Stratis 1.0 will support a basic set of layers, and then additional optional layers are planned for integration that will add more capabilities.
The lowest layer of the data tier is the blockdev layer, which is responsible for initializing and maintaining on-disk metadata regions that are created when a block device is added to a pool. Above that may go support for additional layers, such as providing detection of data corruption (dm-integrity), and providing data redundancy (dm-raid), with the ability to correct data corruption when used in tandem. This would also be where support for compression and deduplication, via the recently open-sourced (but not yet upstream) dm-vdo target, would sit. Since these reduce the available total capacity of the pool and may affect performance, their use will be configurable at time of pool creation.
Above this is the cache tier. This tier manages its own set of higher-performance block devices, to act as a non-volatile cache for the data tier. It uses the dm-cache target, but its internal management of blockdevs used for cache is similar to the data tier's management of blockdevs.
On top of the Backstore sits the Thinpool, which encompasses the data and metadata required for the thin-provisioned storage pool that individual filesystems are created from. Using dm-thin, Stratis creates thin volumes with a large virtual size and formats them with XFS. Since storage blocks are only used as needed, the actual size of a filesystem grows as data is stored on it. If this data's size approaches the filesystem's virtual size, Stratis grows the thin volume and the filesystem automatically. Stratis 1.0 will also periodically reclaim no-longer-used space from filesystems using fstrim, so it can be reused by the Thinpool.
Along with setting up the pool, Stratis continually monitors and maintains it. This includes watching for DM events such as the Thinpool approaching capacity, as well as udev events, for the possible arrival of new blockdevs. Finally, Stratis responds to incoming calls to its D-Bus API. Monitoring is critical because thin-provisioned storage is sensitive to running out of backing storage, and relieving this condition requires intervention from the user, either by adding more storage to the pool or by reducing the total data stored.
Challenges so far
Since Stratis reuses existing kernel components, the Stratis development team's two primary challenges have been determining exactly how to use them, and then encountering cases where the use of components in a new way can raise issues. For example, in implementing the cache tier using dm-cache, the team had to figure out how to use the DM target so that the cache device could be extended if new storage was added. Another example: Snapshotting XFS on a thin volume is fast, but giving the snapshot a new UUID so it can be mounted causes the XFS log to be cleared, which increases the amount of data written.
Both of these were development hurdles, but also mostly expected, given the chosen approach. In the future, when Stratis has proven its worth and has more users and contributors, Stratis developers could also work more with upstream projects to implement and test features that Stratis could then support.
Current Status and How to Get Involved
Stratis version 0.5 was recently released, which added support for snapshots and the cache tier. This is available now for early testing in Fedora 28. Stratis 1.0, which is targeted for release in September 2018, will be the first version suitable for users, and whose on-disk metadata format will be supported in future versions.
Stratis started as a Red Hat engineering project, but has started to attract community involvement, and hopes to attract more. If you're interested in development, testing, or offering other feedback on Stratis, please join us. For learning more about Stratis's current technical details, check out our Design document [PDF] on the web site. There is also a development mailing list.
Development is on GitHub, both for the daemon and the command-line tool. This is also where bugs should be filed. IRC users will find the team on the Freenode network, on channel #stratis-storage. For periodic news, follow StratisStorage on Twitter.
Conclusion
Stratis is a new approach to constructing a volume-managing filesystem whose primary innovation is — ironically — reusing existing components. This accelerates its development timeline, at the cost of foregoing the potential benefits of committing "rampant layering violations". Do the benefits ascribed to ZFS and Btrfs require this integration-of-implementation approach, or are these benefits also possible with integration at a slightly higher level? Stratis aims to answer this question, with the goal of providing a useful new option for local storage management to the Linux community.
| Index entries for this article | |
|---|---|
| GuestArticles | Grover, Andy |