A desktop kernel wishlist

By Nathan Willis
October 29, 2014

GNOME developer Bastien Nocera recently shared a "wishlist" with the kernel mailing list, outlining a number of features that GNOME and other desktop-environment projects would like to see added to or enhanced in the kernel. In the resulting discussion, some of the wishlist items were subsequently crossed off, but many of the others sparked real discussion that could, in time, develop into mainline kernel features.

Nocera prefaced his list by saying that GNOME has had productive discussions with kernel developers in the past, and that the current list consisted of "items that kernel developers might not realise we'd like to rely on, or don't know that we'd make use of if merged." Most of these items fall into either one of two categories: power management or filesystem features, although there are a handful of others in the mix as well.

Less power

Under power management, Nocera listed native support for hybrid suspend (known as Intel Rapid Start on certain hardware), connected standby support, "a hibernation implementation that doesn't use the swap space", and several smaller items (such as establishing uniform semantics for screen-backlight settings and better documentation for managing USB power).

Hybrid suspend is a firmware-level feature intended to minimize the time require to restore a system from the suspended state. It works by initially suspending the system to RAM, but then writing the memory contents to disk (directly from the firmware) after a given time period and hibernating. Resume will be fast before the timeout expires, but the system will preserve its state regardless of how long it is suspended. Matthew Garrett wrote a patch implementing support for the feature in mid-2013, which Nocera subsequently made a push to merge upstream—a push that ultimately stalled. One of the critiques was that Intel's hybrid-suspend feature would eventually be made obsolete by connected standby, a newer idle-state power mode that allows certain background processes to continue. Garrett has also worked on an approach to connected standby.

John Stultz asked for clarification about one other power-management request: exporting the cause of a wake event. Nocera elaborated, saying that the goal was to be able to determine whether the machine was awakened by a user event or by a hardware event (such as the realtime clock alarm), in order to respond accordingly to different scenarios. The use case he cited was for user-space code to try and determine if it was a good time to run a previously scheduled backup: if the user woke the machine by opening the laptop lid, presumably it would be a bad time to start a lengthy backup process. If the wake was automatic, the backup should proceed.

Stultz, however, argued that reporting the cause of the wakeup would not truly satisfy the use case—in part because any number of wake events could take place (and, being asynchronous, could arrive in an unhelpful order) in between the kernel waking up and user-space code being able to run. But more importantly, as Zygo Blaxnell put it, which event was most recent is far less important than (for example) whether or not the user is actually using the machine—a fact that could be determined through other means, such as keyboard activity. Alan Cox, on the other hand, commented that, in the long term, most of the assumptions that go with current thinking about suspend, resume, and hibernation states may go away anyhow:

- There may be no such thing as suspend or resume, just make your code very well behaved on wakeup events, and closing unneeded devices/resources whenever it can.

- On/off is an extreme action rarely taken (feature parity with 1970s VAXen ;-) )

- The "blob with a lid" model of construction is no longer useful. Even a keyboarded device is quite likely have a removable keyboard.

Filesystem issues

The second large category of feature requests concerned filesystem features and the VFS layer. First, Nocera reported that inotify is not meeting the needs of desktop utilities in a number of ways. Performance on large directory structures consumes too many resources, file-creation notification requires watching an entire directory, and monitoring directories' file-renaming and removal events is expensive on large directories. These limitations impact the performance of filesystem indexers, backup tools, and programs that manage file "libraries" (e.g., music and video managers).

Sergey Davidoff of elementary OS elaborated on the subject, saying that desktop application developers are keen to move to the file-library concept (as used in music and video managers) in other application types as well. Presenting the user with the filesystem hierarchy, he said, is far less useful than intelligently tracking the relevant files and allowing the user to search and interact with them based on their metadata. fanotify, as both noted, lacks the proper level of detail, such as reporting rename and move events.

Nocera also asked for a way to propagate timestamp changes up a directory chain. That is, if a file located in /foo/bar/ changes, there would be a way to detect the change not only on the file itself, but also on /foo/bar and on /foo itself. He added, though, that simply updating the change time on the containing directories would clearly be the wrong solution, since it would break many programs.

In short, he said, user-space programs would benefit from a better file-change notification system—ideally one that would consolidate events and monitor a directory structure without re-crawling it periodically. The combination of improving fanotify and adding user-space glue might work, he said, as would adding the changelog features currently available in Btrfs and XFS to other filesystems.

Pavel Machek asked whether or not a adding a (hypothetical) recursive version of the mtime timestamp would be a possible solution. Davidoff replied with skepticism that it would work for monitoring online changes, but that monitoring Btrfs's changelog "more or less as it happens" would probably suffice. In particular, monitoring Btrfs changelogs on the fly could at least ensure that a fixed-size buffer would get the job done, as opposed to the unbounded memory that fanotify would require for the same task.

Miscellany

Nocera's final cluster of wishlist items is a bit of a grab bag. It includes a better user-space API for the industrial input/output (IIO) subsystem used for various sensors, a user-space helper for the out-of-memory (OOM) killer, a system call to poll whether any processes out of a set of processes have exited, and a variant of epoll_wait() that accepts an absolute time rather than a timeout.

As was the case with the other categories, some of these items sparked quick responses. Patrik Lundquist suggested that the desired epoll_wait() functionality could be achieved with timerfd(). To that, Nocera quoted the original source of the request, Ryan Lortie, who said that making a separate call to set up timerfd() on every instance of entering the kernel to sleep is cumbersome, and that "epoll in general suffers from being _way_ too chatty about the syscalls that you have to do." Andy Lutomirski added that he had implemented procfs polling several years ago, and would be willing to resume work on it if it were useful. procfs polling would allow a user to open a set of /proc directories corresponding to a chosen set of processes, then poll the directories to see if any exit.

As to the other suggestions, there has thus far been little reaction either way. Certainly a number of the wishlist items boil down to implementing friendlier user-space APIs. Nocera commented on both the IIO and wake-event reporting issues that the present-day interface of examining raw sysfs files is far from sufficient.

On the whole, though, the kernel community has clearly been receptive to these needs of desktop-environment projects. On the wishlist wiki page, Nocera likened the exercise to the "plumbers' wishlists" submitted to kernel developers by Kay Sievers, Lennart Poettering, and Harald Hoyer. The plumbers' wishlist approach, of course, was successful enough that the plumbers in question have since repeated it. That bodes well for Nocera's desktop wishlist.

It is fairly common to hear about new kernel work that is driven by the needs of either high-end data center users or embedded-system builders—perhaps simply because companies in those lines of business tend to hire kernel developers. Thus, is it always good to see that the kernel community is equally responsive to the needs of user-space developers working in other areas, when those developers take time to reach out with their concerns.

Index entries for this article
Kernel	Desktop support

to post comments

IIO? Shouldn't that be I2C?

Posted Oct 30, 2014 14:29 UTC (Thu) by tau (subscriber, #79651) [Link] (2 responses)

Sensors tend to use I2C or its extended subset SMBus.

IIO? Shouldn't that be I2C?

Posted Oct 30, 2014 21:45 UTC (Thu) by flussence (guest, #85566) [Link]

Scrolling down the Kernel page a bit I see an IIO patch for Android devices: https://lwn.net/Articles/618263/ - so, odd as it may seem, it may not be a typo. Certainly news to me too.

IIO? Shouldn't that be I2C?

Posted Nov 1, 2014 23:16 UTC (Sat) by hadess (subscriber, #24252) [Link]

It's IIO. I2C and SMBus are ways to connect to those sensors. Just in the same way that there is an input sub-system in Linux, but devices are connected via USB or Bluetooth.

A desktop kernel wishlist

Posted Nov 5, 2014 19:06 UTC (Wed) by pedrocr (guest, #57415) [Link]

Another potential FS/VFS feature would be to be able to enumerate the files that point to the same inode. That would make sync/backup solutions much more efficient at handling hardlinks.

There was a discussion around this and other issues on a hacker news post about copying a very large file structure:

https://news.ycombinator.com/item?id=8305283

A desktop kernel wishlist - Big folder copy

Posted Nov 6, 2014 5:49 UTC (Thu) by ratheesh (guest, #86439) [Link] (9 responses)

In s/w industry, we often create backup of work. We need to create backup every now and then and quickly go back to work. So two operations i often
do is "cp" and "rm". Both are time taking. I often type those commands and go out for a walk or chat. it is taking a lot time !!!!

I know that companies like netapp are doing a virtual copy. It means that whenever you copy a big folder, copy operation will be done in a second. This means that you will be viewing the parent folder only. The moment , we modify any of the file, a local copy will be created. It is same as COW. Why don't linux implement feature like this ?

A desktop kernel wishlist - Big folder copy

Posted Nov 6, 2014 7:35 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

It does, "man cp" and search for "reflink".

A desktop kernel wishlist - Big folder copy

Posted Nov 7, 2014 4:59 UTC (Fri) by quartz (guest, #37351) [Link]

BTRFS uses COW by default. I think they call them snapshots:
http://www.linux.com/learn/tutorials/767683-how-to-create...

A desktop kernel wishlist - Big folder copy

Posted Nov 18, 2014 18:07 UTC (Tue) by nix (subscriber, #2304) [Link] (6 responses)

This seems like you are using file copying as a poor man's version-control system. The right solution is to use a version-control system.

Myself, I'm a flaming paranoid who never wants to lose a single keystroke, whether I've saved or not. So I use a combination of:

- persistent undo-tree, which records every single change I ever make to anything in Emacs, even if later undone (well, to any buffer with a corresponding file). I erase these from cron after a month. This stops me losing stuff in files I don't later delete, though if I have a power loss I'll lose the last five minutes or so of unsaved work.

- git, for the actual source code. Obviously this never expires (unless I explictly go around deleting branches).

- git-wip with appropriate tying to Emacs via magit, so that every file save automatically does a git commit (with no log message) to a branch in a namespace reserved for wips. This stops me losing anything I've saved, even if I later delete the file, unless I delete the whole branch.

- RAID. (RAID 5, as it happens, because the other layers provide so much resilience that I consider the possibility of drive failures while degraded to be worth risking given the power costs for running more drives.)

- backups of my working trees with bup every three hours. (When combined with an as-yet-nonexistent version of 'bup index' that used btrfs snapshot diffs to determine what to back up, this could in theory take only *seconds* for each backup run, deduplication and all. Right now it takes longer because it needs to traverse the fs hierarchy to figure out what's changed). This doesn't expire -- yet -- but at the rate my backups are growing this will become a concern in approximately a decade, by which point I expect to have replaced the backup medium more than once.

I'm strongly tempted to trickle the bup backups out across the Internet for even more resilience.

Note what is not used here: cp of any variety. It involves too much manual management and it's too easy to accidentally blow away your own work (if you've never cp'ed the wrong way because you forgot which was the backup and which the original, you're better at avoiding typos than me).

A desktop kernel wishlist - Big folder copy

Posted Nov 23, 2014 21:59 UTC (Sun) by blujay (guest, #39961) [Link] (5 responses)

Thanks for the tips about persistent undo trees and git-wip. I already use undo-tree, but didn't realize it has a persistent mode. Will have to check it out.

A couple of suggestions, though you are probably already aware of them:

> though if I have a power loss I'll lose the last five minutes or so of unsaved work.

Shouldn't the regular Emacs autosave prevent this? In my org directory I have an automatic git-commit that runs, and it picks up the autosave files for me.

There's also an Emacs package called real-auto-save that does actual saves automatically every so often. Unfortunately it doesn't monitor your input, so it just blindly tries to save modified buffers, but for some files, like org-mode ones, it's very useful.

I've long thought about using bup, but it's inability to prune old backups makes me very leery, because some larger files would run into problems that way--so then I'd have to use a separate backup system for those, which makes the system more complicated. Do you use it only for source-code-type files?

BTW, have you heard of Obnam?

A desktop kernel wishlist - Big folder copy

Posted Nov 25, 2014 22:37 UTC (Tue) by nix (subscriber, #2304) [Link] (4 responses)

OK, I'll lose the last 120s of unsaved work due to autosave kicking in :)

real-auto-save is... more than slightly risky. Not *everything* I touch is in git: I don't want absolutely everything really saved unless I ask for it.

I used obnam for a while, but its insistence on rolling its own data structures means that corruption is still a problem years on, and it's *fearfully* slow -- by which I mean that it took more than a week to back up a filesystem that took bup a day, and takes twenty hours or so to do the sort of routine backup run that I run every night (it takes bup an hour to do the filesystem tree walk and on the order of three minutes to do the backup).

I back up *everything* in bup. OS, virtual machine disk images (once a week, providing --smaller=500M to other runs), $HOME including things like build trees and object files from build runs, everything. The only stuff I exclude is things like DVD rips which I can just re-rip on disk failure and which never change nor benefit from deduplication.

The dedup really does save your bacon in a lot of areas: e.g. I can flip between git branches and completely ignore that this will lead to the working directory changing over and over again: the branches have been flipped to before, so they're already in the backup, so bup will deduplicate them almost entirely. Most of my virtual machine images disappear: they're neither compressed nor encrypted, so the binaries on them have largely been seen in other VM images, and vanish. Renames of huge filesystem trees just get deduplicated out of existence; huge swathes of inode changes caused by things like cp -al get almost entirely deduplicated (the ctime change is recorded, but the content gets deduplicated). And of course an ordinary cp -a won't use any more space on the backup either.

As for not being able to expire backups, y'know, that's not really a problem. As I mentioned above, even backing all that stuff up, I estimate that I'll run out of space on my backup device in ten to twelve *years*. By that point the disk will have died of old age -- and I rotate them every so often anyway for offsite backups, so it's no trouble wiping one and doing a new full backup every decade or so!

btw, expiry is being worked on: several promising approaches have been posted to the bup list of late. Personally I'd not expect you'd have to wait too long for it. I just don't think it's all that useful... your backups fill up so much more slowly than they ever did with rdiff-backup and the like.

A desktop kernel wishlist - Big folder copy

Posted Nov 26, 2014 10:48 UTC (Wed) by jezuch (subscriber, #52988) [Link] (3 responses)

Speaking of bup... I agree that it's a marvelous piece of software, but the separate steps of indexing and actual backup are a hassle, and last time I investigated it it had problems with backing up metadata... so as a result of these two minor inconveniences and sheer inertia I'm still using my old set up ;) Has anything changed while I wasn't looking?

A desktop kernel wishlist - Big folder copy

Posted Nov 26, 2014 10:53 UTC (Wed) by mbunkus (subscriber, #87248) [Link]

I don't know what your exact problems were, but the README ( https://github.com/bup/bup/blob/master/README.md ) the following about meta data:

> 'bup save' and 'bup restore' have immature metadata support.
> On the plus side, they actually do have support now, but it's new, and not remotely as well tested as tar/rsync/whatever's. However, you have to start somewhere, and as of 0.25, we think it's ready for more general use. Please let us know if you have any trouble.
> Also, if any strip or graft-style options are specified to 'bup save', then no metadata will be written for the root directory. That's obviously less than ideal.

So if your previous experience was that meta data wasn't supported at all then it may be worth to take another look.

A desktop kernel wishlist - Big folder copy

Posted Nov 27, 2014 17:43 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Metadata support appears to be more or less flawless by this point. Normal POSIX metadata works; xattrs work (though you probably want to use trunk bup for that if you're using a big-endian host); ACLs work. I'm not aware of any other metadata it might need to save :)

The index is, to me, a nice idea simply because it means that rather than having to support umpty million different ways to select and exclude files like most backup programs do, it only needs to support a few, you can pick files additively via multiple 'bup index' runs and then do a 'bup save' once you're happy with it. (And in the future, I can imagine building the index from things like a btrfs snapshot diff -- i.e., instantly -- which would let you do things like run a backup every minute if you wanted to!)

Currently the index's *implementation* is far from ideal, which makes it less useful than it might be -- each index update requires a lengthy merge step, so additive index updates aren't very tempting, and files that are deleted are marked as deleted in the index but never actually removed from it even after the backup is run. Work is underway to fix things on that front, by making the index sqlite-backed rather than its existing rather ugly kludge. But this has no effect on the content of the backup or on the backup run's speed, only on the output of 'bup save -v', which never stops reporting things as deleted. (I 'fix' this by just deleting the index every few weeks and letting the next 'bup index' run rebuild it. It means the next 'bup save' run is slower, because it has to rescan everything and determine that, oh, it's a duplicate, rather than skipping it as unchanged, but the resulting backup is exactly the same as it would have been in any case.)

A desktop kernel wishlist - Big folder copy

Posted Nov 28, 2014 12:13 UTC (Fri) by jezuch (subscriber, #52988) [Link]

Thank you all for your answers! I'll definitely be taking another look at bup :)