More from the testing and fuzzing microconference
A lot was discussed and presented in the three hours allotted to the Testing and Fuzzing microconference at this year's Linux Plumbers Conference (LPC), but some spilled out of that slot. We have already looked at some discussions on kernel testing that occurred both before and during the microconference. Much of the rest of the discussion will be summarized below. As it turns out, a discussion on the efforts by Intel to do continuous-integration (CI) testing of graphics hardware and drivers continued several hundred miles north the following week at the X.Org Developers Conference (XDC); that will be covered in a separate article.
Fuzzers
Two fuzzer developers, Dave Jones and Alexander Potapenko, discussed the fuzzers they work on and plans for the future. It was, in some sense, a continuation of the fuzzer discussion at last year's LPC.
Potapenko represented the syzkaller project, which is a coverage-guided fuzzer for the kernel. The project does more than simply fuzzing though, as it includes code to generate programs that reproduce crashes it finds as well as scripts to set up machines and send email for failures. It "plays well" with the Kernel Address Sanitizer (KASAN), runs on 32 and 64-bit x86 and ARM systems, and has support for Android devices, he said.
Jones noted that his Trinity project is a system-call fuzzer for Linux that is "dumber than syzkaller". It does not use coverage to guide its operation; in some ways it is "amazing that it still finds new bugs." Over the last year, logging over UDP has been added, as has support for the MIPS architecture ("someone got excited"). There have been lots of contributions from others in the community over the last year, he said.
Sasha Levin, one of the microconference leads, asked both what the next big feature for their fuzzer would be. Potapenko said that tracking the origin of values that are used in comparisons in functions is being worked on. The idea is to allow syzkaller to find ways to further the coverage by reversing the sense of the comparisons to take new paths.
For Trinity, Jones plans to explore BPF programs more. He wants to feed in "mangled BPF programs" to see what happens. There is limited support in Trinity for BPF fuzzing currently; it has only found two bugs, he said. Steven Rostedt suggested stressing the BPF verifier, which will require something more than simply random programs. Jones said that Trinity uses Markov chains to create the programs, but that it is "still a little too dumb".
Rostedt asked about the reproducer programs for problems that the fuzzers find; he wondered if those should be sent to the maintainers to be added to the tests they run. Or perhaps they should get added to the kernel self-tests, he said. Greg Kroah-Hartman agreed that the programs could be useful, but some of them cause a "splat" in the logs, which might make them difficult to integrate into the failure checking of the self-tests or the Linux test project.
Something that is missing, Jones said, is for the fuzzers to be run regularly. If that is not done, various problems will sneak through and end up in kernel releases. "We still see really dumb stuff", like not checking for null pointers, ending up in the mainline. Those kinds of bugs "should not hit the tree", he said, but should be caught far earlier. Running Trinity and syzkaller on the linux-next tree could be done, but it is difficult to run the kernel using that tree. That tree is "not really testable", Daniel Vetter said, because it tends to be broken fairly often. The Intel CI system for graphics can use the linux-next tree, but only because there are a bunch of "fixup patches" that get applied.
Levin asked about getting distributions involved in fuzzing. He wondered if there were ways to make it easier for distribution kernel maintainers to run the fuzzers on their kernels. He suggested a disk image that could be run in a virtual machine (VM); that would help getting more people running the fuzzers, he said. Potapenko said that there is infrastructure available to set up a few VMs to run syzkaller, but that at least two physical machines are needed. The fuzzer causes crashes, so it is best to have a separate master machine that supplies parameters to the workers.
With a grin Jones said that he "got out of the distro building game" and was not planning to get back in. Trinity is currently run as part of the Fedora test suite, but it is somewhat destructive so it is the last thing that gets run. There are going to be Fedora kernel test days for each Linux release, he said; ISO files are generated for those tests. He had not thought about adding fuzzers into that image, but it would be good to do so.
At the end, Levin asked how to get more people and companies working on fuzzers. Potapenko said that it is simple to contribute to syzkaller and he would like to see more subsystem maintainers help with the code to exercise their system calls. Jones said there is plenty of low-hanging fruit for things to be added to Trinity; as an experiment, he did not add support for certain system calls to see if someone else would, but so far that has not happened.
ktest
The ktest tool that has been in the kernel source tree for some time now was the subject of the next, fairly brief talk. Steven Rostedt, who wrote the Perl script, wanted to get information about it into more hands; he is often surprised how many people have not heard of it. It is meant to automate testing of kernels, but it does a fair amount more than that. Rostedt said that these days he rarely uses the make command for kernel builds and installs; he uses ktest to handle all of that for him.
One of the main ways he uses it is to check a patch series from a developer in the subsystems he maintains. He wants to ensure that the series is bisectable, among other things. Before developing ktest, he would apply a patch series, one patch at a time; he would then build, boot, and test the kernels built. That was a time-consuming process.
His test setup consists of systems with a remote power switch capability, as well as a means to read the output of the boot. That can all be controlled by ktest to build, install, boot, and run tests remotely. His test suite takes 13 hours to run on a single system as it uses multiple kernel configurations. Once he could do that, he started adding more features to ktest, including bisection, reverse bisection, configuration bisection, and more.
Dhaval Giani, the other microconference lead, noted that he has found ktest to be a good test harness. He uses it to run fuzzers on various test systems and configurations. Rostedt concluded his talk by saying that he mostly just wanted more to be aware of the tool: "ktest exists", he said with a chuckle.
Kernel unit testing
Knut Omang wanted to look at ways to add more unit testing to the kernel. He has created the Kernel Test Framework (KTF) to that end. It is a kernel module that adds some basic functionality for unit testing; he would like to see the kernel have the same unit-testing capabilities that user space has. He has integrated KTF with Google Test (gtest) as the user-space side of the framework. It communicates with the kernel using netlink to query for available tests and then to selectively run one or more tests and collect their output.
Omang wants developers to "get hooked on testing". So he tried to come up with a test suite that developers will want to use. Testing costs less the closer it is done to the developers writing the code. He is an advocate of test-driven development (TDD), but acknowledged that it is not universally popular. The basic idea behind TDD is to write tests before writing the code, but there are a number of arguments that opponents make about it. Among the complaints are that writing good tests takes a lot of time, developers do not think of themselves as testers, and that writing test code is boring.
Behan Webster said that good testers have a different mindset than developers; testers are trying to break things, while developers are trying to make something work. Rostedt added that it is better if a developer doesn't write the tests because they have too much knowledge of how the code is supposed to work, so they will overlook things. Another attendee pointed out that "if you don't have tests, you don't have working code"; it may seem to be working, but it will break at some point.
There was also some discussion about how to do unit testing for components like drivers. A lot of code infrastructure is needed before anything in a driver works at all, which limits the testing that can be done earlier. Omang and others believe that the problem can be decomposed into smaller pieces that can be individually and separately tested, though some in the audience seemed skeptical of that approach.
Kernel memory sanitizer
Finding places where uninitialized memory is used is a potent way to find bugs, finding those places is what the KernelMemorySanitizer (KMSAN) aims to do. Alexander Potapenko described the tool, which has found a lot of bugs in both upstream kernels and in the downstream Google kernels. It stems from the user-space MemorySanitizer tool that came about in 2012.
The idea is to detect the use of uninitialized values in memory, not simply uninitialized variables. Those values could be used in a branch, as an index, be dereferenced, copied to user space, or written to a hardware device. KMSAN has found 13 bugs so far, though he thought another may have been found earlier in the day of the microconference (September 15). KMSAN requires building the kernel with Clang.
To track the state of kernel memory, KMSAN uses shadow memory that is the same size as the memory used by the kernel. That allows KMSAN to track initialization of memory at the bit level (i.e. it can detect that a single bit has been used but not initialized). KASAN uses a similar technique, but tracks memory at the byte level, so its shadow memory uses one byte for each eight bytes of kernel memory.
KMSAN obviously requires a lot of memory, so Levin wondered if there could be options that used less. Potapenko said that doing so creates a lot of false positives, so it is not worth it. He also noted that kmemcheck has found five bugs in the last five years, but that KMSAN runs 5-10 times faster so it finds more bugs.
In the future, KMSAN could be used for taint analysis by using the shadow mapping to track data coming from untrusted sources. It could also help fuzzers determine which function arguments are more useful to change and to track the origin of those values back to places where they enter the kernel. He pointed to CVE-2017-1000380, which was found by KMSAN, and wondered if there is a way to kill of all of the uninitialized memory bugs of that sort. Simply replacing calls to kmalloc() with kzalloc() may be tempting, but could be problematic.
KMSAN requires patches to Clang (and the ability to build the kernel with Clang). He hopes to see KMSAN added to the upstream kernel by the end of the year.
Conclusion
The kernel testing story is clearly getting better. There is still plenty to do, of course, but more varied and larger quantities of testing are being doneāmuch of it automatically. That is finding more bugs; with luck, it may evenutally outrun the kernel's development pace so that it is finding more bugs than are being added every day. Kernel development proceeds apace, it is important that testing gets out ahead of it and stays there.
[I would like to thank LWN's travel sponsor, The Linux Foundation, for
assistance in traveling to Los Angeles for LPC.]
| Index entries for this article | |
|---|---|
| Security | Fuzzing |
| Conference | Linux Plumbers Conference/2017 |