Optimization-unstable code
Compilers can be tricky beasts, especially where optimizations are concerned. A recent paper [PDF] from MIT highlighted some of the problems that can be caused by perfectly legitimate—if surprising—optimizations, some that can lead to security vulnerabilities. The problem stems from C language behavior that is undefined by the standard, which allows compiler writers to optimize those statements away.
Andrew McGlashan raised the issue on the debian-security mailing list, expressing some surprise that the topic hadn't already come up. The paper specifically cites tests done on the Debian "Wheezy" (7.0) package repository, which found that 40% of 8500+ C/C++ packages have "optimization-unstable code" (or just "unstable code"). That does not mean that all of those are vulnerabilities, necessarily, but they are uses of undefined behavior—bugs, for the most part.
The unstable code was found using a static analysis tool called STACK that was written by the authors of the paper, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. It is based on the LLVM compiler framework and checks for ten separate undefined behaviors. Since C compilers can assume that undefined behavior is never invoked by a program, the compiler can optimize the undefined behavior away—which is what can lead to vulnerabilities.
So, what kind of undefined behavior are we talking about here? Two of the examples given early in the paper help to answer that. The first is that overflowing a pointer is undefined:
char *buf = ...;
unsigned int len = ...;
if (buf + len < buf) /* overflow check */
...
The compiler can (and often does, depending on the -O setting)
optimize the test away. On some architectures, according to the paper,
that's no great loss as the test doesn't work. But
on other architectures, it does protect against a too
large value of len. Getting rid of the test could lead to
a buffer overflow ... and buffer overflows can often be exploited.
The second example is a null pointer dereference in the Linux kernel:
struct tun_struct *tun = ...;
struct sock *sk = tun->sk;
if (!tun)
return POLLERR;
/* write to address based on tun */
Normally that code would cause a kernel oops if tun is null, but
if page zero is mapped for some reason, the code is basically harmless—as
long as the test remains. Because the compiler sees the dereference
operation, it can conclude that the pointer is always non-null and remove
the test entirely, which turns a fairly innocuous bug into a potential
kernel exploit.
Other undefined behaviors are examined as well. Signed integer overflow, division by zero, and oversized shifts are flagged, for example. In addition, operations like an overlapping memcpy(), use after free()/realloc(), and exceeding array bounds are checked.
The Debian discussion turned toward how to find and fix these kinds of bugs but, of course, they mostly or completely live upstream. As Mark Haase put it:
But Paul Wise noted that there is some ongoing work by Debian and Fedora developers to package static checkers for the distributions. STACK is on the list, he said, but relies on a version of LLVM that is not yet available for Debian. He recommended that interested folks get involved in those efforts and offered a list of links to get started.
There were some who felt the optimizations removing the unstable code were
actually compiler bugs. Miles Fidelman suggested the problem needed to be fixed
"WAY upstream
" in GCC itself: "if gcc's optimizer is opening a
class of security holes - then it's gcc that has to be fixed
". But
Haase was quick to throw cold water on that
idea, noting a GCC bug and an
LLVM blog
post series that pretty clearly show that compiler writers do not see
these kinds of optimizations as bugs. Haase said:
The problem for programmers is a lack of warnings about these kinds of
undefined constructs, Wise said. "Every use of undefined behaviour should
at minimum result in a compiler warning.
" But even doing that is
difficult (and noisy), Wade Richards said:
Joel Rees would like to see the standard
rewritten "to encourage sane behavior in
undefined situations
". Defining "sane" might be somewhat difficult,
of course.
Bernhard R. Link had a different suggestion:
Bugs in our code—many of which lead to security holes—are a never-ending problem, but over time we do at least seem to be getting some tools to assist in finding them. Given that different compilers, optimization levels, and compiler versions will give different behavior for this particular class of bugs makes them even harder to find. STACK seems like a good solution there—thankfully it is open source, unlike some other static analysis tools.
| Index entries for this article | |
|---|---|
| Security | Static analysis |