Would you like signs with those chars?
As a general rule, C integer types are signed unless specified otherwise; short, int, long all work that way. But char, which is usually a single byte on current machines, is different; it can be signed or not, depending on whatever is most convenient to implement on any given architecture. On x86 systems, a char variable is signed unless declared as unsigned char. On Arm systems, though, char variables are unsigned (unless explicitly declared signed) instead.
The fact that a char variable may or may not be signed is an easy
thing for a developer to forget, especially if that developer's work is
focused on a single architecture. Thus, x86 developers can get into the
habit of thinking of char as always being signed and, as a result,
write code that will misbehave on some other systems. Jason Donenfeld
recently encountered this sort of bug and, after fixing it, posted a
patch meant to address this problem kernel-wide. In an attempt to
"just eliminate this particular variety of heisensigned bugs
entirely
", it added the
-fsigned-char flag to the compiler command line, forcing the bare
char type to be signed across all architectures.
This change turned out to not be popular. Segher Boessenkool pointed
out that it constitutes an ABI change, and could hurt performance on
systems that naturally want char to be unsigned. Linus Torvalds
agreed,
saying that: "We should just accept the standard wording, and be aware
that 'char' has indeterminate signedness
". He disagreed, however, with
Boessenkool's suggestion to remove the -Wno-pointer-sign option
used now (thus enabling -Wpointer-sign warnings). That change
would enable a warning that results from the mixing of
pointers to signed and unsigned char types; Torvalds complained
that it fails to warn when using char variables, but produces
a lot of false positive warnings with correct code.
Later in the discussion, though, Torvalds wondered
whether it might be a good idea to nail down the signedness of
char variables after all — but to force them to be unsigned
by default rather than signed. That, he said, shouldn't generate worse
code on any of the commonly used architectures. "And I do think that
having odd architecture differences is generally a bad idea, and making the
language rules stricter to avoid differences is a good thing
".
Amusingly, he noted that, with this option, code like:
const unsigned char *c = "Subscribe to LWN";
will still, with the -Wpointer-sign option, generate a warning,
since a string constant pointer is still considered to be a bare
char *
type, which is then treated as being different from an explicit
unsigned char * type. "You *really* can't win this
thing. The game is rigged like
some geeky carnival game
".
Donenfeld saw
merit in the idea, even though he thinks that the potential to break
some code exists. He sent out a new
patch adding -funsigned-char to the compiler command line
to effect this change. He had suggested that it could perhaps be merged immediately,
given that there is time to fix any fallout before the 6.1 release, but
Torvalds declined
that opportunity: "if we were still in the merge window, I'd probably apply this,
but as things stand, I think it should go into linux-next and cook
there for the next merge window
". He added that any problems that
result from the change are likely to be subtle and to be in driver code
that isn't widely used across architectures. The core kernel code,
instead, has always had to work across architectures, so he does not
believe that problems will show up there.
So Donenfeld's patch is sitting in linux-next instead, waiting for the 6.2 merge window in December. That gives the community until late February to find any problems that might be caused by forcing bare char variables to be unsigned across all architectures supported by Linux. That is a fair amount of time, but it is also certainly not too soon to begin testing this change in as many different environments as possible. It is, after all, a fundamental change to the language in which the kernel is written; a lack of resulting surprises would, itself, be surprising.
One way to identify potential problems is to find the places where the
generated code changes when char is forced to be unsigned.
Torvalds has already made
some efforts in that direction, and Kees Cook has used a system
designed for checking reproducible builds to find a lot of changes.
Many of those changes will turn out to be harmless, but the only way to
know for sure is to actually look at them. Meanwhile, the posting of
one fix by Alexey Dobriyan has caused Torvalds to request
that the char fixes be collected into a single tree. As those
fixes accumulate, the result should be a sign of just how much
disruption this change is actually going to cause.
| Index entries for this article | |
|---|---|
| Kernel | Build system |