Linux and automotive computing security

Posted Oct 10, 2012 22:51 UTC (Wed) by SLi (subscriber, #53131)
Parent article: Linux and automotive computing security

I am a Linux geek, and I work with safety critical systems (mostly safety critical methodology research). I think anyone who thinks Linux, or for that matter Windows or any other operating system most people here would have heard of, could run in a safety-critical setting in the foreseeable future simply misunderstands the nature of safety critical systems.

The first thing to understand is that safety is not the same as security. Most of this article talks about security. Security can affect safety, and certainly the safety critical industry should take it better into account, but it's only a very small part of the story. Also, many of the attacks mentioned in this paper do not concern safety at all. For example, someone being able to steal your car is not a safety concern. Safety concerns are exactly those that can lead to bodily harm or death of someone operating the system or other people.

As an example, a window closing mechanism in a car might be considered very slightly safety critical if it would be possible for it to chop off a finger if it malfunctions. Brakes would have a higher safety criticality level, since a car moving at a high speed without functional brakes can cause the death of not only the driver but also other people. The car stereos would generally be considered non-critical, but to certify the entire system, you will have to show sufficient separation between critical and non-critical systems, and also between less critical and more critical systems.

There are generally certain requirements that regulators require for safety critical code. Generally any code to be run in a safety critical context needs to be developed with an extraordinarily thorough and rigorous process. The entire process must be well documented, starting from the design, but also encompassing coding, testing, etc. There is a good rationale for this: Testing will never find all your bugs. Rigorous design and such things won't find them all either, but it won't hurt. The point is this: Safety is something that you need to build in from the beginning; you just cannot add it later.

This necessarily results in safety critical code being comparatively speaking very simple. The requirements also become more stringent when the criticality level (for example, SIL levels, where 1 is the least critical, such as car windows, and 4 would be the most critical, such as aeroplane control systems and nuclear reactors) rises. I would be surprised if there are very many high-criticality systems as complex as a TCP/IP stack, let alone the Linux kernel. You can run them in the car stereos, though.

Also, barring some very significant advance in program verification, the Linux kernel can never be even tested to the level required. Generally the lowest levels of criticality require things like a test suite with 100% coverage. To see the kind of testing required for higher levels, take a look at, for example, Modified Condition/Decision coverage (or see below). The only open source piece of software I know claims to have 100% MC/DC coverage is SQLite, and even they misunderstand it and basically only have plain Condition/Decision Coverage.

One of the requirements in MC/DC is that, for each source code level branching condition (boolean expression), you need to separately show (with tests in your test suite) for each subexpression that there is a pair of inputs where that expression only differs in the value of that subexpression such that different branches are taken. That is, if you have a condition: if (a || (b && c)) { ... } else { ... }

you will have to write the following tests:

A test that makes a true (e.g. a=1, b=0, c=0, branch taken=then)
A test that makes a false with the same b and c as in the above test AND takes the other branch (e.g. a=0, b=0, c=0, branch taken=else)
A test that makes b true (e.g. a=0, b=1, c=1, branch taken=then)
A test that makes b false with the same a and c as in the above test AND takes the other branch (e.g. a=0, b=0, c=1, branch taken=else)
A test that makes c true (the above a=0, b=1, c=1, branch taken=then test suffices for this)
A test that makes c false with the same a and b as in the above test AND takes the other branch (e.g. a=0, b=1, c=0, branch taken=else)
A test that makes (b && c) true (e.g. a=0, b=1, c=1, branch taken=then)
A test that makes (b && c) false with the same a as in the above test AND takes the other branch (e.g. a=0, b=0, c=1, branch taken=else)

I hope you are starting to see the hopelessness of testing Linux, or basically any other piece of code that either is moderately large or not designed from the beginning to be so tested, to such a strict standard...

Note that the safety critical people do not claim this process is perfect. It just is a process that results in a lot of eyeballs staring at the code, the specification and the test cases and thinking about them and testing them from nearly every possible angle imaginable. It still happens that there are bugs, but they are certainly much more rare than bugs in Linux kernel :)

to post comments

Linux and automotive computing security

Posted Oct 11, 2012 1:44 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (6 responses)

Thanks for the interesting explanation of the development process behind safety-critical systems. Would it be safe to say that for these systems, the majority of the actual effort is expended on writing testcases?

Linux and automotive computing security

Posted Oct 11, 2012 8:16 UTC (Thu) by hickinbottoms (subscriber, #14798) [Link] (1 responses)

Being involved in this world as well I can say that whilst testing is a considerable part of the process (the back-end of the development model, if you like), the majority of the effort lies in the front-end during and before the design phase.

You can't design a safety-critical system without knowing what the safety requirements are, and they're often harder to identify than you imagine. For example a hypothetical brake-control system might have a safety requirement that the brakes are applied within X ms of being commanded, with Y reliability, which is a fairly easy requirement to spot. Slightly harder is that it's also likely to be potentially hazardous for the brakes to be applied when not commanded, so you need to spot that and engineer the requirements appropriately -- there have been aircraft losses during landing for such failures if my memory serves me correctly.

It's this identification of the requirements and the associated safety analysis process involving tools such as fault trees, event trees, FMEA/FMECA, hazard analysis/logs, SIL analysis etc that makes safety-critical development really hard and expensive. It is, however, critical to get this right before diving into coding and testing since as we know changing the requirements of systems after they're built is difficult and often leads to unexpected behaviours being implemented. The high-integrity world is littered with examples of failures caused by changed requirements or systems being used to fulfil requirements that were never identified.

Because the resulting design of the system is heavily-influenced by the requirements analysis that got you there it's also very difficult to make a convincing safety case and retrospectively develop a safety substantiation for a system that hasn't been designed that way from the outset.

As the parent poster says, you can't stop non-trivial software from having bugs and crashing, but you can build a confident argument that such failure cannot lead to a hazardous condition with an intolerable frequency. The safety analysis process lets you make such statements with evidence.

It's always a little disappointing that at the end of the day you just end up with 'normal-looking' software that isn't somehow magical and better -- but it's the confidence that it's more likely to do what's expected and that when it doesn't it can't lead to situations you've not at least considered that's important.

Linux and automotive computing security

Posted Oct 11, 2012 15:01 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link]

You can't design a safety-critical system without knowing what the safety requirements are, and they're often harder to identify than you imagine.

Yes, and in this case, it turns out that one of the things the designers failed to identify is that they couldn't necessarily trust all of the other systems on the CAN. It's easy to understand why somebody might make that mistake, but the major thrust of the security researchers' article is that it is a mistake. Now they need to go back to the drawing board and design a better set of specifications for their networking component so it won't let the system be subverted by malicious messages.

Writing tests cases

Posted Oct 11, 2012 11:57 UTC (Thu) by man_ls (guest, #15091) [Link] (1 responses)

Would it be safe to say that for these systems, the majority of the actual effort is expended on writing testcases?

I hope that, in this day and age, the effort on writing and running test cases for any non-trivial system is the majority of the coding effort! In a recent interview Kernighan says that in his classes:

I also ask them to write tests to check their code, and a test harness so the testing can be done mechanically. These are useful skills that are pretty much independent of specific languages or environments.

Given that tests should be about half the size of the system (for a big system), and that they are run repeatedly, they should take the majority of the coding effort. For critical systems this amount should be probably higher.

I am just speaking about coding, but obviously it is not the only development activity. I am not surprised to learn from the above poster that analysis and design take even longer than coding.

Writing tests cases

Posted Oct 18, 2012 18:22 UTC (Thu) by TRauMa (guest, #16483) [Link]

Then again, nobody pays for test cases unless regulations force them to. :(

Linux and automotive computing security

Posted Oct 11, 2012 14:57 UTC (Thu) by ortalo (guest, #4654) [Link] (1 responses)

Certainly. And in some cases, manual coding in a conventional language is even nearly prohibited: code is generated from the specification. (With the testcases, the timing calculations, etc.) And even in this case, the testing effort is paramount.

Linux and automotive computing security

Posted Oct 11, 2012 15:00 UTC (Thu) by ortalo (guest, #4654) [Link]

The last line of the above comment disappeared mysteriously. It was:

But is that enough for security (!= safety)?

Linux and automotive computing security

Posted Oct 11, 2012 13:03 UTC (Thu) by etienne (guest, #25256) [Link] (5 responses)

May I ask, which part of your build chain do you trust? I.e:
- Do you test the source code (and so trust the compiler); then you can reuse that unmodified and tested source code from other parts of the software, you do let the compiler optimise (inline function calls).
- Do you test the libraries (and so trust only the linker); then you can call any function of that library from other parts of the software.
- Do you test the hexadecimal code (and so only test the hardware, i.e. FLASH + processor + memory) then it is really difficult to do that sort of every "if" fully checked...

Linux and automotive computing security

Posted Oct 11, 2012 20:37 UTC (Thu) by SLi (subscriber, #53131) [Link]

Generally speaking the compiler used also needs to be certified to the same safety critical standard, or else you will need to spend a nontrivial effort in showing equivalence of the source code and the machine code. As you can probably imagine, optimizing compilers are not used a lot :) The same goes for other parts of the toolchain that can affect the final output.

And the same goes for the microcontroller used. The hardware needs to be certified. You obviously also cannot use libraries that are not certified to the same standard. Though we're mostly talking about small microcontrollers anyway; generally everything is always linked in statically.

There are ways to incorporate complexity without doing it in safety critical code, though. Generally you develop as little safety-critical code as possible, and specify a simple interface over which it interfaces to non-critical code. Then you certify it with the argument that it will behave safely regardless of what input it gets from the non-trusted source (often you still don't need to consider adversarial situations). How this is accomplished really depends on the application: The simplest case is the one where you can ensure safety simply by shutting down the system in case of invalid input. For example, a nuclear reactor can be shut down, or some other heavy machine may be simply stopped (power cut). As you can imagine, this is not such a good solution for, say, aeroplanes. There the usual safe mode means falling back to manual operation.

Usually this separation also means separate hardware for the critical and noncritical parts. However, if you have a kernel certified to a certain level, where the certification is for noninterference of non-critical processes with critical processes, you might be able to run critical and noncritical tasks on the same microcontroller. In practice this is hard to do, as the kernel is a complex piece of software to develop.

I hear there all kinds of crazy hardware solutions for this, especially in the automotive industry, as profit margins drive developers towards single-chip solutions. Like microcontrollers where every other instruction has access to some privileged memory areas and operations, and the others do not. Thus you can get a very simple kind of separation of trusted and untrusted code without a full-blown MMU (or even without a full MPU) and without a page table.

Linux and automotive computing security

Posted Oct 16, 2012 18:13 UTC (Tue) by Baylink (guest, #755) [Link] (3 responses)

Let us reflect...

http://lwn.net/Articles/372224/

(Alas, it appears that DejaGoogle now *requires* a login even to read news articles; shame that hasn't garnered more complaint)

Linux and automotive computing security

Posted Oct 17, 2012 9:28 UTC (Wed) by njwhite (guest, #51848) [Link] (2 responses)

> Alas, it appears that DejaGoogle now *requires* a login even to read news articles

In a characteristically sneaky way, that's only half true. It requires a login if you have a google cookie so they reckon you *have* a google login. Otherwise they let you through with no problem. (I haven't tested this extensively, it's just what seems to be happening in my experience. It's possible it's also location based, as I generally use tor so would expect to see inconsistent behaviour if they were doing that.)

Linux and automotive computing security

Posted Oct 17, 2012 17:01 UTC (Wed) by Baylink (guest, #755) [Link]

That. Is.

Evil.

Linux and automotive computing security

Posted Oct 18, 2012 18:26 UTC (Thu) by TRauMa (guest, #16483) [Link]

It is the same for public google docs documents. While I understand the rationale (logging in usually gives you more actions on the resources you see, in this case the option to copy the document to your account), making this explicit and providing a "proceed without login" button wouldn't have hurt - not doing this shows that the real motivation seems to be to get people to log in as often and long as possible to get better data tracking (on the other hand, if you have a google cookie, the tracked data will be high quality anyway).

Linux and automotive computing security

Posted Oct 12, 2012 13:37 UTC (Fri) by peter-b (guest, #66996) [Link] (1 responses)

> I would be surprised if there are very many high-criticality systems as complex as a TCP/IP stack, let alone the Linux kernel.

As I mentioned in another post, there's a very good counter-example: avionics subsystems within the SpaceX Falcon 9 launch vehicle all communicate over TCP/IP. And the Dragon capsule that docked at the ISS this week has avionics that run the Linux kernel exclusively. I think it would be fair to say that those more-or-less *define* "high-criticality"! ;-)

Linux and automotive computing security

Posted Oct 22, 2012 12:47 UTC (Mon) by pflugstad (subscriber, #224) [Link]

Yes, but they probably didn't need to get FAA DO-178 certification (or whatever the equivalent is for automobiles or health care instruments, etc).

Linux and automotive computing security

Posted Oct 13, 2012 2:02 UTC (Sat) by giraffedata (guest, #1954) [Link]

someone being able to steal your car is not a safety concern

I don't mean to take anything away from the conclusion, but I thought I'd point out that some people consider someone being able to steal a car to be a safety concern.

The US agency empowered to restrict, for safety reasons, what kinds of cars can be built (NHTSA, I believe) requires many anti-theft features, such as locking steering wheels. It claims this authority based on statistics that show cars are driven significantly more carefully by their owners than by their thieves. Stolen cars are especially more likely to be in high speed chases with the police that end in bloody crashes.