Forcing the compiler and CPU to execute assignment statements in order ?
Forcing the compiler and CPU to execute assignment statements in order ?
Posted Jan 1, 2008 23:10 UTC (Tue) by PaulMcKenney (✭ supporter ✭, #9624)In reply to: Forcing the compiler and CPU to execute assignment statements in order ? by Brenner
Parent article: What is RCU, Fundamentally?
You are welcome!
Your question about ordering is a good one, and code reordering by both CPU and compiler can be grossly counter-intuitive at times. So an explanation is indeed in order. Let's start with the compiler. The code was as follows:
11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p;
It is possible that the compiler holds the value of gp in a register, but that it will need to spill that value in order to generate the code for lines 11-14. What to do? Well, as long as the compiler can prove that p and gp do not point to overlapping regions of memory, the compiler is within its rights to generate the code for line 14 before generating the code for the other lines. In this case, rearranging the code in this manner reduces code size, probably increases performance, and hurts no one. That is, it hurts no one in the absence of concurrency. However, please keep in mind that the C standard currently does not allow concurrency, strange though that may seem to those of us who have been writing concurrent C programs for decades.
So special non-standard compiler directives (barrier() in the Linux kernel, or, when used properly, volatile) are required to force the compiler to maintain ordering. Note that such directives are included, either directly or indirectly, in primitives that require ordering, for example, the smp_mb() memory barrier in the Linux kernel.
On to the CPU. Suppose that the compiler generates the code in order, and that the CPU executes it in order. What could possibly go wrong?
The store buffer. The CPU will likely commit the new values of p->a, p->b, p->c, and gp to the store buffer. If the cache line referenced by p happens to be owned by some other CPU (for example, if the memory returned by the preceding kmalloc() had been kfree()ed by some other CPU) and if the cache line containing gp is owned by the current CPU, the first three assignments will be flushed from the store buffer (and thus visible to other CPUs) long after the last assignment will be.
In addition, superscalar CPUs routinely execute code out of order. Such beasts might become less common as power-consumption concerns rise to the fore, but there are still quite a few such CPUs out there, dating from the mid-1990s for commodity microprocessors and to the mid-1960s for supercomputers. For example, Tomasulo's Algorithm, dating from the mid-1960s, is specifically designed to allow CPUs to execute instructions out of order. And there were super-scalar supercomputers pre-dating Tomasulo's Algorithm.
In short, if ordering is important to you, use primitives to enforce ordering. Such primitives include locking primitives (e.g., spin_lock() and spin_unlock() in the Linux kernel), the RCU publish-subscribe primitives called out in this article, and of course explicit memory barriers. (I would recommend avoiding explicit memory barriers where feasible -- they are difficult to get right.)
Hey, you asked!!! And Happy New Year!!!