Semantic patching with Coccinelle

January 20, 2009

This article was contributed by Valerie Aurora

We've all been there: You're tracking down some evil bug, and you have the sudden chilling realization that you're going to have to refactor an enormous chunk of code to fix it. You break out in a cold sweat as you run a quick grep over the source base: hundreds of lines of code to change! And the change is too complex to do with a script because it depends on the calling context, or requires adding a new variable to every caller.

This happened to me last month when I was adding support for 64-bit file systems to e2fsprogs. I thought I was nearly finished when I discovered I needed to write (yet another) new interface and convert (yet another) several hundred lines of code to it. The changes were complex enough that I couldn't use a script, and simple enough that I wanted to claw my eyes out with the soul-killing boredom of doing it by hand. That's when the maintainer, Theodore Ts'o, suggested I look at Coccinelle (a.k.a., spatch).

Coccinelle

Coccinelle is a tool to automatically analyze and rewrite C code. Coccinelle (pronounced cock'-see-nel) means "ladybug" in French, a name chosen because ladybugs eat other bugs. Coccinelle is not just another scripting language; it is aware of the structure of the C language and can make much more complex changes than are possible with pure string processing. For example, Coccinelle can make a particular change only in functions which are assigned to a function pointer in a particular type of array — say, the create member of struct inode_operations.

The input to the tool is the file(s) to be changed and a "semantic patch," written in SmPL (Semantic Patch Language). SmPL looks a like a unified diff (a patch) with some C-like declarations mixed in. Here's an example:

    @@
    expression E;
    identifier fld;
    @@

    - !E && !E->fld
    + !E || !E->fld

This semantic patch fixes the bug in which the pointer is tested for NULL — and then dereferenced if the pointer is NULL. An example of a bug this semantic patch found in the Linux kernel (and automatically generated the fix for):

    --- a/drivers/pci/hotplug/cpqphp_ctrl.c
    +++ b/drivers/pci/hotplug/cpqphp_ctrl.c
    @@ -1139,7 +1139,7 @@ static u8 set_controller_speed(struct controller
    *ctrl, u8 adapter_speed, u8 hp_
            for(slot = ctrl->slot; slot; slot = slot->next) {
                    if (slot->device == (hp_slot + ctrl->slot_device_offset))
                            continue;
    -               if (!slot->hotplug_slot && !slot->hotplug_slot->info)
    +               if (!slot->hotplug_slot || !slot->hotplug_slot->info)
                            continue;
                    if (slot->hotplug_slot->info->adapter_status == 0)
                            continue;

(More on the semantic patch format later.)

Coccinelle is designed, written, and maintained by Julia Lawall at the Department of Computer Science at University of Copenhagen, Gilles Muller and Yoann Padioleau at the Ecole des Mines de Nantes, and René Rydhof Hansen at the Department of Computer Science of Aalborg University. Coccinelle is licensed under the GPL, however, it is written in OCaml, so the potential developer base is somewhat limited.

The original goal of Coccinelle was to automate as much as possible the task of keeping device drivers up to date with the latest kernel interfaces. But the end result can do far more than that, including finding and fixing bugs and coding style irregularities. Over 180 patches created using Coccinelle have been accepted into the Linux kernel to date.

Coccinelle quickstart

Like many languages, SmPL is best learned through example. We'll run through one simple example here just to get started. After that, the Coccinelle web page has some documentation and a plethora of examples.

First, download Coccinelle and install it. I used the source version rather than any of the precompiled options. The Coccinelle binary is called spatch.

As our example, say we have program with a lot of calls to alloca() that we would like to replace with malloc(). alloca() allocates space on the stack and can be more efficient and convenient than malloc(), but it is also compiler-dependent, non-standard, easy to use incorrectly, and has undefined behavior on failure. (Replacing alloca() with malloc() isn't enough, we also have to check the return value — but that will come later.)

Here is the C file we are working on:

    #include <alloca.h>

    int
    main(int argc, char *argv[])
    {
            unsigned int bytes = 1024 * 1024;
            char *buf;

            /* allocate memory */
            buf = alloca(bytes);

            return 0;
    }

We could make the replacement using a scripting language like sed:

$ sed -i 's/alloca/malloc/g' test.c

But this will replace the string "alloca" anywhere it appears. The resulting diff:

    --- test.c
    +++ /tmp/test.c
    @@ -1,4 +1,4 @@
    -#include <alloca.h>
    +#include <malloc.h>
 
     int
     main(int argc, char *argv[])
    @@ -6,8 +6,8 @@
             unsigned int bytes = 1024 * 1024;
             char *buf;
 
    -        /* allocate memory */
    -        buf = alloca(bytes);
    +        /* mallocte memory */
    +        buf = malloc(bytes);
 
             return 0;
     }

We can tweak our script to handle 90% of the cases:

    $ sed -i 's/alloca(/malloc(/g' test.c

But this script doesn't handle the case where a second function name has the first as a suffix, it depends on a particular coding style in which no white space comes between the function name and the open parenthesis, etc., etc. By now our simple sed script is a hundred-character monster. It can be done, but it's a pain.

In Coccinelle, we'd use the following semantic patch:

    @@ expression E; @@

    -alloca(E)
    +malloc(E)

Put the C file in test.c and the above semantic patch in test.cocci and run it like so:

    $ spatch -sp_file test.cocci test.c

It should produce the following diff:

    --- test.c
    +++ /tmp/cocci-output-17416-b5450d-test.c
    @@ -7,7 +7,7 @@ main(int argc, char *argv[])
             char *buf;
 
             /* allocate memory */
    -        buf = alloca(bytes);
    +        buf = malloc(bytes);
 
             return 0;
     }

Let's look at the semantic patch line by line.

    @@ expression E; @@

This declares the "metavariable" E as a variable that can match any expression — e.g.,

1 +
2

, sizeof(x),

strlen(name) + sizeof(x) *
72

. When spatch processes the input, it sets the value of E to the argument to alloca(). The "@@ @@" syntax is chosen to resemble the line in a unified diff describing the lines to be patched. I don't find the resemblance particularly helpful, but the intention is well-taken.

    -alloca(E)

This line says to remove any call to the function alloca(), and to save its argument in the metavariable E for later use.

    +malloc(E)

And this line says to replace the call to alloca() with a call to malloc() and use the value of metavariable E as its argument.

Now, we also want to check the return value of malloc() and return an error if it failed. We can do that too:

    @@
    expression E;
    identifier ptr;
    @@

    -ptr = alloca(E);
    +ptr = malloc(E);
    +if (ptr == NULL)
    +        return 1;

The resulting diff:

    --- test.c
    +++ /tmp/cocci-output-17494-22a573-test.c
    @@ -7,7 +7,8 @@ main(int argc, char *argv[])
             char *buf;
 
             /* allocate memory */
    -        buf = alloca(bytes);
    +        buf = malloc(bytes);
    +        if (buf == NULL)
    +                return 1;
 
             return 0;
 }

Semantic patches can be far more complex. One of my favorite examples is the move of reference counting of the Scsi_Host structure out of drivers. Changing this required adding an argument to the function signature and removing a declaration and several other lines from each SCSI driver's proc_info function. The semantic patch, explained in detail in their OLS 2007 slides [PPT] [ODP], does all of this automatically. I recommend reading and re-reading this example until it sinks in.

Experience

My first experience with Coccinelle was mixed. In theory, Coccinelle does exactly what I want — automate complex changes to code — but in practice the implementation is beta quality. I successfully used Coccinelle to make hundreds of lines of changes with less than a hundred lines of semantic patches, but only after working directly with the developers to get bug fixes and help figuring out SmPL features. Coccinelle is one of those schizophrenic projects situated on the boundary between academic research and practical software development.

One of the first hurdles I had to overcome was teaching Coccinelle about the macros in my code. Coccinelle has to do all its own parsing and pre-processing — you can't just run the input C code through cpp because then you'd have to map the post-processor output back to the original code. Macros will sometimes confuse it enough that it gives up parsing a function until it reaches the next safe grammatical starting point (e.g., the next function) — which may mean that it doesn't process most of the file. To get around this, you can create a list of macros and feed them to spatch with the -macro_file option. (Yes, that's one dash — one of my pet peeves about Coccinelle is the non-standard command-line option style.) For example, here are a few lines from the macro file I used for e2fsprogs:

    #define EXT2FS_ATTR(a)
    #define _INLINE_ inline
    #define ATTR(a)

You can build the list of macros by hand, but spatch has a feature that helps find them automatically. The -parse_c option makes spatch list the top ten parsing errors, which will include the macro name. For example, some of the output from running

spatch
-parse_c

on e2fsprogs:

    EXT2FS_ATTR: present in 85 parsing errors
    example:

          static int check_and_change_inodes(ext2_ino_t dir,
                                      int entry EXT2FS_ATTR((unused)),
                                      struct ext2_dir_entry *dirent, int
                                      offset,
                                      int  blocksize EXT2FS_ATTR((unused)),

Coccinelle has improved significantly in the past few weeks. The 0.1.2 release had a number of bugs that made spatch unusable for me. The next release, 0.1.3, fixed those bugs and with it I was able to make practical, real-world patches. The 0.1.4 release will be out shortly. The developers wrote and released more documentation, including a description of all the command-line options [PDF] and a grammar for SmPL. Many more example spatch scripts are available now. The best reference for learning Coccinelle continues to be the slides from their 2007 OLS tutorial and the associated paper [PDF]. White space handling is improving; originally Coccinelle didn't care much about white space and frequently mangled transformations involving it, which is a problem if you want to take the hand out of hand-editing. One of my semantic patches left a dangling semi-colon in the middle; the developers sent me a patch to fix it within a few days.

One thing I am absolutely certain of: learning Coccinelle and writing semantic patches was way more fun than making the changes by hand or using regular expressions. I also had much greater confidence that my changes were correct; it is remarkably pleasant to make several hundred lines of changes and have the result compile cleanly and pass the regression tests the first time.

Related work

If you really want to, you can do everything Coccinelle can do by writing your own scripts — after all, code is code. But you have to deal with all the little corner cases — e.g., to C, white space is all the same, generally speaking, but regular expressions care intensely about the difference between a space, a newline, and a tab. Use the right tool for the job — if you're just replacing a variable name and your first script works, great. If you're changing a calling convention or moving the allocation and freeing of an object to another context, give a tool like Coccinelle a try.

In terms of power and flexibility, Coccinelle is similar to the Stanford compiler checker [PDF] (commercialized by Coverity). While the compiler checker is far more mature and has better flow analysis and parsing, Coccinelle can generate code to fix the bugs it finds. Most importantly, Coccinelle is open source, so developers can find and fix bugs themselves.

Some IDEs include tools to automatically refactor code, which is one aspect of what Coccinelle does. I have never personally used one of these IDE refactoring tools and can't compare it with Coccinelle, but my friends who have report that their stability leaves something to be desired. Xrefactory is a refactoring tool available on *NIX platforms which is fully integrated with Emacs and XEmacs. It is not open source and requires the purchase of a license, although one version is available for use free of charge.

Conclusion

Coccinelle is an open source tool that can analyze and transform C code according to specified rules, or semantic patches. Semantic patches are much more powerful than patches or regular expressions. The tool is beta quality right now but usable for practical tasks and the developers are very responsive. It's worth learning for any developer making a non-trivial interface change.

Index entries for this article
Kernel	Development tools/Coccinelle
GuestArticles	Aurora (Henson), Valerie

to post comments

Semantic patching with Coccinelle

Posted Jan 20, 2009 22:34 UTC (Tue) by biehl (subscriber, #14636) [Link] (1 responses)

I wonder if someone has ever compared

Dehydra ( https://developer.mozilla.org/en/Dehydra )
GTK-rewriter ( http://people.imendio.com/richard/gtk-rewriter/ )

and maybe other tools ( http://blog.mozilla.com/tglek/2008/09/02/converging-elsa-... ) to Coccinelle?

Semantic patching with Coccinelle

Posted Jan 21, 2009 1:30 UTC (Wed) by padator (guest, #56235) [Link]

From the web pages of Deydra and gtk-rewriter it seems deydra
does not support any transformation (just analysis, and you have
to write javascripts code apparently to match over C code), and gtk-rewrite
have a few transformations hard-coded in a file. The goal of coccinelle
is to make it easy to specify code patterns and transformations. You
use a syntax you already know for that: C (not javascripts on ASTs), and
the patch syntax.

For elsa I can not speak, it's not clear what is their goal
and what they can do.

Semantic patching with Coccinelle

Posted Jan 20, 2009 22:41 UTC (Tue) by Thue (guest, #14277) [Link] (27 responses)

it is written in OCaml, so the potential developer base is somewhat limited

Every programmer should learn to write ML or another functional language (OCaml is an object-oriented version of ML). It is the first language taught at the Department of Computer Science at University of Copenhagen, so there is at least some people who know it.

For some kinds of programs ML code is 1/3 the size of an equivalent imperative program, as well as more readable and easier to verify for correctness. ML has excellent compile time checking; If your ML program compiles it will usually also run correctly.

Many of the computer science students who learn ML keeps it as their favorite language. In my experience it is especially the best students who 'gets it' and likes ML.

Semantic patching with Coccinelle

Posted Jan 21, 2009 1:33 UTC (Wed) by padator (guest, #56235) [Link] (10 responses)

Argh, please no programming language flame war :)

Semantic patching with Coccinelle

Posted Jan 21, 2009 18:45 UTC (Wed) by felixfix (subscriber, #242) [Link] (9 responses)

I agree -- Valerie's statement is fact, not a wish or flaming. There simply aren't as many people who will modify the source as would be if it were in, say, Perl or Python or C itself.

All comments to the contrary are wishful thinking (If I had some ham, I could have ham and eggs, if I had some eggs) and begging for a language war.

Semantic patching with Coccinelle

Posted Jan 21, 2009 19:45 UTC (Wed) by rwmj (guest, #5474) [Link] (8 responses)

Learn something new ...

Semantic patching with Coccinelle

Posted Jan 21, 2009 20:32 UTC (Wed) by felixfix (subscriber, #242) [Link] (5 responses)

... and miss the point completely.

You remind me of people who criticize where I go on vacation and what I do. "You could have gone to xxx and done yyy." Yeh, well, no matter where I go and what I do, I could have gone somewhere else and done something else.

Time and resources are limited. Some people would rather get on with the doing rather than learn new ways to not do things they don't have time for because they spend all their time learning new ways they won't use.

Semantic patching with Coccinelle

Posted Jan 21, 2009 22:23 UTC (Wed) by rwmj (guest, #5474) [Link] (4 responses)

Yeah, you're right. We should never try anything new or do anything differently from how it's
always been done.

Semantic patching with Coccinelle

Posted Jan 21, 2009 23:49 UTC (Wed) by felixfix (subscriber, #242) [Link] (3 responses)

You just don't get it. There are more people who already know common languages than who already know OCaml, and only a fraction of those people who don't know OCaml are going to have the time and resources to learn it.

It's just a simple fact. It has nothing to do with the benefits of education, of new and improved ways of doing things. NOTHING.

Quit taking it personally. It has zero to do with you personally, your personal taste in languages or living styles, what would be best or ideal or anything. It is a simple fact of counting heads. More people know languages other than OCaml and could contribute in those languages.

Separate raw data from your personal wishes and dreams. Valerie wrote a fact. Dispute the fact if you want, but don't ramble on about learning and better ways and so on, those are not facts.

Semantic patching with Coccinelle

Posted Jan 23, 2009 1:37 UTC (Fri) by giraffedata (guest, #1954) [Link] (2 responses)

Looks to me like you're putting words in rwmj's mouth and then disagreeing with them. I don't see that rwmj has taken issue with Valerie's conclusions about the language choice.

Not every comment is a contradiction of the parent, and I wouldn't assume that "learn something new" was meant to say, "there's nothing unfortunate about the fact that this code is in OCaml."

Of course, I'm not really sure how "learn something new" does fit into the thread. The posts after it follow more obviously: you point out that learning something new isn't always the right thing and rwmj misreads that as learning something new is never the right thing and disagrees. While that position (learning something new is sometimes good) is obviously right, you respond as if he were arguing -- still -- that there's nothing unfortunate about the fact that this code is in OCaml.

Semantic patching with Coccinelle

Posted Jan 23, 2009 5:41 UTC (Fri) by rwmj (guest, #5474) [Link] (1 responses)

I was just being sarcastic in that second posting. OCaml and Haskell are something new. They're not just exotic scripting languages - in the way that Ruby is just Perl with a different syntax. They are something considerably more powerful and expressive that can take programming in new directions. Unfortunately explaining this is a bit like the Paul Graham explaining LISP to "Blub" programmers.

Semantic patching with Coccinelle

Posted Jan 23, 2009 8:09 UTC (Fri) by hppnq (guest, #14462) [Link]

Unfortunately explaining this is a bit like the Paul Graham explaining LISP to "Blub" programmers.

It's hard to find explanations that do a worse job of introducing people to Lisp. There's a good explanation of Haskell (PDF), including a bit of history, design choices and an overview of the functional programming paradigm.

Haskell, by the way, is roughly of the same age as Python, but is expected to become the next great programming language Any Moment Now. Or maybe not.

Semantic patching with Coccinelle

Posted Jan 22, 2009 6:49 UTC (Thu) by njs (subscriber, #40338) [Link] (1 responses)

No-one's saying that people *shouldn't* learn OCaml, or even that they personally don't wish to learn OCaml, just that in practice most of the people in the programming world perceive OCaml as exotic and do not learn it -- for whatever reason.

Writing in a niche language *can* have the opposite effect on finding contributors, though. Darcs for instance benefited quite a bit from being written in Haskell, because there were many people who had learned the language out of interest and really wanted to work on something in Haskell, but not many real-world projects to go around. Its competitors were written in better known languages, but their potential contributor base was correspondingly diluted by all the other projects also written in those languages...

This effect does exist but it's short-lived...

Posted Jan 22, 2009 10:29 UTC (Thu) by khim (subscriber, #9252) [Link]

Yes, darcs benefited for a time from the fact that it could attract all these people - but the end result was the same: when C crowd got it's shiny new bauble (Git) all other projects were left in dust...

Sometimes it's good idea to use non-mainstream language because it's the only way to produce something and you don't need many contributors: one of the most popular DFT library (FFTW) is written in OCaml (well, kinda). But it does limit number of potential contributors! No way to avoid this...

Semantic patching with Coccinelle

Posted Jan 21, 2009 10:48 UTC (Wed) by Yorick (guest, #19241) [Link] (15 responses)

First, many thanks to Valerie Henson for an excellent article and for reminding us of the existence of Coccinelle which appears to be a fine tool.

But I must agree with Thue. Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.

We are not talking about exoticisms like Befunge or Brainfuck here but standard, well-known, well-documented and widely-taught languages. A notation well suited to the task makes the task easier; for nontrivial applications, the complexity lies in the problem domain. Anyone who has worked with GCC will attest that the fact that it is (mostly) written in C does not make it easier to understand.

Semantic patching with Coccinelle

Posted Jan 21, 2009 12:35 UTC (Wed) by hppnq (guest, #14462) [Link]

Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.

This is not the common practice, of course, if only because there are a lot more incompetent programmers than programmers who quickly learn Haskell. Whether this is true for any language Y and any project X is an interesting question.

But of the statements S in publication P, I would say yours is more sweeping than Valerie's. ;-)

Semantic patching with Coccinelle

Posted Jan 22, 2009 4:11 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

Is OCaml standard now? Last I heard it was defined by its current implementation. Wikipedia seems to suggest that remains true.

A formal definition and multiple implementations help to reassure coders that time spent learning the language and writing reams of code in it won't end up wasted when, e.g., developers of the sole implementation lose interest and leave it orphaned. (NB: I am not saying I expect this to happen to OCaml.) It is precisely this quality, and nothing about the details of the language design, that make apt Ms. Henson's remark about Coccinelle's potential developer base.

Semantic patching with Coccinelle

Posted Jan 22, 2009 6:10 UTC (Thu) by shimei (guest, #54776) [Link]

I'm not going to take a side on whether OCaml limits the development base or not, but I don't think it'd be the lack of standardization stopping it in any case. Look at languages like python and ruby (or, *gasp*, PHP) that are quite popular and practical. None of those are standardized in any meaningful way, but they're doing just fine. Then look at a language like Haskell that has standardization and multiple high-quality implementations, but still sits on the sidelines of the software industry.

Semantic patching with Coccinelle

Posted Jan 22, 2009 5:43 UTC (Thu) by i3839 (guest, #31386) [Link] (6 responses)

You're right for (semi) long-term contributors. But for people who just want to make a quick fix or other small contribution a for them strange language is a sort of hurdle. Perhaps big enough to miss quite a few long-term contributors who normally would have started with something small.

That said, how the reception of outside contributions is by the main developers has a bigger impact than what language is used...

Semantic patching with Coccinelle

Posted Jan 22, 2009 10:37 UTC (Thu) by Yorick (guest, #19241) [Link] (5 responses)

Thank you, that is a valid objection; small contributions are likely to be inhibited by the use of an unfamilar language. But not all of them; trivial typo fixes, translations, ports, build and configuration changes etc do not require much understanding of the language. Nor do testing and reporting bugs, perhaps the most important class of small contributions.

But what I really wanted to challenge is the sad prevailing idea that some languages are "common" and the rest "strange". The statement in the article could be interpreted that way, although I am confident that Ms Henson does not suffer from that delusion herself. It is not helpful in making "uncommon" languages less so, even when this has great merit.

(Also, some "helpful" contributions that you receive as a maintainer of a free software package makes you wonder if the language-as-barrier is such a bad idea...)

Semantic patching with Coccinelle

Posted Jan 22, 2009 13:43 UTC (Thu) by hppnq (guest, #14462) [Link] (4 responses)

But what I really wanted to challenge is the sad prevailing idea that some languages are "common" and the rest "strange".

To prove your point, maybe you should write patches for Coccinelle so that it can produce semantic patches for OCaml?

Semantic patching with Coccinelle

Posted Jan 22, 2009 16:18 UTC (Thu) by rwmj (guest, #5474) [Link] (3 responses)

To prove your point, maybe you should write patches for Coccinelle so that it can produce semantic patches for OCaml?

OCaml actually supports the principle of semantic patching natively. You can perform almost arbitrary transformations of the abstract syntax tree at compile time, and this feature is used to implement interesting new features like Erlang-style bitstrings, type-safe access to databases, type-safe regular expressions, and much more.

Of course this is "strange" to many. (LISP programmers might recognise them as a very much more powerful version of LISP macros). But this is just one of the several ways that OCaml (and Haskell) are far beyond common programming languages.

Rich.

Semantic patching with Coccinelle

Posted Jan 22, 2009 16:28 UTC (Thu) by padator (guest, #56235) [Link] (2 responses)

> OCaml actually supports the principle of semantic patching natively.

This is not true. What you are talking about is different and is called
meta-programming. The need to refactor code is different. Even in OCaml
you often need to refactor code and there is no tool right now for OCaml
that does that. In fact we, in the coccinelle project, had in the past internally needed to refactor the coccinelle code and it was painful.

So I guess the comment of the other guy was right on the point; we decided to do
a semantic patching tool for C rather than a semantic patching tool for OCaml because there are more people writing C code :)

Semantic patching with Coccinelle

Posted Jan 23, 2009 0:42 UTC (Fri) by nix (subscriber, #2304) [Link]

Obviously, for symmetry, the thing to do is to write a semantic patching
tool (called, perhaps, ocamelle), in C, which carries out such
transformations on OCaml code. ;}

Semantic patching with Coccinelle

Posted Jan 23, 2009 5:38 UTC (Fri) by rwmj (guest, #5474) [Link]

I didn't mean that semantic patching was used in the same way as metaprogramming, but they are certainly analogous to each other. In one case, the transformed code is applied as a patch back on the source. In the other case, the transformed code is immediately passed to the compiler.

Anyhow .. for OCaml refactoring, Jane Street sponsored this project last summer. It's also something that Eclipse + the OCaml Eclipse plugin claims to do. I have not used either.

Semantic patching with Coccinelle

Posted Jan 22, 2009 6:36 UTC (Thu) by dirtyepic (guest, #30178) [Link] (2 responses)

Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.

The point is that the total number of people who know ML or will learn it in the near future is less than the total number of people who know or will know C/python/etc, just as a book written in Ukrainian has a smaller potential audience than one written in English. You don't have to know Ukrainian to make that observation, just how to count.

Semantic patching with Coccinelle

Posted Jan 22, 2009 6:58 UTC (Thu) by padator (guest, #56235) [Link] (1 responses)

Sure, so let's all start writing chinese code.

Semantic patching with Coccinelle

Posted Jan 22, 2009 17:40 UTC (Thu) by dirtyepic (guest, #30178) [Link]

nah, let's complain about them not learning english instead. obviously if they were competent and motivated they would.

Semantic patching with Coccinelle

Posted Jan 22, 2009 12:21 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (1 responses)

Any competent and motivated programmer can quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project. However, pretty much every competent and motivated programmer I know would rather be necking Pimm's from the bottle than learning yet another language in order to satisfy somebody's academic preferences.

Semantic patching with Coccinelle

Posted Jan 23, 2009 0:40 UTC (Fri) by lysse (guest, #3190) [Link]

...and you, sir, win this thread. ;)

Semantic patching with Coccinelle

Posted Jan 21, 2009 2:23 UTC (Wed) by zooko (guest, #2589) [Link]

This was an interesting and informative article. Thanks, Valerie Henson.

Semantic patching with Coccinelle

Posted Jan 21, 2009 10:05 UTC (Wed) by wingo (guest, #26929) [Link] (6 responses)

Great article, Val, as always.

I also like turning tedium into tools problems -- it probably takes the same amount of time but writing tools is much more fun.

Semantic patching with Coccinelle

Posted Jan 21, 2009 18:46 UTC (Wed) by ndk (subscriber, #43509) [Link] (5 responses)

... or as somebody (who?) said:

"I'd rather write programs that write programs, than write programs."

Semantic patching with Coccinelle

Posted Jan 22, 2009 1:30 UTC (Thu) by sitaram (guest, #5959) [Link] (1 responses)

with absolutely no way to prove it, I think it was Larry Wall.

If it wasn't, it should be :-) Sounds like him...!

Semantic patching with Coccinelle

Posted Jan 22, 2009 2:32 UTC (Thu) by JoeBuck (guest, #2330) [Link]

It appears that Richard Sites, who was a student of Donald Knuth, said it first (or at least before Larry Wall did), sometime in the 1970s.

Semantic patching with Coccinelle

Posted Jan 22, 2009 5:46 UTC (Thu) by Mithrandir (guest, #3031) [Link] (2 responses)

Didn't someone say something like "the definition of a geek is someone who would rather write a tool to do a repetitive task than do the task itself, even if it takes longer to write the tool"? Might have been ESR.

Semantic patching with Coccinelle

Posted Jan 22, 2009 8:35 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Douglas Adams spent several pages rhapsodizing about writing tools rather
than doing the actual job, even when the job was a one-off, in _Last
Chance to See_. Given that this was a book about conservation this was a
somewhat strange choice :)

Semantic patching with Coccinelle

Posted Jan 23, 2009 0:47 UTC (Fri) by lysse (guest, #3190) [Link]

Ah, yes, but Douglas Adams was a world-class procrastinator, and what else is writing a tool but the most productive form of procrastination? As for conservation... well, I suppose you could draw a long and tortured analogy about how conservation is a bit like that kind of job, and the temptation is to get all of the conditions exactly right for a creature's continued survival, and how even if it dies off in the meantime, before its habitat is properly ready, you still get that sense of achievement from having done a little something to save the world anyway... or something...

Semantic patching with Coccinelle in Fedora

Posted Jan 21, 2009 22:22 UTC (Wed) by rwmj (guest, #5474) [Link] (3 responses)

Anyone interested in trying this program in Fedora, there is a package here:
https://bugzilla.redhat.com/show_bug.cgi?id=481034

Semantic patching with Coccinelle in Fedora

Posted Jan 22, 2009 20:56 UTC (Thu) by vaurora (guest, #38407) [Link] (2 responses)

Awesome! Thanks! I wonder how long until the Debian package?

Semantic patching with Coccinelle in Fedora

Posted Jan 22, 2009 21:23 UTC (Thu) by eugeniy (subscriber, #24280) [Link] (1 responses)

Working on that already: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512658

Semantic patching with Coccinelle in Fedora

Posted Jan 23, 2009 3:49 UTC (Fri) by vaurora (guest, #38407) [Link]

Sweet! Thanks!

coccinelle problems

Posted Jan 22, 2009 7:57 UTC (Thu) by Octavian (guest, #7462) [Link] (3 responses)

I can confirm the statement that coccinelle is of beta quality currently, as it still fails on some C constructs, e. g. related to named initializers. The following patchlet always fails because of the union inside:

- struct foo my_foo[] = {
- .a = 1,
- .u.b = 42,
- }
+ FOO(1, 42)

However, after contacting Julia Lawal I can say that she was always very helpfull at fixing the issues I found.

coccinelle problems

Posted Jan 22, 2009 12:42 UTC (Thu) by lawall (guest, #56234) [Link] (1 responses)

Thanks for the bug report. I will take a look at it. Indeed, structure declarations of various sorts are not yet as well supported as they should be.

julia

coccinelle problems

Posted Jan 30, 2009 9:09 UTC (Fri) by lawall (guest, #56234) [Link]

> Indeed, structure declarations of various sorts are not yet as well supported as they should be.

Note that this refers to the support for structures in SmPL, ie the language in which semantic patches are written. The C parser supports all kinds of structures with no problem, and indeed all of C, with various extensions as used in Linux code.

julia

coccinelle problems

Posted Feb 5, 2009 8:46 UTC (Thu) by lawall (guest, #56234) [Link]

This problem has been fixed in the new release (0.1.5, Feb 5, 2009). Thanks for the bug report.

julia

Wrong Section?!

Posted Jan 22, 2009 12:07 UTC (Thu) by nikanth (guest, #50093) [Link] (1 responses)

I wonder why is this article published under Kernel section?!

Wrong Section?!

Posted Jan 22, 2009 19:34 UTC (Thu) by egoforth (subscriber, #2351) [Link]

I wonder why is this article published under Kernel section?!

I would posit that it's because this is the Kernel development section, and there has already been a measured impact.

The original goal of Coccinelle was to automate as much as possible the task of keeping device drivers up to date with the latest kernel interfaces. But the end result can do far more than that, including finding and fixing bugs and coding style irregularities. Over 180 patches created using Coccinelle have been accepted into the Linux kernel to date.

Semantic patching with Coccinelle

Posted Jan 22, 2009 13:26 UTC (Thu) by johill (subscriber, #25196) [Link] (2 responses)

Thanks for the article, I anticipate the .4 release and this is the first I heard of it :) One thing I've been battling with recently that unfortunately it doesn't support is modifying printf-style formats. I'd love to write something like

@@
string A, B;
expression M, MBUF;
@@
-printk(... "%s" ..., ..., print_mac(MBUF, M), ...);
+printk(... "%pM" ..., ..., M, ...);

but obviously parameter matching is quite a hard task and definitely needs different syntax than what I just invented on the spot. But even without that, I've used it a few times already, if only to detect problems, e.g. http://thread.gmane.org/gmane.linux.kernel.wireless.general/26371 (though that case required filtering manually for the correct places)

Semantic patching with Coccinelle

Posted Jan 22, 2009 13:27 UTC (Thu) by johill (subscriber, #25196) [Link] (1 responses)

Well, make links clicky: http://thread.gmane.org/gmane.linux.kernel.wireless.gener...

Semantic patching with Coccinelle

Posted Jan 28, 2009 14:35 UTC (Wed) by dmk (guest, #50141) [Link]

whoa! i can do double-click;middle-click on non-clicky links... *click**click* *click*
(i feel so proud!)
...to wander further off topic..

Semantic patching with Coccinelle

Posted Jan 26, 2009 10:19 UTC (Mon) by jmmc (guest, #34939) [Link]

Just wanted to second (third, fourth, etc.) others comments - great article Val and to Jon, once again a big thanks (again !) for organizing and including such great content in LWN !

Related work on C/C++ refactoring

Posted Jan 29, 2009 8:29 UTC (Thu) by tglek (guest, #56374) [Link] (4 responses)

I spent a couple of years writing tools to rewrite C/C++ code for Mozilla. I'm glad to finally see related work discussed on LWN, but I'm a little disappointed that Pork(my refactoring toolchain) is not famous enough to be listed as an alternative.

I wrote a blog post about it:
http://blog.mozilla.com/tglek/2009/01/29/semantic-rewriti...

As an interesting tidbit: since Pork itself is largely written in C++ I actually used it to refactor itself, I briefly ranted about it in http://blog.mozilla.com/tglek/2008/07/25/pull-pork-with-c...

With regards to C preprocessor issues mentioned, I worked with MCPP(http://sourceforge.net/projects/mcpp/) author to enable MCPP to produce a special form of preprocessed files that makes dealing with macros much simpler. See -K option in MCPP.

Related work on C/C++ refactoring

Posted Jan 30, 2009 7:03 UTC (Fri) by padator (guest, #56235) [Link] (3 responses)

How long does it take to express the 2 program transformations mentionned in val's article using Pork ? With coccinelle it takes respectively 7 and 9 lines of SmPL, and it can be applied on the whole kernel, including
code in arch/, in code protected by ifdef, etc.

Regarding CPP, how MCPP handles ifdef ? How much of the linux kernel can you analyse ? How many lines MCPP skip to make your parsing job easier ?
How do you handle iterator macros ? In the case of coccinelle we need sometimes to express transformations on macros, on iterators, declarors, and so we must try to not expand macro and represent macro directly in the internal AST.

Related work on C/C++ refactoring

Posted Jan 30, 2009 19:24 UTC (Fri) by tglek (guest, #56374) [Link] (2 responses)

Not to make this a contest, but some refactorings require writing a C++ tool, some already have such tools.

So for the rename it'd be a matter of running
./renamer ::alloca ::malloc on the files of interest

For the other change, who knows could be a few lines of C++ or could be a few thousand depending if the complexity of the attempted change(pork's changes can be as complex as the user desires due to having a fully elaborated AST).

Related work on C/C++ refactoring

Posted Jan 30, 2009 19:42 UTC (Fri) by tglek (guest, #56374) [Link]

I should mention that Pork also does not require the user to teach it about various preprocessing constructs used. I know Mozilla code features some phenomenally bad macro expansions(ie defining while method bodies within macros.. or building function bodies out of various macros or otherwise having macros concatenation to call functions) most of which would be extremely unpleasant to explain to any parser.

Instead Pork allows detection of code where macros interfere with a particular refactoring so it can produce an error message and inform the user that some manual help is needed for that particular bad macro. Usually that turns out to be a trivial amount of effort(in the worst possible case it took a couple of days to compensate for it).

Related work on C/C++ refactoring

Posted Jan 30, 2009 19:50 UTC (Fri) by padator (guest, #56235) [Link]

And will your renamer tool also rename code inside ifdefs ? If I run it on the linux kernel, will it also rename functions in arch/alpha/... or
protected by #ifdef DEBUG ?

Coccinelle has also internally a fully elaborated AST and one can write any transformations in OCaml working on this AST but this is precisely what we want to avoid with Coccinelle. We don't want users to express their transformations on the AST but instead to express it easily using our SmPL patch syntax.

Also you didn't really answer my question, how long will it take using Pork to express the second program transformation mentioned in val's article about renaming malloc and also adding the pointer checking code for NULL.

Regular Expressions Plus?

Posted Feb 1, 2009 1:52 UTC (Sun) by ldo (guest, #40946) [Link] (2 responses)

Instead of a bunch of language-specific tools, how about a general tool that can do search/replace on syntactic constructs? Sort of the next step beyond regular expressions?

Is this a job for a packrat parser?

Regular Expressions Plus?

Posted Feb 2, 2009 0:30 UTC (Mon) by padator (guest, #56235) [Link] (1 responses)

I guess the next step beyond regular expressions are grammars, and
generic tools using grammars as parameters takes time to implement
and are not always useful.You have to know more than just the
syntactic structure of a programming language to make something interesting.
Emacs/Eclipse knows about the grammar of many programming languages.

Moreover, Coccinelle is not just a search/replace of syntactic constructs.
You have expression, function, and statement metavariables allowing
to match and move code and you can specify constraints about the
context of those entities.As val said:
"can make a particular change only in functions which are assigned to a function pointer in a particular type of array say, the create member of struct inode_operations." You need a way to specify such constraint.
I don't really understand how a packrat parser would help for that ...

Re: Regular Expressions Plus?

Posted Feb 2, 2009 1:34 UTC (Mon) by ldo (guest, #40946) [Link]

padator wrote:

Moreover, Coccinelle is not just a search/replace of syntactic constructs. You have expression, function, and statement metavariables allowing to match and move code and you can specify constraints about the context of those entities.

Yes, but Im pretty sure that those constraints are all, in principle, expressible using well-known techniques like two-level grammars and attribute grammars.

Im not saying its a simple thing to do, but nevertheless it seems useful to have such prebuilt grammars for different languages, working with a common core of code, rather than writing different code for different languages.

Just a thought.

Semantic patching with Coccinelle

Posted Feb 3, 2009 13:56 UTC (Tue) by robbe (guest, #16131) [Link] (1 responses)

I was a bit disheartened to see the example code transformed into code
that leaks (remember that memory reserved with alloca() is automatically
freed, malloc() does not have this property).

All the while I was hoping to see a third run of the tool which added the
missing free() call before every return statement. Is that possible with
Coccinelle?

Semantic patching with Coccinelle

Posted Feb 3, 2009 21:27 UTC (Tue) by padator (guest, #56235) [Link]

Nice catch :)

Yes coccinelle can do that too.
Here is an example of a better semantic patch:

@@
expression E;
identifier ptr;
identifier func;
@@
func(...) {
...
- ptr = alloca(E);
+ ptr = malloc(E);
+ if (ptr == NULL)
+ return 1;
...
+ free(ptr);
return ...;
}

Note that the coccinelle engine will take care to add the call to free() to all control flow paths before a return. Here is an example of a patch produced by spatch on a simple C file:
./spatch -sp_file demos/lwn.cocci demos/lwn.c

--- demos/lwn.c 2009-02-03 15:10:38.000000000 -0600
+++ /tmp/cocci-output-22113-f80295-lwn.c 2009-02-03 15:15:05.000000000 -0600
@@ -3,12 +3,17 @@ void main(int argc, char *argv[])
char *buf;

/* allocate memory */
- buf = alloca(bytes);
+ buf = malloc(bytes);
+ if (buf == NULL)
+ return 1;

- if(argc == 0)
+ if(argc == 0) {
+ free(buf);
return 0;
+ }

+ free(buf);
return 1;
}

note: see also how (beautifully) coccinelle adds the necessary { } after the if to make it a compound statement. Coccinelle also put
the correct indentation each time, even if the LWN html page does not
show it because of html space mangling I guess.

Coccinelle output

Posted Jul 10, 2015 14:10 UTC (Fri) by bou6 (guest, #103486) [Link] (1 responses)

I am a beginner in coccinelle and try to run my first example.

Currently I'am following the steps of this article

1)I created the c file
2)I created the coccinelle script
3)I run it using

$ spatch -sp_file test.cocci test.c
In the terminal I got the expected result as mentioned in the article

--- test.c
+++ /tmp/cocci-output-17416-b5450d-test.c
@@ -7,7 +7,7 @@ main(int argc, char *argv[])
char *buf;

/* allocate memory */
- buf = alloca(bytes);
+ buf = malloc(bytes);

return 0;
}
However the c file didn't change as expected.

Can any body tell me where can I get the changes made by the script?

Coccinelle output

Posted Oct 22, 2015 9:22 UTC (Thu) by mfrw (subscriber, #100251) [Link]

add the option --in-place to spatch.
$ spatch --sp-file test.cocci --in-place test.c
By default spatch just prints the output on the standard output, by this it will change the file also