Free software's not-so-eXZellent adventure

By Jonathan Corbet
April 2, 2024

A common theme in early-days anti-Linux FUD was that, since anybody can contribute to the code, it cannot be trusted. Over two decades later, one rarely hears that line anymore; experience has shown that free-software communities are not prone to shipping overtly hostile code. But, as the backdooring of XZ has reminded us, the embedding of malicious code is, unfortunately, not limited to the proprietary realm. Our community will be busy analyzing this incident for some time to come, but clear conclusions may be hard to come by.

The technical details of the attack are fascinating in many ways. See the companion article for more about that aspect of things.

For those needing a short overview: the XZ package performs data compression; it is widely distributed and used in many different places. Somebody known as "Jia Tan" managed to obtain maintainer-level access to this project and used that access to insert a cleverly concealed backdoor that, when the XZ library was loaded into an OpenSSH server process, would provide an attacker with the ability to run arbitrary code on the affected system. This code had found its way into some testing distributions and the openSUSE Tumbleweed rolling distribution before being discovered by Andres Freund. (See this page for a detailed timeline.)

The hostile code was quickly removed and, while it is too soon to be sure, it appears that it was caught before it could be exploited. Had that discovery not happened, the malicious code could have found its way onto vast numbers of systems. The consequences of a success at that level are hard to imagine and hard to overstate.

Social engineering

Like so many important projects, XZ was for years the responsibility of a single maintainer (Lasse Collin) who was keeping the code going on his own time. That led to a slow development pace at times, and patches sent by others often languished. That is, unfortunately, not an unusual situation in our community.

In May 2022, Collin was subjected to extensive criticism in this email thread (and others) for failing to respond quickly enough to patches. That, too, again unfortunately, is not uncommon in our community. Looking back now, though, the conversation takes on an even more sinister light; the accounts used to bully the maintainer are widely thought to have been sock puppets, created for this purpose and abandoned thereafter. In retrospect, the clear intent was to pressure Collin into accepting another maintainer into the project.

During this conversation, Collin mentioned (more than once) that he had been receiving off-list help from Tan, and that Tan might be given a bigger role in the future. That, of course, did happen, with devastating effects. Tan obtained the ability to push code into the repository, and subsequently abused that power to add the backdoor over an extended period of time. As well as adding the backdoor, Tan modified the posted security policy in an attempt to contain the disclosure of any vulnerabilities found in the code, changed the build system to silently disable the Landlock security module, redirected reports from the OSS-Fuzz effort, and more.

Once the malicious code became part of an XZ release, Tan took the campaign to distributors in an attempt to get them to ship the compromised versions as quickly as possible. There was also a series of patches submitted to the kernel that named Tan as a maintainer of the in-kernel XZ code. The patches otherwise look innocuous on their face, but do seem intended to encourage users to update to a malicious version of XZ more quickly. These patches made it as far as linux-next, but never landed in the mainline.

Much has been made of the fact that, by having an overworked and uncompensated maintainer, XZ was especially vulnerable to this type of attack. That may be true, and support for maintainers is a huge problem in general, but it is not the whole story here. Even paid and unstressed maintainers are happy to welcome help from outsiders. The ability to take contributions from — and give responsibility to — people we have never met from a distant part of the world is one of the strengths of our development model, after all. An attacker who is willing to play a long game has a good chance of reaching a position of trust in many important projects.

This whole episode is likely to make it harder for maintainers to trust helpful outsiders, even those they have worked with for years. To an extent, that may be necessary, but it is also counterproductive (we want our maintainers to get more help) and sad. Our community is built on trust, and that trust has proved to be warranted almost all of the time. If we cannot trust our collaborators, we will be less productive and have less fun.

Closing the door

As might be expected, the Internet is full of ideas of how this attack could have been prevented or detected. Some are more helpful than others.

There have been numerous comments about excess complexity in our systems. They have a point, but few people add complexity for no reason all; features go into software because somebody needs them. This is also true of patches applied by distributors, which were a part of the complex web this attack was built on. Distributors, as a general rule, would rather carry fewer patches than more, and don't patch the software they ship without a reason. So, while both complexity and downstream patching should be examined and avoided when possible, they are a part of our world that is not going away.

In the end, the specific software components that were targeted in this attack are only so relevant. Had that vector not been available, the attacker would have chosen another. The simple truth is that there are many vectors to choose from.

There is certainly no lack of technical and process questions that should be asked with regard to this attack. What are the dependencies pulled in by critical software, do they make sense, and can they be reduced? Why do projects ship tarballs with contents that are not found in their source repository, and why do distributors build from those tarballs? How can we better incorporate the ongoing reproducible-builds work to catch subtle changes? Should testing infrastructure be somehow separated from code that is built for deployment? What are the best practices around the management of binary objects belonging to a project? Why are we still using ancient build systems that almost nobody understands? How can we get better review of the sorts of code that makes eyes glaze over? What is the proper embargo policy for a problem like this one?

These are all useful questions that need to be discussed in depth; hopefully, some useful conclusions will come from them. But it is important to not get hung up on the details of this specific attack; the next one is likely to look different. And that next attack may well already be underway.

On the unhelpful side, the suggestion from the OpenSSF that part of the problem was the lack of an "OpenSSF Best Practices badge" is unlikely to do much to prevent the next attack. (Note that the organization seems to have figured that out; see the Wayback Machine for the original version of the post). The action by GitHub to block access to the XZ repository — after the horse had long since left the barn, moved to a new state, and raised a family — served only to inhibit the analysis of the problem while protecting nobody.

The source

This attack was carried out in a careful and patient fashion over the course of at least two years; it is not a case of a script kiddie getting lucky. Somebody — or a group of somebodies — took the time to identify a way to compromise a large number of systems, develop a complex and stealthy exploit, and carry out an extensive social-engineering campaign to get that exploit into the XZ repository and, from there, into shipping distributions. All of this was done while making a minimum of mistakes until near the end.

It seems clear that considerable resources were dedicated to this effort. Speculating on where those resources came from is exactly that — speculation. But to speculate that this may have been a state-supported effort does not seem to be going out too far on any sort of limb. There are, undoubtedly, many agencies that would have liked to obtain this kind of access to Linux systems. We may never know whether one of them was behind this attempt.

Another thing to keep in mind is that an attack of this sophistication is unlikely to limit itself to a single compromise vector in a single package. The chances are high that other tentacles, using different identities and different approaches, exist. So, while looking at what could have been done to detect and prevent this attack at an earlier stage is a valuable exercise, it also risks distracting us from the next attack (or existing, ongoing attacks) that do not follow the same playbook.

Over the years, there have not been many attempts to insert a backdoor of this nature — at least, that we have detected — into core free-software projects. One possible explanation for that is that our software has been sufficiently porous that a suitably skilled attacker could always find an existing vulnerability to take advantage of without risking the sort of exposure that is now happening around the XZ attack. An intriguing (and possibly entirely wishful) thought is that the long and ongoing efforts to harden our systems, move to better languages, and generally handle security issues in a better way has made that avenue harder, at least some of the time, pushing some attackers into risky, multi-year backdoor attempts.

Finally

In the numerous discussions sparked by this attack, one can readily find two seemingly opposing points of view:

The XZ episode shows the strength of the free-software community. Through our diligence and testing, we detected a sophisticated attack (probably) before it was able to achieve its objectives, analyzed it, and disabled it. (Example).
The world was saved from a massive security disaster only by dint of incredible luck. That such an attack could get so far is a demonstration of the weakness of our community; given our longstanding sustainability problems, an attack like this was inevitable at some point. (Example).

In the end, both of those points of view can be valid at the same time. Yes, there was a massive bit of luck involved in the detection of this attack, and yes, the support for maintainers (and many other contributors) is not what it needs to be. But, to paraphrase Louis Pasteur, chance favors the prepared development community. We have skilled and curious developers who will look into anomalous behavior, and we have a software environment that facilitates that sort of investigation. We never have to accept that a given software system just behaves strangely; we have, instead, gone to great lengths to ensure that it is possible to dig in and figure out why that behavior is happening — and to fix it. That is, indeed, a major strength; that, along with luck and heroic work, is what saved us.

We must hope that we can improve our game enough that our strengths will save us the next time as well. There is a good chance that our community, made up of people who just want to get something done and most of whom are not security experts, has just been attacked by a powerful adversary with extensive capabilities and resources. We are not equipped for that kind of fight. But, then, neither is the proprietary world. Our community has muddled through a lot of challenges to get this far; we may yet muddle through this one as well.

Index entries for this article
Security	Backdoors
Security	Vulnerabilities/Social engineering

to post comments

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:18 UTC (Tue) by jd (guest, #26381) [Link] (11 responses)

The fact that closed source has backdoors surviving in larger numbers for longer means that most of the tools software engineers assume will stop bad code don't actually work, for whatever reason.

The fact that open source has had its share of disastrous vulnerabilities and hostile maintainers (the case of the missing python module...) shows many eyes isn't enough, even when there actually are many eyes.

I don't know what the answer is, but I do know that any solutions will depend on trust somewhere. Even in business. If we can't fix trust, the problem won't get fixed.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:41 UTC (Tue) by myUser (subscriber, #58242) [Link] (2 responses)

From my current understanding the malicious xz build system could fiddle with binary blobs shipped in the release tarball and then link these blobs into the libzma library. We (people writing FLOSS software) should think about ways to have a trail of what binary blobs where used for the build products. You might want to include the PNG for your desktop app so a general no about including binary blobs will not work.

Free software's not-so-eXZellent adventure

Posted Apr 10, 2024 14:57 UTC (Wed) by heftig (subscriber, #73632) [Link] (1 responses)

I guess for icons and pixel art you could check Netpbm images into Git, instead? Since they're ASCII text you can't smuggle in hidden data as in PNG and JPEG images.

Free software's not-so-eXZellent adventure

Posted Apr 10, 2024 17:21 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Sure, there aren't data blocks to hold image-viewer-ignored information, but stenography still exists (though that exists for any "list of numbers" format rather than say, SVG or another more algorithmic description of the image).

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:45 UTC (Tue) by Wol (subscriber, #4433) [Link]

> shows many eyes isn't enough, even when there actually are many eyes.

"With sufficient eyes, all problems are shallow" is, I believe, the correct quote (near enough).

It only takes one pair of eyes to spot the problem, and then the many eyes will turn and yes, the problem is shallow. As was amply demonstrated both here, and on many other occasions.

It's widely said about security that the bad guys only have to strike lucky once, while the good guys have to strike lucky all the time. This proves the same holds true in reverse - the good guys only have to strike lucky once, while the bad guys have to strike lucky every time.

Cheers,
Wol

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 6:08 UTC (Wed) by AdamW (subscriber, #48457) [Link] (6 responses)

I was thinking today that this problem is oddly similar to one in another realm (...possibly the realm on the other end of this case): intelligence agencies.

Intelligence agencies have recognized, for a long time, that they will *always* be subject to infiltration attempts. They also realize that it is almost impossible to catch all of these. This means that, in order for an intelligence agency to ever get any work done, it has to recognize that at *some* point, it will probably trust someone who should not be trusted. It's not an ideal situation, but it may not be a completely fixable one. Similarly, in F/OSS, it may not be possible to entirely "fix trust".

It might be intriguing to look at some of the coping mechanisms from that world and see if they can be translated to F/OSS...

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 10:38 UTC (Wed) by smurf (subscriber, #17840) [Link] (4 responses)

> It might be intriguing to look at some of the coping mechanisms from that world and see if they can be translated to F/OSS...

Certainly they can. One such mitigation is "layers". Ideally, a compromised agent can only betray their direct contacts, not the contacts' contacts in turn.

The OS equivalent is compartmentalization. The build system cannot write to the source (this is mostly true). The test system cannot write to the build (… should not … this is one of the loopholes the XZ exploit used). The packager can only package; right now it can also add arbitrary shell scripts that get executed when the package is installed, but at least some script adhoc-ery can be fixed declaratively. systemd's tempfiles language supports a large subset of operations that these scripts are commonly used for: see "man 5 tmpfiles.d" for details.

> "fix trust"

Conversely, the source should be used for the build. *Only* the *signed* source (as in, "git tag -s"). Not some assumed-to-be-the-source tarball somebody cooked up in their backyard (another loophole which the XZ exploit used) and/or which is accompanied by security theater, in the form of an unsigned SHA* checksum alongside the tarball, on the same web server …

Debian's Ian Jackson has been working on converting its 30-year-old build architecture to something modern and reasonably sane; see https://peertube.debian.social/w/pav68XBWdurWzfTYvDgWRM

I can only assume that both Debian and other distributions will now accelerate their efforts to do this.

Yes, this seems like a game of whack-a-mole, but every mole whacked is one that won't re-surface.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:40 UTC (Wed) by draco (subscriber, #1792) [Link] (3 responses)

In the XZ backdoor case, the build read from the test artifacts, the tests didn't write to the build

Of course, both should be disallowed

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 10:36 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

Well, if the build actively reads test output then it's subverted anyway; reading the test output is then "only" a matter of obfuscation.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 18:28 UTC (Thu) by draco (subscriber, #1792) [Link] (1 responses)

No, it read from test *input*. There was no need to run the tests.

Here's a picture: https://cdn.arstechnica.net/wp-content/uploads/2024/04/xz...

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 9:11 UTC (Fri) by smurf (subscriber, #17840) [Link]

It reads from data that was hidden in / disguised as part of the test suite; the file in question wasn't even used as actual test input AFAICR. That falls squarely into the "subverted anyway" category.

So yes you're right in that in this case the test output didn't actually influence the build. Thus to be safe against "hide an exploit's core in plain sight" attacks we'd have to go a step further and mandate that the builder cannot access its test data, binary or otherwise.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 12:08 UTC (Wed) by pizza (subscriber, #46) [Link]

> It might be intriguing to look at some of the coping mechanisms from that world and see if they can be translated to F/OSS...

Those "coping mechanisms" invariably involve a considerable amount of intentional inefficiency (eg "ridiculous levels of compartmentalization" and "process"), along with a commensurate budget and manpower to implement and sustain all of that.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:32 UTC (Tue) by cen (subscriber, #170575) [Link] (26 responses)

OSSF Scorecard could help in theory because it scores branch protection and number of reviewers among other things. You would still need to enforce the score and block low scoring packages getting into distros but for an old established package like xz that seems unlikely. Applying such policy to all packages is impossible because it would most likely block thousands of packages that are wasting away in a barely maintained state, however, it could be mandated for core/important system tools. A determined entity could still bypass this with impersonating multiple maintainers so it's just a slight improvement.

The most likely scenario I see is that xz will gradually be dropped as dependency and replaced with alternatives.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:48 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

> Applying such policy to all packages is impossible because it would most likely block thousands of packages that are wasting away in a barely maintained state,

But if it triggers a serious effort to rid distros of the worst scoring projects, it could quite possibly achieve wonders pretty quickly.

In two ways. Firstly, if there are several packages occupying the same niche, resources would tend to concentrate on just one. And secondly, it'll push packages to get a maintainer - even if it does just become a job for someone at RH, or Canonical, or SUSE, or LF ...

Cheers,
Wol

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 21:47 UTC (Tue) by magfr (subscriber, #16052) [Link]

...or NSA, or GRU...

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 6:47 UTC (Wed) by LtWorf (subscriber, #124958) [Link]

I briefly opened the project and it seems to be something that uniquely applies to github?

This would leave out stuff like: the kernel, gcc, KDE, libreoffice, all of distribution tools like dpkg and apt…

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 20:17 UTC (Sun) by immibis (subscriber, #105511) [Link]

and thirdly, distroa will drop useful packages and people will drop those distros.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 20:50 UTC (Tue) by pebolle (guest, #35204) [Link] (21 responses)

> The most likely scenario I see is that xz will gradually be dropped as dependency and replaced with alternatives.

Perhaps that will happen.

The scenario that I'd prefer is for the XZ software to be dropped everywhere. It's cost/benefit ratio is by now dismal. And also for its original maintainer to be told to switch to activities not involving software. Because his name will be tainted in that field forever.

Switching fields

Posted Apr 2, 2024 21:04 UTC (Tue) by corbet (editor, #1) [Link] (18 responses)

I think that is somewhat harsh with regard to Lasse Collin. Perhaps you are sharp enough that you would never fall prey to a focused social-engineering campaign like that; I'm not at all sure that I am.

OTOH I would surely understand if he concluded that he wanted to switch to goat farming or something equally far removed after having had this experience.

Switching fields

Posted Apr 2, 2024 21:22 UTC (Tue) by pebolle (guest, #35204) [Link] (17 responses)

> I think that is somewhat harsh with regard to Lasse Collin. Perhaps you are sharp enough that you would never fall prey to a focused social-engineering campaign like that; I'm not at all sure that I am.

Being not particularly sharp I could certainly fall prey to such a campaign.

The point is that Lasse Collin was a maintainer of a project that was subjected to an exceptionally serious exploit. Isn't that enough to tell him to shift his spare time activities?

Switching fields

Posted Apr 2, 2024 21:33 UTC (Tue) by Wol (subscriber, #4433) [Link]

> The point is that Lasse Collin was a maintainer of a project that was subjected to an exceptionally serious exploit. Isn't that enough to tell him to shift his spare time activities?

In that case, shouldn't we ALL drop working on Linux and move to other activities - ESPECIALLY YOU, seeing as you admit you would easily fall prey to a similar attack?

Have you heard the saying "once bitten, twice shy"? I'd actually think Lasse SHOULD work on more similar projects, if he wants too - he's unlikely to get caught the same way twice. Although if he did want to give up, I wouldn't be surprised.

At the end of the day, punishing people for making mistakes is extremely counter-productive. Let's not go there ...

Cheers,
Wol

Switching fields

Posted Apr 2, 2024 21:33 UTC (Tue) by Sesse (subscriber, #53779) [Link] (7 responses)

> The point is that Lasse Collin was a maintainer of a project that was subjected to an exceptionally serious exploit. Isn't that enough to tell him to shift his spare time activities?

Why? “I don't want unlucky maintainers”? Or would you tell basically everyone in the FOSS world who would plausibly have fallen for the same ruse to do the same? (There wouldn't be a lot of maintainers around then.)

Switching fields

Posted Apr 2, 2024 21:53 UTC (Tue) by pebolle (guest, #35204) [Link] (6 responses)

> Why? “I don't want unlucky maintainers”?

I don't want maintainers publicly declaring their incapability to maintain their project (because of "mental health issues") .

> Or would you tell basically everyone in the FOSS world who would plausibly have fallen for the same ruse to do the same? (There wouldn't be a lot of maintainers around then.)

If they had fallen for the same ruse the honourable thing would be to say: "I messed up big time. I'm sorry. I won't be doing FOSS any more."

Switching fields

Posted Apr 2, 2024 21:57 UTC (Tue) by mb (subscriber, #50428) [Link]

You should seriously step back from the keyboard and think about what you want to say before posting.

This is really uncalled for.
The xz maintainer does not deserve this.

Switching fields

Posted Apr 2, 2024 22:00 UTC (Tue) by Sesse (subscriber, #53779) [Link]

> I don't want maintainers publicly declaring their incapability to maintain their project (because of "mental health issues") .

Why?

> If they had fallen for the same ruse the honourable thing would be to say: "I messed up big time. I'm sorry. I won't be doing FOSS any more."

So will the world be better off for that? Do you think the average security of FOSS packages will increase if the xz maintainer goes away? Who should maintain the package in their stead? And how do these standards differ from e.g. zlib's situation? (zlib was recently de facto supplanted by zlib-ng, led by a virtually unknown maintainer. I happen to know them and know that they are trustworthy, but how would you distinguish them from Jia Tan without that knowledge?)

Switching fields

Posted Apr 2, 2024 22:04 UTC (Tue) by corbet (editor, #1) [Link] (1 responses)

I have a sense that this subthread could go bad quickly.

How about we stop here; you've said your piece, and it is quite clear that others disagree with you. I suspect that discussing this idea further will not bring much joy to anybody.

Switching fields

Posted Apr 6, 2024 0:51 UTC (Sat) by nicku (subscriber, #777) [Link]

Thank you Jon.

Switching fields

Posted Apr 2, 2024 23:22 UTC (Tue) by viro (subscriber, #7872) [Link]

Google for e.g. F32.9 ICD 10. Yes, it does qualify as "mental health issues". Treatable, at that. You are essentially saying that anyone who had an episode of that joy should (a) never mention it and (b) if somebody manages to take advantage of them while in that state - go away and never come back.

Note: I've no idea what condition Lasse had, but your claim is generic enough to have the above fit it. And when read that way, you do come across as a really vile piece of work. Self-righteous references to honourable behaviour and lovely uses of passive voice ("be told ...") only strengthen that impression.

Switching fields

Posted Apr 3, 2024 10:40 UTC (Wed) by farnz (subscriber, #17727) [Link]

Would you prefer that the maintainer hid their issues, and was blackmailed by a malicious entity into putting their code in place for them, hiding the very existence of Jia Tan?

That's the alternative direction that you're opening up if maintainers can't talk about their problems without being forced to stand down - and I contend that it's worse to have a maintainer open to blackmail (so we cannot, after the fact, determine how far back we need to go in the commit history to predate the malicious comaintainer) than to have a malicious actor obtain comaintainership under their own identity (where we can at least assume that while Jia Tan's changes must be treated as probably malicious, the previous maintainer's commits can be treated as well-intended).

Switching fields

Posted Apr 2, 2024 21:39 UTC (Tue) by mb (subscriber, #50428) [Link]

>Isn't that enough to tell him to shift his spare time activities?

No? What is wrong with you? Seriously?

Switching fields

Posted Apr 2, 2024 21:49 UTC (Tue) by chris_se (subscriber, #99706) [Link] (5 responses)

> Isn't that enough to tell him to shift his spare time activities?

So anybody who ever trusted the wrong person should be seen as a pariah? And in your eyes that should last forever? I'm just speechless at this horrible view of the world you are espousing here.

Switching fields

Posted Apr 2, 2024 22:06 UTC (Tue) by pebolle (guest, #35204) [Link] (4 responses)

> So anybody who ever trusted the wrong person should be seen as a pariah? And in your eyes that should last forever? I'm just speechless at this horrible view of the world you are espousing here.

My horrible view of the world is that if you screw up big time, and yes, this is screwing up big time, you take responsibility. In this case by retracting from this field.

Switching fields

Posted Apr 3, 2024 0:10 UTC (Wed) by JoeBuck (guest, #2330) [Link]

That's a foolish way to manage people. Someone who's been burned by such a sophisticated operation is more likely to be cautious in the future than some arrogant person who is convinced that he cannot be conned, and the project needs a capable, experienced maintainer.

But ideally he'd get help from someone who is paid by a company that depends on this library. Probably 20% time would suffice to take the load off, and the employer can vouch for the person.

Switching fields

Posted Apr 3, 2024 5:57 UTC (Wed) by ebee_matteo (guest, #165284) [Link]

> My horrible view of the world is that if you screw up big time, and yes, this is screwing up big time, you take responsibility. In this case by retracting from this field.

Fun fact! 80% of businesses fail within 20 years.

https://clarifycapital.com/blog/what-percentage-of-busine...

If also 80% of business owner who have "screwed up" to the point of closing down then "retracted from the field", I don't think humanity would get much done.

My point goes with the old adage: errare humanum est. Learning from one mistakes is what matters.

As a FOSS contributor and maintainer with a treatable mental illness, and having known a big amount of likewise people over the years, I take deep exception with your stance.

Maybe showing empathy and understanding that it is hard for one person to contrast state-sponsored attempts spanning years (look at SolarWinds too, Cisco WebEx, Zoom...) would mean less time spent on forums, and more time doing code reviews.

The real problem is that everybody uses FOSS software even for commercial products, but very few companies invest time to contribute back, especially when it comes to security reviews.

Therefore, I would like to applaud Collin for his courage to state his condition publicly and being so transparent about it.

If you are reading this, a big hug goes out to you! You are not alone!

Switching fields

Posted Apr 4, 2024 2:40 UTC (Thu) by draco (subscriber, #1792) [Link]

This reminds me of a story. I remember hearing it a bit differently, but the details don't matter, just the lesson of the story: https://www.discerningreaders.com/watson-sr-we-forgive-th...

Switching fields

Posted Apr 5, 2024 7:57 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

I think you forgot to read the no warranty clause on every free software license.

Switching fields

Posted Apr 4, 2024 16:54 UTC (Thu) by jschrod (subscriber, #1646) [Link]

This - and your following posts in this thread - shows only one thing: that YOU cannot be trusted with work in open source software. After all, that means working with other people - and with that attitude you are obviously unable to do that.

You should "shift your spare time activities" to something else.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:15 UTC (Tue) by cesarb (subscriber, #6266) [Link] (1 responses)

> > The most likely scenario I see is that xz will gradually be dropped as dependency and replaced with alternatives.

> Perhaps that will happen.

It's already happening, and it's been happening for a while, independent of the present events.

The reason is the existence of ZSTD, which at its higher settings has nearly the same compression ratio as XZ, but with much better decompression speed. Many uses of LZMA/XZ have been gradually migrating to ZSTD; newer uses will probably skip GZIP and LZMA and go straight to ZSTD.

Of course, compatibility means that a LZMA/XZ decoder will still be necessary for a long time, for things which had used a LZMA/XZ encoder in the past (and this includes systemd, for its journal format, since users might still want to read log files which were encoded using LZMA, even when newer logs are being written using ZSTD).

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:44 UTC (Tue) by cozzyd (guest, #110972) [Link]

Compatibility means xz can't be dropped, unless replaced by a fully compatible alternative (but that requires trusting someone entirely new anyway, in which case why don't they work on xz instead?). This episode makes me wonder what fraction of RIIR projects might be done by malevolent actors.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:35 UTC (Tue) by willy (subscriber, #9762) [Link] (1 responses)

I don't think it's just a matter of guarding against Evil Comaintainers. There's a big assumption in our processes that the Founder of a project is trustworthy. It's worked fairly well so far because, well, most people are fundamentally good. But good people can be turned (gambling debts, moral hazards, appeals to patriotism).

Agreed that Free Software is in no worse a situation than Binary Software, so there's no reason to grumble about the development model. Arguably it's the responsibility of the distributions to get changes coming from upstream, but realistically there isn't the developer time to do that (can you imagine trying to verify all the changes between gcc-11 and gcc-12?)

I don't have a solution here. Just agreeing that there are a lot of attack vectors.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 5:56 UTC (Thu) by rgmerk (guest, #167921) [Link]

There's a big assumption in our processes that the Founder of a project is trustworthy. It's worked fairly well so far because, well, most people are fundamentally good. But good people can be turned (gambling debts, moral hazards, appeals to patriotism).

Indeed, founders are people subject to the full range of human frailties. To take an extreme but pertinent example, Hans Reiser's crime may not directly relate to his software contributions, but it was heinous.

I don't have a solution either, other than the general observation that single points of failure are bad and to be avoided if at all possible.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 17:46 UTC (Tue) by flussence (guest, #85566) [Link] (21 responses)

The "angry mob of abusive sockpuppets descending on a maintainer to depose them" thing was also *very* recently attempted on ifupdown-ng, another fairly important package in Debian. When upstream rightly called BS on them, they started targeting the Debian package maintainer to pressure them to switch to a hostile fork.

The whole thing has me thinking about libav now.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 18:04 UTC (Tue) by mbiebl (subscriber, #41876) [Link] (20 responses)

Just in case you are referring to https://github.com/ifupdown-ng/ifupdown-ng/issues/234, this somehow reminds me of https://github.com/avahi/avahi/issues/388 (interestingly, the same account is involved here).

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 18:32 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (13 responses)

This is a cultural problem. It is going to keep happening for as long as we continue to normalize the expectation that maintainers will provide free unlimited work to everyone who asks.

Part of this is that maintainers need to get better at saying "no," and the other part is that users, distros, etc. all need to get better at hearing "no." Then this kind of harassment would be seen as unacceptable. In the current culture, the emails asking Collin to bring on another maintainer look benign (they're not flaming, rude, or even all that demanding). We ought to get to a point where asking a maintainer to bring on another maintainer is Just Not Done unless you're personally volunteering to be that other maintainer.

Volunteering to be the maintainer

Posted Apr 2, 2024 19:31 UTC (Tue) by corbet (editor, #1) [Link] (10 responses)

Of course, that's exactly what "Jia Tan" did; the only difference was using the sock puppets to play the "bad cop" part of the game.

Volunteering to be the maintainer

Posted Apr 2, 2024 20:24 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (9 responses)

Well, yes, but my point is that:

* Ideally, people complaining about a lack of activity would be disregarded as background noise, so Collin would have felt less pressure to add a maintainer.
* Ideally, not having enough maintainers would just not be seen as a big problem in the first place. You should make somebody a maintainer after they've done excellent work for long enough that your sign-off is just a rubber stamp. It should not be a game of "I need a maintainer, that person is available and willing, I'll use them." The most effective way to eliminate that temptation is to cut it off at the "I need a maintainer" step.

Volunteering to be the maintainer

Posted Apr 3, 2024 2:25 UTC (Wed) by Paf (subscriber, #91811) [Link]

I find it really hard to imagine - personally - refusing qualified help. Who wouldn't want another pair of capable hands to help make their project better? Unless the scope is really so small I feel no need. For any project of size there's probably more work you'd like to do than can do. So it's hard to see shutting that off, to me.

Volunteering to be the maintainer

Posted Apr 3, 2024 11:51 UTC (Wed) by smurf (subscriber, #17840) [Link]

> Ideally, people complaining about a lack of activity would be disregarded as background noise

People (*especially* reasonable people) are notoriously bad at ignoring background noise, for good evolutionary reasons – which unfortunately are counter-productive in a world where social accountability doesn't apply to sock puppets.

Volunteering to be the maintainer

Posted Apr 4, 2024 23:13 UTC (Thu) by rra (subscriber, #99804) [Link] (6 responses)

> Ideally, people complaining about a lack of activity would be disregarded as background noise, so Collin would have felt less pressure to add a maintainer.

This is a somewhat GitHub-specific point, but since for a long time I didn't realize this option existed, maybe other people don't as well.

If you go into your personal settings on GitHub, and then expand Moderation on the left sidebar, there is a tab called Blocked Users. If you go there, you can block a GitHub account from all of your repositories. This prevents them from opening issues, PRs, interacting on issues, etc. You can also ask GitHub to warn you if any repository you interact with has prior commits from that user.

I haven't checked whether GitLab and other forges have a similar feature. Hopefully so.

People in free software, perhaps due to the Geek Social Fallacies, often seem too reluctant to simply ban people. If I saw even half of the type of malicious attempted bullying in the two bugs referenced earlier in this thread on a project I maintain, I hope that I would have been willing to just ban the user and move on. In fact, I have already proactively banned the two worst offenders in those bugs from all of my repositories, and I recommend others consider doing the same when they see someone behave egregiously, even on repositories they aren't involved in.

It's mentally hard to be subjected to constant ongoing harassment and tell yourself that you should just ignore it. Let the software do it for you.

Volunteering to be the maintainer

Posted Apr 7, 2024 20:26 UTC (Sun) by immibis (subscriber, #105511) [Link] (5 responses)

When you allow bans, people get banned incorrectly and this is just as frustrating on the other end. Allowing arbitrary bans is probably a different social fallacy. It fails closed instead of failing open, and it's not clear that's much better in some cases.

Recently, I personally noticed that an executive at one of the big OSS foundations happened to also be a racial supremacist (noncontroversially - he basically said "kill all the $people" many times). I did what seemed sensible, and went to a group that seemed equipped to have software politics and said "hey, I noticed this thing, what's the right response?" Because the wrong people were online at the time and in a bad mood, I am now permanently banned from talking to anyone there or reading any messages.

Volunteering to be the maintainer

Posted Apr 7, 2024 20:47 UTC (Sun) by atnot (guest, #124910) [Link] (1 responses)

Based on the how deliberately obtuse you're being about who "$people" are here, I'm just going to assume that they probably made a good call there.

And also failing closed is actually a very good thing. Not only for the safety of the community, but because the contributions of any one bad or abusive person are without exception severely outweighed by the amount of people they put off from joining the project. If anything in my experience we're not banning enough.

Volunteering to be the maintainer

Posted Apr 8, 2024 7:00 UTC (Mon) by smurf (subscriber, #17840) [Link]

Not mentioning the specific class of people $asshat wants killed isn't being obtuse, much less deliberately.

It's about highlighting the fact that it doesn't matter which people are his (I assume) problem, or whether the subcategory in question is racial or sexual or environmental or whatnot.

Last but not least, it's about not risking triggering a side discussion about $people here.

Volunteering to be the maintainer

Posted Apr 8, 2024 16:07 UTC (Mon) by rra (subscriber, #99804) [Link] (2 responses)

> When you allow bans, people get banned incorrectly and this is just as frustrating on the other end.

This is exactly the type of social fallacy (and, frankly, entitlement) that I'm decrying. It does not matter whether you're frustrated about being banned. The scarce resource here is maintainer time and energy, not the frustration level of random people on the Internet. If I don't find someone's communication productive and I haven't agreed to some sort of relationship with them (being formal members of the same project, for example), I am under precisely no obligation to communicate with them as a free software maintainer.

In other words, I do not care in the slightest whether the people who are banned find that frustrating, and, in my opinion, neither should anyone else who is in the maintainer's position. Caring what annoying people think about you or your project is a losing game. Block them and move on. There are more helpful, productive, and positive interactions about free software than one can do justice to in a lifetime. Dropping communication with people the first time they seriously annoy you will free up so much time to do more useful things, will do wonders for your mood, and nothing irreplaceable will be lost.

You do not owe random people on the Internet a debate. Even if they're right!

Volunteering to be the maintainer

Posted Apr 9, 2024 10:17 UTC (Tue) by immibis (subscriber, #105511) [Link] (1 responses)

Certainly I'm not entitled to talk to people who don't want to listen. However:

- such a ban has a collateral effect as people who do want to listen aren't allowed to listen and often aren't even aware they're not allowed to listen.
- as a society we should probably value rightness, not be indifferent to it.

Volunteering to be the maintainer

Posted Apr 9, 2024 11:15 UTC (Tue) by farnz (subscriber, #17727) [Link]

You seem to be mixing up two sorts of bans:

A ban that happens simply because you tried to observe but not speak.
A ban that happens after you spoke in a disruptive fashion.

Most, if not all, open source projects don't ban you from observing public forums; there may be invite-only forums that you're not part of, but as long as you're not speaking, you can listen, although you may need to use a different identity to your normal identity if you've previously been banned for being disruptive (or rely on external logs - e.g. you can use mailing list archives to read a list you're banned from, or a project's IRC log, or Incognito Mode in your browser to read a web forum).

And it is surely right that society excludes disruptive people from areas where they cause issues; society is, in this situation, valuing rightness over ability to disrupt.

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 15:54 UTC (Fri) by bferrell (subscriber, #624) [Link] (1 responses)

One other "normalized" "thought" that might bear looking at... "faster! faster!!! Break it, we'll fix it later!!!"
The pressure came from "languishing patches"

Maybe slower is a good thing unless we have a specific incident to address?

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 11:08 UTC (Sun) by jwakely (subscriber, #60262) [Link]

Yes! Does the software work? Good, let it work then. Does it need avx512 optimisations, or a Zig port? Probably not. Let somebody else provide it as a separate library or in a fork.

If a vulnerability is found and affects distros relying on the code, the distros should be developing and contributing a fix (maybe while the vuln is still embargoed) and not expecting the original author to do that work for free.

Pressuring unpaid maintainers to make working libraries become less stable isn't progress.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 9:27 UTC (Wed) by sune (subscriber, #40196) [Link] (5 responses)

the same account seems to also be involved here
https://github.com/microsoft/vcpkg/issues/37197 (please update xz in vcpkg to 5.6.0) ...

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 10:01 UTC (Wed) by mbiebl (subscriber, #41876) [Link] (3 responses)

Thanks for the additional data point, sune.
The actions of this Neustradamus account seem very suspicious at this point and I would be very cautious.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 11:20 UTC (Wed) by Phantom_Hoover (subscriber, #167627) [Link] (2 responses)

If they're actually involved with the attack (which is somewhat plausible given their wider pattern of behaviour) that's a pretty big deal because *they're still active online*, whereas every other identity associated with it has evaporated as you'd expect.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 21:19 UTC (Wed) by flussence (guest, #85566) [Link] (1 responses)

Directly involved or not, they seem to be doing a fair bit of frantic "how dare you attack me" deflection in response to the sudden recognition of their tireless efforts to stick a finger in every pie, and not much else... the freshly spawned sockpuppet account leaving a glowing review of them in the linked avahi issue isn't exactly helping their case either.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 23:16 UTC (Wed) by viro (subscriber, #7872) [Link]

The really charming part is that this useful idiot (if we are to believe their claims) admits pushing people to pull the stuff said idiot has nothing to do with and almost certainly had never reviewed. And they clearly see no problem with that MO...

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 18:26 UTC (Thu) by ceplm (subscriber, #41334) [Link]

Be nice, it could be just an enthusiast who wants to have the latest and greatest. I have filed plenty of such bugs in my life as well.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 18:08 UTC (Tue) by bredelings (subscriber, #53082) [Link] (52 responses)

I'm surprised I'm not hearing more concern about glibc IFUNC.

Part of the problem here seemed to be that the exploit code is in some sense very distant from the exploit target. If any library that is transitively linked into an executable can override any function in the executable, then the intuition of most programmers is going to be wrong.

So, in addition to making it clear which binary blobs make it into an executable -- AFTER it is dynamically linked -- maybe we should also be concerned about what you might call "long-distance effects" that are unintuitive or hard to reason about?

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 19:05 UTC (Tue) by tux3 (subscriber, #101245) [Link] (36 responses)

At a minimum we should burn this particular attack. An ifunc resolver is supposed to consult what the CPU supports and pick a particular function to return, it should not be possible for it to start scribbling all over sensitive parts of memory with nobody being the wiser.

If we spend time closing this hole, attackers will simply find another way. There's no shortage of ways for a library to cause headaches after malicious code sneaks its way in. But it's particularly bad when it gets to run before main(), in a library that would otherwise not be called at all! We SHOULD force attackers to spend time finding something different the next time around.

It has to be in a systematic way that we make these attacks more expensive.
We can't make humans infallible; we can't stop trusting contributors entirely.
I've seen a lot of critique of people trying to find technical solutions to the backdoor situation. But look how exploiting memory safety bugs has become harder and more expensive, year after year. That wasn't because humans stopped writing bugs! It was a series of technical solution over the years. None of which solved the problem completely, mind you!
But all of them helped.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 19:26 UTC (Tue) by epa (subscriber, #39769) [Link]

You could abolish it altogether since “consult what the CPU supports and pick a particular function” can equally be done in real code compiled explicitly from C source. Even if that code uses some inline assembly.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 21:07 UTC (Tue) by ms (subscriber, #41272) [Link] (16 responses)

Yeah this does feel like a total misfeature, presumably added in the name of performance. The fact that code in lib x can overwrite code in z if z links with y and y links with x, seems pretty insane to me.

In other languages, sure, you might want specialised implementations, but you do that with a function pointer, and you certainly don't get to violate lexical scope. Unless there's some really compelling use case for ifuncs to have omnipotent powers, it seems a badly conceived feature.

That said, I totally buy the argument that a well resourced team would not have been deterred by the absence of ifuncs.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 21:38 UTC (Tue) by dezgeg (guest, #92243) [Link] (14 responses)

> lib x can overwrite code in z if z links with y and y links with x

As far as I understood, ifuncs itself are not the method to achieve this. It's just a way to gain execution whenever a library is loaded, just like .init (which is used for things like calling global constructons). The actual method to hook RSA_public_decrypt() is via audit hook (see man rtld-audit).

Now, it's still possible that using the ifunc mechanism made things somewhat simpler for the backdoor developer (perhaps ifunc resolvers are called in an earlier stage than .init functions, or something), but fact is .init still would have been a way to get liblzma code to execute in the sshd process (and .init can't be removed without breaking metric tons of stuff).

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:15 UTC (Tue) by andresfreund (subscriber, #69562) [Link] (8 responses)

> Now, it's still possible that using the ifunc mechanism made things somewhat simpler for the backdoor developer (perhaps ifunc resolvers are called in an earlier stage than .init functions, or something), but fact is .init still would have been a way to get liblzma code to execute in the sshd process (and .init can't be removed without breaking metric tons of stuff).

It'd have been harder / noisier, I suspect. From what I can tell .init etc do get processed after symbol resolution and thus after the GOT has been remapped read-only. Of course you could still do nasty stuff, but it'll either be harder to have sufficient information about the client (and thus whether to execute a payload), or noisier (remapping the GOT RW).

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 1:48 UTC (Wed) by dezgeg (guest, #92243) [Link]

Yea, then possibly it would make sense to pick some other target than the dynamic linker for such hooking. From a quick look, RSA_public_decrypt() behind the scenes is a function pointer call, so swapping out a pointer to the ops structure in struct RSA would suffice. CRYPTO_set_mem_functions() could be used to set a custom malloc implementation to help locating potential struct RSA candidates.

Another idea that comes to mind is fopencookie() could be used to hook stdout/stderr and use that to deliver the payload (assuming sshd includes enough info there about failed connection attempts)... there's just so many different opportunities. Also detection chances could probably be reduced by simply disabling the backdoor for say, 90% of the connections (pthread_atfork() perhaps) while still remaining useful.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 13:59 UTC (Wed) by jejb (subscriber, #6654) [Link] (6 responses)

I think everyone's also missing the point that this ifunc or .init method also allows highly indirect dependencies to gain control of early stage execution even if the execution would never have got into that library. This feature is what makes indirect dependencies so dangerous. For ifunc, the solution looks like it should be that if you have lazy symbol resolution (the default), we shouldn't call the ifunc to perform the choice test until we actually need to resolve the symbol (so if we never need the symbol it's never called). I think it would also be highly desirable not to call library .init functions until the first symbol in the library is hit (this one is more problematic since they generally expect to be run before the actual main() routine is called).

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 14:35 UTC (Wed) by farnz (subscriber, #17727) [Link] (5 responses)

The problem with "shouldn't call the ifunc to perform the choice test until we actually need to resolve the symbol" is that an attacker can just redirect a symbol that matters to you; for example, if I supply an IFUNC for memcpy, then my IFUNC is called as soon as your program calls memcpy.

This is why I believe that there's two ELF design decisions that interact to surprise you - the other one is that the symbol import doesn't tell the linker which library it expects the symbol to come from, so any symbol can be indirected by any dependency.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 14:51 UTC (Wed) by jejb (subscriber, #6654) [Link] (4 responses)

> The problem with "shouldn't call the ifunc to perform the choice test until we actually need to resolve the symbol" is that an attacker can just redirect a symbol that matters to you; for example, if I supply an IFUNC for memcpy, then my IFUNC is called as soon as your program calls memcpy.

The way ifuncs currently work for dynamic libraries is that you can only do one for your own symbols, so liblzma can't use an ifunc to intercept memcpy, only glibc can do that.

So to look at this concrete example: liblzma couldn't use an ifunc to redirect RSA_public_decrypt (the object of its attack), it had to use an ifunc on one of its own functions to install an audit hook to redirect the symbol resolution for RSA_public_decrypt that allowed it to hijack the symbol in another library.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 15:21 UTC (Wed) by ms (subscriber, #41272) [Link]

> The way ifuncs currently work for dynamic libraries is that you can only do one for your own symbols, so liblzma can't use an ifunc to intercept memcpy, only glibc can do that.

That seems a really good example: I have no doubt it would be technically possible for the dynamic linker / elf loader to allow ifuncs for symbols that are defined elsewhere - sure you might have to do it lazily because maybe that symbol hasn't been defined yet, but I'm sure it could be done. But instead, it has been _chosen_ that you can only put ifuncs it on your own functions. So it doesn't seem that much of a leap to also choose "you can only install audit hooks on your own symbols", or, more generally, "pre main, you can't do anything to symbols you've not defined yourself".

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 15:22 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

liblzma can, however, supply a memcpy symbol, and with a bit of careful squirrelling, be set up so that its IFUNC causes it to redirect to the normal memcpy. This sort of symbol redirection game is designed into ELF (although it's not necessary for a dynamic linker to do this - Apple's Mach-O format has symbol tell you which library to import from).

So liblzma can use an IFUNC to redirect RSA_public_decrypt; it's just that in doing so, it makes RSA_public_decrypt visible in the exports list of liblzma, which would make the attack less stealthy.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:15 UTC (Wed) by jejb (subscriber, #6654) [Link] (1 responses)

> liblzma can, however, supply a memcpy symbol, and with a bit of careful squirrelling, be set up so that its IFUNC causes it to redirect to the normal memcpy.

No, it can't. If you get two identical global scope non weak symbols the dynamic linker will throw a duplicate symbol error. If you try to do a local scope instead (which actually indirect dependencies get by default), it won't get resolved until used in that scope (so the resolution in sshd would always pick up the global scope symbol in glibc). You could possibly get around this using versioning, but, assuming ifunc on resolution is implemented, your ifunc wouldn't get called until something requests that specific version of memcpy ... which wouldn't happen for the binary in question.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:17 UTC (Wed) by farnz (subscriber, #17727) [Link]

It can - it supplies a weak symbol, and at least in my experiments, that's enough to trigger IFUNC resolution from my attacking binary. Maybe this is a bug, but it's still a problem.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:28 UTC (Tue) by fenncruz (subscriber, #81417) [Link]

So how would we block audit hook from doing this? Can a code opt out of the audit hook at compile or runtime?

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 12:03 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

> perhaps ifunc resolvers are called in an earlier stage than .init functions, or something

They might well be, but that's irrelevant since once you have an adversarial foothold in some library or other, running as root, you can do pretty much anything anyway, ifunc or .init or hijacked symbol.

More to the point: using an ifunc in order to choose the best implementation is fairly standard for crypto or compression code, thus doesn't jump at you when you look at the library's metadata. On the other hand, .init is used to build static objects and similar stuff which a reasonably-coded multi-thread-capable library should have no business requiring. The fact that it's easier to get gdb to trace code in .init sections than in ifuncs might be relevant too.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 12:44 UTC (Wed) by ms (subscriber, #41272) [Link] (2 responses)

> They might well be, but that's irrelevant since once you have an adversarial foothold in some library or other, running as root, you can do pretty much anything anyway, ifunc or .init or hijacked symbol.

(ignoring the "running as root" clause)

I think this is maybe getting to the nub of it: if you link with a library, are you implicitly trusting it, to the extent that you're willing for it to write all over "your" memory, including code?

It's possible that for some of us who are used to thinking in terms of microservices, containers, etc etc, it doesn't seem hard to imagine a world where the answer is "no, I'm not trusting it that far - it can have its own memory, just like it has its own namespace, and its own scopes, but it doesn't get to interfere with mine". To me, it seems pretty crazy that all these languages typically enforce lexical scoping rules, but apparently glibc, with a combination of ifuncs and audit hooks, allows complete violation of the compartmentalisation that lexical scoping creates.

For some of us who are (I'm guessing) more experienced C/C++/kernel devs, there's both tradition, and good reason as to why believing/hoping/pretending the trust isn't absolute, is misguided.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 15:32 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

You don't even need to be thinking in terms of microservices etc; Apple's dynamic linker requires imported symbols to be annotated with the library that's expected to supply the symbol, and thus most of the time, when you link against a library, you're only bringing it in for its symbols. I don't know if it has an equivalent of IFUNC, nor when static constructors are run (when a library is loaded, or when you first access a symbol from the library - both would be fine within the letter of C++).

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 17:31 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> when static constructors are run (when a library is loaded, or when you first access a symbol from the library - both would be fine within the letter of C++).

They are run when a symbol from the *object* containing the static object resides is accessed.

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 20:28 UTC (Sun) by immibis (subscriber, #105511) [Link]

If you can run code, you can run code. It could have been in the constructor function which runs when the library is loaded, but it wasn't, but if this ifunc option wasn't available then it probably would have been.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:08 UTC (Tue) by khim (subscriber, #9252) [Link] (17 responses)

> But it's particularly bad when it gets to run before main(), in a library that would otherwise not be called at all!

You only need to use global C++ constructor for that. No need to even use any non-portable tricks.

Any OS that doesn't support such capability couldn't be called C++ compliant, it's as simple as that.

> We SHOULD force attackers to spend time finding something different the next time around.

By forcing everyone to switch to a language which doesn't have such capabilities, like Rust? An interesting proposal, but entirely non-practical, at least as short-term solution.

C++ is just too widely used in our world to just go and kick it out from our ecosystem.

> But all of them helped.

And they added enough complexity, by now, that proper solution without all these mitigations, but with capabilties in hardware is probably more efficient already and if not it would be more efficient then pile of hacks, that we are creating, very soon.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:49 UTC (Tue) by excors (subscriber, #95769) [Link] (3 responses)

> By forcing everyone to switch to a language which doesn't have such capabilities, like Rust?

Rust does have such capabilities:

#[used]
#[link_section = ".ctors"]
static CONSTRUCTOR: extern fn() = before_main;
extern fn before_main() { ... }

The language isn't the issue.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:12 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> Rust does have such capabilities:

That's not Rust. That's platform-specific extension. Try to compile and run your code on Windows.

C++, on the other hand, have that capability as core part of it's design. One couldn't even use cin/cout without it (amount many other things) and it's regular feature used by thousands of libraries.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 8:29 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (1 responses)

Rust does have global constructors, since you can assign any expression to a global.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 9:01 UTC (Wed) by excors (subscriber, #95769) [Link]

I believe Rust only lets you use constant expressions (which should be evaluated at compile-time), when initialising `const` and `static` globals.

For dynamic global `static` initialisation you need something like https://crates.io/crates/ctor , which warns "Rust's philosophy is that nothing happens before or after main and this library explicitly subverts that". That does the `#[link_section]` thing, with variants for Windows and macOS.

That crate has 48M downloads and 272 reverse dependencies, so it's not an official language feature but you probably still wouldn't want to design an OS that doesn't support equivalent execution-before-main functionality, as it would cause too many compatibility problems even in a world with no C/C++.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 6:01 UTC (Wed) by tux3 (subscriber, #101245) [Link] (12 responses)

>You only need to use global C++ constructor for that. No need to even use any non-portable tricks.

Those run after relro, so that you can't overwrite the GOT.
This backdoor makes extensive use of GOT overwrites, they like to hide what they're doing by hijacking a benign function like _cpuid, so that at first glance someone might think this is normal code.

The GOT is also a much sinpler overwrite than trying to patch .text or watch for future function pointers to appear in memory that could be overwritten. Way too easy.

>And they added enough complexity, by now, that proper solution without all these mitigations, but with capabilties in hardware is probably more efficient already and if not it would be more efficient then pile of hacks, that we are creating, very soon.

I long for the day, but hardware with new securiy features takes many years to reach users. And historically, hardware couldn't fully close all the holes either, so that people still use a combination, a pile of different features

Software prevention adds complexity, but it has been very worth it in the mean time. And easy to turn off should the end-all be all hardware capability reach users

And we can very well work towards both! Making attacker lives harder is super positive sum, methinks.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 10:01 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> Those run after relro, so that you can't overwrite the GOT.

Of course you can! Lots of games (or, rather, their cheat-protection systems) are doing that. Just call mprotect.

> And we can very well work towards both! Making attacker lives harder is super positive sum, methinks.

Only as long as regular users don't feel the heat. Recall how long it took for people to accept SELinux and not reflexively turn it off on a new installations to reduce number of support calls.

What you are proposing is significantly more intrusive (if you would also introduce enough hoops to make simple call to mprotect syscall unfeasible) and it's unclear how hard would you need to circumvent all these hoops.

Most mitigation strategies are working on the assumption that malicious code is still severely restricted at what it may do because it vas injected in your process using very “narrow” path (few bytes of buffer overflow, heap-spraying and so on).

Library, on the other hand, is very “wide” path. Trying to make it “narrow” would introduce so many disruptions that we would get the same effects that banks with strict “security teams” are getting: they enforce protocols to dirsuptive and painful that people are just moving data outside of these protocols to ensure that some would would actually be done… and then it leaks (or worse) from there.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 10:40 UTC (Wed) by tux3 (subscriber, #101245) [Link] (1 responses)

>Just call mprotect.

Please do, that is actually my dream scenario!
I am thinking about writing a detector for strange behavior in new Debian uploads, and an mprotect syscall on plt.got is a super cut-and-dry reason for my detector to flag a package as suspicious =)

If closing the ifunc vector forces attackers to use mprotect() or similar noisy syscalls, I'm counting this as a win, I'd be very happy with that result

>What you are proposing is significantly more intrusive (if you would also introduce enough hoops to make simple call to mprotect syscall unfeasible) and it's unclear how hard would you need to circumvent all these hoops.

I think we should consider separately mitigations that affect everyone (e.g. a new compile flag for a binary to opt-out of the ifunc mechanism, even in any of the libs it loads), and detection efforts.

I'm proposing that we make it hard to do things quietly for everyone, so that attackers make noise.

Then a CI system somewhere, not your computer, can do some heavy instrumentation without impacting regular users

Agree that we can't try to make libraries too 'narrow'. At some point if you run malicious code, you run malicious code. But letting malicious code run during load-time, before relro, in a process that otherwise would never have called liblzma I think is making it too easy for the attackers.

If we have an opt-in to close the really dangerous stealthy attack vectors like ifuncs in sshd, then attackers may have to make noise (e.g. mprotect), and a "run-time linter" might catch them.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 11:51 UTC (Wed) by khim (subscriber, #9252) [Link]

> If closing the ifunc vector forces attackers to use mprotect() or similar noisy syscalls, I'm counting this as a win, I'd be very happy with that result

Unlikely, as I wrote already: “closing the ifunc vector” would just lead to proliferation of juicy, ready-to-exploit tables full of function pointers in the unprotected .data segment.

Then you would just need to pick the one you feel is most stable to exploit it.

> e.g. a new compile flag for a binary to opt-out of the ifunc mechanism, even in any of the libs it loads

That one is 200% impractical for most apps, including ssd. Before you may even contemplate that you first need to move pam mechanism into separate process.

> I'm proposing that we make it hard to do things quietly for everyone, so that attackers make noise.

That's entirely impractical: our current OOP-style paradigm relies too much on using vtables. And while function pointer tables are, very often, protected, the actual references to these tables are not.

And fixing that issue is utterly impractical: that's literally the way we have been developing software for last few decades.

Going back to the drawing board and switching to some other development methodology is even harder then creation of a new hardware, this may only happen if we would develop new way to doing things and then wait for a new generation of developers to grow.

> But letting malicious code run during load-time, before relro, in a process that otherwise would never have called liblzma I think is making it too easy for the attackers.

Unfortunately it's also something that is used for many fully legitimate purposes.

> If we have an opt-in to close the really dangerous stealthy attack vectors like ifuncs in sshd, then attackers may have to make noise (e.g. mprotect), and a "run-time linter" might catch them.

Forget ifunc's! These were just a low-hanging fruit which were easy to exploit, but compared to bazillion vtables embedded everywhere… which are already numerous and which would greatly increase in numbers if you would disable ifuncs…

I'm really glad that no one takes your ideas seriously because in your attempt to play the whack-the-mole game you are proposing solutions which would lead to multiplication of such moles… that's a losing game.

Security is not achieved by replacement of standard solution with bazillion ad-hoc ones, each with its own juice security hole.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:33 UTC (Wed) by aaronmdjones (subscriber, #119973) [Link]

> Of course you can! Lots of games (or, rather, their cheat-protection systems) are doing that. Just call mprotect.

It wouldn't surprise me if, due to this attack, in the near future we get a linker option (-z foo) that instructs the dynamic linker to extend full RELRO by calling mseal(2) on the GOT and such before invoking any global ctors or main().

I also wouldn't be surprised if OpenBSD is considering doing exactly the same thing for all of their userspace right now; they have an mimmutable(2). If they haven't already.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 13:20 UTC (Wed) by farnz (subscriber, #17727) [Link]

It's not just IFUNCs, although they were used in this attack. It's also .init_table, .ctor and other ELF mechanisms intended to run code when an object is loaded.

There are two fundamental design decisions in ELF that were chained to get here:

Imports of a symbol merely specify the symbol name, and not the library it comes from; I import (for example) EVP_DigestSignInit_ex@OPENSSL_3.0.0, not EVP_DigestSignInit_ex@OPENSSL_3.0.0 from libssl.so.3, let alone EVP_DigestSignInit_ex@OPENSSL_3.0.0 from /usr/lib/libssl.so.3. As a result, if liblzma.so.5 is listed as providing symbols needed to run this binary, it has to be loaded as soon as the binary attempts to use a symbol that's dynamically linked (such as memcpy) in order to determine if the symbol is provided by this library.
Loading a library implicitly runs code from that library; this exists to make it easy to implement C++ static constructors ("dynamic initialization"), since it allows you to avoid all the fun around "first use of any function or object defined in the same translation unit as the object to be initialized", and instead just do all the dynamic initialization before main, as well as for multi-ABI support (which is what IFUNCs provide).

The first is, I suspect, where a lot of developer confusion comes from (and it's worth noting that Mach-O as used on iOS and macOS has imports specify both a symbol and an expected origin, allowing Darwin's dynamic linker to delay loading a library until an import indicates that it's expecting a symbol to be found in that library).

The second is useful to attackers in general, since it means that if they can get you to load their library (e.g. by dynamically linking against it), they can run their attack code; restricting it would mean that attackers would have to not only get you to load their library, but also to use at least one symbol from their library in order to get code execution.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:08 UTC (Wed) by draco (subscriber, #1792) [Link] (6 responses)

I'm hopeful for CHERI <https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/>

I don't know if it would've eliminated the possibility of this attack (I need to get around to reading up on how dynamic linking works, I know LWN did an article a few weeks ago), but it feels like with smart use of provenance/capabilities, liblzma would've had no ability to mess with the data structures without the help of the dynamic linker.

But progress on that front seems frustratingly slow. Maybe because too few people are aware of the work

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 16:43 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> But progress on that front seems frustratingly slow. Maybe because too few people are aware of the work

People are aware, as I have said it all goes back to iAPX 432 and PP 250 so more than half-century old by now.

But as it turned out people are not ready to pay for security, they expect to obtain it for free, somehow. And when I mean “pay” here I mean “pay with convenience”, not money.

I still remember how data security company tried to, somehow, sell it's warez in US. The first thing US marketing guys demanded is to neuter base security restrictions and remove mandatory pin use from the device!

And to this day all devices that we developing are made for convenience first, security second… and it's not clear if we may change that without civilizational collapse. Otherwise people wouldn't even think about exchanging convenience for security.

Flat address space, ifuncs and other such things are only sympthoms.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 22:19 UTC (Wed) by ejr (subscriber, #51652) [Link]

I was going to respond one above, but this seems a better place.

seL4 also has been around for quite some time, although it's a few decades younger than the Ada processor. I guess someone's getting use out of it as opposed to EROS, CapOS, and the like that couldn't gain funding to continue.

None of this is new. So far, no one has found a magic balance of security, capabilities (computation/device sense, not security capabilities), sane usage, and funding.

I absolutely would LOVE to be wrong.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 16:32 UTC (Thu) by koh (subscriber, #101482) [Link] (3 responses)

> I don't know if it would've eliminated the possibility of this attack (I need to get around to reading up on how dynamic linking works, I know LWN did an article a few weeks ago), but it feels like with smart use of provenance/capabilities, liblzma would've had no ability to mess with the data structures without the help of the dynamic linker.

I don't think so. First, we're going to need kernel support for a so-called 'purecap' userspace (otherwise it's 'hybrid' mode, which just means: either use the extra protection capabilities offer, or don't, up to you, hardware understands both), and userspace compiled as purecap as well. Purecap just says: everything that's a pointer in the source is actually compiled as a so-called capability of twice the size, which offers hardware protection for (certain) out-of-bounds accesses and permissions (think RWX, but a bit more fine-grained), allows for "(un-)sealing" and a few other fun things like unforgeability. Other than that it's just a fat pointer.

I don't think it would've offered any protection here without investing lots of effort into redesigning the dynamic linking procedure and ABI, because ld.so is already using current memory protection facilities. As far as I can tell, the exploit is not using out-of-bounds accesses, but regular, documented API. For anything besides OOB, like sealing function pointers in the GOT such that access to them were to require unsealing, an ABI redesign would be necessary, which in turn would probably require IFUNC to have to change as well.

> But progress on that front seems frustratingly slow. Maybe because too few people are aware of the work

We've got fancy boards in gamer cases, though: https://www.morello-project.org/assets/images/content/DSC...
Not exactly sure, why.
They're actually quite decent 4-core ARMv8 chips that run stock Ubuntu flawlessly (hybrid) as well as CheriBSD (purecap & hybrid).

I've met some 50-ish researchers in January on a workshop that work on it. How many in total do I don't know. Besides the somewhat feature-complete CHERI-Clang there's also a Rust port under way (and probably others).

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 18:55 UTC (Thu) by draco (subscriber, #1792) [Link] (2 responses)

I think you're underselling the power of having memory access controlled by capabilities that can't be created arbitrarily and must be derived from another one of the same or larger bounds (provenance) and same or greater permissions.

The libraries wouldn't be allowed to create a write pointer into the GOT or PLT without being provided an appropriate capability. (And logically, you'd probably do it the other way around.)

And the library couldn't reach into the dynamic linker memory for the write capability without having a capability pointing into its memory that allows read. (One possibility for this architecture is process memory isolation without context switching.)

This also allows the dynamic linker to isolate the libraries from each other—which counters the "liblzma would just manipulate sshd's callback tables directly" attack that was mentioned in response to "eliminate IFUNC entirely". It also ensures that the dynamic linker can enforce that manipulation of the PLT/GOT are sane.

That this changes the ABI is already a given, since using memory addresses as pointers is impossible in enforcing mode.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 19:32 UTC (Thu) by koh (subscriber, #101482) [Link] (1 responses)

You're right, this isolation could in principle be done using the compartmentalization features the arch does provide. But is it feasible, assuming there was kernel and userland support for the supported base archs?

For instance, I wonder how valgrind and other tools substituting their own symbols into a dynamically linked executable's could work under such a construct. IIUC with the ld.so design you mentioned it couldn't work, as valgrind would be shielded from the child process's libc, is that right? If so, maintaining two different linkers (or worse: via envvar) seems to counteract the gains. Without the valgrind on postgres Andres might not have felt the incentive to dig deeper and find out about the exploit. Also everyone else needs it (I presume).

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 19:54 UTC (Thu) by draco (subscriber, #1792) [Link]

If valgrind were at (or near) the base of the capability hierarchy for the process, it would have the necessary permissions. It probably means Valgrind is built differently than other processes (with a stub before the dynamic linker that retains the process root capability for careful use later) and then launching the program(s) to be analyzed inside the valgrind process, but that seems workable.

I could also imagine that Linux could provide the necessary CHERI capabilities for other processes to those with the appropriate Linux capabilities (i.e., root) if we had no better mechanism.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 21:59 UTC (Tue) by khim (subscriber, #9252) [Link] (14 responses)

> If any library that is transitively linked into an executable can override any function in the executable

Which is, of course, possible on majority of OSes (not 100% about OpenBSD, there are some mitigations in place which would require certain amount of non-trivial dance to achieve that, but I think it's possible even there).

> then the intuition of most programmers is going to be wrong.

Maybe we should fix the intuition, then?

Libraries are injected into the address space of the same process. They inherently may do “everything” in that process, including, but not limited to the rewrite of the code of other libraries (or binary). You may play cat-and-mouse game with trying to plug these holes, but that protection would never be water-tight, it's not not a security boundary in a normal process. By design.

Sure, we may recall ideas of iAPX423/e2k/CHERI, but I wouldn't expect these to become a mainstream any time soon and till that would happen changing the intuition is the only possible interim solution.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 6:16 UTC (Wed) by tux3 (subscriber, #101245) [Link] (3 responses)

But we should play the cat and mouse game anyways. The more we push attackers to do the noisy, annoying, heavy-handed thing, the more chance we have of just detecting it instead.

We don't even need to close all the holes. Just the ones that are cheap and hard to detect.

A CI system somewhere sending an email because sshd in Debian sid is seen doing weird .text overwrite at load time? This also catches the mouse.

If you can't outrun them, force 'em to do crazy things that no normal free software package does, so that we have a chance to detect it.

Without ifuncs, I think they'd have had a much harder time making this stealthy. And for a backdoor that was detected because it disturbed a microbenchmark, stealth is everything!

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 11:22 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> Without ifuncs, I think they'd have had a much harder time making this stealthy.

Without ifuncs they would just need to play with internal sshd data structures to find any suitable function pointer they may overwrite.

And without ifuncs there would be many more such pointers, because the need to have extra-fast `memcpy` wouldn't go away without ifuncs. Only instead of mostly-protected GOT these redirectors would be placed in regular data segment.

> And for a backdoor that was detected because it disturbed a microbenchmark, stealth is everything!

I'm not all that sure the end result would reduction in the attack surface and would actually suspect it would actually increase it.

If you remove central CPUID-support mechanism then you are forcing people to invent their own, ad-hoc ones. History says that usually this leads to negative impact on security, not positive one: instead of one, central mechanism which can be monitored for doing something unusual you are getting hundreds of ad-hock implementations.

You may find the example here (it's bionic, yes, but that's part which existed before they added support for ifunc's). It's a bit harder to [ab]use today, it used to have ready-to-use-GOT-replacement for most malloc-related functions.

And because crypto-routines are very speed-conscious (different CPUs offer different ways to improve them) chances are high that without ifuncs there would tons of easy-to-rewrite function pointer tables.

> If you can't outrun them, force 'em to do crazy things that no normal free software package does, so that we have a chance to detect it.

Doesn't work if you also force free software packages to do crazy things that add more entry points for malicious actors. Your attempt to neuter ifuncs would do just that.

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 17:28 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

I'm shooting into the dark, but couldn't `ifunc` support be limited to touching symbols for the shared object itself and not others? Breaks down for fully static builds though…

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 17:36 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Seems like this is already the case: https://lwn.net/Articles/968256/

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 7:53 UTC (Wed) by epa (subscriber, #39769) [Link] (9 responses)

But the sshd code does not make any calls into the xz library. Without the ifunc hooks the malicious code would never get a chance to run. What’s surprising and dangerous is not that library code has full access to the process’s address space (we all knew that) but that by merely linking a library to satisfy some transitive dependency you can end up running arbitrary code, even though you never call any of the library’s functions.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 10:48 UTC (Wed) by farnz (subscriber, #17727) [Link] (8 responses)

Actually fixing that edge case is going to be extremely hard. ELF has chosen to enable compilers to implement C++ dynamic initialization (for statics) as running code before main starts (via the .init_array section), which means that loading a library is enough to run code from that library.

While it is, per the standard, permissible to reimplement this so that dynamic initialization is deferred until first use of a symbol, that's a significant change to current C++ compilers (including GCC and Clang), and is likely (for efficiency reasons) to need changes to ELF so that the dynamic linker can determine whether this is the "first use" of a symbol and run the initializer at this point, rather than at program startup.

This is a behaviour change that will surprise quite a few people - there will be C++ programs that expect side-effects of globals to be run before main - but it's a major blocker to preventing a malicious library from running code just because it's linked.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 8:15 UTC (Thu) by epa (subscriber, #39769) [Link] (7 responses)

I imagine that when adding a library you would have two choices. “Safe linking” would not allow any initialization code to run on startup, and would be the default. If you have a library with C++ initializers, or other funky stuff, you’d need to allow that on the linker command line. It would have been an obvious red flag if the xz project started telling users they now needed to enable unsafe linking.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 11:22 UTC (Thu) by farnz (subscriber, #17727) [Link] (6 responses)

The problem with "safe linking" as a concept is indirect dependencies. I might turn off safe linking because I know that one of my dependencies uses a C++ static global; that then means that I have to (at the very least) trust it to not have dependencies that should have used safe linking, but didn't, and depending on the implementation, I might have to trust all my dependencies to not have unwanted startup code.

It'd be better to take advantage of the fact that, at its strongest, the C++ standard has required statics to be initialized before the first reference by the program to a symbol from the translation unit the static is defined in. You can, under C++ rules, delay running the initializer code until the dynamic linker has determined that a given library actually provides the desired symbol (which avoids shenanigans with weak symbols where the final link is done against a different library).

Doing this consistently (including ensuring that IFUNCs only come into play once you've resolved the IFUNC as the definition of this symbol, avoiding games with weak symbols there, too) would ensure that "merely linking" a library is harmless up until you use a symbol from that library; there's no specification that prevents this, it's just a lot of changes to make to ELF and to compilers.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 12:07 UTC (Thu) by khim (subscriber, #9252) [Link]

> there's no specification that prevents this

There are absolute is a specification that prevents it. It's called System V Application Binary Interface.

And C++ standard is pretty explicit about it, too: It is implementation-defined whether the dynamic initialization of a non-block non-inline variable with static storage duration is sequenced before the first statement of main or is deferred.

For better or for worse, ELF have picked “sequenced before the first statement of main” option.

You couldn't just go and change it without defining new platform.

And if you are defining new platform and asking everyone to adopt it anyway then you may as well declare that this particular part of the standard doesn't apply to that new platform.

But that would help apps that are written for the existing platform.

It's incredibly common for the C++ programs to rely on registry that is filled in global constructors (think Abseil Flags) and you may expect significant pushback to such a proposal.

OpenBSD may do that, they don't care about being usable by Joe Average and they may just patch all the programs they care about. Linux couldn't.

> it's just a lot of changes to make to ELF and to compilers

It's definition of a new platform, first and foremost. It's not normal GNU/Linux anymore.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 17:22 UTC (Thu) by epa (subscriber, #39769) [Link] (3 responses)

I was envisaging that you’d also have to specify all the libraries you link against. So if libsystemd requires libxz, you’d need -lsystemd -lxz on the command line when building something that uses libsystemd. No hidden transitive dependencies.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 17:27 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

That opens up a different problem; if only the main binary can indicate dependencies (so no indirect dependencies), how do you remove your liblzma dependency when libsystemd stops depending on it? How do you cope when a future version of libsystemd adds a dependency on libzstd2?

Remember that we're talking about the dynamic linker at this point, not the static linker - so you can't rely on the version of libsystemd you used at compile time being the one used at runtime.

Free software's not-so-eXZellent adventure

Posted Apr 4, 2024 18:54 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)

You’re right. Removing the dependency would be a bit tardy. But having to add it explicitly is kind of a feature and what I intended. So if your program starts to link in libxz, that’s because you explicitly okayed it and not just a hidden transitive dependency of some other project.

In C programming, there are those who argue that header files should not include other header files. Rather, if a header needs something then each user should #include its dependencies explicitly. I don’t have a view on the wisdom of that for C development but it’s a similar idea to what I am suggesting at the linking step.

Perhaps given a difference between “safe linking”, which can only define new symbols and not replace them or execute arbitrary code at startup, and the full-fat kind that can do things merely by including a library, transitive dependencies could be included by default if they are “safe”, but if you have a dependency of a dependency which wants to have ifunc hooks or startup code, you must explicitly ask for it or the linker will complain.

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 9:38 UTC (Fri) by smurf (subscriber, #17840) [Link]

> In C programming, there are those who argue that header files should not include other header files. Rather, if a header needs something then each user should #include its dependencies explicitly.

There's a reason for this attitude, which is that C is somewhat brain dead and doesn't have namespaces, an explicit list of things included files export, or anything else along these lines. Thus any random macros which X exports to Y are visible in Z, and the programmer working on Z should be aware of that.

Fortunately many public include files of non-system headers use prefixed symbols, and the linker has gotten better at complaining when globals collide. Thus this is not much of an issue.

On the other hand, this practice severely hampers upwards compatibility. When A includes B which includes C which requires a new system header D, should we force A to add some random include file? not really IMHO.

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 17:19 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> It'd be better to take advantage of the fact that, at its strongest, the C++ standard has required statics to be initialized before the first reference by the program to a symbol from the translation unit the static is defined in.

FWIW, macOS does this, so while it is not compatible with the SysV ABI as khim points out, anything cross platform that needs reliable static initialization and is cross-platform needs to handle this case already. I'd be fine with a compile flag to make this doable in ELF with an opt-in compile flag…

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 22:06 UTC (Tue) by AlecTavi (guest, #86342) [Link] (8 responses)

> changed the build system to silently disable the Landlock security module

This was particularly clever subterfuge. I read the linked commit diff on my phone, and missed how this disabled Landlock. I even copied the C source to check if there was some typo in a syscall I didn't notice from memory. Still, everything compiled and looked legitimate.

If others miss it too, there's a stray period in the diff. My eyes skipped right over it. When I deleted the diff formatting from the code, I removed the period and had a legitimate C file.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:07 UTC (Tue) by rrolls (subscriber, #151126) [Link] (4 responses)

I noticed the stray period in that diff.

What I'm curious about is: why does a file with a syntax error in it cause something to get silently disabled, rather than failing the build?

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:32 UTC (Tue) by viro (subscriber, #7872) [Link]

For autoconf failed compile is *not* "we fail the entire build" thing. How would you test things like "is FOO_BAR_GREEN
defined in foo.h or is it in foo/bar.h"?

Look through the autoconf tests done on some build...

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:33 UTC (Tue) by mjg59 (subscriber, #23239) [Link]

It's testing whether it's possible to build code that supports a given feature on that system. The expected outcome of that feature not being buildable is that the compile will fail, and if that feature is optional you'd then simply note it's unavailable and continue.

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:47 UTC (Tue) by hmanning77 (subscriber, #160992) [Link] (1 responses)

This has been raised on CMake's issue tracker: https://gitlab.kitware.com/cmake/cmake/-/issues/25846

The issue is that there are apparently cases where the only reliable way to detect the presence or absence of a feature is to try to compile it. When the check fails in a single environment there is no automatic way to know whether the code failed to compile on that machine, or could never have compiled at all. The issue tracker discussion immediately dismisses the general problem as impossible to solve, and diverts to catching the easiest avenues for abuse.

I don't have any better ideas to offer though. I'm assuming it isn't feasible to say, "security features should be enabled by default as best practice and never automatically disabled". The root cause is that discovering available features is hard; could that be improved?

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 16:12 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

FD: CMake developer and have commented in that issue.

Yes, detecting the environment and changing behavior based on its state is an anti-pattern in my mind. This generally means that one should have things be on or off rather than "see if available and enable it if so". *Defaults* for things might make sense behind `--enable-sandboxing-features` so that packaging systems can use the same `configure` command line across multiple versions (with fine-grained disabling where necessary).

My main concern is that the same command line can result in vastly different builds just because some dependency was not made appropriately discoverable at the right time and you end up with a "box of chocolates"[1] build.

There *are* some exceptions to this:

- polyfills for basic platform APIs in order to get compatible behaviors (e.g., libcurl's detection of `send` argument qualifiers to avoid warnings)
- platform knowledge (e.g., never check Landlock on Windows)

Of course, one can also just memoize this information (`libuv` does this) and use the preprocessor to detect feature support with version checks.

[1] https://www.youtube.com/watch?v=bNrNPD1uAII

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:51 UTC (Tue) by excors (subscriber, #95769) [Link] (2 responses)

What I'm curious about is: The erroneous period is only in the CMake build, not the equivalent Autotools check for Landlock. But I thought the rest of the exploit is only in the Autotools build, and won't affect CMake users. So, what was the point of that sabotage? (Or maybe it really was just a fat-finger; malicious people can still make innocent mistakes.)

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 5:48 UTC (Wed) by chris_se (subscriber, #99706) [Link]

> But I thought the rest of the exploit is only in the Autotools build, and won't affect CMake users. So, what was the point of that sabotage?

One possibility could be that the original plan was to perform various changes that themselves were very innocuous-looking over a longer period of time, and that the end goal was to also backdoor the CMake build. (Maybe in a different manner, because the CMake build is used more in conjunction with vendor-distributed software, e.g. via conan, vcpkg, etc.) But that for some reason the timeline moved up and they had to pull the trigger early. (It has already been speculated that the systemd change to use dlopen() in the future may have forced the hand.) Or their intention was to also break the autotools variant, and they simply messed up by forgetting to add the dot to that check accidentally.

But unless the culprit(s) come(s) forward, we'll probably never know the reasoning behind it.

> (Or maybe it really was just a fat-finger; malicious people can still make innocent mistakes.)

Or this. Since they put in so much effort to actually seem legitimate as part of the social engineering, this may have also been an attempt to simply fix a legitimate issue properly that went wrong accidentally.

People have gone through all commits by the same author, and they have found a lot of things that with hindsight could be seen as malicious, but for some of these we'll probably never know whether they were very clever in attempting to introduce very subtle bugs, or they simple made honest mistakes and those commits were actually intended to just be part of the social engineering aspect to seem legitimate.

That said, since they have been shown to be an adversary, we should consider all subtle bugs to be of malicious intent out of an abundance of caution, even if on a philosophical level we'll probably never know the actual truth of that.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 15:42 UTC (Wed) by draco (subscriber, #1792) [Link]

Is landlock widely used (this is the first I'd heard of it)? Perhaps the few distro(s) integrating landlock do the xz build with CMake? If so, the added exposure of breaking the autotools build at the same time might not have been worth the risk, especially if someone ever looked for whether support was enabled.

Easier to explain away one typo than two…

Free software's not-so-eXZellent adventure

Posted Apr 2, 2024 23:13 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

What actually scares me is that this attack does not _have_ to be the result of GRU/China/NSA/$THREE_LETTER_AGENCY.

It can legitimately be just one individual acting on their own. And there is a pretty solid motivation for it: cryptocurrency. These days, if you manage to sneak a backdoor into something like SSH, you can quite easily leverage that for an immediate payoff on the hundred-million dollar scale. No need to deal with stingy government agencies (that can "disappear" you just as well as pay you off).

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 18:12 UTC (Wed) by lunaryorn (subscriber, #111088) [Link]

Kinda reminds me of the event-stream incident in 2018… this "hostile contributor takes over" move had a precedent.

Free software's not-so-eXZellent adventure

Posted Apr 7, 2024 16:20 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

However, I feel that cryptocurrency mining would have also made the fact that *something* was up very evident and have led to discovery before "much" benefit had been gained without sitting on it to be deployed to LTS distro releases first. Given the pressure to get it into the bleeding edge distros, I suspect that this was more of interest to get "landing pads" for further work deployed. I feel like cryptocurrency mining is better off finding 0-days in existing software as the scam-coin-to-mine changes faster than this deployment process but slower than "people patching their 0-day bugs".

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 8:36 UTC (Wed) by stephanlachnit (subscriber, #151361) [Link] (1 responses)

> Why are we still using ancient build systems that almost nobody understand?

Such a good point IMHO. Yes, using a different build system could have resulted in the same thing, but autoconf is such a mess that almost nobody looks inside the scripts except if they have to. E.g. Meson+Python is not just easier to read for beginners, most people also know at least a little bit of Python to see what is going on.

autoconf really has to go, it's time as the main build system for core components is long over.

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 18:36 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link]

> autoconf really has to go

It's one thing to say that $PACKAGE has to go.

It's another thing to say that $PACKAGE can be replaced by objectively superior $ALTERNATIVE.

The real thing is to be Git and be both so "superior" and have enough marketing mass to decimate the alternatives.

Probably what's needed is, e.g. an not only an autoconf => cmake (for example) translator, but a name that can straight up replace the old stuff, beginning with GNU in its entirety.

Maybe the answer is a huge kickstarter at GNU. Doubling down, this might also be a post-RMS marketing campaign for the FSF.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 9:58 UTC (Wed) by nsheed (subscriber, #5151) [Link] (3 responses)

Somebody (many somebodies) put years of effort into the stealthy build-up of various components that would facilitate a very stealthy attack on sshd.

By itself it's an incredibly clever hack but the surrounding effort involved in planning and then shepherding/protecting the various parts through to fruition is scary.

Free software's not-so-eXZellent adventure

Posted Apr 3, 2024 11:33 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

That's why I don't believe for a minute that “Jia Tan” is a free actor. $THREE_LETTER_AGENCY is definitely involved (I wish we knows which one), simply because there was obviously a lot of efforts to plan that.

We only see a top of an iceberg, but that was definitely well-timed and coordinated to make literally tons of distributions vulnerable in the near future… and lone hackers in the basements are not know for such level of coordination while for three letter agencies that's routine scale level of coordination.

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 10:34 UTC (Fri) by tzafrir (subscriber, #11501) [Link] (1 responses)

How much did it cost? The team behind this definitely had one "star" programmer and some more team members. They worked on it part-time for several years. They were hoping to gain a backdoor to sshd and potentially some more.

This is not beyond what a medium black-hat company could do. Assuming that there's market for such exploits. They could have hoped to cover the costs of the operation. And I guess they already employ such people.

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 11:38 UTC (Fri) by khim (subscriber, #9252) [Link]

> How much did it cost?

Three years.

> They worked on it part-time for several years.

And that immediately rules out all but largest companies, lone hobbists… and three-letter agencies.

> They were hoping to gain a backdoor to sshd and potentially some more.

They were hoping to get a reusable backdoor.

> This is not beyond what a medium black-hat company could do.

This is beyond what medium black-hat company may plan to do.

> Assuming that there's market for such exploits.

That's precisely the thing: exploits on black market command high price only if they are noncompromised.

Black market favors independent (even if easily detectable and patchable) exploits which you may sell to many buyers. Compromised, well-known exploits go down in price sharply.

This was an attempt to plant reusable exploit which was, presumably, was either planed to be used against one particular extra-high-profile target (which would still be there after three years!) or, alternatively, be reused again and again.

To reuse it again and again you need to “keep it in house” and ensure details wouldn't leak. To even have some high-profile target that is sitting there waiting for your attack you have to large organization which plans that encompass decades.

This, pretty much, rules out everyone but $THREE_LETTER_AGENCY es.

> And I guess they already employ such people.

It's not about abilities to do such an exploit. It's about the use of such exploit. Only $THREE_LETTER_AGENCY (and ironically enough, lone hobbist) may benefit from something like this. And this looks like a work of a team which rules out “lone hobbist” hypnotises.

Thanks and a comment on tarballs

Posted Apr 5, 2024 13:25 UTC (Fri) by lisandropm (subscriber, #69317) [Link] (1 responses)

First of all: **thanks**. It is a totally insightful article. Also, I can not but feel totally represented in:

> Distributors, as a general rule, would rather carry fewer patches than
> more, and don't patch the software they ship without a reason.
> So, while both complexity and downstream patching should be
> examined and avoided when possible, they are a part of our world
> that is not going away.

Going down that line: indeed the world seems to have gone trough the VCS/git line and abandoned tarballs, and yes, *maybe* we need to change the way certain things work. Should we really abandon tarballs? I don't know, and I also can't say much: I do not currently have the time to help with the required tooling...

But I think Guillem Jover did some very insightful comments on [0], specially at the end. I recommend taking the time to read it in full (it's long, possibly too-Debian related, but it is a good point of view of the current state of affairs).

[0] <https://lists.debian.org/debian-devel/2024/03/msg00367.html>

Thanks and a comment on tarballs

Posted Apr 5, 2024 13:39 UTC (Fri) by lisandropm (subscriber, #69317) [Link]

And part of it is already at "Distributions quotes of the week". Maybe I should read the whole LWN release before commenting :-)

Free software's not-so-eXZellent adventure

Posted Apr 5, 2024 16:47 UTC (Fri) by karim (subscriber, #114) [Link]

Nitpick:
"There is a good chance that our community, made up of people who just want to get something done and most of whom are not security experts, has just been attacked by a powerful adversary with extensive capabilities and resources. We are not equipped for that kind of fight. But, then, neither is the proprietary world."

The proprietary world is probably unlikely to be vulnerable to the "good cop"/"bad cop" routine that seems to have been used here. It doesn't mean it's not vulnerable to other forms of attack. Just saying.

Unrelated, but somewhat comes to mind:
I remember the key signing routines at OLS from the early 2000s. I had always found those dubious. None of the participants were professionally trained at validating government-issued ID -- something even government agencies can struggle with themselves.

We (the open source community) have been operating since forever on implied, distributed trust simply based on each participant's own evaluation of the trustworthiness of the people they're interfacing with.

I have no grand solution or alternative to this. But anyone who, as the article states, is willing and able to play the long game can and likely will be able to score against us.

Maintainer funding is one way of alleviating, although not eliminating, maintainers' feeling of desperation. I've seen several maintainers over the past decades eventually just tire of seeing their work being used for profit by others while they get nothing.

Free software's not-so-eXZellent adventure

Posted Apr 8, 2024 10:32 UTC (Mon) by ras (subscriber, #33059) [Link] (2 responses)

> Yes, there was a massive bit of luck involved in the detection of this attack

I don't want to diminish Andres skill and dedication in following this through because it was a sterling effort, but I suspect the detection of this particular attack was inevitable and it would have happened sooner rather than later.

I guess most of us here run VM's with ssh servers exposed to the internet, and all of those are under continual attack. Normally it takes sshd a relatively small amount of CPU time to reject the an unknown login, but even so the shear number of attacks we get now combined with the tight provisioning of VM's means the resources used are very noticeable. None of the attacks will get through my ssh installations, but regardless the shear number of them has forced me to use fail2ban to keep the OOM killer at bay.

This attack increased the CPU used by those logins by about a factor of 3. It's difficult to believe this would not be noticed. In fact, it Andres picked it up because of this very tell-tale. If it wasn't him, it would have been someone else.

On the downside, I expect that mistake won't be repeated next time around.

Free software's not-so-eXZellent adventure

Posted Apr 10, 2024 18:35 UTC (Wed) by raven667 (subscriber, #5198) [Link] (1 responses)

> This attack increased the CPU used by those logins by about a factor of 3. It's difficult to believe this would not be noticed

That may be true right now, but I'm not sure that the increased CPU usage was a fundamental property of this backdoor or just a bug that could have been fixed had this backdoor persisted for longer. It may have been rushed because the exploit chain they used was about to dry up as the systemd project announced that they were working on using dlopen() for these dependencies to reduce the overhead for tiny environments.

I don't think you can complacently _rely_ on someone detecting these things quickly, the idea that "many eyes makes bugs shallow" doesn't address the governance model that directs and pays someone to actually look beyond their hobby/curiosity, but the thing with backdoors/bugs in open source is that they are detectable and once they are detected we have the tools to do comprehensive analysis out in the open as well, so they are high-risk for the attacker as it only takes one curious "huh, that's weird" to blow the whole operation. With proprietary closed software there may be processes that could catch this at some places but not many and I would expect something as well hidden as this to go undetected for a long time, and if it were detected there is no guarantee that a private company would disclose enough detail to understand the provenance or impact, just take a look at the MS response to having a threat actor take a signing key from Azure to craft login tokens for O365 services and how little actual detail was disclosed, the US DHS Cyber Safety board wrote a whole paper on that themselves. Lack of information sharing leads to the same attacks working against different organizations over and over again because they can't learn from their peers, eg. the explosion of ransomware, like the jackass in every zombie movie who doesn't tell anyone they are bitten.

Free software's not-so-eXZellent adventure

Posted Apr 11, 2024 10:00 UTC (Thu) by farnz (subscriber, #17727) [Link]

I don't think you can complacently _rely_ on someone detecting these things quickly, the idea that "many eyes makes bugs shallow" doesn't address the governance model that directs and pays someone to actually look beyond their hobby/curiosity, but the thing with backdoors/bugs in open source is that they are detectable and once they are detected we have the tools to do comprehensive analysis out in the open as well, so they are high-risk for the attacker as it only takes one curious "huh, that's weird" to blow the whole operation.

FWIW, I've always interpreted "many eyes make bugs shallow" as "once a bug has been identified, people will find the root cause and a good fix quickly", and not "all bugs will be found quickly". This meaning of the phrase still applies here - the bug was found (via a mix of chance and Andres Freund's hard work) a long time after it was introduced, but it took very little time to find the root cause and a fix for the bug once it had been identified.