[go: up one dir, main page]

|
|
Log in / Subscribe / Register

A struct sockaddr sequel

By Jonathan Corbet
November 14, 2025
One of the many objectives of the Linux Kernel Self-Protection Project (KSPP), which just completed ten years of work, is to ensure that all array references can be bounds-checked, even in the case of flexible array members, the size of which is not known at compile time. One of the most challenging flexible array members in the kernel is not even declared as such. Almost exactly one year ago, LWN looked at the effort to increase safety around the networking subsystem's heavily used sockaddr structure. One year later, Kees Cook is still looking for a way to bring this work to a close.

In short, the problem is that struct sockaddr is traditionally defined as:

    struct sockaddr {
        short sa_family;
	char sa_data[14];
    };

The sa_data field was more than large enough to hold a network address in the early 1980s when this structure was first defined for BSD Unix, but it is not large enough now. As a result, a great deal of code, in both the kernel and user space, passes around struct sockaddr pointers that, in truth, point to different structures with more space for the addresses they need to hold. In other words, sa_data is being treated as a flexible array member, even though it is not declared as one. The prevalence of struct sockaddr has thrown a spanner into the works of many attempts to better check the uses of array members in structures.

At the end of last year's episode, much of the kernel had been changed to use struct sockaddr_storage (actually implemented as struct __kernel_sockaddr_storage), which has a data array large enough to hold any known network address. An attempt was made to change the definition struct sockaddr to make its sa_data field into an explicit flexible array member, but that work ran into a minor snag. There are many places in the kernel where struct sockaddr is embedded within another structure. In most of these cases, sa_data is not treated as a flexible array member, so developers have freely embedded struct sockaddr anywhere within the containing structure, often not at the end.

If sa_data is redefined as a flexible array member, the compiler no longer knows how large the structure will actually be. That, in turn, means that the compiler does not know how to lay out a structure containing struct sockaddr, so it guesses and emits a warning. Or, in the case of a kernel build, tens of thousands of warnings. Kernel developers, as it turns out, would rather face the prospect of an array overflow than a warning flood of that magnitude, so this work came to a halt.

One possible solution would be to replace embedded struct sockaddr fields with struct sockaddr_storage, eliminating the flexible array member. But that would bloat the containing structures with memory that is not needed, so that approach is not popular either.

Instead, Cook is working on a patch series that introduces yet another struct sockaddr variant:

    struct sockaddr_unsized {
	__kernel_sa_family_t	sa_family;	/* address family, AF_xxx */
	char			sa_data[];	/* flexible address data */
    };

Its purpose is to be used in internal network-subsystem interfaces where the size of sa_data needs to be flexible, but where its actual size is also known. For example, the bind() method in struct proto_ops is defined as:

    int	(*bind) (struct socket *sock,
		 struct sockaddr *myaddr,
		 int sockaddr_len);

The type of myaddr can be changed to struct sockaddr_unsized * since sockaddr_len gives the real size of the sa_data array. Cook's patch series does many such replacements, eliminating the use of variably sized sockaddr structures in the networking subsystem. With that done, there are no more uses of struct sockaddr that read beyond the 14-byte sa_data array. As a result, struct sockaddr can be reverted to its classic, non-flexible definition, and array bounds checking can be applied to code using that structure.

That change is enough to make all of those warnings go away, so many would likely see it as a good stopping point. There is still, though, the matter of all those sockaddr_unsized structures, any of which might be the source of a catastrophic overflow at some point. So, once the dust settles from this work, we are likely to see some attention paid to implementing bounds checking for those structures. One possible approach mentioned in the patch set is to eventually add an sa_data_len field, so that the structure would contain the length of its sa_data array. That would make it easy to document the relationship between the fields with the counted_by() annotation, enabling the compiler to insert bounds checks.

While the ability to write new code in Rust holds promise for reducing the number of memory-safety bugs introduced into the kernel, the simple fact is that the kernel contains a huge amount of C code that will not be going away anytime soon. Anything that can be done to make that code safer is thus welcome. The many variations of struct sockaddr that have made the rounds may seem silly to some, but they are a part of the process of bringing a bit of safety to an API that was defined over 40 years ago. Ten years of KSPP have made the kernel safer, but the job is far from done.

Index entries for this article
KernelFlexible array members
KernelNetworking/struct sockaddr
KernelReleases/6.19


to post comments

sa_family

Posted Nov 14, 2025 18:56 UTC (Fri) by magfr (subscriber, #16052) [Link] (12 responses)

Does not the sa_family value define the size of sa_data, except for the AF_LOCAL case where the length is necessary?

sa_family

Posted Nov 14, 2025 19:41 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (10 responses)

Wouldn't you need a function call to determine that size though? If it was a straight array lookup, `__af_family_sizeof_sa_data[sa_family]` may itself read off the end of that array. A second bounds check on `sa_family` itself as well perhaps? Are the `AF_*` constant values dense enough to warrant it? It definitely seems like something beyond `__counted_by` at least.

In Rust, I feel one would have `SockAddr::new(family: AfFamily) -> SockAddr` and then make `sa_family` read-only from there on out. But C doesn't allow one to have only some public structure members (without tooling like https://www.youtube.com/watch?v=bYxn_0jupaI).

sa_family

Posted Nov 16, 2025 0:18 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (8 responses)

> In Rust, I feel one would have `SockAddr::new(family: AfFamily) -> SockAddr` and then make `sa_family` read-only from there on out.

This is an inelegant way of handling it. The more idiomatic representation would be an enum (the Rust equivalent of a tagged union), which automatically enforces that the discriminant (tag) is correct for whatever data you try to put inside of it. Then we don't need to bother with private members and checked constructors.

You would have a series of separate structs for each address family, each with their own layout, and the enum would be a thin wrapper over those. That way, for example, AF_INET can have a 32-bit address plus a separate field for the port number, AF_INET6 can be largely identical but with a bigger address, AF_UNIX can have a (heap-allocated or fixed-capacity) string for the path, and so on. If one of the address families wants to enforce some invariant at construction time (e.g. "AF_UNIX path must not contain embedded null bytes"), then you put that logic at the struct level, and the enum never has to think about it. This is possible because you always construct the inner struct first, and only then go to wrap it in the enum (so enforcing the struct's invariants is simply not the enum's problem).

Obviously this is not C-FFI compatible, but it's how you would write it from scratch in a hypothetical Rust-only universe. It is also not terribly difficult to convert between a scheme like the above and the C approach, so you might expect to find such a conversion layer in a serious Rust implementation of the sockets API.

sa_family

Posted Nov 16, 2025 22:07 UTC (Sun) by da4089 (subscriber, #1195) [Link] (3 responses)

I’m unfamiliar with Rust in this level of detail.

Is it possible for a loadable kernel module to extend that enum at runtime with a previously-unknown-to-the-kernel address family?

sa_family

Posted Nov 16, 2025 22:40 UTC (Sun) by randomguy3 (subscriber, #71063) [Link] (2 responses)

In C terms, a Rust enum is most similar to a union (plus a value to tell you which union variant is active). Imagine the potential issues with allowing new union variants to be added at runtime - most notably, how would you safely destroy such a value?

Of course (as with most things in software), this could be solved, but it would cost you (in performance, ergonomics, safety and/or another axis of flexibility).

sa_family

Posted Nov 17, 2025 8:29 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

Destruction would be the least of the issues. Every location that uses it and matches on the variant and does something for each of them would be incomplete.

sa_family

Posted Nov 17, 2025 20:23 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

Simple code compatibility can be guarded with #[non_exhaustive], if you anticipate the problem when you define the enum.[1]

But the type system assumes that you will tell it about all of the variants at compile time. Layout optimizations are made under this assumption, so it cannot reasonably be unwound at runtime, even with #[non_exhaustive] (which is primarily meant as a backcompat lint).

If the set of possible types is unbounded at runtime, probably you need to use a dyn-compatible trait instead of an enum. Then we can define methods on addresses and dynamically dispatch to them (or statically dispatch if the type is known). But this has a number of downsides, including all the usual DST restrictions (see [2]), as well as various annoying restrictions on the method signatures.[3]

In contexts where it is acceptable to generate code handling each address type separately, the trait does not need to be dyn-compatible, because you can use ordinary enerics for that (specializations will be monomorphized and compiled at the same time as the rest of the module). But that doesn't work for cases where you want a single data structure that directly or indirectly contains instances of mixed types (e.g. the file descriptor table).

It should also be emphasized that Rust is a systems programming language. If you don't like dyn Trait, you are allowed to manually re-implement your own version of it with unsafe code. There is no strict aliasing rule, so you can unsafely cast between different types as long as the object representations are valid at all of those types, and there's nothing stopping you from building a safe abstraction around that operation. This is probably not a great idea in most situations, but the language doesn't rule it out, and indeed it even provides primitives to help you do it correctly (e.g. std::any::TypeId). This might be appropriate in contexts where you only need a very small subset of dyn Trait functionality and don't want to live with all of its restrictions.

[1]: https://doc.rust-lang.org/reference/attributes/type_syste...
[2]: https://doc.rust-lang.org/reference/dynamically-sized-typ...
[3]: https://doc.rust-lang.org/reference/items/traits.html#dyn...

sa_family

Posted Nov 17, 2025 4:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

That works with a closed set of options. It may well be so in Linux. But the constructor pattern might be a better fit if modules can provide their own socket family types. There is also the C-FFI boundary compatibility to consider in Linux.

So, yes, I agree that an `enum` would be better in a vacuum, but we're not. I should have been clearer that it was a Rust spelling of a potential C API (because the syntax for this specific bit is nicer than C offers, IMO).

sa_family

Posted Nov 17, 2025 8:32 UTC (Mon) by taladar (subscriber, #68407) [Link]

Honestly, considering how rarely new address families pop up an enum might still be the better choice, that would also have the advantage that you then have to check every location that does something with multiple enum variants to see if code for the new address family has to be added there.

sa_family

Posted Nov 18, 2025 8:35 UTC (Tue) by plugwash (subscriber, #29694) [Link] (1 responses)

Enums are great until you need to persist data to disk or pass it across a trust/version boundary.

Then they are not so great, firstly the only stable ABI is "repr C" but that disables all layout optimisations. Secondly even if you use "repr C" creating an invalid value is still undefined behaviour.

sa_family

Posted Nov 18, 2025 8:43 UTC (Tue) by taladar (subscriber, #68407) [Link]

Persisting data to disk or sending it around the network is no problem with enums, you just shouldn't do it by numeric discriminants, use string discriminant names for that, ideally in a self-describing format like JSON (or the various binary equivalents).

In fact what you should almost never do is persist some sort of numeric discriminant value the way most C programs do since that just leads to headaches in the future, not the least of which is that - without context - nobody has any idea which value means what. That can work fine if your enum is something like a day of the week or a month but not so much if it is something that is less well known or might change in the future.

Passing data around a trust or version boundary also doesn't get any easier if you try to encode what is basically a sum type in terms of the actual facts of the domain you are dealing with in a less natural way the way most languages without sum types do.

sa_family

Posted Nov 17, 2025 6:06 UTC (Mon) by wahern (subscriber, #37304) [Link]

The 4.4BSD Sockets API included an .sa_len member and SA_LEN macro. The BSD derivatives, including macOS, still have it; Solaris and Linux never adopted it, understandably because it was (IME) confusing if and when the member was used, as opposed to the out-of-struct socklen_t parameters to bind(2), connect(2), etc. The .sa_data_len proposal restores this element, except it sizes the family-specific portion rather than the entire struct like .sa_len[1], and it's confined to the kernel, so there'll hopefully be less confusion about the source of truth for the object size.

[1] Analogous ISO C proposals for the counted_by attribute allow expressions for the size, similar to VLAs/VMAs, such that one could use something like, e.g., `counted_by(.sa_len - offsetof(struct sockaddr, sa_data))`.

sa_family

Posted Nov 17, 2025 19:55 UTC (Mon) by kees (subscriber, #27264) [Link]

Each SA_* is different. Some map to a fixed size, yes. But others map a variable size with a minimum bounds (either explicitly or implicitly).

Historical tidbit

Posted Dec 7, 2025 19:12 UTC (Sun) by lukeshu (guest, #105612) [Link]

> The sa_data field was more than large enough to hold a network address in the early 1980s when this structure was first defined for BSD Unix

That's not quite true! In the earliest versions there were two variants, the `struct sockaddr` that we know today that passed the data in the struct for when the data was small, and an "indirect" `struckt sockaddri` variant that passed length+pointer for when the data was big.

https://github.com/dspinellis/unix-history-repo/blob/3573...

(I assume there was a calling-convention reason that you couldn't pass more than 16 bytes as arguments.)


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds