Lazy BDDs with eager literal intersections

In a previous article, we discussed how Elixir changed its set-theoretic types representation from Disjunctive Normal Forms (DNFs) to Lazy Binary Decision Diagrams (Lazy BDDs).

In a nutshell, DNFs allow us to represent unions, intersections, and negations as a flat data structure:

(c1 and not d1) or (c2 and not d2) or (c3 and not d3) or ...

This meant that any operation between complex types was immediately flattened. For example, intersections of unions, such as (foo or bar) and (baz or bat), had to be immediately flatten into the cartesian production (foo and baz) or (foo and bat) or (bar and baz) or (bar and bat). Even worse, unions of differences could lead to exponential expansion.

Elixir v1.19 then introduced BDDs with lazy unions (in short, lazy BDDs). They are trees which allow us to represent set-theoretic operations of any arbitrary depth, without flattening them, while also detecting duplicate types. A lazy BDD has type

type lazy_bdd() = :top or :bottom or
  {type(), constrained :: lazy_bdd(), uncertain :: lazy_bdd(), dual :: lazy_bdd()}

where type() is the representation of the actual type. For example, if the type being represented is a tuple, type() would be a list of all elements in the tuple. In literature, type() is known as literal.

Throughout this article, we will use the following notation to represent lazy BDDs:

B = {a, C, U, D}

where B stands for BDD, a is the literal, C is constrained, U is uncertain, and D is dual. Semantically, the BDD above is the same as:

B = (a and C) or U or (not a and D)

Which means the following expression, where foo, bar, baz, and bat below represent types:

(foo and not (bar or (baz and bat))

will be stored as:

{foo,
 {bar, :bottom, :bottom,
  {baz, :bottom,
   {bat, :bottom, :bottom, :top}, :top}, :bottom, :bottom}

The conversion to lazy BDDs and a few optimizations we added to their formulation in literature allowed us to type check Elixir programs faster, despite Elixir v1.19 performing more type checks than v1.18!

However, when working on Elixir v1.20, which introduced type inference of all constructs, we noticed some of the downsides of lazy BDDs. This article explores these downsides and how we addressed them.

As with previous articles, we discuss implementation details of the type system. You don’t need to understand these internals to use the type system. Our goal is simply to document our progress and provide guidance for future maintainers and implementers. Let’s get started.

The trouble with laziness

As we described above, lazy BDDs allow us to represent set-theoretic operations at any depth. And while this is extremely useful for unions and differences, they offer a downside when it comes to intersections. For example, take this type:

(%Foo{} or %Bar{} or %Baz{} or %Bat{}) and %Bar{}

While we could store the above as a BDD, it is also clear that the above can only be equal to %Bar{}. In other words, if we resolve intersections eagerly, we will most likely reduce the tree size, speeding up all future operations! To do this, we need to compute the intersection between literal types (the first element of the BDD node), rather than intersections between BDDs. So we are naming these new optimizations eager literal intersections.

Eager literal intersections

Our goal is to apply intersections between literals as soon as possible as it helps us reduce the size of the tree whenever literal intersections are empty.

Take two BDDs:

B1 = {a1, C1, U1, D1}
B2 = {a2, C2, U2, D2}

And imagine there is a function that can compute the intersection between a1 and a2 (which is the intersection of literals, not BDDs). The optimization works as long as one of C1 or C2 are :top. In this case, let’s choose C2, so we have:

B1 = (a1 and C1) or U1 or (not a1 and D1)
B2 = a2 or U2 or (not a2 and D2)

The intersection of B1 and B2 can be computed as:

B1 and (a2 or U2 or (not a2 and D2))

Let’s distribute it:

(a2 and B1) or (U2 and B1) or ((not a2 and D2) and B1)

And expand the first B1:

(a2 and ((a1 and C1) or U1 or (not a1 and D1))) or
  (U2 and B1) or ((not a2 and D2) and B1)

And now let’s distribute a2 while reordering the arguments:

(((a1 and a2) and C1) or (a2 and U1) or ((a2 and D1) and not a1) or
  (U2 and B1) or ((not a2 and D2) and B1)

In the first term of the union, we have a1 and a2 as a literal intersection. If a1 and a2 is empty, then the whole C1 node can be eliminated.

Then we proceed recursively intersect a2 with every literal node in a2 and U1 and a2 and D1. And, if their literal nodes are empty, those subtrees are eliminated too. This allows us to dramatically cut down the size of tree! In our benchmarks, these optimizations allowed us to reduce type checking of a module from 10s to 25ms.

The remaining terms of the union are U2 and B1 and (not a2 and D2) and B1. If desired, we could apply the same eager literal intersection optimization to U2 and B1 (as long as the constrained part in either U2 or B1 are :top).

This optimization has worked quite well for us, but it is important to keep in mind since BDDs are ordered and the literal intersection may create a new literal value, this optimization must be applied semantically so we can recompute the position of intersected literals in the tree. We cannot apply it when we are already traversing the tree using the general lazy BDD formulas from the previous article.

Finally, note this optimization may eagerly reintroduce some of the complexity seen in DNFs if applied recursively. For instance, if you have (foo or bar) and (baz or bat), the recursive application of eager literal intersections will yield (foo and baz) or (foo and bat) or (bar and baz) or (bar and bat). If most of those intersections are eliminated, then applying eager literal intersections is still beneficial, but that may not always be the case.

To discuss exactly when these trade-offs may be problematic, let’s talk about open vs closed types.

Open vs closed types

Elixir’s type system can represent both open and closed maps. When you write:

user = %{name: "john", age: 42}

We are certain the map has keys :name and :age and only those keys. We say this map is closed, as it has no other keys, and it would have the type %{name: binary(), age: integer()}.

However, when you pattern match on it:

def can_drive?(%{age: age}) when is_integer(age) and age >= 18

Because pattern matching only validates the age key, a map given as argument may have other keys! Therefore, we say the map is open and it has the type %{..., age: integer()}. This type says the map may have any keys but we are sure the age is integer().

The trouble is that, when we are intersecting two maps, because the open map is very broad, their intersection rarely eliminate entries. For example, the intersection between %{..., age: integer()} and %{..., name: binary()} is the map %{..., name: binary(), age: integer()}.

So when we have to compute the intersection between (foo or bar) and (baz or bat) and foo, bar, baz, and bat are open maps with different keys, then it will generate a cartesian product of all combinations! However, if they were closed maps, the end result would be empty. For this reason, we recommend applying the eager literal intersection only when the intersection will often lead to empty types.

Optimizing differences

The difference between B1 \ B2 can always be expressed as the intersection between B1 and not B2, which is precisely how we write differences in Elixir.

Now take the following type:

{:ok, integer()} and not {:error, integer()}

Currently, both ok-type and error-type are stored as nodes in the BDD. However, it is clear in the example above the types are disjoint and the result is {:ok, integer()}.

If we know the literals a1 and a2 are disjoint (their intersection is empty), then it is likely we can avoid adding nodes to the tree.

Furthermore, imagine this type:

{:ok, integer()} and not (not {:error, integer()})

The difference will convert the negation into a positive, resulting in {:ok, integer()} and {:error, integer()}, which is empty. Therefore, if B2 contains negations (which means its D2 component is not bottom), then we can also apply the eager literal intersection optimization from the previous section.

We will explore both scenarios next, starting with the one where D2 is bottom.

When `D2` is bottom

We start B1 and not B2:

B1 and not B2

Next let’s break B1 into (a1 and C1) or B1_no_C1, where B1_no_C1 is U1 or (not a1 and D1):

((a1 and C1) or B1_no_C1) and not B2

Now we distribute the difference:

((a1 and C1) and not B2) or (B1_no_C1 and not B2)

Let’s solve the left-hand side. We know B2 = (a2 and C2) or U2 (remember D2 is bottom), so let’s add that:

(a1 and C1 and not ((a2 and C2) or U2))
or (B1_no_C1 and not B2)

Now distribute the not and remove the parenthesis:

(a1 and C1 and not (a2 and C2) and not U2)
or (B1_no_C1 and not B2)

Now, if a1 and a2 are disjoint, then a1 and not (a2 and C2) is the same as a1. This happens because if a1 and a2 are disjoint, a2 and C2 is a subset of a2, which is then also disjoint with a1. So the first part simplifies to a1 and C1 and not U2:

(a1 and C1 and not U2) or (B1_no_C1 and not B2)

Now by expanding B1_no_C1 into U1 or (not a1 and D1) and distributed the union, we get:

(a1 and C1 and not U2) or (U1 and not B2) or (not a1 and D1 and not B2)

which is in the BDD format! So we can rewrite the difference of disjoint literals as:

{a1, C1 and not U2, U1 and not B2, D1 and not B2}

which completely avoids adding a2 to the BDD and can then continue recursively.

When `D2` is not bottom

When D2 is not bottom, it means B2 has a negated component. Since B2 itself is negated when part of a difference, the D2 component of B2 becomes an intersection, and we can apply the same eager literal technique we applied to intersections.

Once again, we start B1 and not B2:

B1 and not B2

Now let’s break B2 into (B2_no_D2 or (not a2 and D2)), where B2_no_D2 is (a2 and C2) or U2:

(B1) and not (B2_no_D2 or (not a2 and D2))

Now we distribute the negation all the way through:

(B1) and (not B2_no_D2 and (a2 or not D2))

And distribute B1’s intersection with (a2 or not D2):

((B1 and a2) or (B1 and not D2)) and not B2_no_D2

B1 and a2 is an eager literal intersection from the previous section, which we can reuse!

Furthermore, notice at the end we compute the difference between ((B1 and a2) or (B1 and not D2)) and B2_no_D2. Given B2_no_D2 by definition has D2 = bottom, we can apply the optimized difference for when D2 is bottom.

At the moment, we are not applying this optimization in Elixir, as the difference with negations on the right-hand side are uncommon. We may revisit this in the future.

Results

We initially implemented eager literal intersections as part of Elixir v1.20 release, which reduced the type checking time of one of the pathological cases from 10 seconds to 25ms!

However, our initial implementation also caused a performance regression, as we did not distinguish between open and closed maps. This regression was addressed by applying the optimization only to closed maps, as discussed in the article.

Permalink

From Minimum Viable Product to Mission Critical - Camjar Djoweini | Erlang Solutions Webinar

Permalink

The Picture They Paint of You

I keep noticing that the way AI SREs and coding agents are sold is fairly different: coding assistants are framed as augmenting engineers and are given names, and AI SREs are named “AI SRE” and generally marketed as a good way to make sure nobody is distracted by unproductive work. I don’t think giving names and anthropomorphizing components or agents is a good thing to do, but the picture that is painted by what is given a name and the framing brought up for tech folks is evocative.

This isn’t new; because people already pointed out how voice assistants generally replicated perceived stereotypes and biases—both in how they’re built but also in how they’re used—all I had to do was keep seeing announcements and being pitched these tools to see the pattern emerge. Similar arguments are currently made for agents in the age of LLMs, where agents can be considered to be encoding specific dynamics and values as well.

And so whatever I’m going to discuss here is a small addition to the existing set of perspectives encoded in existing products, and one that is not inclusive (eg. Sales Development Representatives, through AI SDRs, also join all sorts of professions, craftspeople, and artists on this list). I’m using AI SREs and Coding Assistants because I think it’s a very clear example of a divide on two functions that are fairly close together within organizations.

The observations

Here’s a quick overview of various products as I browsed online and gathered news and announcements from the space. The sampling isn't scientific, but it covers a broad enough set of the players in the current market.

AI SREs

Vendor	Product Name	Framing	Comments
bacca.ai	AI SRE	“cuts downtime  before it cuts your profits”, “stop firefighting, start innovating”, “frees your engineers from the grind of constant troubleshooting”
resolve.ai	AI SRE	“Machines on-call for humans”, “Removing the toil of investigations, war rooms, and on-call”, “Operates tools and reasons through complex problems like your expert engineers”🔗	Their AI SRE buyer’s guide also provides framing such as “engineering velocity stalls because teams spend the majority of their time firefighting production issues rather than building new capabilities.”
Neubird	AI SRE, Hawkeye	“No more RCA Delays”, “No more time lost to troubleshooting”, “no more millions lost to downtime, delays, and guesswork.”	The name Hawkeye, a superhero product name, is used in press releases and one of the FAQ questions, but is otherwise absent from the product page. There is a closing frame on a video that uses the words "AI SRE Teammate."
Harness	AI SRE, AI Scribe, AI Root Cause Analysis	“Scales your response, not your team”, “Reduce MTTR”, “Standardize first response”, “Let AI Handle The Busy Work While Your Team Solves What Matters”	Their FAQ explicitly compares human and AI SREs by stating “Traditional SRE relies on manual processes and rule-based automation, while AI SRE uses machine learning to adapt, predict issues, and automate complex decision-making at scale.”
incident.io	AI SRE	“resolves incidents like your best engineer”, “The SRE that doesn't sleep”, “No need to stall the whole team”, “Keep builders building”, “AI SRE does all the grunt work [postmortems] too.”
Rootly	AI SRE, Rootly AI	“AI SRE agents and your teams resolve incidents together”, “your expert engineer in every incident”, “quickly identify root causes and the fix—even if you don't know that code”	In late 2025, the page instead had a framing of “Detect, diagnose, and remediate incidents with less effort” with no reference to teamwork
cleric.ai	Cleric	“investigates production issues, captures what works, and makes your whole team faster”, “Skip straight to the answer”, “Unblock your engineers”,	One of the few with a name, possibly a DnD support role reference.
AlertD	AI SRE	“AI Agents For SREs and DevOps”, “Stop losing hours to scripting and tool switching”, “Unite SRE and DevOps tribal knowledge with AI agents”, “Best-practice AI agent guidance for next steps by your DevOps and SREs”, “Share AI dashboards and insights to act smarter, together”, “Work smarter with your AI”	This is one of two products my summary search revealed with a framing that tries to help SREs and DevOps instead of having a focus on replacing them.
AWS	DevOps Agent	“your always-on, autonomous on-call engineer”, “resolves and proactively prevents incidents, continuously improving reliability and performance”, reduce MTTR […] and drive operational excellence.”
Ciroos	Ciroos	“Become an SRE superhero”, “increase human ingenuity”, “AI SRE Teammate for site reliability engineering (SRE), IT Operations, and DevOps teams”🔗, “extends the capabilities of every SRE team”	Other product that aims to help SRE and DevOps teams. Name is relatively human. The automation model described in the FAQ repeats certain myths, but it’s far more transparent and more grounded than others in this list.

Disclaimer: I have not tried any of the above; this list is built from the products’ own pages.

Of all of these, only a few mention possible teamwork, and only two of these do so by being a teammate to your SRE staff. Every other one of these instead frames the work as either less important or as worth replacing, sometimes very explicitly. Some have names that refer to superheroes or DnD support classes, most are just named after the role they aim to substitute.

Coding Assistants

Vendor	Product Name	Framing	Comments
Anthropic	Claude Code	“Built for builders / programmers / creators / …”, “Describe what you need, and Claude handles the rest.”, “Stop bouncing between tools”, “meets you where you code”, “you’re in control”	Human name, emphasizes aspects of delegation
Google	Gemini code assist	“Uncap your potential and get all of your development done”, “Experience coding with fewer limits”, “Accelerate development”, “[offload] repetitive tasks”, “reduce code review time”	Name is the latin word for “twins”; framing seeks both augmentation but some delegation.
Zed	Zed (Editor)	“minimal code editor crafted for speed and collaboration with humans and AI”, “AI that works the way you code”, “fluent collaboration between humans and AI”	Not technically a coding assistant, but an environment to collaborate with them
Github	Copilot	“Command your craft”, “accelerator for every workflow”, “stay in your flow”, “code, command, and collaborate”, “Ship faster with AI that codes with you”	The naming fits a role that is collaborative, and both it and the positioning try to articulate collaboration while you lead.
Cline	Cline	“Your coding partner”, “Collaborative by nature, autonomous when permitted”, “fully collaborative AI partner”, “Make coordinated changes across large codebases”
Windsurf	Cascade, Editor	“most powerful way to code with AI”, “limitless power, complete flow”, “saves you time and helps you ship products faster”, “removes the vast amounts of time spent of boilerplate and menial tasks so that you can focus on the fun and creative parts of building.”	Not technically a coding assistant for the editor side, but also provides agents.
Cursor	Cursor (editor)	“Built to make you extraordinarily productive”, “accelerate development by handing off tasks”, “reviews your PRs, collaborates in Slack, and runs in your terminal”, “develop enduring software”	Also not a coding assistant, but has tabs to interact with them.
OpenAI	Codex	“Built to drive real engineering work”, “reliably completes tasks end to end, like building features, complex refactors, migrations, and more”, “command center for agentic coding”, “Adapts to how your team builds”, “Made for always-on background work”	This is one of the few AI coding tools orients itself into a more definitive substitutive role, even if it stills pays lip service to working with your team.

Disclaimer: I have tried some of the above, but not all; this list is built from the products’ own pages.

You can see from the tables above that each of these tools has a more distinct name, with some being a person’s name. The vast majority of these are framed as tools that aim to augment an engineer or a team, to make them more productive, let them do more within their roles.

So what are the implications here?

The way these products are presented paints two very distinct pictures (even if exceptions exist in each category):

Software Engineering work is perceived as valuable work; the engineer is in control and deserves more power, more control, more productivity. The AI exists to be a partner, a teammate, or an assistant.
Software Reliability Engineering work is a hindrance; teams need to be distracted less by these tasks and instead focus on more valuable work. Human limitations—such as needing to sleep—need to be overcome. The AI exists to replace or be a substitute to the worker.

These models potentially replicate and project to the rest of the world the ways these roles are perceived internally.

For example, I’ve written in the past about how I see incidents and outages as worthy learning opportunities to orient organizations; this framing necessarily perceives SRE as doing important work you wouldn’t want to ignore. The vision behind AI SREs is the opposite. Incidents and outages are one-off exceptions to paper over and move on from, rather than a structural and emergent consequence of what you do (and how you do it) and from which you should learn.

This sort of thing is interesting because it can also be indicative of the split between what practitioners think of their work (learning from incidents is a necessity), and what decision-makers above them may think of the work and function (these postmortems are grunt work).

Much like AI assistants shaped after secretaries were described as showing a vision that mimics the relation between servants and masters, the way we frame AI tooling for all types of workers exposes the way their builders think about that work.

But it’s also a signal about how the buyers feel about that work. In case the role sold is one of a partner or teammate, you need to sell this idea to both the employee who’ll work with the tool, and to the employer who will pay for it. When you sell technology that replaces a role or function, then you only need to speak to the person with the money.

The implication then is that what these tools project is a mix of how the role is perceived on either side of the transaction. If, as an employee, you feel like the tools are only doing part of the work you value, that may imply few people with power or influence actually value it the same way you do.

This does not mean organizations can fully succeed in the substitution effort. Time and time again history has shown that part of a role can be automated and centralized, and the rest of it will be piled onto fewer individuals who will do the hard-to-automate bits and will then coordinate the automation for the rest of it—something called the left-over principle.

As automation capacity increases and as organizations transform themselves to make room for it all, the dynamic evolves.

It’s already pretty clear to me that the vision many builders and buyers have of SREs is often a very reductionist and unflattering one. The role hasn’t yet gone away, possibly because there’s more to it than builders and buyers believe. I figure the evolving portrait of software engineering is equally incomplete at this point, depending on the complexity of the system you’re trying to create and control.

What are they now painting?

Just for fun, I also looked at how the frameworks that promise to automate all code generation are framed. Codex in the table above is inching that way, but the portfolio grows.

Anthropic is introducing agent teams where the teammates are below you. You are directing a team lead that in turn directs teammates. The discourse is one of control, where collaboration is delegated to agents, which you can still manage more directly. GasTown puts you in the seat of a product manager, and the entire development team is abstracted into deeper hierarchies. Amp is also about coordinating agents (of various skills, roles, and costs) while targeted to developers still, but doesn’t drive the analogy as hard.

The enthusiasm is there, and more reports are coming around the Software Factory approach, such as StrongDM experimenting with code that must not be reviewed by humans or the outcome engineering manifesto which imply that the future is in being a high-level controller around large groups of faceless agents, which you must constrain and provide enough information to in order for them to act well.

The trend is seemingly moving away from a partnership between the software engineer and their automation, and into a view that reminds me far more of Taylorism. Maybe that shift is happening because that’s generally what comes to mind when people think of automating production away from manual work.

These products are conceptualized by analogy. Take a pattern you know, and replicate some key properties in a different space. This is an absolutely normal way of exploring new areas, of transferring understanding from one domain to another.

I get that spitting code fast is valuable for many. But if we believe workers can bring more to the table than Taylor did, then this vision is limiting. If we believe that this doesn’t apply because the agents are not that capable, then reductive anthropomorphism isn’t fitting either. In both cases, we should demand and seek better analogies, because a better representation of work as we do it should result in better tools.

That’s because as much as an analogy can be a lever, it can also be a straitjacket. When you’re stuck inside a model, you interpret everything in its own terms, and it becomes much harder to adopt a different perspective or to break out of the oversimplification. And once you’ve made sense of the new space well enough, you ideally don’t need to rely on the analogy anymore: your understanding stands on its own.

In accepting the Taylorist software factory frameworks or AI SREs built while framing the work as low-status, we also—at a social level—tacitly amplify these representations and give them validity. This is necessarily done at the cost of alternative designs, by settling the space with products conceived as poor caricatures of actual work. It lacks respect and is conceptually weak.

We keep being told it has never been cheaper, easier, or more accessible to create new stuff. This should give everyone involved more time to explore the problem space and learn. Yet here we are.

The picture they paint of you says a lot. Just not about you.

Permalink

Erlang/OTP 29.0 Release Candidate 1

OTP 29.0-rc1

Erlang/OTP 29.0-rc1 is the first release candidate of three before the OTP 29.0 release.

The intention with this release is to get feedback from our users. All feedback is welcome, even if it is only to say that it works for you. We encourage users to try it out and give us feedback either by creating an issue at https://github.com/erlang/otp/issues or by posting to Erlang Forums.

All artifacts for the release can be downloaded from the Erlang/OTP Github release and you can view the new documentation at https://erlang.org/documentation/doc-17.0-rc1/doc. You can also install the latest release using kerl like this:

kerl build 29.0-rc1 29.0-rc1.

Erlang/OTP 29 is a new major release with new features, improvements as well as a few incompatibilities. Some of the new features are highlighted below.

Many thanks to all contributors!

General

In the default code path for the Erlang system, the current working directory (.) is now in the last position instead of the first.
There is no longer a 32-bit Erlang/OTP build for Windows.

New language features

Native records as described in EEP-79 has been implemented. A native record is a data structure similar to the traditional tuple-based records, except that is a true data type. Native records are considered experimental in Erlang/OTP 29 and possibly also in Erlang/OTP 30.
The new is_integer/3 guard BIF makes it possible to easily verify that a value is both an integer and within a certain range. For example: is_integer(I, 0, 100)
Multi-valued comprehensions according to EEP 78 are now supported. For example, [-I, I || I <- [1, 2, 3]] will produce [-1,1,-2,2,-3,3].
By enabling the compr_assign feature, it is now possible to bind variables in a comprehensions. For example: [H || E <- List, H = erlang:phash2(E), H rem 10 =:= 0]

Compiler and JIT improvements

In the documentation for the [compile] module, there is now a section with recommendations for implementors of languages running on the BEAM.
The JIT now generates better code for matching or creating binaries with multiple little-endian segments.
The compiler will generate more efficient code for map comprehensions with constant values that don’t depend on the generator. Example: #{K => 42 || K <- List}

Compiler warnings

There are several new compiler warnings enabled by default. For each such warning, there is an option to disable it.

There will now be a warning when using the catch operator, which has been deprecated for a long time. It is recommended to instead use try…catch but is also possible to disable the warning by using the nowarn_deprecated_catch option.
There will now be a warning when exporting variables out of a subexpression. For example: file:open(File, AllOpts = [write, {encoding,utf8}]). This warning can be disabled using the nowarn_export_var_subexpr compiler option.
The compiler will now warn for uses of the and and or operators. This warning can be disabled using the nowarn_obsolete_bool_op compiler option.
The compiler will now warn for matches such as {a,B} = {X,Y}, which is better written as {a=X,B=Y}. This warning can be disabled using the nowarn_match_alias_pats option.

For a long time, there has been a warning for using the obsolete guard tests (such as list(L) instead of is_list(L). In Erlang/OTP 30, the old guards will be removed from the language.

STDLIB

There are new functions for randomly permutating a list: rand:shuffle/1 and rand:shuffle_s/2.

SSH

The default key exchange algorithm is now mlkem768x25519-sha256, a hybrid quantum-resistant algorithm combining ML-KEM-768 with X25519. This provides protection against both classical and quantum computer attacks while maintaining backward compatibility through automatic fallback to other algorithms when peers don’t support it.

For more details about new features and potential incompatibilities see the README.

Permalink

Making your System Debuggable - Lessons from Erlang Doctor | Paweł Chrząszcz

Permalink

Instrumenting Erlang and Elixir code with open telemetry - Part 2 | Denys Gonchar

Permalink

Type inference of all constructs and the next 15 months

Today we celebrate 15 years since Elixir’s first commit! To mark the occasion, we are glad to announce the first release candidate for Elixir v1.20, which performs type inference of all language constructs, with increasing precision.

In this blog post, we will break down exactly what this means, and what to expect in the short and medium term of the language evolution (roughly the next 15 months).

Types, in my Elixir?

In 2022, we announced the effort to add set-theoretic types to Elixir. In June 2023, we published an award winning paper on Elixir’s type system design and said our work was transitioning from research to development.

Our goal is to introduce a type system which is:

sound - the types inferred and assigned by the type system align with the behaviour of the program
gradual - Elixir’s type system includes the dynamic() type, which can be used when the type of a variable or expression is checked at runtime. In the absence of dynamic(), Elixir’s type system behaves as a static one
developer friendly - the types are described, implemented, and composed using basic set operations: unions, intersections, and negations (hence it is a set-theoretic type system)

However, I want to emphasize what the gradual typing means in Elixir. Many gradual type systems have the any() type, which, from the point of view of the type system, often means “anything goes” and no type violations are reported.

On the other hand, Elixir’s gradual type is called dynamic() and it works as a range. For example, you can say dynamic(integer() or float()), which means the type is either integer() or float() at runtime. Then if you proceed to pass it to a function that expects a binary(), you will get a typing violation. This allows the type system to emit warnings even in the presence of dynamism. Even if you declare a type as dynamic() and then proceed to use as integer() and then binary(), a type violation is still reported. We have also developed new techniques that ensure our gradual typing is sound, without a need for additional runtime checks.

The type system was made possible thanks to a partnership between CNRS and Remote. The development work is currently sponsored by Fresha, and Tidewave.

Let’s see how this is turning out in practice.

Inference across Elixir releases

Elixir v1.17 was the first release to introduce set-theoretic types into the compiler. Elixir v1.18 added inference of patterns and return types. Therefore, if you wrote this code:

defmodule User do
  defstruct [:age, :car_choice]

  def drive(%User{age: age, car_choice: car}, car_choices) when age >= 18 do
    if car in car_choices do
      {:ok, car}
    else
      {:error, :no_choice}
    end
  end

  def drive(%User{}, _car_choices) do
    {:error, :not_allowed}
  end
end

Elixir’s type system will infer the drive function expects a User struct as input and returns either {:ok, dynamic()} or {:error, :no_choice} or {:error, :not_allowed}. Therefore, the following code

User.drive({:ok, %User{}}, car_choices)

will emit a warning stating that we are passing an invalid argument, both in your IDE and the shell:

Example of a warning when passing wrong argument to a function

Now consider the expression below. We are expecting the User.drive/2 call to return :error, which cannot possibly be true:

case User.drive(user, car_choices) do
  {:ok, car} -> car
  :error -> Logger.error("User cannot drive")
end

Therefore the code above would emit the following warning:

Example of a warning when a case clause won't ever match

However, Elixir v1.18 could only infer types from patterns. If you wrote this code:

def user_age_to_string(user) do
  Integer.to_string(user.age)
end

Elixir would not infer anything about the function arguments. As of Elixir v1.20-rc, Elixir correctly infers the function to be %{..., age: integer()} -> binary(), which means it expects a map with at least the age field (the leading ... indicates other keys may be present) and it returns a binary().

Or let’s see another example:

def add_rem(a, b) do
  rem(a + b, 8)
end

While a + b works with both integers and floats, because the rem (remainder) function works exclusively with integers, Elixir correctly infers that a and b must also both be integers. If you try calling the function above with a float, you will also get a type violation.

In a nutshell, we have been steadily increasing the amount of inference in Elixir programs. Our goal is to find typing violations in Elixir programs for free, without a need for developers to change existing code. And, in the last few days, we finally wrapped up the last missing piece.

Inference of guards

Elixir v1.20-rc also performs inference of guards! Let’s see some examples:

def example(x, y) when is_list(x) and is_integer(y)

The code above correctly infers x is a list and y is an integer.

def example({:ok, x} = y) when is_binary(x) or is_integer(x)

The one above infers x is a binary or an integer, and y is a two element tuple with :ok as first element and a binary or integer as second.

def example(x) when is_map_key(x, :foo)

The code above infers x is a map which has the :foo key, represented as %{..., foo: dynamic()}. Remember the leading ... indicates the map may have other keys.

def example(x) when not is_map_key(x, :foo)

And the code above infers x does not have the :foo key (hence x.foo will raise a typing violation), which has the type: %{..., foo: not_set()}.

You can also have expressions that assert on the size of data structures:

def example(x) when tuple_size(x) < 3

Elixir will correctly track that the tuple has at most two elements, and therefore accessing elem(x, 3) will emit a typing violation. In other words, Elixir can look at complex guards, infer types, and use this information to find bugs in our code!

The next ~15 weeks

As we work on the type system, we have been carefully monitoring the compiler performance. And while we have been able to develop new techniques to keep everything running smoothly, the next weeks will dramatically ramp up the amount of type information flowing through the compiler, and therefore we need your feedback.

The next Elixir release is scheduled for May. We are shipping this release candidate earlier than usual for validation. We also plan to launch at least two additional release candidates with increased type checking.

Jan/2026: inference of all constructs

The first release candidate is out right now, with type inference of all Elixir constructs. Please give it a try. However, at this stage, we expect some false positives: the type system will report warnings which are not actual violations. We will explain exactly why in the next paragraphs. So don’t change your programs yet. The most valuable feedback we want from you is performance! If everything compiles at roughly the same speed as before, then hooray!

Feb-Mar/2026: inference across clauses

The second release candidate will add type inference across clauses. Let’s see some examples. Take this code:

case some_function_call() do
  %{name: name} = user -> ...
  %{first_name: first, last_name: last} = user -> 
end

Today, we know user in the first clause has the name field (and potentially other fields). We know that user in the second clause has first_name and last_name. The code above also implies that user in the second clause does not have the name field (after all, if it had the name field, the first clause would have matched). In other words, pattern matching order becomes a source of negative type information. In the first release candidate, the type system cannot infer this information yet, but it will be implemented in the following release candidate.

Besides giving us more precise types, the above will also allow us to perform exhaustiveness checks as well as find redundant clauses (note we already warn for clauses that won’t ever match since Elixir v1.18).

However, it is worth keeping in mind the work is a bit more complex than one might think. For example, take this code:

case some_function_call() do
  %{age: age} = user when age >= 21 -> ...
  %{name: name} = user -> 
end

Can we say the user in the second clause does not have the age field? No, we can’t, because the first clause only matches if age is greater than or equal to 21. So the second clause will still match users with a lower age. This means we must distinguish between “surely accepted clauses” and “potentially accepted clauses”.

Apr-May/2026: inference across dependencies

Finally, we will ship a third release candidate, which enables type inference for function calls across your dependencies. In the current release candidate, Elixir can infer types from function calls, but such inference only applies to modules from Elixir’s standard library. Take the following code:

def integer_to_string(x) do
  Integer.to_string(x)
end

In the code above, we will infer x is an integer(), but if instead you call MyInteger.to_string(x) from a dependency, we only perform type checking, we won’t infer the integer_to_string function expects an integer. Once implemented, this step will drastically increase the amount of types flowing through the compiler, hence we are dedicating a release candidate for it.

The next ~15 months

At this point, you may be wondering: when can we officially claim Elixir is statically typed?

When we first announced the type system effort, we broke it into three distinct milestones:

Type inference of patterns and guards: this is our current milestone which has, since then, been extended to type inference of all language constructs
Introduction of typed structs, allowing struct types to propagate throughout the system, as we pattern match on structs throughout the codebase
Introduction of type signatures, including for parametric and protocol polymorphism

Assuming all release candidates above go according to plan, we will officially conclude the first milestone as part of Elixir v1.20 and start working on the subsequent ones. However, there are still challenges ahead that may prove the type system to be impractical:

Ergonomics: all of our improvements so far have happened behind the scenes, without changes to the language. While this has been very valuable to validate the feasibility and performance of the type system, we still need to assess its impact on the developer experience
Performance: our current implementation does not yet support recursive and parametric types and those may also directly impact performance and make the type system unfeasible

Our goal is to explore these problems and their solutions in the future Elixir v1.21 (Nov/2026) and v1.22 (May/2027) releases, by implementing these operations in the compiler and using it to internally type complex Elixir modules, such as the Enum module. So while we don’t have a precise date for when we will conclude these upcoming milestones, we will likely continue to see gradual improvements on every release for the next 15 months.

Wrapping up

The first release candidate for Elixir v1.20 is out and includes type inference of all constructs. We will have multiple release candidates before the final release in May/2026, and your feedback is very important:

Jan/2026: inference of all constructs, may have many false positives, assess performance!
Feb-Mar/2026: inference across clauses, few or none false positives, assess performance!
Apr-May/2026: inference across dependencies, assess performance!

Every release will have a thread in the Elixir Forum for discussion.

Check our documentation to learn more about our overall work on set-theoretic types. This release also includes our official types cheatsheet.

The complete CHANGELOG for this release is on GitHub.

Happy coding!

Permalink

Software Acceleration and Desynchronization

A bit more than a month ago, I posted the following on social media:

Seeing more reports and industry players blaming code reviews for slowing down the quick development done with AI. It's unclear whether anyone's asking if this is just moving the cognitive bottleneck of "understanding what's happening" around. "Add AI to the reviews" seems to be the end goal here.

And I received multiple responses, some that were going "This is a terrible thing" and some going "yeah that's actually not a bad idea." Back then I didn't necessarily have the concepts to clarify these thoughts, but I've since found ways to express the issue in a clearer, more system-centric way. While this post is clearly driven by the discourse around AI (particularly LLMs), it is more of a structural argument about the kind of changes their adoption triggers, and the broader acceleration patterns seen in the industry with other technologies and processes before, and as such, I won’t really mention them anymore here.

The model I’m proposing here is inspired by (or is a dangerously misapplied simplification of) the one presented by Hartmut Rosa’s Social Acceleration,1 bent out of shape to fit my own observations. A pattern I’ll start with is one of loops, or cycles.

Loops, Cycles, and Activities

Let’s start with a single linear representation of the work around writing software:

A directed graph containing: plan work → write code → code reviews → deploy

This is a simplification because we could go much deeper, such as in this image of what value stream mapping could look like in the DORA report:2

DORA report 2025 Figure 50: value stream mapping, which contains elements in a sequence such as backlog, analyse codebase, code, generate unit tests, security analysis, code commit, production, etc. The gap between code commit and production is expanded to show sub-steps like create merge, code review, build, deploy to QA, deploy bugfix, etc.

But we could also go for less linear to show a different type of complexity, even with a simplified set of steps:

Each of the steps above can imply a skip backwards to an earlier task, and emergencies can represent skips forwards. For the sake of the argument, it doesn't matter that our model is adequately detailed or just a rough estimation; we could go for more or less accurate (the “write tests” node could easily be expanded to fill a book), this is mostly for illustrative purposes.

Overall, in all versions, tasks aim to go as quickly as possible from beginning to end, with an acceptable degree of quality. In a mindset of accelerating development, we can therefore take a look at individual nodes (writing code, debugging, or reviewing code) for elements to speed up, or at overall workflows by influencing the cycles themselves.

For example, code reviews can be sped up with auto formatting and linting—automated rule checks that enforce standards or prevent some practices—which would otherwise need to be done by hand. This saves time and lets people focus on higher-level review elements. And the overall cycle can be made faster by moving these automated rules into the development environment, thereby tightening a feedback loop: fix as you write rather than accumulating flaws on top of which you build, only to then spend time undoing things to fix their foundations.

Concurrent Loops

So far, so good. The problem though is that this one isolated loop is insufficient to properly represent most of software work. Not only are there multiple tasks run in parallel by multiple people each representing independent loops, each person also is part of multiple loops as well. For example, while you might tackle a ticket for some piece of code, you may also have to write a design document for an upcoming project, provide assistance on a support ticket, attend meetings, focus on developing your career via mentorship sessions, and keep up with organizational exercises through publishing and reading status reports, and so on.

Here's a few related but simplified loops, as a visual aid:

Once again, each person on your team may run multiple of these loops in parallel during their workday, of various types.

But more than that, there might be multiple loops that share sections. You can imagine how writing code in one part of the system can prepare you or improve your contributions to multiple sorts of tasks: writing code that interacts with it, modifying it, reviewing changes, writing or reviewing docs, awareness of possible edge cases for incidents, better estimates of future tasks, and so on.

You can also imagine how, in planning how to best structure code for new product changes, experience with the current structure of the code may matter, along with awareness of the upcoming product ambitions. Likewise, the input of people familiar with operational challenges of the current system can prove useful in prioritizing changes. This sort of shared set of concerns informed ideas like DevOps, propelled by the belief that good feedback and integration (not throwing things over the fence) would help software delivery.

Basically, a bunch of loops can optionally contribute to one set of shared activities, but some activities can also be contributors to multiple loops, and these loops might be on multiple time scales:

A coding loop, a career growth loop, and a high-level architecture loop are all depicted as concurrent. However, the three loops share a step. For the coding loop, this is code review, for the growth loop it is about awareness of teams work and norm enforcement, and for the architecture loop, it represents change awareness.

Here, the activity of reviewing code might be the place where the coding loops gets straightforward reviews as desired, but it is also a place where someone's career growth plans can be exercised in trying to influence or enforce norms, and where someone looking at long-term architectural and growth pattern gets to build awareness of ongoing technical changes.

These shared bits are one of the factors that can act like bottlenecks, or can counter speed improvements. To make an analogy, consider the idea that if you were cycling to work 30 minutes each way every day, and sped up your commute by going twice as fast via public or private transit, you’d save 2h30 every week; however some of that time wouldn’t be “saved” if you consider you might still need to exercise to stay as fit physically. You would either need to spend much of your time saved on exercise outside of commute, or end up incidentally trading off commute time for longer-term health factors instead.

Applied to software, we may see this pattern with the idea of “we can now code faster, but code review is the new bottleneck.” The obvious step will be to try to speed up code reviewing to match the increased speed of code writing. To some extent, parts of code reviewing can be optimized. Maybe we can detect some types of errors more reliably and rapidly through improvements. Again, like linting or type checking, these ideally get moved to development rather than reviews.

But code reviewing is not just about finding errors. It is also used to discuss maintainability, operational concerns, to spread knowledge and awareness, to get external perspectives, or to foster broader senses of ownership. These purposes, even if they could be automated or sped up, can all indicate the existence of other loops that people may have to maintain regardless.

Synchronization and Desynchronization

If we decide to optimize parts of the work, we can hope for a decent speedup if we do one of:

Continuously develop a proper understanding of the many purposes a task might have, to make useful, well-integrated changes
give up some of the shared purposes by decoupling loops

The first option is challenging and tends to require research, iteration, and an eye for ergonomics. Otherwise you’ll quickly run into problems of “working faster yet going the same speed”, where despite adopting new tools and methods, the bottlenecks we face remain mostly unchanged. Doing this right implies your changes are made knowing they'll structurally impact work when attempting to speed it up, and be ready to support these disruptions.

The second is easy to do in ways that accidentally slow down or damage other loops—if the other purposes still exist, new activities will need to replace the old ones—which may in turn feed back into the original loop (e.g.: code reviews may block present code writing, but also inform future code writing), with both being weakened or on two different tempos when decoupled. This latter effect is something we’ll call “desynchronization.” One risk of being desynchronized is that useful or critical feedback from one loop no longer makes it to another one.

To cope with this (but not prevent it entirely), we have a third option in terms optimization:

Adopt “norms” or practices ahead of time that ensure alignment and reduce the need for synchronization.

This is more or less what “best practices” and platforms attempt to provide: standards that when followed, reduce the need for communication and sense making. These tend to provide a stable foundation on which to accelerate multiple activities. These don’t fully prevent desynchronization, they just stave it off.

To illustrate desynchronization, let’s look at varied loops that could feed back into each other:

These show shared points where loops synchronize, across ops and coding loops, at review time. The learnings from operational work can feed back into platform and norms loops, and the code reviews with ops input are one of the places where these are "enforced".3 If you remove these synchronization points, you can move faster, but loops can also go on independently for a while and will grow further and further apart:

There's not a huge difference across both images, but what I chose to display here is that lack of dev-time ops input (during code review) might lead to duplicated batches of in-flight fixes that need to be carried and applied to code as it rolls out, with extra steps peppered through. As changes are made to the underlying platform or shared components, their socialization may lag behind as opportunities to propagate them are reduced. If development is sped up enough without a matching increase in ability to demonstrate the code's fitness (without waiting for more time in production), the potential for surprises goes up.

Keep in mind that this is one type of synchronization across one shared task between two high-level loops. Real work has more loops, with more nodes, more connections, and many subtler synchronization points both within and across teams and roles. Real loops might be more robust, but less predictable. A loop with multiple synchronization points can remove some of them and look faster until the few remaining synchronization points either get slower (to catch up) or undone (to go fast).

Not all participants to synchronization points get the same thing out of them either. It’s possible an engineer gets permission (and protection) from one, another gets awareness, some other team reinforces compliance, and a management layer claims accountability out of it happening, for example.

It's easy to imagine both ends of a spectrum, with on one end, organizations that get bogged down on synchronous steps to avoid all surprises, and on the other, organizations that get tangled into the web of concurrent norms and never-deprecated generations of the same stuff all carried at once because none of the synchronous work happens.

Drift that accumulates across loops will create inconsistencies as mental models lag, force corner cutting to keep up with changes and pressures, widen gaps between what we think happens and what actually happens.4 It pulls subsystems apart, weakens them, and contributes to incidents—unintended points of rapid resynchronization.

I consider incidents to be points of rapid resynchronization because they're usually where you end up desynchronizing so hard, incident response forces you to suspend your usual structure, quickly reprioritize, upend your roadmap, and (ideally) have lots of people across multiple teams suddenly update their understanding of how things work and break down. That the usual silos can't keep going as usual points to forced repair after too much de-synchronization.

As Rosa points out in his book, this acceleration tends to grow faster than what the underlying stable systems can support, and they become their own hindrances. Infrastructure and institutions are abandoned or dismantled when the systems they enabled gradually feel stalled or constrained by them, and seek alternatives:

[Acceleration] by means of institutional pausing and the guaranteed maintenance of background conditions is a basic principle of the modern history of acceleration and an essential reason for its success as well. [Institutions] were themselves exempted from change and therefore helped create reliable expectations, stable planning, and predictability. [...] Only against the background of such stable horizons of expectation does it become rational to make the long-term plans and investments that were indispensable for numerous modernization processes. The erosion of those institutions and orientations as a result of further, as it were, “unbounded” acceleration [...], might undermine their own presuppositions and the stability of late modern society as a whole and thereby place the (accelerative) project of modernity in greater danger than the antimodern deceleration movement.

The need for less synchronization doesn’t mean that synchronization no longer needs to happen. The treadmill never slows down, and actors in the system must demonstrate resilience to reinvent practices and norms to meet demands. This is particularly obvious when the new pace creates new challenges: what brought us here won’t be enough to keep going, and we’ll need to overhaul a bunch of loops again.

There’s something very interesting about this observation: A slowdown in one place can strategically speed up other parts.

Is This Specific Slow a Good Slow or a Bad Slow?

There’s little doubt to me that one can go through a full cycle of the “write code” loop faster than one would go through “suffering the consequences of your own architecture” loop—generally that latter cycle depends on multiple development cycles to get adequate feedback. You can ship code every hour, but it can easily take multiple weeks for all the corner cases to shake out.

When operating at the level of system design or software architecture (“We need double-entry bookkeeping that can tolerate regional outages”), we tend to require an understanding of the system’s past, a decent sense of its present with its limitations, and an ability to anticipate future challenges to inform the directions in which to push change. This is a different cycle from everyday changes (“The feature needs a transaction in the ledger”), even if both are connected.

The implication here is that if you’re on a new code base with no history and a future that might not exist (such as short-term prototypes or experiments), you’re likely to be able to have isolated short loops. If you’re working on a large platform with thousands of users, years of accrued patterns and edge cases, and the weight of an organizational culture to fight or align with, you end up relying on the longer loops to inform the shorter ones.

The connections across loops accrue gradually over time, and people who love the short loops get very frustrated at how slow they’re starting to be:

Yet irreversible decisions require significantly more careful planning and information gathering and are therefore unavoidably more time intensive than reversible ones. In fact, other things equal, the following holds: the longer the temporal range of a decision is, the longer the period of time required to make it on the basis of a given substantive standard of rationality. This illustrates the paradox of contemporary temporal development: the temporal range of our decisions seems to increase to the same extent that the time resources we need to make them disappear.

That you have some folks go real fast and reap benefits while others feel bogged down in having to catch up can therefore partially be a sign that we haven’t properly handled synchronization and desynchronization. But it can also be a function of people having to deliberately slow down their work when its output either requires or provides the stability required by the fast movers. Quick iterations at the background level—what is generally taken for granted as part of the ecosystem—further increases the need for acceleration from all participants.

In a mindset of acceleration, we will seek to speed up every step we can, through optimization, technological innovation, process improvements, economies of scale, and so on. This connects to Rosa’s entire thesis of acceleration feeding into itself.5 One of the point Rosa makes, among many, is that we need to see the need for acceleration and the resulting felt pressures (everything goes faster, keeping up is harder; therefore we need to do more as well) as a temporal structure, which shapes how systems work. So while technical innovation offers opportunities to speed things up (often driven by economic forces), these innovations transform how our social structures are organized (often through specialization), which in turn, through a heightened feeling of what can be accomplished and a feeling that the world keeps going faster, provokes a need to speed things up further and fuels technological innovation. Here's the diagram provided in his book:

We generally frame acceleration as an outcome of technological progress, but the idea here is that the acceleration of temporal structures is, on its own, a mechanism that shapes society (and, of course, our industry). Periods of acceleration also tend to come with multiple forms of resistance; while some are a bit of a reflex to try and keep things under control (rather than having to suffer more adaptive cycles), there are also useful forms of slowing down, those which can provide stability and lengthen horizons of other acceleration efforts.

Few tech companies have a good definition of what productivity means, but the drive to continually improve it is nevertheless real. Without a better understanding of how work happens, we’re likely to keep seeing a wide variation in how people frame the impact of new tech on their work as haphazard slashing and boosting of random parts of random work loops. I think this overall dynamic can provide a useful explanation for why some people, despite being able to make certain tasks much faster, either don't feel overall more productive, or actually feel like they don't save time and it creates more work. It's hard to untangle which type of slowdown is being argued for at times, but one should be careful to classify all demands of slowing down as useless Luddite grumblings.6 It might be more useful down the road to check whether you could be eroding your own foundations without a replacement.

What do I do with this?

A systems-thinking approach tends to require a focus on interactions over components. What the model proposed here does is bring a temporal dimension to these interactions. We may see tasks and activities done during work as components of how we produce software. The synchronization requirements and feedback pathways across these loops and for various people are providing a way to map out where they meet.

Ultimately even the loop model is a crude oversimplification. People are influenced by their context and influence their context back in a continuous manner that isn’t possible to constrain to well-defined tasks and sequences. Reality is messier. This model could be a tool to remind ourselves that no acceleration happens in isolation. Each effort contains the potential for desynchronization, and for a resulting reorganization of related loops. In some ways, the aim is not to find specific issues, but to find potential mismatches in pacing, which suggest challenges in adapting and keeping up.

The analytical stance adopted matters. Seeking to optimize tasks in isolation can sometimes yield positive local results, within a single loop, and occasionally at a wider scale. Looking across loops in all its tangled mess, however, is more likely to let you see what’s worth speeding up (or slowing down to speed other parts up!), where pitfalls may lie, and foresee where the needs for adjustments will ripple on and play themselves out. Experimentation and ways to speed things up will always happen and will keep happening, unless something drastically changes in western society; experimenting with a better idea of what to look for in terms of consequences is not a bad idea.

1: While I have not yet published a summary or quotes from it in my notes section, it's definitely one of the books that I knew from the moment I started reading it would have a huge influence in how I frame stuff, and as I promised everyone around me who saw me reading it: I'm gonna be very annoying when I'll be done with it. Well, here we are. Grab a copy of Social Acceleration: A New Theory of Modernity. Columbia University Press, 2013.

2: Original report, figure 50 is on p. 75.

3: This example isn't there to imply that the synchronization point is necessary, nor that it is the only one, only that it exists and has an impact. This is based on my experience, but I have also seen multiple synchronization points either in code review or in RFC reviews whenever work crosses silo boundaries across teams and projects become larger in organizational scope.

4: I suspect it can also be seen as a contributor to concepts such as technical debt, which could be framed as a decoupling between validating a solution and engineering its sustainability.

5: I believe this also connects to the Law of Stretched Systems in cognitive systems engineering, and might overall be one of these cases where multiple disciplines find similar but distinct framings for similar phenomena.

6: Since I'm mentioning Luddism, I need to do the mandatory reference to Brian Merchant's Blood in the Machine, which does a good job at reframing Luddism in its historical context as a workers' movement trying to protect their power over their own work at the first moments of the Industrial Revolution. Luddites did not systemically resist or damage all new automation technology, but particularly targeted the factory owners that offered poor working conditions while sparing the others.

Permalink

Erlang/OTP 28.3 Release

OTP 28.3

Erlang/OTP 28.3 is the second maintenance patch package for OTP 28, with mostly bug fixes as well as improvements.

POTENTIAL INCOMPATIBILITIES

Adjustment in ssh_file module allowing inclusion of Erlang/OTP license in test files containing keys.

HIGHLIGHTS

ssl

Support for MLKEM hybrid algorithms x25519mlkem768, secp384r1mlkem1024, secp256r1mlkem768 in TLS-1.3
ssl, public_key
Added support in public_key and ssl for post quantum algorithm SLH-DSA.

erts, kernel

Support for the socket options TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_KEEPINTVL have been implemented for gen_tcp, as well as TCP_USER_TIMEOUT for both gen_tcp and socket.

OTP

Publish OpenVEX statements in https://erlang.org/download/vex/

OpenVEX statements contain the same information as the OTP advisories, with the addition of vendor CVEs for which Erlang/OTP is not affected. This is important to silence vulnerability scanners that may claim Erlang/OTP to be vulnerable to vendor dependency projects, e.g., openssl.

OpenVEX statements will be published in https://erlang.org/download/vex/ where there will be an OTP file per release, e.g., https://erlang.org/download/vex/otp-28.openvex.json.

Erlang/OTP publishes OpenVEX statements for all supported releases, that is, as of today, OTP-26, OTP-27, and OTP-28.

The source SBOM tooling (oss-review-toolkit) has been updated to produce source SBOM in SPDX v2.3 format, and the source SBOM now links OpenVEX statements to a security external reference. This means that by simply analyzing the source SBOM, everyone can further read the location of the OpenVEX statements and further process them.

For details about bugfixes and potential incompatibilities see the Erlang 28.3 README

The Erlang/OTP source can also be found at GitHub on the official Erlang repository, https://github.com/erlang/otp

Download links for this and previous versions are found here:

https://www.erlang.org/downloads

Permalink

Lazier Binary Decision Diagrams (BDDs) for set-theoretic types

The Elixir team and the CNRS are working on a set-theoretic type system for Elixir which, simply put, is a type-system powered by unions, intersections, and negations. As part of the implementation of said type systems, we need an efficient way of representing said operations. This article discusses the existing approaches found in theory and practice, as well as the improvements we have introduced as part of Elixir v1.19.

This article covers the implementation details of the type system. You don’t need to understand these internals to use the type system, just as you don’t need to know virtual machine bytecodes or compiler passes to use a programming language. Our goal is to document our progress and provide guidance for future maintainers and implementers. Let’s get started.

DNFs - Disjunctive Normal Forms

A Disjunctive Normal Form (DNF) is a standardized way of expressing logical formulas using only disjunctions (unions) of conjunctions (intersections). In the context of set-theoretic type systems, DNFs provide a canonical representation for union and intersection types, represented respectively as or and and in Elixir.

In Elixir, we would represent those as lists of lists. Consider a type expression like (A and B) or (C and D). This is already in DNF, it’s a union of intersections, and it would be represented as: [[A, B], [C, D]]. This means performing unions between two DNFs is a simple list concatenation:

def union(dnf1, dnf2), do: dnf1 ++ dnf2

However, more complex expressions like A and (B or C) need to be converted. Using distributive laws, this becomes (A and B) or (A and C), which is now in DNF. In other words, the intersection of DNFs is a Cartesian product:

def intersection(dnf1, dnf2) do
  for intersections1 <- dnf1,
      intersections2 <- dnf2 do
    intersections1 ++ intersections2
  end
end

The advantage of DNFs is their simple structure. Every type can be represented as unions of intersecting terms, making operations like checking if a type is empty simply a matter of checking if all unions have at least one intersection that is empty:

def empty?(dnf) do
  Enum.all?(dnf, fn intersections ->
    Enum.any?(intersections, &empty_component?/1)
  end)
end

On the other hand, the snippets above already help us build an intuition on the drawbacks of DNFs.

First, we have seen how intersections are Cartesian products, which can lead to exponential blow ups when performing the intersection of unions. For example, (A₁ or A₂) and (B₁ or B₂) and (C₁ or C₂) leads to (A₁ and B₁ and C₁) or (A₁ and B₁ and C₂) or (A₁ and B₂ and C₁) or ..., with 8 distinct unions.

Furthermore, if we implement unions as simple list concatenations, those unions can end up with duplicated entries, which exacerbates the exponential blow up when we perform intersections of these unions. This forces us to aggressively remove duplicates in unions, making it more complex and expensive than a concatenation.

Despite their limitations, DNFs served us well and were the data structure used as part of Elixir v1.17 and v1.18. However, since Elixir v1.19 introduced type inference of anonymous functions, negations became more prevalent in the type system, making exponential growth more frequent. Let’s understand why.

Inferring anonymous functions

Imagine the following anonymous function:

fn
  %{full_name: full} -> "#{full}"
  %{first_name: first, last_name: last} -> "#{last}, #{first}"
end

We can say the first clause accepts any map with the key full_name. The second clause accepts any map with the keys first_name and last_name which DO NOT have the key full_name (otherwise they would have matched the first clause). Therefore, the inferred type should be:

$ %{full_name: String.Chars.t()} -> String.t()
$ %{first_name: String.Chars.t(), last_name: String.Chars.t()} and not
    %{full_name: String.Chars.t()} -> String.t()

As you can see, in order to express this type, we need a negation (not). Or, more precisely, a difference since A and not B is the same as A - B.

Implementing negations/differences in DNFs is relatively straightforward. Instead of lists of lists, we now use lists of two-element tuples, where the first element is a list of positive types, and the second is a list of negative types. For example, previously we said (A and B) or (C and D) would be represented as [[A, B], [C, D]], now it will be represented as:

[{[A, B], []}, {[C, D], []}]

While (A and not B) or C or D is represented as:

[{[A], [B]}, {[C], []}, {[D], []}]

The difference between two DNFs is implemented similarly to intersections, except we now need to perform the Cartesian product over the positive and negative parts of each conjunction. And given anonymous functions have differences, inferring the types of anonymous functions are now exponentially expensive, which caused some projects to take minutes to compile. Not good!

BDDs - Binary Decision Diagrams

Luckily, those exact issues are well documented in literature and are addressed by Binary Decision Diagrams (BDDs), introduced by Alain Frisch (2004) and later recalled and expanded by Giuseppe Castagna (2016).

BDDs represent set-theoretic operations as an ordered tree. This requires us to provide an order, any order, across all types. Given all Elixir values have a total order, that’s quite straightforward. Furthermore, by ordering it, we can detect duplicates as we introduce nodes in the tree. The tree can have three distinct node types:

type bdd() = :top or :bottom or {type(), constrained :: bdd(), dual :: bdd()}

:top represents the top type (where the intersection type and :top returns type) and :bottom represents the bottom type (where the intersection type and :bottom returns :bottom). Non-leaf nodes are represented via a three-element tuple, where the first element is the type (what we have been calling A, B… so far), the second element is called in literature the constrained branch, and the third element is the dual branch.

In order to compute the actual type of a non-leaf node, we need to compute (type() and constrained()) or (not type() and dual()) (hence the names constrained and dual). Let’s see some examples.

The type A is represented as {A, :top, :bottom}. This is because, if we compute (A and :top) or (not A and :bottom), we get A or :bottom, which is equivalent to A.

The type not A is represented as {A, :bottom, :top}, and it gives us (A and :bottom) or (not A and :top), which yields :bottom or not A, which is equivalent to not A.

The type A and B, assuming A < B according to a total order, is represented as {A, {B, :top, :bottom}, :bottom}. Expanding it node by node gives us:

(A and ((B and :top) or (not B and :bottom))) or (not A and :bottom)
(A and (B or :bottom)) or (not A and :bottom)
(A and B) or :bottom
(A and B)

While the difference A and not B is represented as {A, {B, :bottom, :top}, :bottom}, which we also expand node by node:

(A and ((B and :bottom) or (not B and :top))) or (not A and :bottom)
(A and (:bottom or not B)) or (not A and :bottom)
(A and not B) or :bottom
(A and not B)

Finally, the union A or B is implemented as {A, :top, {B, :top, :bottom}}. Let’s expand it:

(A and :top) or (not A and ((B and :top) or (not B and :bottom)))
(A and :top) or (not A and (B or :bottom))
A or (not A and B)
(A or not A) and (A or B)
:top and (A or B)
A or B

In other words, Binary Decision Diagrams allow us to represent unions, intersections, and differences efficiently, removing the exponential blow up. Guillaume Duboc implemented them as part of Elixir v1.19, addressing the bottlenecks introduced as part of the new type system features… but unfortunately BDDs introduced new slow downs.

The issue with BDDs comes when applying unions to intersections and differences. Take the following type (A and B) or C. Since we need to preserve the order A < B < C, it would be represented as:

{A, {B, :top, {C, :top, :bottom}}, {C, :top, :bottom}}

which can be expanded as:

(A and ((B and :top) or (not B and ((C and :top) or (not C and :bottom))))) or (not A and ((C and :top) or (not C and :bottom)))
(A and (B or (not B and C))) or (not A and C)
(A and (B or C)) or (not A and C)
(A and B) or (A and C) or (not A and C)
(A and B) or C

As you can see, although the representation is correct, its expansion ends-up generating too many disjunctions. And while we can simplify them back to (A and B) or C symbolically, doing such simplications in practice are too expensive.

In other words, the BDD expansion grows exponentially in size on consecutive unions, which is particularly troublesome because we must expand the BDD every time we check for emptiness or subtyping.

At the end of the day, it seems we traded faster intersections/differences for slower unions. Perhaps we can have our cake and eat it too?

BDDs with lazy unions (or ternary decision diagrams)

Luckily, the issue above was also forecast by Alain Frisch (2004), where he suggests an additional representation, called BDDs with lazy unions.

In a nutshell, we introduce a new element, called uncertain, to each non-leaf node to represent unions:

type lazy_bdd() = :top or :bottom or
  {type(), constrained :: lazy_bdd(), uncertain :: lazy_bdd(), dual :: lazy_bdd()}

We’ll refer to the uncertain as unions going forward.

The type of each non-leaf node can be computed by (type() and constrained()) or uncertain() or (not type() and dual()). Here are some examples:

A = {A, :top, :bottom, :bottom}
A and B = {A, {B, :top, :bottom, :bottom}, :bottom, :bottom}
A or B = {A, :top, {B, :top, :bottom, :bottom}, :bottom}

And, going back to (A and B) or C, it can be represented as:

{A, {B, :top, :bottom, :bottom}, {C, :top, :bottom, :bottom}, :bottom}

The duplication of C is fully removed. With our new representation in hand, the next step is to implement union, intersection, and difference of lazy BDDs, using the formulas found in literature and described below.

Assuming that a lazy BDD B is represented as {a, C, U, D}, and therefore B1 = {a1, C1, U1, D2} and B2 = {a2, C2, U2, D2}, the union of the lazy BDDs B1 or B2 can be computed as:

{a1, C1 or C2, U1 or U2, D1 or D2} when a1 == a2
{a1, C1, U1 or B2, D1} when a1 < a2
{a2, C2, B1 or U2, D2} when a1 > a2

The intersection B1 and B2 is:

{a1, (C1 or U1) and (C2 or U2), :bottom, (D1 or U1) and (D2 or U2)} when a1 == a2
{a1, C1 and B2, U1 and B2, D1 and B2} when a1 < a2
{a2, B1 and C2, B1 and U2, B1 and D2} when a1 > a2

The difference B1 and not B2 is:

{a1, (C1 or U1) and not (C2 or U2), :bottom, (D1 or U1) and not (D2 or U2)} when a1 == a2
{a1, (C1 or U1) and not B2, :bottom, (D1 or U1) and not B2} when a1 < a2
{a2, B1 and not (C2 or U2), :bottom, B1 and not (D2 or U2)} when a1 > a2

Guillaume Duboc first implemented lazy BDDs to represent our function types, addressing some of the bottlenecks introduced alongside BDDs. Afterwards, we attempted to convert all types to use lazy BDDs, hoping they would address the remaining bottlenecks, but that was not the case. There were still some projects that type checked instantaneously in Elixir v1.18 (which used DNFs) but took minutes on v1.19 release candidates, which could only point to large unions still being the root cause. However, weren’t lazy BDDs meant to address the issue with unions?

That was the question ringing in Guillaume’s head and in mine after an hours-long conversation, when we decided to call it a day. Unbeknownst to each other, we both continued working on the problem that night and the following morning. Separately, we were both able to spot the issue and converge on the same solution.

Lazier BDDs (for intersections)

If you carefully look at the formulas above, you can see that intersections and differences of equal nodes cause a distribution of unions. Here is the intersection:

{a1, (C1 or U1) and (C2 or U2), :bottom, (D1 or U1) and (D2 or U2)} when a1 == a2

Notice how U1 and U2 now appear on both constrained and dual parts and the whole union part of the node disappeared, now listed simply as :bottom.

In addition, considering the common case where C1 = C2 = :top and D1 = D2 = :bottom, the node above becomes {a1, :top, :bottom, U1 and U2}, which effectively moves the unions to the dual part. If you play close attention to it, since the uncertain is now :bottom, we reverted back to the original BDD representation. Any further union on those nodes will behave exactly as in the non-lazy BDDs, which we know to be problematic.

In other words, certain operations on lazy BDDs cause unions to revert to the previous BDD representation. So it seems lazy BDDs are not lazy enough? Could we stop this from happening?

Guillaume and I arrived at a new formula using different approaches. Given Guillaume’s approach can also be used to optimize differences, that’s the one I will show below. In particular, we know the intersection of equal nodes is implemented as:

{a1, (C1 or U1) and (C2 or U2), :bottom, (D1 or U1) and (D2 or U2)} when a1 == a2

If we distribute the intersection in the constrained part, we get:

(C1 and C2) or (C1 and U2) or (U1 and C2) or (U1 and U2)

If we distribute the intersection in the dual part, we get:

(D1 and D2) or (D1 and U2) or (U1 and D2) or (U1 and U2)

We can clearly see both parts have U1 and U2, which we can then move to the union! Leaving us with:

{a1,
 (C1 and C2) or (C1 and U2) or (U1 and C2),
 (U1 and U2),
 (D1 and D2) or (D1 and U2) or (U1 and D2)} when a1 == a2

We can then factor out C1 in the constrained and D1 in the dual (or C2 and D2 respectively), resulting in:

{a1,
 (C1 and (C2 or U2)) or (U1 and C2),
 (U1 and U2),
 (D1 and (D2 or U2)) or (U1 and D2)} when a1 == a2

While this new formula requires more operations, if we consider the common case C1 = C2 = :top and D1 = D2 = :bottom, we now have {a1, :top, U1 and U2, :bottom}, with the unions perfectly preserved in the middle. We independently implemented this formula and noticed it addressed all remaining bottlenecks!

Lazier BDDs (for differences)

The issues we outlined above for intersections are even worse for differences. Let’s check the difference formula:

{a1, (C1 or U1) and not (C2 or U2), :bottom, (D1 or U1) and not (D2 or U2)} when a1 == a2
{a1, (C1 or U1) and not B2, :bottom, (D1 or U1) and not B2} when a1 < a2
{a2, B1 and not (C2 or U2), :bottom, B1 and not (D2 or U2)} when a1 > a2

As you can see, all operations shuffle the union nodes and return :bottom. But this time, we know how to improve it! Let’s start with a1 == a2. If we expand the difference in the constrained part, we get:

(C1 and not C2 and not U2) or (U1 and not C2 and not U2)

If we do the same in the dual part, we have:

(D1 and not D2 and not U2) or (U1 and not D2 and not U2)

Unfortunately, there are no shared union terms between the constrained and dual parts, unless C2 and D2 are :bottom. Therefore, instead of fully rewriting the difference of equal nodes, we add the following special case:

{a1, C1 and not U2, U1 and not U2, D1 and not U2}
when a1 == a2 and C2 == :bottom and D2 == :bottom

We can apply a similar optimization when a1 < a2. The current formula:

{a1, (C1 or U1) and not B2, :bottom, (D1 or U1) and not B2} when a1 < a2

The constrained part can be written as (C1 and not B2) or (U1 and not B2) and the dual part as (D1 and not B2) or (U1 and not B2). Given (U1 and not B2) is shared on both parts, we can also convert it to a union, resulting in:

{a1, C1 and not B2, U1 and not B2, D1 and not B2} when a1 < a2

Unfortunately, we can’t apply this for a2 > a1, as differences are asymmetric and do not distribute over unions on the right side. Therefore, the updated formula for difference is:

{a1, C1 and not U2, U1 and not U2, D1 and not U2} when a1 == a2 and C2 == :bottom and D2 == :bottom
{a1, (C1 or U1) and not (C2 or U2), :bottom, (D1 or U1) and not (D2 or U2)} when a1 == a2
{a1, C1 and not B2, U1 and not B2, D1 and not B2} when a1 < a2
{a2, B1 and not (C2 or U2), :bottom, B1 and not (D2 or U2)} when a1 > a2

With these new formulas, all new typing features in Elixir v1.19 perform efficiently and most projects now type check faster than in Elixir v1.18. You can derive similar properties when a1 == a2 and (U1 == :bottom) or (U1 == U2). Give it a try!

Acknowledgements

As there is an increasing interest in implementing set-theoretic types for other dynamic languages, we hope this article shines a brief light on the journey and advancements made by the research and Elixir teams when it comes to representing set-theoretic types.

The type system was made possible thanks to a partnership between CNRS and Remote. The development work is currently sponsored by Fresha and Tidewave.

Permalink

The trouble with laziness

Eager literal intersections

Open vs closed types

Optimizing differences

When D2 is bottom

When D2 is not bottom

Results

The observations

AI SREs

Coding Assistants

So what are the implications here?

What are they now painting?

OTP 29.0-rc1

General

New language features

Compiler and JIT improvements

Compiler warnings

STDLIB

SSH

Types, in my Elixir?

Inference across Elixir releases

Inference of guards

The next ~15 weeks

Jan/2026: inference of all constructs

Feb-Mar/2026: inference across clauses

Apr-May/2026: inference across dependencies

The next ~15 months

Wrapping up

Loops, Cycles, and Activities

Concurrent Loops

Synchronization and Desynchronization

Is This Specific Slow a Good Slow or a Bad Slow?

What do I do with this?

OTP 28.3

POTENTIAL INCOMPATIBILITIES

HIGHLIGHTS

ssl

ssl, public_key

erts, kernel

OTP

DNFs - Disjunctive Normal Forms

Inferring anonymous functions

BDDs - Binary Decision Diagrams

BDDs with lazy unions (or ternary decision diagrams)

Lazier BDDs (for intersections)

Lazier BDDs (for differences)

Acknowledgements

About

Syndicate

Planetarium

Subscriptions

When `D2` is bottom

When `D2` is not bottom