<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Oleg's gists</title>
    <link href="https://oleg.fi/gists/atom.xml" rel="self" />
    <link href="https://oleg.fi/gists" />
    <id>https://oleg.fi/gists/atom.xml</id>
    <author>
        <name>Oleg Grenrus</name>
        <email>oleg.grenrus@iki.fi</email>
    </author>
    <updated>2025-02-13T00:00:00Z</updated>
    <entry>
    <title>PHOAS to de Bruijn conversion</title>
    <link href="https://oleg.fi/gists/posts/2025-02-13-phoas-to-db.html" />
    <id>https://oleg.fi/gists/posts/2025-02-13-phoas-to-db.html</id>
    <published>2025-02-13T00:00:00Z</published>
    <updated>2025-02-13T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2025-02-13
    
        by Oleg Grenrus
    

    <a title="All pages tagged &#39;agda&#39;." href="/tags/agda.html">agda</a>
</div>

<p>Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.</p>
<p>There are standalone source files if you just want to see the code:</p>
<ul>
<li>Agda: <a href="https://github.com/phadej/nbexp/blob/master/src/NbEXP/SelfContained/Conv.agda" class="uri">https://github.com/phadej/nbexp/blob/master/src/NbEXP/SelfContained/Conv.agda</a></li>
<li>Haskell: <a href="https://github.com/phadej/nbexp/blob/master/src-hs/Conv.hs" class="uri">https://github.com/phadej/nbexp/blob/master/src-hs/Conv.hs</a></li>
</ul>
<p>How to convert PHOAS terms to de Bruijn terms?</p>
<p>The solution is hard to find.</p>
<p>You can cheat, [as mentioned by Roman on Agda mailing list <a href="https://lists.chalmers.se/pipermail/agda/2018/010033.html" class="uri">https://lists.chalmers.se/pipermail/agda/2018/010033.html</a>]:</p>
<blockquote>
There is always a way to cheat, though. You can turn the PHOAS ->
untyped de Bruijn machinery into the PHOAS -> typed de Bruijn
machinery by checking that future contexts indeed extend past contexts
and throwing an error otherwise (which can't happed, because future
contexts always extend past contexts, but it's a metatheorem).
</blockquote>
<p>In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a>  <span class="kw">where</span> <span class="kw">postulate</span> whatever <span class="ot">:</span> <span class="ot">_</span></span></code></pre></div>
<p>Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.</p>
<p>I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.</p>
<p>Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.</p>
<p>In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates <a href="http://adam.chlipala.net/cpdt/html/Intensional.html" class="uri">http://adam.chlipala.net/cpdt/html/Intensional.html</a>; but that procedure requires an extra well formedness proof of given PHOAS term.</p>
<p>This Agda development is a translation of that developement.</p>
<div id="toc"></div>
<h2 id="common-setup">Common setup</h2>
<p>Our syntax representations will be well-typed, so we need types:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a><span class="co">-- Types</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a><span class="kw">data</span> Ty <span class="ot">:</span> <span class="dt">Set</span> <span class="kw">where</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a>  emp <span class="ot">:</span> Ty</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>  fun <span class="ot">:</span> Ty <span class="ot">→</span> Ty <span class="ot">→</span> Ty</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true"></a>Ctx <span class="ot">:</span> <span class="dt">Set</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true"></a>Ctx <span class="ot">=</span> List Ty</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true"></a></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true"></a><span class="kw">variable</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true"></a>  A B C <span class="ot">:</span> Ty</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true"></a>  Γ Δ Ω <span class="ot">:</span> Ctx</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true"></a>  v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span></span></code></pre></div>
<h2 id="de-bruijn-syntax">de Bruijn syntax</h2>
<div class="sourceCode" id="cb3"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a>Var <span class="ot">:</span> Ctx <span class="ot">→</span> Ty <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a>Var Γ A <span class="ot">=</span> Idx A Γ <span class="co">-- from agda-np, essentially membership relation.</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true"></a><span class="kw">data</span> DB <span class="ot">(</span>Γ <span class="ot">:</span> Ctx<span class="ot">)</span> <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span> <span class="kw">where</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true"></a>  var <span class="ot">:</span> Var Γ A <span class="ot">→</span> DB Γ A</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true"></a>  app <span class="ot">:</span> DB Γ <span class="ot">(</span>fun A B<span class="ot">)</span> <span class="ot">→</span> DB Γ A <span class="ot">→</span> DB Γ B</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true"></a>  lam <span class="ot">:</span> DB <span class="ot">(</span>A ∷ Γ<span class="ot">)</span> B <span class="ot">→</span> DB Γ <span class="ot">(</span>fun A B<span class="ot">)</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true"></a>  abs <span class="ot">:</span> DB Γ emp <span class="ot">→</span> DB Γ A</span></code></pre></div>
<h2 id="parametric-higher-order-abstract-syntax">Parametric Higher-order abstract syntax</h2>
<div class="sourceCode" id="cb4"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="kw">data</span> PHOAS <span class="ot">(</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">)</span> <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span> <span class="kw">where</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>  var <span class="ot">:</span> v A <span class="ot">→</span> PHOAS v A</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a>  app <span class="ot">:</span> PHOAS v <span class="ot">(</span>fun A B<span class="ot">)</span> <span class="ot">→</span> PHOAS v A <span class="ot">→</span> PHOAS v B</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>  lam <span class="ot">:</span> <span class="ot">(</span>v A <span class="ot">→</span> PHOAS v B<span class="ot">)</span> <span class="ot">→</span> PHOAS v <span class="ot">(</span>fun A B<span class="ot">)</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true"></a>  abs <span class="ot">:</span> PHOAS v emp <span class="ot">→</span> PHOAS v A</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true"></a></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true"></a><span class="co">-- closed &quot;true&quot; PHOAS terms.</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true"></a>PHOAS° <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set₁</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true"></a>PHOAS° A <span class="ot">=</span> <span class="ot">∀</span> <span class="ot">{</span>v<span class="ot">}</span> <span class="ot">→</span> PHOAS v A</span></code></pre></div>
<h2 id="de-bruijn-to-phoas">de Bruijn to PHOAS</h2>
<p>This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a>phoasify <span class="ot">:</span> NP v Γ <span class="ot">→</span> DB Γ A <span class="ot">→</span> PHOAS v A</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a>phoasify γ <span class="ot">(</span>var x<span class="ot">)</span>   <span class="ot">=</span> var <span class="ot">(</span>lookup γ x<span class="ot">)</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a>phoasify γ <span class="ot">(</span>app f t<span class="ot">)</span> <span class="ot">=</span> app <span class="ot">(</span>phoasify γ f<span class="ot">)</span> <span class="ot">(</span>phoasify γ t<span class="ot">)</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true"></a>phoasify γ <span class="ot">(</span>lam t<span class="ot">)</span>   <span class="ot">=</span> lam <span class="ot">λ</span> x <span class="ot">→</span> phoasify <span class="ot">(</span>x ∷ γ<span class="ot">)</span> t</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true"></a>phoasify γ <span class="ot">(</span>abs t<span class="ot">)</span>   <span class="ot">=</span> abs <span class="ot">(</span>phoasify γ t<span class="ot">)</span></span></code></pre></div>
<h2 id="interlude-well-formedness-of-phoas-terms">Interlude: Well-formedness of PHOAS terms</h2>
<p>dam Chlipala defines an equivalence relation between two PHOAS terms, <code>exp_equiv</code> in <code>Intensional</code>, <code>wf</code> in CPDT book). e only need a single term well-formedness so can do a little less</p>
<p>The goal is to rule out standalone terms like</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="kw">module</span> Invalid <span class="kw">where</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>  <span class="kw">open</span> <span class="kw">import</span> Data<span class="ot">.</span>Unit <span class="kw">using</span> <span class="ot">(</span>⊤<span class="ot">;</span> tt<span class="ot">)</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true"></a>  invalid <span class="ot">:</span> PHOAS <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> ⊤<span class="ot">)</span> emp</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true"></a>  invalid <span class="ot">=</span> var tt</span></code></pre></div>
<p>Terms like <code>invalid</code> cannot be values of <code>PHOAS°</code>, as all values of "<code>v</code>" inside <code>PHOAS°</code> have to originated from <code>lam</code>-constructor abstractions. We really should keep <code>v</code> parameter free, i.e. parametric, when <em>constructing</em> <code>PHOAS</code> terms.</p>
<p>The idea is then to simply to track which variables (values of <code>v</code>) are intoduced by lambda abstraction.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="kw">data</span> phoasWf {v <span class="op">:</span> <span class="dt">Ty</span> <span class="ot">→</span> <span class="dt">Set</span>} (<span class="dt">G</span> <span class="op">:</span> <span class="dt">List</span> (Σ <span class="dt">Ty</span> v)) <span class="op">:</span> {<span class="dt">A</span> <span class="op">:</span> <span class="dt">Ty</span>} <span class="ot">→</span> <span class="dt">PHOAS</span> v <span class="dt">A</span> <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a> <span class="kw">where</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true"></a>  varWf <span class="op">:</span> <span class="ot">∀</span> {<span class="dt">A</span>} {x <span class="op">:</span> v <span class="dt">A</span>}</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true"></a>    <span class="ot">→</span> <span class="dt">Idx</span> (<span class="dt">A</span> , x) <span class="dt">G</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> (var x)</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true"></a>  appWf <span class="op">:</span> <span class="ot">∀</span> {<span class="dt">A</span> <span class="dt">B</span>} {f <span class="op">:</span> <span class="dt">PHOAS</span> v (fun <span class="dt">A</span> <span class="dt">B</span>)} {t <span class="op">:</span> <span class="dt">PHOAS</span> v <span class="dt">A</span>}</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> f</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> t</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> (app f t)</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true"></a>  lamWf <span class="op">:</span> <span class="ot">∀</span> {<span class="dt">A</span> <span class="dt">B</span>} {f <span class="op">:</span> v <span class="dt">A</span> <span class="ot">→</span> <span class="dt">PHOAS</span> v <span class="dt">B</span>}</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true"></a>    <span class="ot">→</span> (<span class="ot">∀</span> (x <span class="op">:</span> v <span class="dt">A</span>) <span class="ot">→</span> phoasWf ((<span class="dt">A</span> , x) <span class="ot">∷</span> <span class="dt">G</span>) (f x))</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> (lam f)</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true"></a>  absWf <span class="op">:</span> <span class="ot">∀</span> {<span class="dt">A</span>} {t <span class="op">:</span> <span class="dt">PHOAS</span> v emp}</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> t</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true"></a>    <span class="ot">→</span> phoasWf <span class="dt">G</span> (<span class="fu">abs</span> {<span class="dt">A</span> <span class="ot">=</span> <span class="dt">A</span>} t)</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true"></a></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true"></a><span class="co">-- closed terms start with an empty G</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true"></a>phoasWf° <span class="op">:</span> <span class="dt">PHOAS</span>° <span class="dt">A</span> <span class="ot">→</span> <span class="dt">Set</span>₁</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true"></a>phoasWf° tm <span class="ot">=</span> <span class="ot">∀</span> {v} <span class="ot">→</span> phoasWf {v <span class="ot">=</span> v} [] tm</span></code></pre></div>
<p>A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>meta-theorem-proposition <span class="ot">:</span> <span class="dt">Set₁</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a>meta-theorem-proposition <span class="ot">=</span> <span class="ot">∀</span> <span class="ot">{</span>A<span class="ot">}</span> <span class="ot">(</span>t <span class="ot">:</span> PHOAS° A<span class="ot">)</span> <span class="ot">→</span> phoasWf° t</span></code></pre></div>
<p>As far as I'm aware this proposition cannot be proved nor refuted in Agda.</p>
<h2 id="de-bruijn-to-phoas-translation-creates-well-formed-phoas-terms">de Bruijn to PHOAS translation creates well-formed PHOAS terms.</h2>
<p>As a small exercise we can show that <code>phoasify</code> of closed de Bruijn terms creates well-formed PHOAS terms.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true"></a>toList <span class="ot">:</span> NP v Γ <span class="ot">→</span> List <span class="ot">(</span>Σ Ty v<span class="ot">)</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true"></a>toList []       <span class="ot">=</span> []</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true"></a>toList <span class="ot">(</span>x ∷ xs<span class="ot">)</span> <span class="ot">=</span> <span class="ot">(_</span> , x<span class="ot">)</span> ∷ toList xs</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true"></a>phoasifyWfVar <span class="ot">:</span> <span class="ot">(</span>γ <span class="ot">:</span> NP v Γ<span class="ot">)</span> <span class="ot">(</span>x <span class="ot">:</span> Var Γ A<span class="ot">)</span> <span class="ot">→</span> Idx <span class="ot">(</span>A , lookup γ x<span class="ot">)</span> <span class="ot">(</span>toList γ<span class="ot">)</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true"></a>phoasifyWfVar <span class="ot">(</span>x ∷ γ<span class="ot">)</span> zero    <span class="ot">=</span> zero</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true"></a>phoasifyWfVar <span class="ot">(</span>x ∷ γ<span class="ot">)</span> <span class="ot">(</span>suc i<span class="ot">)</span> <span class="ot">=</span> suc <span class="ot">(</span>phoasifyWfVar γ i<span class="ot">)</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true"></a></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true"></a>phoasifyWf <span class="ot">:</span> <span class="ot">(</span>γ <span class="ot">:</span> NP v Γ<span class="ot">)</span> <span class="ot">(</span>t <span class="ot">:</span> DB Γ A<span class="ot">)</span> <span class="ot">→</span> phoasWf <span class="ot">(</span>toList γ<span class="ot">)</span> <span class="ot">(</span>phoasify γ t<span class="ot">)</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true"></a>phoasifyWf γ <span class="ot">(</span>var x<span class="ot">)</span>   <span class="ot">=</span> varWf <span class="ot">(</span>phoasifyWfVar γ x<span class="ot">)</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true"></a>phoasifyWf γ <span class="ot">(</span>app f t<span class="ot">)</span> <span class="ot">=</span> appWf <span class="ot">(</span>phoasifyWf γ f<span class="ot">)</span> <span class="ot">(</span>phoasifyWf γ t<span class="ot">)</span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true"></a>phoasifyWf γ <span class="ot">(</span>lam t<span class="ot">)</span>   <span class="ot">=</span> lamWf <span class="ot">λ</span> x <span class="ot">→</span> phoasifyWf <span class="ot">(</span>x ∷ γ<span class="ot">)</span> t</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true"></a>phoasifyWf γ <span class="ot">(</span>abs t<span class="ot">)</span>   <span class="ot">=</span> absWf <span class="ot">(</span>phoasifyWf γ t<span class="ot">)</span></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true"></a></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true"></a>phoasifyWf° <span class="ot">:</span> <span class="ot">(</span>t <span class="ot">:</span> DB [] A<span class="ot">)</span> <span class="ot">→</span> phoasWf° <span class="ot">(</span>phoasify [] t<span class="ot">)</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true"></a>phoasifyWf° t <span class="ot">=</span> phoasifyWf [] t</span></code></pre></div>
<h2 id="phoas-to-de-bruijn">PHOAS to de Bruijn</h2>
<p>The rest deals with the opposite direction.</p>
<p>In Intensional Adam Chlipala uses <code>v = λ _ → ℕ</code> instatiation to make the translation.</p>
<p>I think that in the typed setting using <code>v = λ _ → Ctx</code> turns out nicer.</p>
<p>The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a><span class="kw">data</span> IsSuffixOf <span class="ot">{</span>ℓ<span class="ot">}</span> <span class="ot">{</span>a <span class="ot">:</span> <span class="dt">Set</span> ℓ<span class="ot">}</span> <span class="ot">:</span> List a <span class="ot">→</span> List a <span class="ot">→</span> <span class="dt">Set</span> ℓ <span class="kw">where</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a>  refl <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>xs<span class="ot">}</span> <span class="ot">→</span> IsSuffixOf xs xs</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true"></a>  cons <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>xs ys<span class="ot">}</span> <span class="ot">→</span> IsSuffixOf xs ys <span class="ot">→</span> <span class="ot">∀</span> <span class="ot">{</span>y<span class="ot">}</span> <span class="ot">→</span> IsSuffixOf xs <span class="ot">(</span>y ∷ ys<span class="ot">)</span></span></code></pre></div>
<p>We need to establish well-formedness of PHOAS expression in relation to some context <code>Γ</code></p>
<p>Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true"></a>wf <span class="ot">:</span> <span class="ot">(</span>Γ <span class="ot">:</span> Ctx<span class="ot">)</span> <span class="ot">→</span> PHOAS <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> Ctx<span class="ot">)</span> A <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true"></a>wf <span class="ot">{</span>A <span class="ot">=</span> A<span class="ot">}</span> Γ <span class="ot">(</span>var Δ<span class="ot">)</span>         <span class="ot">=</span> IsSuffixOf <span class="ot">(</span>A ∷ Δ<span class="ot">)</span> Γ</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true"></a>wf         Γ <span class="ot">(</span>app f t<span class="ot">)</span>       <span class="ot">=</span> wf Γ f × wf Γ t</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true"></a>wf         Γ <span class="ot">(</span>lam <span class="ot">{</span>A <span class="ot">=</span> A<span class="ot">}</span> t<span class="ot">)</span> <span class="ot">=</span> wf <span class="ot">(</span>A ∷ Γ<span class="ot">)</span> <span class="ot">(</span>t Γ<span class="ot">)</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true"></a>wf         Γ <span class="ot">(</span>abs t<span class="ot">)</span>         <span class="ot">=</span> wf Γ t</span></code></pre></div>
<p>And if <code>(A ∷ Δ)</code> is suffix of context <code>Γ</code>, we can convert the evidence to the de Bruijn index (i.e. variable):</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true"></a>makeVar <span class="ot">:</span> IsSuffixOf <span class="ot">(</span>A ∷ Δ<span class="ot">)</span> Γ <span class="ot">→</span> Var Γ A</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true"></a>makeVar refl     <span class="ot">=</span> zero</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true"></a>makeVar <span class="ot">(</span>cons s<span class="ot">)</span> <span class="ot">=</span> suc <span class="ot">(</span>makeVar s<span class="ot">)</span></span></code></pre></div>
<p>Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true"></a>dbify <span class="ot">:</span> <span class="ot">(</span>t <span class="ot">:</span> PHOAS <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> Ctx<span class="ot">)</span> A<span class="ot">)</span> <span class="ot">→</span> wf Γ t <span class="ot">→</span> DB Γ A</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true"></a>dbify         <span class="ot">(</span>var x<span class="ot">)</span>   wf        <span class="ot">=</span> var <span class="ot">(</span>makeVar wf<span class="ot">)</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true"></a>dbify         <span class="ot">(</span>app f t<span class="ot">)</span> <span class="ot">(</span>fʷ , tʷ<span class="ot">)</span> <span class="ot">=</span> app <span class="ot">(</span>dbify f fʷ<span class="ot">)</span> <span class="ot">(</span>dbify t tʷ<span class="ot">)</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true"></a>dbify <span class="ot">{</span>Γ <span class="ot">=</span> Γ<span class="ot">}</span> <span class="ot">(</span>lam t<span class="ot">)</span>   wf        <span class="ot">=</span> lam <span class="ot">(</span>dbify <span class="ot">(</span>t Γ<span class="ot">)</span> wf<span class="ot">)</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true"></a>dbify         <span class="ot">(</span>abs t<span class="ot">)</span>   wf        <span class="ot">=</span> abs <span class="ot">(</span>dbify t wf<span class="ot">)</span></span></code></pre></div>
<p>What is left is to show that we can construct <code>wf</code> for all <code>phoasWf</code>-well-formed terms.</p>
<p>Adam Chlipala defines a helper function:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true"></a>makeG′ <span class="ot">:</span> Ctx <span class="ot">→</span> List <span class="ot">(</span>Σ Ty <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> Ctx<span class="ot">))</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true"></a>makeG′ [] <span class="ot">=</span> []</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true"></a>makeG′ <span class="ot">(</span>A ∷ Γ<span class="ot">)</span> <span class="ot">=</span> <span class="ot">(</span>A , Γ<span class="ot">)</span> ∷ makeG′ Γ</span></code></pre></div>
<p>However for somewhat technical reasons, we rather define</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true"></a>expand <span class="ot">:</span> <span class="ot">(</span>Γ <span class="ot">:</span> Ctx<span class="ot">)</span> <span class="ot">→</span> NP <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> Ctx<span class="ot">)</span> Γ</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true"></a>expand []      <span class="ot">=</span> []</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true"></a>expand <span class="ot">(_</span> ∷ Γ<span class="ot">)</span> <span class="ot">=</span> Γ ∷ expand Γ</span></code></pre></div>
<p>and use <code>expand</code> with previously defined <code>toList</code> to define our version of <code>makeG</code>:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true"></a>makeG <span class="ot">:</span> Ctx <span class="ot">→</span> List <span class="ot">(</span>Σ Ty <span class="ot">(λ</span> <span class="ot">_</span> <span class="ot">→</span> Ctx<span class="ot">))</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true"></a>makeG Γ <span class="ot">=</span> toList <span class="ot">(</span>expand Γ<span class="ot">)</span></span></code></pre></div>
<p><code>makeG</code> and <code>makeG′</code> are the same:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true"></a>toList∘expand≡makeG <span class="ot">:</span> <span class="ot">∀</span> Γ <span class="ot">→</span> makeG Γ ≡ makeG′ Γ</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true"></a>toList∘expand≡makeG []      <span class="ot">=</span> refl</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true"></a>toList∘expand≡makeG <span class="ot">(</span>A ∷ Γ<span class="ot">)</span> <span class="ot">=</span> cong <span class="ot">((</span>A , Γ<span class="ot">)</span> ∷<span class="ot">_)</span> <span class="ot">(</span>toList∘expand≡makeG Γ<span class="ot">)</span></span></code></pre></div>
<p>Then we can construct <code>wf</code> for all <code>phoasWf</code>:</p>
<pre class="agda`"><code>wfWfVar : Idx (A , Δ) (makeG Γ) → IsSuffixOf (A ∷ Δ) Γ
wfWfVar {Γ = B ∷ Γ} zero    = refl
wfWfVar {Γ = B ∷ Γ} (suc i) = cons (wfWfVar i)

wfWf : (t : PHOAS (λ _ → Ctx) A) → phoasWf (makeG Γ) t → wf Γ t
wfWf         (var x)   (varWf xʷ)    = wfWfVar xʷ
wfWf         (app f t) (appWf fʷ tʷ) = wfWf f fʷ , wfWf t tʷ
wfWf {Γ = Γ} (lam f)   (lamWf fʷ)    = wfWf (f Γ) (fʷ Γ)
wfWf         (abs t)   (absWf tʷ)    = wfWf t tʷ</code></pre>
<p>And finally we define <code>dbifyᵒ</code> for all well-formed <code>PHOASᵒ</code> terms.</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true"></a>dbify° <span class="ot">:</span> <span class="ot">(</span>t <span class="ot">:</span> PHOAS° A<span class="ot">)</span> <span class="ot">→</span> phoasWf° t <span class="ot">→</span> DB [] A</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true"></a>dbify° t w <span class="ot">=</span> dbify t <span class="ot">(</span>wfWf t w<span class="ot">)</span></span></code></pre></div>
<h2 id="bonus-section">Bonus section</h2>
<p>We can show that converting closed de Bruijn term to PHOAS and back is an identity function:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true"></a>bonus<span class="op">-</span>var <span class="op">:</span> (x <span class="op">:</span> <span class="dt">Var</span> Γ <span class="dt">A</span>) <span class="ot">→</span> x ≡ makeVar (wfWfVar (phoasifyWfVar (expand Γ) x))</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true"></a>bonus<span class="op">-</span>var {Γ <span class="ot">=</span> <span class="dt">A</span> <span class="ot">∷</span> Γ} zero    <span class="ot">=</span> refl</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true"></a>bonus<span class="op">-</span>var {Γ <span class="ot">=</span> <span class="dt">A</span> <span class="ot">∷</span> Γ} (suc i) <span class="ot">=</span> cong suc (bonus<span class="op">-</span>var i)</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true"></a></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true"></a>bonus <span class="op">:</span> (t <span class="op">:</span> <span class="dt">DB</span> Γ <span class="dt">A</span>)</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true"></a>      <span class="ot">→</span> t ≡ dbify (phoasify (expand Γ) t)</span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true"></a>              (wfWf (phoasify (expand Γ) t) (phoasifyWf _ t))</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true"></a>bonus (var x)   <span class="ot">=</span> cong var (bonus<span class="op">-</span>var x)</span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true"></a>bonus (app f t) <span class="ot">=</span> cong₂ app (bonus f) (bonus t)</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true"></a>bonus (lam t)   <span class="ot">=</span> cong lam (bonus t)</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true"></a>bonus (<span class="fu">abs</span> t)   <span class="ot">=</span> cong <span class="fu">abs</span> (bonus t)</span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true"></a></span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true"></a>bonus° <span class="op">:</span> <span class="ot">∀</span> (t <span class="op">:</span> <span class="dt">DB</span> [] <span class="dt">A</span>) <span class="ot">→</span> t ≡ dbify° (phoasify [] t) (phoasifyWf° t)</span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true"></a>bonus° t <span class="ot">=</span> bonus t</span></code></pre></div>
]]></summary>
</entry>
<entry>
    <title>NbE PHOAS</title>
    <link href="https://oleg.fi/gists/posts/2025-02-11-nbe-phoas.html" />
    <id>https://oleg.fi/gists/posts/2025-02-11-nbe-phoas.html</id>
    <published>2025-02-11T00:00:00Z</published>
    <updated>2025-02-11T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2025-02-11
    
        by Oleg Grenrus
    

    <a title="All pages tagged &#39;agda&#39;." href="/tags/agda.html">agda</a>
</div>

<p>Normalization by evaluation using parametric higher order syntax. In Agda.</p>
<p>I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="kw">module</span> NbEXP<span class="ot">.</span>PHOAS <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a><span class="kw">data</span> Ty <span class="ot">:</span> <span class="dt">Set</span> <span class="kw">where</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true"></a>  emp <span class="ot">:</span> Ty</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true"></a>  fun <span class="ot">:</span> Ty <span class="ot">→</span> Ty <span class="ot">→</span> Ty</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true"></a><span class="kw">data</span> Tm <span class="ot">(</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">)</span> <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span> <span class="kw">where</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true"></a>  var <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a<span class="ot">}</span> <span class="ot">→</span> v a <span class="ot">→</span> Tm v a</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true"></a>  app <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a b<span class="ot">}</span> <span class="ot">→</span> Tm v <span class="ot">(</span>fun a b<span class="ot">)</span> <span class="ot">→</span> Tm v a <span class="ot">→</span> Tm v b</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true"></a>  lam <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a b<span class="ot">}</span> <span class="ot">→</span> <span class="ot">(</span>v a <span class="ot">→</span> Tm v b<span class="ot">)</span> <span class="ot">→</span> Tm v <span class="ot">(</span>fun a b<span class="ot">)</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true"></a><span class="kw">data</span> Nf <span class="ot">(</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">)</span> <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true"></a><span class="kw">data</span> Ne <span class="ot">(</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">)</span> <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true"></a><span class="kw">data</span> Ne v <span class="kw">where</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true"></a>  nvar <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a<span class="ot">}</span> <span class="ot">→</span> v a <span class="ot">→</span> Ne v a</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true"></a>  napp <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a b<span class="ot">}</span> <span class="ot">→</span> Ne v <span class="ot">(</span>fun a b<span class="ot">)</span> <span class="ot">→</span> Nf v a <span class="ot">→</span> Ne v b</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true"></a></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true"></a><span class="kw">data</span> Nf v <span class="kw">where</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true"></a>  neut <span class="ot">:</span> Ne v emp <span class="ot">→</span> Nf v emp</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true"></a>  nlam <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>a b<span class="ot">}</span> <span class="ot">→</span> <span class="ot">(</span>v a <span class="ot">→</span> Nf v b<span class="ot">)</span> <span class="ot">→</span> Nf v <span class="ot">(</span>fun a b<span class="ot">)</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true"></a></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true"></a>Sem <span class="ot">:</span> <span class="ot">(</span>Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">)</span> <span class="ot">→</span> Ty <span class="ot">→</span> <span class="dt">Set</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true"></a>Sem v emp       <span class="ot">=</span> Ne v emp</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true"></a>Sem v <span class="ot">(</span>fun a b<span class="ot">)</span> <span class="ot">=</span> Sem v a <span class="ot">→</span> Sem v b</span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true"></a></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true"></a>lower <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">(</span>a <span class="ot">:</span> Ty<span class="ot">)</span> <span class="ot">→</span> Sem v a <span class="ot">→</span> Nf v a</span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true"></a>raise <span class="ot">:</span> <span class="ot">∀</span> <span class="ot">{</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">(</span>a <span class="ot">:</span> Ty<span class="ot">)</span> <span class="ot">→</span> Ne v a <span class="ot">→</span> Sem v a</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true"></a></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true"></a>lower emp       s <span class="ot">=</span> neut s</span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true"></a>lower <span class="ot">(</span>fun a b<span class="ot">)</span> s <span class="ot">=</span> nlam <span class="ot">λ</span> x <span class="ot">→</span> lower b <span class="ot">(</span>s <span class="ot">(</span>raise a <span class="ot">(</span>nvar x<span class="ot">)))</span></span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true"></a></span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true"></a>raise emp       n   <span class="ot">=</span> n</span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true"></a>raise <span class="ot">(</span>fun a b<span class="ot">)</span> n x <span class="ot">=</span> raise b <span class="ot">(</span>napp n <span class="ot">(</span>lower a x <span class="ot">))</span></span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true"></a></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true"></a>eval <span class="ot">:</span> <span class="ot">{</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">{</span>a <span class="ot">:</span> Ty<span class="ot">}</span> <span class="ot">→</span> Tm <span class="ot">(</span>Sem v<span class="ot">)</span> a <span class="ot">→</span> Sem v a</span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true"></a>eval <span class="ot">(</span>var x<span class="ot">)</span>   <span class="ot">=</span> x</span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true"></a>eval <span class="ot">(</span>app f t<span class="ot">)</span> <span class="ot">=</span> eval f <span class="ot">(</span>eval t<span class="ot">)</span></span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true"></a>eval <span class="ot">(</span>lam t<span class="ot">)</span> x <span class="ot">=</span> eval <span class="ot">(</span>t x<span class="ot">)</span></span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true"></a></span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true"></a>nf <span class="ot">:</span> <span class="ot">{</span>a <span class="ot">:</span> Ty<span class="ot">}</span> <span class="ot">→</span> <span class="ot">{</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">→</span> Tm <span class="ot">(</span>Sem v<span class="ot">)</span> a <span class="ot">→</span> Nf v a</span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true"></a>nf <span class="ot">{</span>a<span class="ot">}</span> t <span class="ot">=</span> lower a <span class="ot">(</span>eval t<span class="ot">)</span></span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true"></a></span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true"></a>nf<span class="ot">_</span>parametric <span class="ot">:</span> <span class="ot">{</span>a <span class="ot">:</span> Ty<span class="ot">}</span> <span class="ot">→</span> <span class="ot">({</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">→</span> Tm v a<span class="ot">)</span> <span class="ot">-&gt;</span> <span class="ot">({</span>v <span class="ot">:</span> Ty <span class="ot">→</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">→</span> Nf v a<span class="ot">)</span></span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true"></a>nf<span class="ot">_</span>parametric t <span class="ot">=</span> nf t</span></code></pre></div>
]]></summary>
</entry>
<entry>
    <title>hashable arch native</title>
    <link href="https://oleg.fi/gists/posts/2024-06-24-hashable-arch-native.html" />
    <id>https://oleg.fi/gists/posts/2024-06-24-hashable-arch-native.html</id>
    <published>2024-06-24T00:00:00Z</published>
    <updated>2024-06-24T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-06-24
    
        by Oleg Grenrus
    

    
</div>

<p>In <code>hashable-1.4.5.0</code> I made it use a XXH3 algorithm for hashing byte arrays. The version <code>1.4.5.0</code> and <code>1.4.6.0</code> backlashed, as I enabled <code>-march=native</code> by default, and that causes distribution issues. Version <code>1.4.7.0</code> doesn't enable <code>-march=native</code> by default.</p>
<p>This by default leaves some performance on the table, e.g. a quick benchmark comparison on my machine (<code>model name: AMD Ryzen Threadripper 2950X 16-Core Processor</code>) gives</p>
<pre class="plain"><code>Benchmark              without   with            
hash/Text/strict/11    1.481e-8  1.289e-8 -12.95%
hash/Text/strict/128   0.319e-7  0.263e-7 -17.73%
hash/Text/strict/2^20  2.220e-4  1.252e-4 -43.61%
hash/Text/strict/40    1.934e-8  1.714e-8 -11.37%
hash/Text/strict/5     1.194e-8  0.995e-8 -16.64%
hash/Text/strict/512   0.778e-7  0.649e-7 -16.62%
hash/Text/strict/8     1.215e-8  0.983e-8 -19.09%
Geometric mean         0.810e-7  0.644e-7 -20.47%</code></pre>
<p>i.e. the new default is 15% slower for small inputs (which is probably the use case for <code>hashable</code>), and it gets worse for larger ones.</p>
<p><a href="https://hackage.haskell.org/package/xxhash-ffi-0.3/xxhash-ffi.cabal" class="uri">https://hackage.haskell.org/package/xxhash-ffi-0.3/xxhash-ffi.cabal</a> doesn't give any control to the user, specifically; there's also a bit of non-determinism because the <code>pkg-config</code> flag is automatic - you may not notice which version you use, having <code>libxxhash-dev</code> installed is rare, but it may happen. (So if you have <code>package xxhash-ffi</code> <code>cc-options: -march=native</code>, it might be not used, if you forget to force off the <code>pkg-config</code> flag).</p>
<h2 id="architecture-selection-and-chip-optimizations">architecture selection and chip optimizations</h2>
<p>Which made me wonder, <strong>how much this kind of very low-level performance optimisation we leave on the table</strong> when we only care about running binaries locally (e.g. tests; but also benchmarks).</p>
<p>For example, <a href="https://hackage.haskell.org/package/base-4.20.0.1/docs/Data-Bits.html#v:popCount"><code>popCount</code></a> is relatively common, <a href="https://hackage-search.serokell.io/?q=popCount" class="uri">https://hackage-search.serokell.io/?q=popCount</a> says 1948 matches across 196 packages; and includes things like <code>vector-algorithms</code> which one would hope to be fast. <a href="https://hackage-search.serokell.io/?q=countLeadingZeros"><code>countLeadingZeros</code></a> is also common with 541 matches across 114 packages (and 532 matches across 82 package for <code>countTrailingZeros</code> including <code>unordered-containers</code>).</p>
<p>To get the <code>popcnt</code> operation you need to enable <code>msse4.2</code>, to get <code>lzcnt</code> instead of <code>bsr</code> you need to enable <code>-mbmi2</code>. <code>popCount</code> fallback is a loop, it's slow (I was thinking about that when I wrote <code>splitmix</code> in 2017; however there <code>popCount</code> is not used in hot path; except if you split a lot... like you do when using <code>QuickCheck's</code> <code>Gen</code>... hopefully doesn't matter... does it?). <a href="https://stackoverflow.com/a/43443701">This StackOverflow</a> answer says that there is no fallback from <code>lzcnt</code> to <code>bsr</code>, but maybe it's <code>LZCNT == (31 - BSR)</code> as accepted answer says. I'm not an expert in x86 ISA; nor I want to be writing Haskell; I hope there was some good reason to introduce LZCNT, and it's worth using when it exists.</p>
<p>I don't think many people add</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a>package <span class="op">*</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a>  ghc<span class="op">-</span>options<span class="op">:</span> <span class="op">-</span>msse4<span class="op">.</span><span class="dv">2</span> <span class="op">-</span>mavx <span class="op">-</span>mbmi <span class="op">-</span>mbmi2</span></code></pre></div>
<p>to their <code>cabal.project.local</code> files. Does it matter? I hope that shouldn't make anything worse (except the portability).</p>
<p>There are few small issues with code generation like <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/25019" class="uri">https://gitlab.haskell.org/ghc/ghc/-/issues/25019</a> or <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/24989" class="uri">https://gitlab.haskell.org/ghc/ghc/-/issues/24989</a>, I'm sure these will be fixed soon.</p>
<p>However, I'm not so optimistic about bigger issues like adding <code>arch=native</code> and also <code>-mtune=...</code>; as far as I understand, architecture flags tell compiler that it can or cannot use some instructions, where <code>mtune</code> is an <em>optimization</em> flag. Even if some instruction is supported by a chip, it doesn't mean it's fast (but maybe it's more relevant for SIMD stuff). That's knowledge I hope compiler to know better than me.</p>
<p>Or even bigger ones to decide whether <code>-march=native -mtune=native</code> should be default. Arguably, e.g. GCC and Clang produce very portable binaries by default, but at least they have convenient enough ways to tune binaries for local execution too.</p>
<h2 id="text">text</h2>
<p>This low-level instruction business is surprisingly common. E.g. <a href="https://hackage.haskell.org/package/text"><code>text</code></a> uses <a href="https://github.com/simdutf/simdutf"><code>simdutf</code></a> except your <code>text</code> probably doesn't because GHC ships <code>text</code> without <code>simdutf</code> (as currently around GHC-9.10.1 time). The <code>text</code> doesn't suffer from <code>-march=native</code> issue like hashable, at least partially because of the above. I'm not sure how the things work there, it looks like <code>simdutf</code> compiles code for various processors:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a><span class="pp">#define SIMDUTF_TARGET_HASWELL SIMDUTF_TARGET_REGION(&quot;avx2,bmi,pclmul,lzcnt&quot;)</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a><span class="pp">#define SIMDUTF_TARGET_WESTMERE SIMDUTF_TARGET_REGION(&quot;sse4.2,pclmul&quot;)</span></span></code></pre></div>
<p>and then uses dynamic dispatch. Or maybe the <code>sse4.2</code> is just so common nowadays, that the few rare people who compile <code>text</code> themselves don't run into portability issues. (GHC only enables sse2 for Haskell code</p>
<p><code>text</code> also has some non-simdutf code too <a href="https://github.com/haskell/text/pull/566">as e.g. the issue about avx512 detection</a> highlights; and that uses dynamic dispatch as far as I can tell. (What's the cost of dynamic dispatch? I doubt it's free, and when the operations are small it might show, does it?)</p>
<p>Given all that, I think that it won't hurt if one could compile <code>text</code> so there aren't runtime ISA-detection (so things can be tuned for your chip), even if the default were to do a runtime dispatch. (e.g. if we had that, there would be an immediate workaround for above avx512 detection issue: explicitly turn it off). And again, it would be nice if GHC and <code>cabal-install</code> had convenient ways to enable for-local-execution optimisations (and for bundled libraries like <code>text</code> it's almost impossible nowadays, due no good way to force their re-installation, <a href="https://github.com/haskell/cabal/issues/8702">Cabal#8702</a> is a related issue).</p>
<h2 id="containers">containers</h2>
<p><a href="https://hackage.haskell.org/package/containers"><code>containers</code></a> also use <code>popCount</code>, <code>countLeadingZeros</code>; but I bet that it's always used with the portable configuration in practice, as it's bundled with GHC, similarly as <code>text</code> library is. (The <code>IntSet</code> / <code>IntMap</code> implementation uses bit level operations, it might benefit from using better instructions when available).</p>
<h2 id="conclusion">Conclusion</h2>
<p>It feels that the end of compilation pipeline - the assembly generation - isn't getting as much attention as it could<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Sure, these improvements would only decrease run times constant factors only. On the other hand, if we could get 2-3% improvements in hot loops without source code changes, why not get these?</p>
<p>I'm biased, (not only) as maintainer of <code>hashable</code> I would like to see <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/24075"><code>CLMUL</code></a> instruction, and <code>AESENC</code> would be nice to play with. But if I the 99.9% used default would rely on their software fallbacks rather than fast silicon implementation, I bet there won't be anything interesting to discover.</p>
<p>And it would be nice to have CPP macros to reflect whether GHC will generate POPCNT, LZCNT, CLMUL, AESENC instruction or their fallbacks. E.g. in <code>hashable</code> it's worth using <code>AESENC</code> for mixing if it's a silicon one, otherwise it's probably better to stick to a different but simpler fallback. (Maybe we already has these: <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/7554">GHC#7554</a> suggests so, maybe it's only a documentation issue <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/25021">GHC#25021</a>).</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>I noticed <a href="https://www.reddit.com/r/haskell/comments/1dmlefx/ghc_gets_dividebyconstant_optimisation_closing_my/">the "GHC gets divide-by-constant optimisation, closing my 10 years old ticket about 10x slowdowns" post</a> on Reddit yesterday. Fun coincidence.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
]]></summary>
</entry>
<entry>
    <title>cabal fields</title>
    <link href="https://oleg.fi/gists/posts/2024-05-28-cabal-fields.html" />
    <id>https://oleg.fi/gists/posts/2024-05-28-cabal-fields.html</id>
    <published>2024-05-28T00:00:00Z</published>
    <updated>2024-05-28T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-05-28
    
        by Oleg Grenrus
    

    
</div>

<p><a href="https://github.com/phadej/cabal-fields"><code>cabal-fields</code></a> is partly motivated by the <a href="https://github.com/haskell/cabal/issues/7548">Migrate from the .cabal format to a widely supported format</a> issue. Whether it's a solution or not, it's up to you to decide.</p>
<h2 id="envelope-grammar-vs-specific-format-grammar">Envelope grammar vs. specific format grammar</h2>
<p>It is important to separate the envelope format (whether it's JSON, YAML, TOML, or cabal-like) from the actual file format (<code>package.json</code>, <code>stack.yaml</code>, <code>Cargo.toml</code>, or <code>pkg.cabal</code> for various package description formats).</p>
<p>An envelope format provides the common syntax. Often it has special support for enumerations i.e. lists. cabal-like format doesn't. All fields are just opaque text. Depending on how you look at it, that's the good or bad thing.</p>
<p>Surely, specifying build dependencies like:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="fu">dependencies</span><span class="kw">:</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a><span class="at">  </span><span class="kw">-</span><span class="at"> base &gt;= 4.13 &amp;&amp; &lt; 5</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a><span class="at">  </span><span class="kw">-</span><span class="at"> bytestring</span></span></code></pre></div>
<p>makes the list structure clear for consumers. However, e.g. <a href="https://github.com/sol/hpack"><code>hpack</code></a> doesn't use list syntax uniformly: <code>ghc-options</code>, which is a list-typed field, is still an opaque text field in <code>hpack</code> package description.</p>
<p>And individual package dependencies are also just opaque text fields, there isn't even a split between a package name and the version range.</p>
<p>On the other hand, in cabal-like envelope there simply aren't any built-in "types": no lists, no numbers, no booleans. As a actual file format designer you need to choose how to represent them, allowing you to pick the format best suited for the domain. For example, in <code>.cabal</code> files we don't need to write versions in quotes, even if we have single digit versions!</p>
<p>For the purpose of automatic "exact-print" editing, it would be best if envelope format supported as much of needed structure as possible (e.g. there would be package name and version range split). <a href="https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html">For example in Cargo</a></p>
<pre class="toml"><code>[dependencies]
time = &quot;0.1.12&quot;</code></pre>
<p>the separation is there.</p>
<p>OTOH, there is a gotcha:</p>
<blockquote>
Leaving off the caret is a simplified equivalent syntax to using caret requirements. While caret requirements are the default, it is recommended to use the simplified syntax when possible.
</blockquote>
<p>I'm quite sure that a lot of ad-hoc tools work only with simplified syntax.</p>
<p>Having simple envelope format is then probably the second best. If some file-format specific parsing has to be written anyway (e.g. to parse version ranges), dealing with a bit more complex stuff (like lists in <code>.cabal</code> <code>build-depends</code>) shouldn't be considerably more effort.</p>
<h2 id="greibach-lexing-parsing">Greibach lexing-parsing</h2>
<p>In formal language theory, a context-free grammar is in <a href="https://en.wikipedia.org/wiki/Greibach_normal_form">Greibach normal form (GNF)</a> if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables:</p>
<pre class="text"><code>A → xBC
A → yDE</code></pre>
<p>This suggests a representation for parsing procedure output, which looks like token stream (can be lazily constructed and consumed), but does represent a complete parse result, not just the result of lexing.</p>
<p>The idea is to have a continuation parameter for each production, <code>A</code> may start with <code>X</code> (<code>A1</code> constructor) and then continue with <code>B</code>, which continues with <code>C</code> and then eventually with <code>k</code>.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">A</span> e k</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>  <span class="ot">=</span> <span class="dt">A1</span> <span class="dt">X</span> (<span class="dt">B</span> e (<span class="dt">C</span> e k))</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a>  <span class="op">|</span> <span class="dt">A2</span> <span class="dt">Y</span> (<span class="dt">D</span> e (<span class="dt">E</span> e k))</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>  <span class="op">|</span> <span class="dt">A_Err</span> e</span></code></pre></div>
<p>Additionally have an error constructor so the possible parse errors are embedded somewhere later in t he stream. So there is <code>A = ... | A_Err</code>, <code>B = ... | B_Err</code> etc.</p>
<p>This may sound complicated, but it isn't. For simple grammars, the tokens stream type isn't that complicated. See for example <code>aeson</code>'s <a href="https://hackage.haskell.org/package/aeson-2.2.2.0/docs/Data-Aeson-Decoding-Tokens.html"><code>Tokens</code></a>. For JSON value, the <code>Tokens</code> looks almost like the <a href="https://hackage.haskell.org/package/aeson-2.2.2.0/docs/Data-Aeson-Types.html#t:Value"><code>Value</code></a> type, but it does preserve more of the grammar. For example, the key order in maps is "as written", etc. This is sometimes important distinction: do you want a syntax representation or a value representation.</p>
<p>A cabal-like envelope format is also a simple grammar, which can be parsed into similar token stream. In <code>cabal-fields</code> it looks like</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">AnnByteString</span> ann <span class="ot">=</span> <span class="dt">ABS</span> ann <span class="ot">{-# UNPACK #-}</span> <span class="op">!</span><span class="dt">ByteString</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a>  <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>, <span class="dt">Functor</span>, <span class="dt">Foldable</span>, <span class="dt">Traversable</span>)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">Tokens</span> ann k e</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true"></a>    <span class="ot">=</span> <span class="dt">TkSection</span> <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) (<span class="dt">Tokens</span> ann (<span class="dt">Tokens</span> ann k e) e)</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkField</span> <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) <span class="op">!</span>ann (<span class="dt">TkFieldLines</span> ann (<span class="dt">Tokens</span> ann k e) e)</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkComment</span> <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) (<span class="dt">Tokens</span> ann k e)</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkEnd</span> k</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkErr</span> e</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true"></a>  <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">TkFieldLines</span> ann k e</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true"></a>    <span class="ot">=</span> <span class="dt">TkFieldLine</span> <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) (<span class="dt">TkFieldLines</span> ann k e)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkFieldComment</span> <span class="op">!</span>(<span class="dt">AnnByteString</span> ann) (<span class="dt">TkFieldLines</span> ann k e)</span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkFieldEnd</span> k</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true"></a>    <span class="op">|</span> <span class="dt">TkFieldErr</span> e</span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true"></a>  <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>, <span class="dt">Functor</span>, <span class="dt">Foldable</span>, <span class="dt">Traversable</span>)</span></code></pre></div>
<p>compare this to <a href="https://hackage.haskell.org/package/Cabal-syntax-3.12.0.0/docs/Distribution-Fields-Field.html#t:Field">the <code>Field</code> type</a> in <code>Cabal-syntax</code>; not considerably more complicated.</p>
<p>A benefit of Greibach-parsing is that it's relatively easy to write FFI-able parsers in C. We don't need to create AST, we can have lexer-like interface, leaving the handling of AST creation to the host language. The parser implementation can be embedded as easily embedded into e.g. Haskell or Python.</p>
<h2 id="cabal-and-braces">Cabal and braces</h2>
<p>I must admit I like cabal-like format a lot. It's simplicity and free-formness make it good fit for almost any kind of configuration.</p>
<p>But there is a feature that I very much dislike.</p>
<p>While <code>.cabal</code> files are perceived to have white-space layout, there's actually a curly-braces option. With curly-braces you can write the whole <code>.cabal</code> file on a single line!</p>
<p>If you look (but I don't recommend) <a href="https://github.com/haskell/cabal/blob/88c81c92751c0beb36fb7c508ccf06c682a4f9a2/Cabal-syntax/src/Distribution/Fields/Parser.hs#L178">into grammar for cabal envelope, the handling of curly-braces</a>, and especially how it interacts with whitespace layout, you'll see some horrific stuff.</p>
<p>You can write</p>
<pre class="text"><code>test-suite hkd-example
  default-language: Haskell2010
  type: { exitcode-stdio-1.0 } main-is:HKD.hs 
  hs-source-dirs:   test
  build-depends:
      base
    , some</code></pre>
<p>but I kindly ask you, please don't. :P</p>
<p>In short, for a feature used (or known) as little (but surprisingly a lot, 2.5% of Hackage; more than packages using tabs!), it adds quite a lot of complexity! And I'd argue that the syntax is not natural. And if it wasn't there, it wouldn't be added today.</p>
<p>So for now, I simply don't support it. <code>Cabal-syntax</code> must support all the stuff and warts, <code>cabal-fields</code> doesn't.</p>
<h2 id="cabal-and-section-arguments">Cabal and section arguments</h2>
<p>Another gotcha in cabal envelope format is that while the field contents are opaque, the section arguments (e.g. the <code>test-suite</code> name, or expression in <code>if</code>) is actually lexed. It's non an opaque string, i.e. cannot be arbitrary.</p>
<p>The only case where it makes a difference i can think of top my head, is to allow end-of-line comments. E.g. you can today write (and some did / do on Hackage):</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode cabal"><code class="sourceCode cabal"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a>flag tagged -- toggle me</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a>  default<span class="dt">:</span> True</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true"></a>  manual<span class="dt">:</span> True</span></code></pre></div>
<p>but I wouldn't recommend.</p>
<p>This is <em>the only</em> case where you can have a comment on otherwise non-whitespace line. E.g. if you write</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode cabal"><code class="sourceCode cabal"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>  <span class="dt">build-depends:</span> base <span class="dv">&lt;5</span> -- i feel lucky</span></code></pre></div>
<p>that won't work, the <code>-- i feel lucky</code> will be considered as part of the field content.</p>
<p>Doing it so makes the envelope format simpler: there are no escaping on the envelope level. If you escape something in e.g. <code>description:</code> field, it's handled by only by haddock. That avoids double escaping head-aches.</p>
<p>So <code>cabal-fields</code> treats section arguments as an opaque text. If you have a end-of-line comment on that line, it will be included.</p>
<h2 id="c-interface">C interface</h2>
<p>The <code>cabal-fields</code> library was first prototyped in Haskell and has safe interface. However, C doesn't have sum types, nor polymorphic recursion nor many safety features at all. So the C version looks an ordinary lexer interface would. But there is a guarantee that only valid cabal-like files will be recognised, so the token stream will be well-formed; or an error token will be returned.</p>
<h2 id="python-interface">Python interface</h2>
<p>I tested the pure Haskell version against Haskell-using-C-FFI version. They behave the same (against the most of Hackage).</p>
<p>The goal however was to parse <code>.cabal</code> files with Python. People do complain that they cannot modify the <code>.cabal</code> files with Python. Why I don't understand why you'd use Python, if you can use Haskell, but not you "can" use Python too.</p>
<p>The <a href="https://github.com/phadej/cabal-fields/blob/4bcc7e02a116ef4fca912bb998f0d9fa41439c3a/python/cabalfields-demo.py"><code>cabalfields-demo.py</code></a> by default parsers and exact-prints the input files:</p>
<pre class="text"><code>% python3 cabalfields-demo.py ../cabal-fields/cabal-fields.cabal 
../cabal-fields/cabal-fields.cabal
??? same: True
cabal-version: 2.4
name:          cabal-fields
version:       0.1
synopsis:      An alternative parser for .cabal like files
description:   An alternative parser for @.cabal@ like files.
...</code></pre>
<p>with intermediate types which look like:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a><span class="kw">class</span> Field:</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, name, name_pos, colon_pos, contents):</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true"></a>        <span class="va">self</span>.name <span class="op">=</span> name</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true"></a>        <span class="va">self</span>.name_pos <span class="op">=</span> name_pos</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true"></a>        <span class="va">self</span>.colon_pos <span class="op">=</span> colon_pos</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true"></a>        <span class="va">self</span>.contents <span class="op">=</span> contents</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true"></a></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true"></a><span class="kw">class</span> Section:</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, name, name_pos, args, contents):</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true"></a>        <span class="va">self</span>.name <span class="op">=</span> name</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true"></a>        <span class="va">self</span>.name_pos <span class="op">=</span> name_pos</span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true"></a>        <span class="va">self</span>.args <span class="op">=</span> args</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true"></a>        <span class="va">self</span>.contents <span class="op">=</span> contents</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true"></a></span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true"></a><span class="kw">class</span> FieldLine:</span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, contents, pos):</span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true"></a>        <span class="va">self</span>.contents <span class="op">=</span> contents</span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true"></a>        <span class="va">self</span>.pos <span class="op">=</span> pos</span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true"></a></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true"></a><span class="kw">class</span> Comment:</span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, contents, pos):</span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true"></a>        <span class="va">self</span>.contents <span class="op">=</span> contents</span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true"></a>        <span class="va">self</span>.pos <span class="op">=</span> pos</span></code></pre></div>
<p>I haven't yet added any modification or consistency functionality.</p>
<p>It would be simpler to edit the structure if instead of absolute positions, there would be differences; and the pretty-printer would check that differences are consistent (i.e. there are needed newlines, enough indentation etc). That shouldn't be too difficult of an exercise to do. It might be easier to do on Haskell version first (types do help).</p>
<p>Perfectly, the C library would also contain a builder. But it needs a prototype first.</p>
<p>Also it's easier to write parsers in C, we don't need to think of memory allocation: the tokens returned are splices of the input byte array. Inn printing we need to have some kind of a builder abstraction: we would like to have an interface which can be used to produce both continuous strict byte-array for Python (using custom allocators), but also lazy <code>ByteString</code> in Haskell.</p>
<h2 id="conclusion">Conclusion</h2>
<p><code>cabal-fields</code> is a library for parsing <code>.cabal</code> like files. It is using Greibach lexing-parsing approach. It doesn't support curly braces, and slightly differs in how it handles section arguments. There is also a C implementation. With a Python module using it. And small demo of exact-printing <code>.cabal</code> like files from Python.</p>
]]></summary>
</entry>
<entry>
    <title>A note about coercions</title>
    <link href="https://oleg.fi/gists/posts/2024-04-21-a-note-about-coercions.html" />
    <id>https://oleg.fi/gists/posts/2024-04-21-a-note-about-coercions.html</id>
    <published>2024-04-21T00:00:00Z</published>
    <updated>2024-04-21T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-04-21
    
        by Oleg Grenrus
    

    
</div>

<p>Safe coercions in GHC are a very powerful feature. However, they are not perfect; and already many years ago I was also thinking about how we could make them more expressive.</p>
<p>In particular such things like "higher-order roles" have been buzzing. For the record, I don't think <a href="https://github.com/ghc-proposals/ghc-proposals/pull/233">Proposal #233</a> is great; but because that proposal is almost four years old, I don't remember why; nor I have tangible counter-proposal either.</p>
<p>So I try to recover my thoughts.</p>
<p>I like to build small prototypes; and I wanted to build a small language with zero-cost coercions.</p>
<p>The first approach, I present here, <em>doesn't work</em>.</p>
<p>While it allows model coercions, and very powerful ones, these coercions <em>are not zero-cost</em> as we will see. For language like GHC Haskell where being zero-cost is non-negotiable requirement, this simple approach doesn't work.</p>
<p>The small "formalisation" is in Agda file <a href="https://gist.github.com/phadej/5cf29d6120cd27eb3330bc1eb8a5cfcc" class="uri">https://gist.github.com/phadej/5cf29d6120cd27eb3330bc1eb8a5cfcc</a></p>
<h2 id="syntax">Syntax</h2>
<p>We start by defining syntax. Our language is "simple": there are types</p>
<pre class="text"><code>A, B = A -&gt; B     -- function type, &quot;arrow&quot;</code></pre>
<p>coercions</p>
<pre class="text"><code>co = refl A        -- reflexive coercion
   | sym co        -- symmetric coercions
   | arr co₁ co₂   -- coercion of arrows built from codomain and domain
                   -- type coercions</code></pre>
<p>and terms</p>
<pre class="text"><code>f, t, s = x         -- variable
        | f t       -- application
        | λ x . t   -- lambda abstraction
        | t ▹ co    -- cast</code></pre>
<p>Obviously we'd add more stuff (in particular, I'm interested in expanding coercion syntax), but these are enough to illustrate the problem.</p>
<p>Because the language is simple (i.e. not dependent), we can define typing rules and small step semantics independently.</p>
<h2 id="typing">Typing</h2>
<p>There is nothing particularly surprising in typing rules.</p>
<p>We'll need a "well-typed coercion" rules too though, but these are also very straigh-forward</p>
<pre class="text"><code>Coercion Typing:  Δ ⊢ co : A ≡ B

------------------
Δ ⊢ refl A : A ≡ A

Δ ⊢ co : A ≡ B
------------------
Δ ⊢ sym co : B ≡ A

Δ ⊢ co₁ : C ≡ A
Δ ⊢ co₂ : D ≡ B
-------------------------------------
Δ ⊢ arr co₁ co₂ : (C -&gt; D) ≡ (A -&gt; B)</code></pre>
<p>Terms typing rules are using two contexts, for term and coercion variables (GHC has them in one, but that is unhygienic, there's <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/17291">a GHC issue about that</a>). The rules for variables, applications and lambda abstractions are as usual, the only new is the typing of the cast:</p>
<pre class="text"><code>Term Typing: Γ; Δ ⊢ t : A

Γ; Δ ⊢ t : A 
   Δ ⊢ co : A ≡ B
-------------------------
Γ; Δ ⊢ t ▹ co : B </code></pre>
<p>So far everything is good.</p>
<p>But when playing with coercions, it's important to specify the reduction rules too. Ultimately it would be great to show that we could erase coercions either before or after reduction, and in either way we'll get the same result. So let's try to specify some reduction rules.</p>
<h2 id="reduction-rules">Reduction rules</h2>
<p>Probably the simplest approach to reduction rules is to try to inherit most reduction rules from the system without coercions; and consider coercions and casts as another "type" and "elimination form".</p>
<p>An elimination of refl would compute trivially:</p>
<pre class="text"><code>t ▹ refl A ~~&gt; t</code></pre>
<p>This is good.</p>
<p>But what to do when cast's coercion is headed by <code>arr</code>?</p>
<pre class="text"><code>t ▹ arr co₁ co₂ ~~&gt; ???</code></pre>
<p>One "easy" solution is to eta-expand t, and split the coercion:</p>
<pre class="text"><code>t ▹ arr co₁ co₂ ~~&gt; λ x . t (x ▹ sym co₁) ▹ co₂</code></pre>
<p>We cast an argument before applying it to the function, and then cast the result. This way the reduction is type preserving.</p>
<p><strong>But this approach is not zero-cost</strong>.</p>
<p>We could not erase coercions completely, we'll still need some indicator that there were an arrow coercion, so we'll remember to eta-expand:</p>
<pre class="text"><code>t ▹ ??? ~~&gt; λ x . t x</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>Treating coercions as another type constructor with cast operation being its elimination form may be a good first idea, but is not good enough. We won't be able to completely erase such coercions.</p>
<p>Another idea is to complicate the system a bit. We could "delay" coercion elimination until the result is scrutinised by another elimination form, e.g. in application case:</p>
<pre class="text"><code>(t ▹ arr co₁ co₂) s ~~&gt; t (s ▹ sym co₁) ▹ co₂ </code></pre>
<p>And that is the approach taken in <a href="https://www.seas.upenn.edu/~sweirich/papers/coercible-JFP.pdf">Safe Zero-cost Coercions for Haskell</a>, you'll need to look into JFP version of the paper, as that one has appendices.</p>
<blockquote>
(We do not have space to elaborate, but a key example is the use
of <code>nth</code> in rule <code>S_KPUSH</code>, presented in the extended version
of this paper.)
</blockquote>
<p>The rule <code>S_Push</code> looks some what like:</p>
<pre class="text"><code>---------------------------------------------- S_Push
(t ▹ co) s ~~&gt; t (s ▹ sym (nth₁ co)) ▹ nth₂ co</code></pre>
<p>where we additionally have <code>nth</code> coercion constructor to <em>decompose</em> coercions.</p>
<p>Incidentally there was, technically is, <a href="https://github.com/ghc-proposals/ghc-proposals/pull/276">a proposal to remove decomposition rule</a>, but it's a wrong solution to the known problem. The problem and a proper solution was kind of already identified in the original paper</p>
<blockquote>
We could similarly imagine a lattice keyed by classes whose instance
definitions are to be respected; with such a lattice, we could allow the coercion of
<code>Map Int v</code> to <code>Map Age v</code> precisely when <code>Int</code>’s and <code>Age</code>’s <code>Ord</code> instances correspond.
</blockquote>
<p>The original paper also identified the need for higher-order roles. And also identified that</p>
<blockquote>
This means that <code>Monad</code> instances could be defined only for types
that expect a representational parameter.
</blockquote>
<p>which <a href="https://oleg.fi/gists/posts/2019-07-31-fmap-coerce-coerce.html">I argue should be already required for <code>Functor</code></a> (and <code>traverseBia</code> hack with unlawful <code>Mag</code> would still work if GHC had unboxed representational coercions, i.e. GADTs with baked-in representational (not only nominal) coercions).</p>
<p>There also the mention of <em>unidirectional <code>Coercible</code></em>, which people <a href="https://github.com/ghc-proposals/ghc-proposals/issues/198">asked about later</a> and <a href="https://discourse.haskell.org/t/one-way-coercible/9242">recently</a>:</p>
<blockquote>
Such uni-directional version of <code>Coercible</code> amounts to <i>explicit
inclusive subtyping</i> and is more complicated than our current symmetric system.
</blockquote>
<p>It is fascinating that authors were able to predict the relevant future work so well. And I'm thankful that GHC got <code>Coercible</code> implemented even it was already known to not be perfect. It's useful nevertheless. But I'm sad that there haven't been any results of future work since.</p>
]]></summary>
</entry>
<entry>
    <title>What makes a good compiler warning?</title>
    <link href="https://oleg.fi/gists/posts/2024-04-18-warnings-criteria.html" />
    <id>https://oleg.fi/gists/posts/2024-04-18-warnings-criteria.html</id>
    <published>2024-04-18T00:00:00Z</published>
    <updated>2024-04-18T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-04-18
    
        by Oleg Grenrus
    

    
</div>

<p>Recently I came up with a criteria for a good warning to have in a compiler:</p>
<blockquote>
If compiler makes a choice, or has to deal with some complication, it may well tell about that.
</blockquote>
<p>That made me think about warnings I implemented into GHC over the years. They are fine.</p>
<p>Let us first understand the criteria better. It is better explained by an example which triggers few warnings:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="ot">foo ::</span> <span class="dt">Char</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a>foo <span class="ot">=</span> <span class="kw">let</span> x <span class="ot">=</span> <span class="ch">&#39;x&#39;</span> <span class="kw">in</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a>      <span class="kw">let</span> x <span class="ot">=</span> <span class="ch">&#39;y&#39;</span> <span class="kw">in</span> x</span></code></pre></div>
<p>First warning is <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wname-shadowing"><code>-Wname-shadowing</code></a>:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a>Shadow.hs<span class="op">:</span><span class="dv">3</span><span class="op">:</span><span class="dv">11</span><span class="op">:</span> warning<span class="op">:</span> [<span class="op">-</span><span class="dt">Wname</span><span class="op">-</span>shadowing]</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a>    <span class="dt">This</span> binding for ‘x’ shadows the existing binding</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a>      bound at Shadow.hs<span class="op">:</span><span class="dv">2</span><span class="op">:</span><span class="dv">11</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>  <span class="op">|</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true"></a><span class="dv">3</span> <span class="op">|</span>       <span class="kw">let</span> x <span class="ot">=</span> <span class="ch">&#39;y&#39;</span> <span class="kw">in</span> x</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true"></a>  <span class="op">|</span>           <span class="op">^</span></span></code></pre></div>
<p>When resolving names (i.e. figuring out what textual identifiers refer to) compilers have a choice what to do with duplicate names. The usual choice is to pick the closest reference, shadowing others. But it's not the only choice, and not the only choice GHC does in similar-ish situations. e.g. module's top-level definition <em>do not shadow</em> imports; instead an ambiguous name error is reported. Also <code>\ x x -&gt; x</code> is rejected (treated as a non-linear pattern), but <code>\x -&gt; \x -&gt; x</code> is accepted (two separate patterns, inner one shadows). So, in a way, <code>-Wname-shadowing</code> reminds us what GHC does.</p>
<p>Another warning in the example is <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wunused-binds"><code>-Wunused-binds</code></a>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a>Shadow.hs<span class="op">:</span><span class="dv">2</span><span class="op">:</span><span class="dv">11</span><span class="op">:</span> warning<span class="op">:</span> [<span class="op">-</span><span class="dt">Wunused</span><span class="op">-</span>local<span class="op">-</span>binds]</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a>    <span class="dt">Defined</span> but <span class="fu">not</span> used<span class="op">:</span> ‘x’</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true"></a>  <span class="op">|</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true"></a><span class="dv">2</span> <span class="op">|</span> foo <span class="ot">=</span> <span class="kw">let</span> x <span class="ot">=</span> <span class="ch">&#39;x&#39;</span> <span class="kw">in</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true"></a>  <span class="op">|</span>           <span class="op">^</span></span></code></pre></div>
<p>This a kind of warning that compiler might figure out in the optimisation passes (I'm not sure if GHC always tracks usage, but IIRC GCC had some warnings triggered only when optimisations are on). When doing usage analysis, compiler may figure out that some bindings are unused, so it doesn't need to generate code for them. At the same time it may warn the user.</p>
<h2 id="more-examples">More examples</h2>
<p>Let go through few of the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#warnings-and-sanity-checking">numerous warnings GHC can emit</a>.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Woverflowed-literals"><code>-Woverflowed-literals</code></a> causes a warning to be emitted if a literal will overflow. It's not strictly a compiler choice, but a choice nevertheless in <code>base</code>'s <code>fromInteger</code> implementations. For most types <a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> the <code>fromInteger</code> is a total function with rollover behavior: <code>300 :: Word8</code> is <code>44 :: Word8</code>. It could been chosen to not be total too, and IMO that would been ok if <code>fromInteger</code> were used <em>only</em> for desugaring literals.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wderiving-defaults"><code>-Wderiving-defaults</code></a>: Causes a warning when both <code>DeriveAnyClass</code> and <code>GeneralizedNewtypeDeriving</code> are enabled and no explicit deriving strategy is in use. This a great example of a choice compiler makes. I actually don't remember which method GHC picks then, so it's good that compiler reminds us that it is good idea to be explicit (using <code>DerivingStrategies</code>).</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wincomplete-patterns"><code>-Wincomplete-patterns</code></a> warns about places where a pattern-match might fail at runtime. This a complication compiler has to deal with. Compiler needs to generate some code to make all pattern matches complete. An easy way would been to always implicitly default cases to all pattern matches, but that would have performance implications, so GHC checks pattern-match coverage, and as a side-product may report incomplete pattern matches (or <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Winaccessible-code"><code>-Winaccesible-code</code></a>) <a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wmissing-fields"><code>-Wmissing-fields</code></a> warns you whenever the construction of a labelled field constructor isn’t complete, missing initialisers for one or more fields. Here compiler needs to fill the missing fields with something, so it warns when it does.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Worphans"><code>-Worphans</code></a> gets an honorary mention. Orphans cause so much incidental complexity inside the compiler, that I'd argue that <code>-Worphans</code> should be enabled by default (and not only in <code>-Wall</code>).</p>
<h2 id="bad-warnings">Bad warnings</h2>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wmissing-import-lists"><code>-Wmissing-import-lists</code></a> warns if you use an unqualified <code>import</code> declaration that does not explicitly list the entities brought into scope. I don't think that there are any complications or choices compiler needs to deal with, therefore I think this warning should been left for style checkers. (I very rarely have import lists for modules from the same package or even project; and this is mostly a style&amp;convenience choice).</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wprepositive-qualified-module"><code>-Wprepositive-qualified-module</code></a> is even more of an arbitrary style check. With <code>-Wmissing-import-lists</code> it is generally accepted that explicit import lists are better for compatibility (and for GHCs recompilation avoidance). Whether you place <code>qualified</code> before or after the module name is a style choice. I think this warning shouldn't exist in GHC. (For the opposite you'd need a style checker to warn if <code>ImportQualifiedPost</code> is enabled anywhere).</p>
<p>Note, while <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wtabs"><code>-Wtabs</code></a> is also mostly a style issue, but the compiler has to make a choice how to deal with them. Whether to always convert tabs to 8 spaces, convert to next 8 spaces boundary, require indentation to be exactly the same spaces&amp;tabs combination. All choices are sane (and I don't know which one GHC makes), so a warning to avoid tabs is justified.</p>
<h2 id="compatibility-warnings">Compatibility warnings</h2>
<p>Compatibility warnings are usually good also according to my criteria. Often it is the case that there is an old and a new way of doing things. Old way is going to be removed, but before removing it, it is deprecated.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wsemigroup"><code>-Wsemigroup</code></a> warned about <code>Monoid</code> instances without <code>Semigroup</code> instances. (A warning which you shouldn't be able to trigger with recent GHCs). Here we could not switch to new hierarchy immediately without breaking some code, but we could check whether the preconditions are met for awhile.</p>
<p><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wtype-equality-out-of-scope"><code>-Wtype-equality-out-of-scope</code></a> is somewhat similar. For now, there is some compatibility code in GHC, and GHC warns when that fallback code path is triggered.</p>
<h2 id="my-warnings">My warnings</h2>
<p>One of the warning I added is <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wmissing-kind-signatures"><code>-Wmissing-kind-signatures</code></a>. For long time GHC didn't have a way to specify kind signatures until <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/poly_kinds.html#extension-StandaloneKindSignatures"><code>StandaloneKindSignatures</code></a> were added in GHC-8.10. Without kind signatures GHC must <em>infer</em> kind of a data type or type family declaration. With kind signature it could just check against given kind (which is a technically a lot easier). So while the warning isn't actually implemented so, it could be triggered when GHC notices it needs to infer a kind of a definition. In the implementation the warning is raised <em>after</em> the type-checking phase, so the warning can include the inferred kind. However, we can argue that when inference fails, GHC could also mention that the kind signature was missing. Adding a kind signature often results in better kind errors (c.f. adding a type signature often results in a better type error when something is wrong).</p>
<p>The <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wmissing-poly-kind-signatures"><code>-Wmissing-poly-kind-signatures</code></a> warning seems like a simple restriction of above, but it's not exactly true. There is another problem GHC deals with. When GHC infers a kind, there might be unsolved meta-kind variables left, and GHC has to do something to them. With <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/poly_kinds.html#extension-PolyKinds"><code>PolyKinds</code></a> extension on, GHC <em>generalises</em> the kind. For example when inferring a kind of <code>Proxy</code> as in</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">Proxy</span> a <span class="ot">=</span> <span class="dt">Proxy</span></span></code></pre></div>
<p>GHC infers that the kind is <code>k -&gt; Type</code> for some <code>k</code> and with <code>PolyKinds</code> it generalises it to <code>type Proxy :: forall {k}. k -&gt; Type</code>. Another option, which GHC also may do (and does when <code>PolyKinds</code> are not enabled) is to <em>default</em> kinds to <code>Type</code>, i.e. <code>type Proxy :: Type -&gt; Type</code>. There is no warning for kind defaulting, but arguable there should be as defaulted kinds may be wrong. (Haskell98 and Haskell2010 don't have a way to specify kind signatures; that is clear design deficiency; which was first resolved by <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/kind_signatures.html"><code>KindSignatures</code></a> and finally more elegantly by <code>StandaloneKindSignatures</code>).</p>
<p>There is defaulting for type variables, and (in some cases) GHC warns about them. You probably have seen <code>Defaulting the type variable ‘a0’ to type ‘Integer’</code> warnings caused by <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wtype-defaults"><code>-Wtype-defaults</code></a>. Adding <code>-Wkind-defaults</code> to GHC makes sense, even only for uniformity between (types of) terms and types; or arguably nowadays it is a sign that you should consider enabling <code>PolyKinds</code> in that module.</p>
<h2 id="about-errors">About errors</h2>
<p>The warning criteria also made me think about the following: the error <em>hints</em> are by necessity imprecise. If compiler knew exactly how to fix an issue, maybe it should just fix it and instead only raise a warning.</p>
<p>GHC has few of such errors. For example when using a syntax guarded by an extension. It can be argued (and IIRC was recently argued in discussions around GHC language editions) that another design approach would be simply accept new syntax, but just warn about it. The current design approach where extensions are "feature flags" providing some forward and backward compatibility is also defendable.</p>
<p>Conversely, if there is a case where compiler kind-of-knows what the issue is, but the language is not powerful enough for compiler to fix the problem on its own, the only solution is to raise an error. Well, there is another: (find a way to) extend the language to be more expressive, so compiler could deal with the currently erroneous case. Easier said than done, but in my opinion worth trying.</p>
<p>An example of above would be <a href="https://github.com/ghc-proposals/ghc-proposals/issues/544#issuecomment-1300407772"><code>-Wmissing-binds</code></a> . Currently writing a type signature without a corresponding binding is a hard error. But compiler could as well fill it in with a dummy one, That would complement <code>-Wmissing-methods</code> and <code>-Wmissing-fields</code>. Similarly for types, a standalone kind signature tells the compiler already a lot about the type even without an actual definition: the rest of the module can treat it as an opaque type.</p>
<p>Another example is briefly mentioned making module-top-level definitions shadow imports. That would make adding new exports (e.g. to implicitly imported <code>Prelude</code>) less affecting. While we are on topic of names, GHC could also report early when imported modules have ambiguous definitions, e.g.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Text.Lazy</span> <span class="kw">as</span> <span class="dt">Lazy</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.ByteString.Lazy</span> <span class="kw">as</span> <span class="dt">Lazy</span></span></code></pre></div>
<p>doesn't trigger any warnings. But if you try to use <code>Lazy.unpack</code> you get an ambiguous occurrence error. GHC already deals with the complications of ambiguous names, it could as well have an option to report them early.</p>
<h2 id="conclusion">Conclusion</h2>
<blockquote>
If compiler makes a choice, or has to deal with some complication, it may well tell about that.
</blockquote>
<p>Seems like a good criteria for a good compiler warning. As far as I can tell most warnings in GHC pass it; but I found few "bad" ones too. And also identified at least one warning-worthy case GHC doesn't warn about.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>With <code>-XNegativeLiterals</code> and <code>Natural</code>, <code>fromInteger</code> may result in run-time error though, for example:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="op">&lt;</span>interactive<span class="op">&gt;:</span><span class="dv">6</span><span class="op">:</span><span class="dv">1</span><span class="op">:</span> warning<span class="op">:</span> [<span class="op">-</span><span class="dt">Woverflowed</span><span class="op">-</span>literals]</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>    <span class="dt">Literal</span> <span class="op">-</span><span class="dv">1000</span> is negative but <span class="dt">Natural</span> only supports positive numbers</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a><span class="op">***</span> <span class="dt">Exception</span><span class="op">:</span> arithmetic underflow</span></code></pre></div>
<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn2" role="doc-endnote"><p>Using [<code>-fmax-pmcheck-models</code>] we could <em>almost</em> turn off GHCs pattern-match coverage checker, which will make GHC consider (almost) all pattern matches as incomplete. So <code>-Wincomplete-patterns</code> is kind of an example of a warning which is powered by an "optional" analysis is GHC.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
]]></summary>
</entry>
<entry>
    <title>Core Inspection</title>
    <link href="https://oleg.fi/gists/posts/2024-04-12-core-inspection.html" />
    <id>https://oleg.fi/gists/posts/2024-04-12-core-inspection.html</id>
    <published>2024-04-12T00:00:00Z</published>
    <updated>2024-04-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-04-12
    
        by Oleg Grenrus
    

    
</div>

<p><a href="https://hackage.haskell.org/package/inspection-testing"><code>inspection-testing</code></a> was created over five years ago. You may want to glance over Joachim Breitner <a href="https://dl.acm.org/doi/10.1145/3242744.3242748">A promise checked is a promise kept: inspection testing</a>) Haskell Symposium paper introducing it.</p>
<p>Already in 2018 I thought it's a fine tool, but it's more geared towards /library/ writers. They can check on (some) examples that the promises they make about the libraries they write work at least on some examples.</p>
<p>What we cannot do with current <code>inspection-testing</code> is check that the actual "real-life" use of the library works as intended.</p>
<p>Luckily, relatively recently, GHC got a feature to include <em>all Core bindings</em> in the interface files. While <a href="https://well-typed.com/blog/2023/02/interface-files-with-core/">the original motivation</a> is different (to make Template Haskell run fast), the <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/phases.html#ghc-flag--fwrite-if-simplified-core">-fwrite-if-simplified-core</a> enables us to inspect (as in inspection testing) the "production" Core (not the test examples).</p>
<p>The <a href="https://github.com/phadej/cabal-extras/tree/master/cabal-core-inspection"><code>cabal-core-inspection</code></a> is a very quick &amp; dirty proof-of-concept of this idea.</p>
<p>Let me illustrate this with two examples.</p>
<p>In neither example I need to do any test setup, other than configuring <code>cabal-core-inspection</code> (though configuration is now hardcoded). Compare that to configuring e.g. HLint (HLint has user definable rules, and these are actually powerful tool). In fact, <code>cabal-core-inspection</code> is nothing more than a linter for Core.</p>
<h2 id="countchars">countChars</h2>
<p>First example is <code>countChars</code> as in Haskell Symposium Paper.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="ot">countChars ::</span> <span class="dt">ByteString</span> <span class="ot">-&gt;</span> <span class="dt">Int</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a>countChars <span class="ot">=</span> T.length <span class="op">.</span> T.toUpper <span class="op">.</span> TE.decodeUtf8</span></code></pre></div>
<p>The promise is (actually: was) that no intermediate <code>Text</code> values are created.</p>
<p>As far as I know, we cannot use <code>inspection-testing</code> in its current form to check anything about non-local bindings, so if <code>countChars</code> is defined in an application, we would need to duplicate its definition in the test-suite to inspect it. That is not great.</p>
<p>With Core inspection, we can look at the actual Core of the module (as it is in the compiler interface file).</p>
<p>The prototype doesn't have any configuration, but if we imagine it has we could ask it to check that <code>Example.countChars</code> should not contain type <code>Text</code>. The prototype prints</p>
<pre class="text"><code>Text value created with decodeUtf8With1 in countChars</code></pre>
<p>So that's not the case. The intermediate <code>Text</code> value is created. In fact, nowadays <code>text</code> doesn't promise that <code>toUpper</code> fuses with anything.</p>
<p>A nice thing about <code>cabal-core-inspection</code> that (in theory) it could check <em>any definition in any module</em> as long as it's compiled with <code>-fwrite-if-simplified-core</code>. So we could check things for our friends, if we care about something specific.</p>
<h2 id="no-generics">no Generics</h2>
<p>Second example is about GHC.Generics. I use a simple generic equality, but this could apply to any <code>GHC.Generics</code> based deriving. (You should rather use <code>deriving stock Eq</code>, but generic equality is a simplest example which I remembered for now).</p>
<p>The generic equality might be defined in a library. And library author may actually have tested it with <code>inspection-testing</code>. But does it work on our type?</p>
<p>If we have</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">T</span> <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a>    <span class="dt">T1</span><span class="ot"> ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Char</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true"></a>    <span class="dt">T2</span><span class="ot"> ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Double</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true"></a>  <span class="kw">deriving</span> <span class="dt">Generic</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true"></a><span class="kw">instance</span> <span class="dt">Eq</span> <span class="dt">T</span> <span class="kw">where</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true"></a>    (<span class="op">==</span>) <span class="ot">=</span> genericEq</span></code></pre></div>
<p>it does. The <code>cabal-core-inspection</code> doesn't complain.</p>
<p>But if we add a third constructor</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">T</span> <span class="kw">where</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>    <span class="dt">T1</span><span class="ot"> ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Char</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a>    <span class="dt">T2</span><span class="ot"> ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Double</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>    <span class="dt">T3</span><span class="ot"> ::</span> <span class="dt">ByteString</span> <span class="ot">-&gt;</span> <span class="dt">T.Text</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span></code></pre></div>
<p><code>cabal-core-inspection</code> barfs:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="dt">Found</span> <span class="dt">L1</span> from <span class="dt">GHC.Generics</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a><span class="dt">Found</span> <span class="op">:*:</span> from <span class="dt">GHC.Generics</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a><span class="dt">Found</span> <span class="dt">R1</span> from <span class="dt">GHC.Generics</span></span></code></pre></div>
<p>The <code>T</code> becomes too large for GHC to want inline all the generics stuff.</p>
<p>It won't be fair to blame the library author, for example for</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">T</span> <span class="kw">where</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>    <span class="dt">T1</span><span class="ot"> ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a>    <span class="dt">T2</span><span class="ot"> ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true"></a>    <span class="dt">T3</span><span class="ot"> ::</span> <span class="dt">Char</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true"></a>    <span class="dt">T4</span><span class="ot"> ::</span> <span class="dt">Double</span> <span class="ot">-&gt;</span> <span class="dt">T</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true"></a>  <span class="kw">deriving</span> <span class="dt">Generic</span></span></code></pre></div>
<p>generic equality still optimises well, and doesn't have any traces of <code>GHC.Generics</code>. We may actually need to (and may be adviced to) tune some GHC optimisation parameters. But we need a way to check whether they are enough. <code>inspection-testing</code> doesn't help, but a proper version of core inspection would be perfect for that task.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/phases.html#ghc-flag--fwrite-if-simplified-core">-fwrite-if-simplified-core</a> enables us to automate inspection of actual Core. That is a huge win. The <code>cabal-core-inspection</code> is just a proof-of-concept, and I might try to make it into a real thing, but right now I don't have a real use case for it.</p>
<p>I'm also worried about <code>Note [Interface File with Core: Sharing RHSs]</code> in GHC. It says</p>
<blockquote>
<p>In order to avoid duplicating definitions for bindings which already have unfoldings we do some minor headstands to avoid serialising the RHS of a definition if it has *any* unfolding.</p>

<ul>
<li>Only global things have unfoldings, because local things have had their unfoldings stripped.</li>
<li>For any global thing which has an unstable unfolding, we just use that.</li>
</ul>
</blockquote>
<p>Currently this optimisation is disabled, so <code>cabal-core-inspection</code> works, but if it's enabled as is; then <code>INLINE</code>d bindings won't have their simplified unfoldings preserved (but rather only "inline-RHS"), and that would destroy Core inspection possibility.</p>
<p>But until then, <a href="https://github.com/phadej/cabal-extras/tree/master/cabal-core-inspection"><code>cabal-core-inspection</code></a> idea works.</p>
]]></summary>
</entry>
<entry>
    <title>Implicit arguments</title>
    <link href="https://oleg.fi/gists/posts/2024-04-01-implicit-arguments.html" />
    <id>https://oleg.fi/gists/posts/2024-04-01-implicit-arguments.html</id>
    <published>2024-04-01T00:00:00Z</published>
    <updated>2024-04-01T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-04-01
    
        by Oleg Grenrus
    

    
</div>

<p>In programming languages with sophisticated type systems we easily run into inconvenience of providing many (often type) arguments explicitly. Let's take a simple <code>map</code> function as an example:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="fu">map</span><span class="ot"> ::</span> <span class="kw">forall</span> a b<span class="op">.</span> (a <span class="ot">-&gt;</span> b) <span class="ot">-&gt;</span> <span class="dt">List</span> a <span class="ot">-&gt;</span> <span class="dt">List</span> b</span></code></pre></div>
<p>If we had to always explicitly provide <code>map</code>'s arguments, write something like</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a>ys <span class="ot">=</span> <span class="fu">map</span> <span class="op">@</span><span class="dt">Char</span> <span class="op">@</span><span class="dt">Char</span> <span class="fu">toLower</span> xs</span></code></pre></div>
<p>we would immediately give up on types, and switch to use some dynamically typed programming language. It wouldn't be fun to state "the obvious" all the time.</p>
<p>Fortunately we know a way (<a href="https://en.wikipedia.org/wiki/Unification_(computer_science)">unification</a>) which can be used to <em>infer</em> many such argument. Therefore we can write</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a>ys <span class="ot">=</span> <span class="fu">map</span> <span class="fu">toLower</span> xs</span></code></pre></div>
<p>and the type arguments will be inferred by compiler. However we usually are able to be explicit if we want or need to be, e.g. with <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/type_applications.html"><code>TypeApplications</code></a> in GHC Haskell.</p>
<h2 id="beyond-hindley-milner">Beyond Hindley-Milner</h2>
<p>Conor McBride calls a following phenomenon <a href="https://stackoverflow.com/a/13241158">"Milner's Coincidence"</a>:</p>
<blockquote>
<p>The Hindley-Milner type system achieves the truly awesome coincidence of four distinct distinctions</p>

<ul>
<li>terms vs types</li>
<li>explicitly written things vs implicitly written things</li>
<li>presence at run-time vs erasure before run-time</li>
<li>non-dependent abstraction vs dependent quantification</li>
</ul>

<p>We’re used to writing terms and leaving types to be inferred. . . and then erased. We’re used
to quantifying over type variables with the corresponding type abstraction and application happening silently and statically.</p>
</blockquote>
<p>GHC Haskell type-system has been long far more expressive than vanilla Hindley-Milner, and the four distrinctions are already misaligned.</p>
<p>GHC developers are filling the cracks: For example we'll soon <a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> get a <code>forall a -&gt;</code> (with an arrow, not a dot) quantifier, which is <em>erased</em> (irrelevant), <em>explicit</em> (visible) dependent quantification. Later we'll get <code>foreach a.</code> and <code>foreach a -&gt;</code> which are retained (i.e. not-erased, relevant) implicit/explicit dependent quantification.</p>
<p>(Agda also has "different" quantifiers: explicit <code>(x : A) -&gt; ...</code> and implicit <code>{y : B} -&gt; ...</code> dependent quantifiers, and <a href="https://agda.readthedocs.io/en/v2.6.4.3/language/runtime-irrelevance.html">erased variants</a> look like <code>(@0 x : A) -&gt; ...</code> and <code>{@0 y : B} -&gt; ...</code>.)</p>
<p>In Haskell, if we have a term with implicit quantifier (<code>foo :: forall a. ...</code>), we can use <code>TypeApplications</code> syntax to apply the argument explicitly:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a>bar <span class="ot">=</span> foo <span class="op">@</span><span class="dt">Int</span></span></code></pre></div>
<p>If the quantifier is explicit, we'll (eventually) write just</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a>bar <span class="ot">=</span> foo <span class="dt">Int</span></span></code></pre></div>
<p>or</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a>bar <span class="ot">=</span> foo (<span class="kw">type</span> <span class="dt">Int</span>)</span></code></pre></div>
<p>for now.</p>
<h2 id="inferred-type-variables">Inferred type variables</h2>
<p>That all is great, but consider we define a kind-polymorphic<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> type like</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="kw">type</span> <span class="dt">ProxyE</span><span class="ot"> ::</span> <span class="kw">forall</span> k<span class="op">.</span> k <span class="ot">-&gt;</span> <span class="dt">Type</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">ProxyE</span> a <span class="ot">=</span> <span class="dt">MkProxyE</span></span></code></pre></div>
<p>then when used at type level, <code>forall</code> behaves as previously, constructors</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span>kind <span class="dt">ProxyE</span> <span class="dt">Int</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a><span class="dt">ProxyE</span> <span class="dt">Int</span><span class="ot"> ::</span> <span class="dt">Type</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span>kind <span class="dt">ProxyE</span> <span class="op">@</span><span class="dt">Type</span> <span class="dt">Int</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true"></a><span class="dt">ProxyE</span> <span class="op">@</span><span class="dt">Type</span> <span class="dt">Int</span><span class="ot"> ::</span> <span class="dt">Type</span></span></code></pre></div>
<p>The type of constructor <code>MkProxyE</code> is</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span><span class="kw">type</span> <span class="dt">ProxyE</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true"></a><span class="dt">ProxyE</span><span class="ot"> ::</span> <span class="kw">forall</span> k (<span class="ot">a ::</span> k)<span class="op">.</span> <span class="dt">ProxyE</span> <span class="op">@</span>k a</span></code></pre></div>
<p>So if we want to create a term of type <code>Proxy Int</code>, we need to provide both <code>k</code> and <code>a</code> arguments:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span><span class="kw">type</span> <span class="dt">ProxyE</span> <span class="op">@</span><span class="dt">Type</span> <span class="op">@</span><span class="dt">Int</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a><span class="dt">ProxyE</span> <span class="op">@</span><span class="dt">Type</span> <span class="op">@</span><span class="dt">Int</span><span class="ot"> ::</span> <span class="dt">ProxyE</span> <span class="op">@</span>(<span class="dt">Type</span>) <span class="dt">Int</span></span></code></pre></div>
<p>we could also jump over <code>k</code>:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span><span class="kw">type</span> <span class="dt">MkProxyE</span> <span class="op">@</span>_ <span class="op">@</span><span class="dt">Int</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true"></a><span class="dt">MkProxyE</span> <span class="op">@</span>_ <span class="op">@</span><span class="dt">Int</span><span class="ot"> ::</span> <span class="dt">ProxyE</span> <span class="op">@</span>(<span class="op">*</span>) <span class="dt">Int</span></span></code></pre></div>
<p>The above skipping over arguments is not convenient, luckily GHC has a feature, created for other needs, which we can (ab)use here. There are <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/type_applications.html#inferred-vs-specified-type-variables"><em>inferred</em> variables</a> (though the better name would be "very hidden"), these are arguments for which <code>TypeApplication</code> doesn't apply:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true"></a><span class="kw">type</span> <span class="dt">Proxy</span><span class="ot"> ::</span> <span class="kw">forall</span> {k}<span class="op">.</span> k <span class="ot">-&gt;</span> <span class="dt">Type</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">Proxy</span> a <span class="ot">=</span> <span class="dt">MkProxy</span></span></code></pre></div>
<p>This is the way <code>Proxy</code> is defined in <code>base</code> (but I renamed the constructor to avoid name ambiguity)</p>
<p>And while GHCi prints</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span><span class="kw">type</span> <span class="dt">MkProxy</span> <span class="op">@</span><span class="dt">Int</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true"></a><span class="dt">MkProxy</span> <span class="op">@</span><span class="dt">Int</span><span class="ot"> ::</span> <span class="dt">Proxy</span> <span class="op">@</span>{<span class="dt">Type</span>} <span class="dt">Int</span></span></code></pre></div>
<p>the <code>@{A}</code> syntax is not valid Haskell, so we cannot explicitly apply inferred variables. Neither we can in types:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Proxy</span> <span class="op">@</span>{<span class="dt">Type</span>}</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true"></a></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true"></a><span class="op">&lt;</span>interactive<span class="op">&gt;:</span><span class="dv">1</span><span class="op">:</span><span class="dv">10</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span> parse <span class="fu">error</span> on input ‘<span class="dt">Type</span>’</span></code></pre></div>
<p>I think this is plainly wrong, we should be able to apply these "inferred" arguments too.</p>
<p>The counterargument is that, <em>inferred</em> variables weren't meant to be "more implicit" variables. As GHC manual explains, inferred variables are a solution to <code>TypeApplications</code> with inferred types. We need to know the order of variables to be able to apply them; but especially in presence of type-class constraints the order is arbitrary.</p>
<p>I'm not convinced, I think that ability to be fully explicit is way more important than a chance to write brittle code.</p>
<p>One solution, which I think would work, is simply to not generalise. This is controversial proposal, but as GHC Haskell is moving towards having fancier type system, something needs to be sacrificed. (<a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/let_generalisation.html#extension-MonoLocalBinds"><code>MonoLocalBinds</code></a> is for local bindings, but I'd argue that should be for all bindings, not only local).</p>
<p>The challenge has been that library writes may not been aware of <code>TypeApplications</code>, but today they have no choice. Changing from <code>foo :: forall a b. ...</code> to <code>foo :: forall b a. ...</code> may break some code (even though PVP doesn't explicitly write that down, that should be common sense).</p>
<p>So in the GHC manual example</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true"></a><span class="ot">f ::</span> (<span class="dt">Eq</span> b, <span class="dt">Eq</span> a) <span class="ot">=&gt;</span> a <span class="ot">-&gt;</span> b <span class="ot">-&gt;</span> <span class="dt">Bool</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true"></a>f x y <span class="ot">=</span> (x <span class="op">==</span> x) <span class="op">&amp;&amp;</span> (y <span class="op">==</span> y)</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true"></a></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true"></a>g x y <span class="ot">=</span> (x <span class="op">==</span> x) <span class="op">&amp;&amp;</span> (y <span class="op">==</span> y)</span></code></pre></div>
<p>the <code>g</code> would fail to type-check because there are unsolved type-variables. One way to think about this is that GHC would refuse to pick an order of variables. GHC could still generalise if there are no dictionary arguments, but on the other hand I don't think it would help much. It might help more if GHC wouldn't specialise as much, then</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true"></a>h <span class="ot">=</span> f</span></code></pre></div>
<p>would type-check.</p>
<p>This might sound like we would need to write much many type signatures. I don't think that is true: it's already a best practice to write type signatures for type level bindings, and for local bindings we would mostly need to give signatures to function bindings.</p>
<p>This proposal subsumes <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/monomorphism.html#extension-NoMonomorphismRestriction">monomorphism restriction</a>, recall that without type defaulting:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true"></a><span class="co">-- turn off defaulting</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true"></a>default ()</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true"></a>fooLen <span class="ot">=</span> genericLength <span class="st">&quot;foo&quot;</span></span></code></pre></div>
<p>will fail to compile with</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true"></a><span class="dt">Ambiguous</span> <span class="kw">type</span> variable ‘i0’ arising from a use <span class="kw">of</span> ‘genericLength’</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true"></a>prevents the constraint ‘(<span class="dt">Num</span> i0)’ from being solved<span class="op">.</span></span></code></pre></div>
<p>error. With <code>NoMonomophismRestriction</code> we have</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true"></a>ghci<span class="op">&gt;</span> <span class="op">:</span>t fooLen</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true"></a><span class="ot">fooLen ::</span> <span class="dt">Num</span> i <span class="ot">=&gt;</span> i</span></code></pre></div>
<p>Another, a lot simpler option, is to simply remember whether the symbols' type was inferred, and issue a warning if <code>TypeApplications</code> is used with such symbol in application head. So if user writes</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true"></a><span class="op">...</span> (g <span class="op">@</span><span class="dt">Int</span> <span class="op">@</span><span class="dt">Char</span> <span class="op">...</span>)</span></code></pre></div>
<p>GHC would warn that <code>g</code> has inferred type, and the <code>TypeApplications</code> with <code>g</code> are brittle. The solution is to give <code>g</code> a type signature. This warning could be issued early in a pipeline (maybe already in renamer), so it would explain further (possibly cryptic) type errors.</p>
<p>Let me summarise the above: If we could apply inferred variables, i.e. use curly brace application syntax, we would have complete explicit <code>forall a -&gt;</code>, implicit <code>forall a.</code> and <em>more implicit</em> <code>forall {a}.</code> dependent quantifiers. Currently the <code>forall {a}.</code> quantifier is incomplete: we can abstract, but we cannot apply. We'll also need some alternative solution to <code>TypeApplicaitons</code> and inferred types. We should be able to bind these variables explicitly in lambda abstractions as well: <code>\ a -&gt;</code>, <code>\ @a -&gt;</code> and <code>\ @{a} -&gt;</code> respectively (see <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/type_abstractions.html?highlight=typeabstractions#type-abstractions-in-functions"><code>TypeAbstractions</code></a>).</p>
<h2 id="alternatives">Alternatives</h2>
<p>The three level explicit/implicit/impliciter arguments may feel complicated. Doesn't other languages have similar problems, how they solve them?</p>
<p>As far as I'm aware Agda and Coq resolve this problem by supporting applying implicit arguments <em>by name</em>:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode agda"><code class="sourceCode agda"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true"></a><span class="co">-- using indices instead of parameters,</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true"></a><span class="co">-- to make constructor behave as in Haskell</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true"></a><span class="kw">data</span> Proxy <span class="ot">:</span> <span class="ot">{</span>k <span class="ot">:</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">(</span>a <span class="ot">:</span> k<span class="ot">)</span> <span class="ot">-&gt;</span> <span class="dt">Set1</span> <span class="kw">where</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true"></a>  MkProxy <span class="ot">:</span> <span class="ot">{</span>k <span class="ot">:</span> <span class="dt">Set</span><span class="ot">}</span> <span class="ot">{</span>a <span class="ot">:</span> k<span class="ot">}</span> <span class="ot">-&gt;</span> Proxy a</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true"></a></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true"></a>t <span class="ot">=</span> MkProxy <span class="ot">{</span>a <span class="ot">=</span> true<span class="ot">}</span></span></code></pre></div>
<p>Just adding named arguments to Haskell would be a bad move. It would add another way where a subtle and well-meaning change in the library could break downstream. For example unifying the naming scheme of type-variables in the libraries, so they are always <code>Map k v</code> and not <code>Map k a</code> sometimes, as it is in <code>containers</code> which uses both variable namings.</p>
<p>We could require library authors to explicitly declare that bindings in a module can be applied by name (i.e. that they have thought about the names, and recognise that changing them will be breaking change). You would still be able to always explicitly apply implicit arguments, but sometimes you won't be able to use more convenient named syntax.</p>
<p>It is fair to require library authors to make adjustments so that (numerous) library users would be able to use a new language feature with that library. In a healthy ecosystem that shouldn't be a problem. Specifically it is extra fair, if the alternative is to make feature less great, as then people might not use it at all.</p>
<h2 id="infinite-level-of-implicitness">Infinite level of implicitness</h2>
<p>Another idea is to embrace implicit, more implicit and even more implicit arguments. Agda has two levels: explicit and implicit, GHC Haskell has two and a half, why stop there?</p>
<p>If we could start fresh, we could pick Agda's function application syntax and have</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true"></a>funE arg    <span class="co">-- explicit application</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true"></a>funI {arg}  <span class="co">-- explicit application of implicit argument</span></span></code></pre></div>
<p>but additionally we could add</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true"></a>funJ {{arg}}    <span class="co">-- explicit application of implicit² argument</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true"></a>funK {{{arg}}}  <span class="co">-- explicit application of implicit³ argument</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true"></a><span class="op">...</span>             <span class="co">-- and so on</span></span></code></pre></div>
<p>With unlimited levels of implicitness we could define <code>Proxy</code> as</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true"></a><span class="kw">type</span> <span class="dt">Proxy</span><span class="ot"> ::</span> <span class="kw">forall</span> {k} <span class="ot">-&gt;</span> k <span class="ot">-&gt;</span> <span class="dt">Type</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true"></a><span class="kw">data</span> <span class="dt">Proxy</span> a <span class="kw">where</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true"></a>    <span class="dt">MkProxy</span><span class="ot"> ::</span> <span class="kw">forall</span> {{k}} <span class="ot">-&gt;</span> {<span class="ot">a ::</span> k} <span class="ot">-&gt;</span> <span class="dt">Proxy</span> a</span></code></pre></div>
<p>and use it as <code>MkProxy</code>, <code>MkProxy {Int}</code> or <code>MkProxy {{Type}} {Int} :: Proxy Int</code>. Unlimited possibilities.</p>
<p>For what it is worth, the implementation should be even simpler than of named arguments.</p>
<p>But I'd be quite happy already if GHC Haskell had a way to explicitly apply any function arguments, be it three levels (ordinary, <code>@arg</code> and <code>@{arg}</code>) of explicitness, many or just two; and figured another way to tackle <code>TypeApplications</code> with inferred types.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p><a href="https://www.haskell.org/ghc/blog/20240313-ghc-9.10.1-alpha1-released.html">GHC-9.10.1 release notes (for alpha1)</a> mention "Partial implementation of the <a href="https://github.com/ghc-proposals/ghc-proposals/pull/281">GHC Proposal #281</a>, allowing visible quantification to be used in the types of terms."<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>kind is type of types.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
]]></summary>
</entry>
<entry>
    <title>ST with an early exit</title>
    <link href="https://oleg.fi/gists/posts/2024-03-17-st-with-early-exit.html" />
    <id>https://oleg.fi/gists/posts/2024-03-17-st-with-early-exit.html</id>
    <published>2024-03-17T00:00:00Z</published>
    <updated>2024-03-17T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-03-17
    
        by Oleg Grenrus
    

    
</div>

<h2 id="implementation">Implementation</h2>
<p>I wish there were an early exit functionality in the <code>ST</code> monad. This need comes time to time when writing imperative algorithms in Haskell.</p>
<p>It's very likely there is a functional version of an algorithm, but it might be that <code>ST</code>-version is just simply faster, e.g. by avoiding allocations (as allocating even short lived garbage is not free).</p>
<p>But there are no early exit in the <code>ST</code> monad.</p>
<p>Recent GHC added <a href="https://hackage.haskell.org/package/ghc-prim-0.11.0/docs/GHC-Prim.html#g:24"><em>delimited continuations</em></a>. The TL;DR is that delimited continuations is somewhat like <code>goto</code>:</p>
<ul>
<li><code>newPromptTag#</code> creates a label (tag)</li>
<li><code>prompt#</code> brackets the computation</li>
<li><code>control#</code> kind of jumps (goes to) the end of enclosing prompt bracket, and continues from there.</li>
</ul>
<p>So let's use this functionality to implement a version of <code>ST</code> which has an early exit. It turns out to be quite simple.</p>
<p>The <code>ST</code> monad is define like:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="kw">newtype</span> <span class="dt">ST</span> s a <span class="ot">=</span> <span class="dt">ST</span> (<span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, a <span class="op">#</span>)</span></code></pre></div>
<p>and we change it by adding an additional prompt tag argument:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a><span class="kw">newtype</span> <span class="dt">EST</span> e s a <span class="ot">=</span> <span class="dt">EST</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a>    {<span class="ot"> unEST ::</span> <span class="kw">forall</span> r<span class="op">.</span> <span class="dt">PromptTag</span><span class="op">#</span> (<span class="dt">Either</span> e r)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a>            <span class="ot">-&gt;</span> <span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, a <span class="op">#</span>) </span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>    }</span></code></pre></div>
<p>(Why <code>forall r.</code>? We'll see soon).</p>
<p>It's easy to lift normal <code>ST</code> computations into <code>EST</code> ones:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a><span class="ot">liftST ::</span> <span class="dt">ST</span> s a <span class="ot">-&gt;</span> <span class="dt">EST</span> e s a</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a>liftST (<span class="dt">ST</span> f) <span class="ot">=</span> <span class="dt">EST</span> (\_ <span class="ot">-&gt;</span> f)</span></code></pre></div>
<p>so <code>EST</code> is a generalisation of <code>ST</code>, good.</p>
<p>Now we need a way to run <code>EST</code> computations, and also a way to early exit in them.</p>
<p>The early exit is the simpler one. Given that tag prompt brackets the whole computation, we simply jump to the end with <code>Left e</code>. We ignore the captured continuation, we have no use for it.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="ot">earlyExitEST ::</span> e <span class="ot">-&gt;</span> <span class="dt">EST</span> e s <span class="fu">any</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>earlyExitEST e <span class="ot">=</span> <span class="dt">EST</span> (\tag <span class="ot">-&gt;</span> control0<span class="op">##</span> tag (\_k s <span class="ot">-&gt;</span> (<span class="op">#</span> s, <span class="dt">Left</span> e <span class="op">#</span>)))</span></code></pre></div>
<p>Now, the job for <code>runEST</code> is to create the tag and prompt the computation:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="ot">runEST ::</span> <span class="kw">forall</span> e a<span class="op">.</span> (<span class="kw">forall</span> s<span class="op">.</span> <span class="dt">EST</span> e s a) <span class="ot">-&gt;</span> <span class="dt">Either</span> e a</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a>runEST (<span class="dt">EST</span> f) <span class="ot">=</span> runRW<span class="op">#</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a>    <span class="co">-- create tag</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true"></a>    (\s0 <span class="ot">-&gt;</span> <span class="kw">case</span> newPromptTag<span class="op">#</span> s0 <span class="kw">of</span> {</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true"></a>    <span class="co">-- prompt</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true"></a>    (<span class="op">#</span> s1, tag <span class="op">#</span>) <span class="ot">-&gt;</span> <span class="kw">case</span> prompt<span class="op">#</span> tag</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true"></a>         <span class="co">-- run the `f` inside prompt,</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true"></a>         <span class="co">-- and once we get to the end return `Right` value</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true"></a>         (\s2 <span class="ot">-&gt;</span> <span class="kw">case</span> f tag s2 <span class="kw">of</span> (<span class="op">#</span> s3, a <span class="op">#</span>) <span class="ot">-&gt;</span> (<span class="op">#</span> s3, <span class="dt">Right</span> a <span class="op">#</span>)) s1 <span class="kw">of</span> {</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true"></a>    (<span class="op">#</span> _, a <span class="op">#</span>) <span class="ot">-&gt;</span> a }})</span></code></pre></div>
<p><code>runRW#</code> and forgetting the state at the end is the same as in <code>runST</code>, for comparison:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="ot">runST ::</span> (<span class="kw">forall</span> s<span class="op">.</span> <span class="dt">ST</span> s a) <span class="ot">-&gt;</span> a</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>runST (<span class="dt">ST</span> st_rep) <span class="ot">=</span> <span class="kw">case</span> runRW<span class="op">#</span> st_rep <span class="kw">of</span> (<span class="op">#</span> _, a <span class="op">#</span>) <span class="ot">-&gt;</span> a</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a><span class="co">-- See Note [runRW magic] in GHC.CoreToStg.Prep</span></span></code></pre></div>
<p>With all the pieces in place, we can run few simple examples:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="co">-- | &gt;&gt;&gt; ex1</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a><span class="co">-- Left &#39;x&#39;</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true"></a><span class="ot">ex1 ::</span> <span class="dt">Either</span> <span class="dt">Char</span> <span class="dt">Bool</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true"></a>ex1 <span class="ot">=</span> runEST <span class="op">$</span> earlyExitEST <span class="ch">&#39;x&#39;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true"></a><span class="co">-- | &gt;&gt;&gt; ex2</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true"></a><span class="co">-- Right True</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true"></a><span class="ot">ex2 ::</span> <span class="dt">Either</span> <span class="dt">Char</span> <span class="dt">Bool</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true"></a>ex2 <span class="ot">=</span> runEST (<span class="fu">return</span> <span class="dt">True</span>)</span></code></pre></div>
<h2 id="comments--wrinkles">Comments &amp; wrinkles</h2>
<p>Early exit is one of the simplest "effect" you can implement with delimited continuations. This is the throwing part of the exceptions, with only top-level exception handler. It's a nice exercise (and a brain twister) to implement catch blocks.</p>
<p>One wrinkle in this implementation is the <code>control0##</code> (not <code>control0#</code>) function I used. The delimited continuations primops are made to work only with <code>RealWorld</code>, not arbitrary <code>State#</code> tokens.</p>
<p>I think this is unnecessary specialization <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/24165">GHC issue #24165</a>, I was advice to simply use <code>unsafeIOToST</code>, so I did:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>control0<span class="op">##</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a><span class="ot">    ::</span> <span class="dt">PromptTag</span><span class="op">#</span> a</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true"></a>    <span class="ot">-&gt;</span> (((<span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, b <span class="op">#</span>)) <span class="ot">-&gt;</span> <span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, a <span class="op">#</span>))</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true"></a>                                         <span class="ot">-&gt;</span> <span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, a <span class="op">#</span>))</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true"></a>    <span class="ot">-&gt;</span> <span class="dt">State</span><span class="op">#</span> s <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> s, b <span class="op">#</span>)</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true"></a>control0<span class="op">##</span> <span class="ot">=</span> unsafeCoerce<span class="op">#</span> control0<span class="op">#</span></span></code></pre></div>
<p>This still feels silly, especially realizing that <a href="https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0313-delimited-continuation-primops.rst#examples">the (only) example in the delimited continuations proposal</a> goes like</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true"></a><span class="kw">type</span> role <span class="dt">CC</span> nominal representational</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true"></a><span class="kw">newtype</span> <span class="dt">CC</span> ans a <span class="ot">=</span> <span class="dt">CC</span> (<span class="dt">State</span><span class="op">#</span> <span class="dt">RealWorld</span> <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> <span class="dt">RealWorld</span>, a <span class="op">#</span>))</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true"></a>  <span class="kw">deriving</span> (<span class="dt">Functor</span>, <span class="dt">Applicative</span>, <span class="dt">Monad</span>) via <span class="dt">IO</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true"></a><span class="ot">runCC ::</span> (<span class="kw">forall</span> ans<span class="op">.</span> <span class="dt">CC</span> ans a) <span class="ot">-&gt;</span> a</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true"></a>runCC (<span class="dt">CC</span> m) <span class="ot">=</span> <span class="kw">case</span> runRW<span class="op">#</span> m <span class="kw">of</span> (<span class="op">#</span> _, a <span class="op">#</span>) <span class="ot">-&gt;</span> a</span></code></pre></div>
<p>but if you look at that, it's just a <code>ST</code> monad done weirdly:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a><span class="kw">newtype</span> <span class="dt">ST</span> s a <span class="ot">=</span> <span class="dt">ST</span> (<span class="dt">State</span><span class="op">#</span> <span class="dt">RealWorld</span> <span class="ot">-&gt;</span> (<span class="op">#</span> <span class="dt">State</span><span class="op">#</span> <span class="dt">RealWorld</span>, a <span class="op">#</span>))</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a><span class="co">-- not using `s` argument !?</span></span></code></pre></div>
<p>There might be a good reason why <code>CC</code> should be done like that (other than than primops are <code>RealWorld</code> specific), but the proposal doesn't explain that difference. To me having phantom <code>ans</code> instead of using nominally it as in <code>ST</code> is suspicious.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Delimited continutations are fun and could be very useful.</p>
<p>But surprisingly, at the moment of writing I cannot find any package on Hackage using them for anything! <a href="https://hackage-search.serokell.io/?q=newPromptTag">Search for newPromptTag</a> returns only false positives (<code>ghc-lib</code> etc) right now. I wonder why they are unused?</p>
<p>Please try them out!</p>
]]></summary>
</entry>
<entry>
    <title>More QualifiedDo examples</title>
    <link href="https://oleg.fi/gists/posts/2024-02-27-more-qualified-do.html" />
    <id>https://oleg.fi/gists/posts/2024-02-27-more-qualified-do.html</id>
    <published>2024-02-27T00:00:00Z</published>
    <updated>2024-02-27T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on 2024-02-27
    
        by Oleg Grenrus
    

    
</div>

<p><em>Qualified do-notation</em>, <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/qualified_do.html"><code>QualifiedDo</code></a>, is a nice syntactical extension in GHC. Probably the best its property is that it changes semantics only locally, by using explicit "annotation": by qualifying the <code>do</code> keyword<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. This means that enabling the extension doesn't change meaning of other &amp; existing code.</p>
<p>I'll give two examples of <code>QualifiedDo</code> applications.</p>
<h2 id="first-example-complete-pattern-synonyms">First example: COMPLETE pattern synonyms</h2>
<p>GHC had long had <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/pattern_synonyms.html"><code>PatternSynonyms</code></a>. One use case for pattern synonyms is to provide backward compatibility when data type constructors change: preserving old constructor names and arguments as a compatibility pattern synonym.</p>
<p>For example, we used to have <code>data Solo = Solo a</code>. Recently the constructor was renamed to <code>MkSolo</code> to avoid name punning. To not break all the code using <code>Solo</code> constructor there compatibility pattern synonym was added:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="kw">pattern</span> <span class="dt">Solo</span><span class="ot"> ::</span> a <span class="ot">-&gt;</span> <span class="dt">Solo</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a><span class="kw">pattern</span> <span class="dt">Solo</span> x <span class="ot">=</span> <span class="dt">MkSolo</span> x</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a><span class="ot">{-# COMPLETE Solo #-}</span></span></code></pre></div>
<p>The <code>COMPLETE</code> pragma says that a pattern match using <code>Solo</code> pattern synonym is complete, so we wouldn't get incomplete pattern match warnings<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>But <code>COMPLETE</code> support is (ironically) incomplete. If we have a <code>do</code> block like</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a><span class="ot">broken ::</span> <span class="dt">Monad</span> m <span class="ot">=&gt;</span> m (<span class="dt">Solo</span> a) <span class="ot">-&gt;</span> m a</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a>broken s <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a>    <span class="dt">Solo</span> x <span class="ot">&lt;-</span> s</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>    <span class="fu">return</span> x</span></code></pre></div>
<p>the GHC will <strong>error</strong> because we don't have <code>MonadFail</code> instance (to desugar <em>incomplete</em> pattern match: <code>Could not deduce (MonadFail m)</code>, that is GHC issue <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/15681">#15681</a>). There are various workarounds, but I don't remember anyone mentioning <code>QualifiedDo</code>.</p>
<p>If we write a small helper module</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a><span class="kw">module</span> <span class="dt">M</span> ((<span class="op">&gt;&gt;=</span>), (<span class="op">&gt;&gt;</span>), <span class="fu">fail</span>) <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true"></a><span class="kw">import</span> <span class="dt">Prelude</span> ((&gt;&gt;=), (&gt;&gt;), <span class="dt">Monad</span>, <span class="dt">String</span>, error)</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true"></a><span class="kw">import</span> <span class="dt">GHC.Stack</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true"></a><span class="fu">fail</span><span class="ot"> ::</span> (<span class="dt">Monad</span> m, <span class="dt">HasCallStack</span>) <span class="ot">=&gt;</span> <span class="dt">String</span> <span class="ot">-&gt;</span> m a</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true"></a><span class="fu">fail</span> <span class="ot">=</span> <span class="fu">error</span></span></code></pre></div>
<p>we can change <code>broken</code> into something which <code>works</code>:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">M</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a><span class="ot">works ::</span> <span class="dt">Monad</span> m <span class="ot">=&gt;</span> m (<span class="dt">Solo</span> a) <span class="ot">-&gt;</span> m a</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>works s <span class="ot">=</span> M.do</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true"></a>    <span class="dt">Solo</span> x <span class="ot">&lt;-</span> s</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true"></a>    <span class="fu">return</span> x</span></code></pre></div>
<p>Now if GHC needs to <code>fail</code>, it will simply <code>error</code>.</p>
<p>I hope that it's obvious that this is a band-aid: if you are relying on <code>fail</code> doing something useful (e.g. in <code>Maybe</code>), this will obviously break your program. But as <code>QualifiedDo</code> usage is explicitly annotated it's not a spooky action at the distance. And <code>HasCallStack</code> annotation should help you find the mistakes if any happen.</p>
<h2 id="second-example-zero-overhead-effects">Second example: zero-overhead effects</h2>
<p>At work I have been (adjacently) working with the code building on top of <a href="https://hackage.haskell.org/package/io-sim"><code>io-sim</code></a>. TL;DR you write your code using (a lot of) type-classes, and then can either run your code in real <code>IO</code> (production) or in a simulator <code>IOSim</code> (for tests). But I'm getting slightly anxious thinking about having all I/O code being abstracted using type-classes making the true <code>IO</code> case potentially go slow. (This is <code>mtl</code>-like take on effect handling, but even <a href="https://hackage.haskell.org/package/effectful"><code>effectful</code></a> or something based on delimited continuations aren't zero-overhead: the overhead is there, just smaller).</p>
<p>What we truly want is a complete specialisation of effect-related type-classes, so there aren't any abstraction bits left when the use case is concrete (in <code>mtl</code> approach we can theoretically get there, but not in practice. In <code>effectful</code> or delimited-continuations a small cost is always there, but it doesn't rely that much on compiler optimising well).</p>
<p>Most likely, if your code isn't pushing both the I/O and CPU utilization at the same time, either approach will work ok. Compare that to data science done in Python: Python is a quite slow glue language, but it's combining bigger fast running "primitive" blocks. So if there is very little glue code, and the most work is done inside the abstracted primitives, the glue being tacky doesn't matter.</p>
<p>But can we do better?</p>
<p>In GHC we can do better using staging i.e. Typed Template Haskell (TTH). At first I was worried that TTH syntactic overhead will be off-putting until I remembered that <code>QualifiedDo</code> extension exists!</p>
<p>We can write code like:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">SIO</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a><span class="ot">example ::</span> <span class="dt">SIO.SIO</span> i m <span class="ot">=&gt;</span> i <span class="dt">FilePath</span> <span class="ot">-&gt;</span> m ()</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true"></a>example fn <span class="ot">=</span> SIO.do</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true"></a>  contents <span class="ot">&lt;-</span> SIO.readFile fn</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true"></a>  SIO.putStr contents</span></code></pre></div>
<p>that looks like normal Haskell. If we were forced to use <code>&gt;&gt;=</code> like operator explicitly, e.g. writing</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="ot">example&#39; ::</span> <span class="dt">SIO.SIO</span> i m <span class="ot">=&gt;</span> i <span class="dt">FilePath</span> <span class="ot">-&gt;</span> m ()</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>example&#39; fn <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a>  SIO.readFile fn <span class="op">&gt;&gt;&gt;=</span> \contents <span class="ot">-&gt;</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true"></a>  SIO.putStr contents</span></code></pre></div>
<p>it wouldn't be as nice.</p>
<p>The <code>SIO</code> type class has the part which looks almost like <code>Monad</code>, but not exactly:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="kw">class</span> <span class="dt">SIO</span> i m <span class="op">|</span> m <span class="ot">-&gt;</span> i <span class="kw">where</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a><span class="ot">  (&gt;&gt;=)    ::</span> m a <span class="ot">-&gt;</span> (i a <span class="ot">-&gt;</span> m b) <span class="ot">-&gt;</span> m b</span></code></pre></div>
<p>The "pure" values are wrapped inside type constructor <code>i</code> (for identity).</p>
<p>The <code>readFile</code> and <code>putStr</code> are also in the same type-class (could be different, doesn't really matter):</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a><span class="ot">  readFile ::</span> i <span class="dt">FilePath</span> <span class="ot">-&gt;</span> m <span class="dt">ByteString</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a><span class="ot">  putStr   ::</span> i <span class="dt">ByteString</span> <span class="ot">-&gt;</span> m ()</span></code></pre></div>
<p>We can have concrete instances, like <code>IO</code> (or actually <code>IOSim</code>) for tests:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true"></a><span class="kw">instance</span> <span class="dt">SIO</span> <span class="dt">Identity</span> <span class="dt">IO</span> <span class="kw">where</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true"></a><span class="ot">  (&gt;&gt;=) ::</span> <span class="kw">forall</span> a b<span class="op">.</span> <span class="dt">IO</span> a <span class="ot">-&gt;</span> (<span class="dt">Identity</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> b) <span class="ot">-&gt;</span> <span class="dt">IO</span> b</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true"></a>  (<span class="op">&gt;&gt;=</span>) <span class="ot">=</span> coerce (bindIO <span class="op">@</span>a <span class="op">@</span>b)</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true"></a>  <span class="fu">readFile</span> <span class="ot">=</span> coerce BS.readFile</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true"></a>  <span class="fu">putStr</span> <span class="ot">=</span> coerce BS.putStr</span></code></pre></div>
<p>But because we are liberated from the restricting shape of the <code>Monad</code> type class, we can have instance for <code>CodeQ</code> from <code>template-haskell</code>:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a><span class="kw">newtype</span> <span class="dt">CodeIO</span> a <span class="ot">=</span> <span class="dt">CodeIO</span> {<span class="ot"> unCodeIO ::</span> <span class="dt">CodeQ</span> (<span class="dt">IO</span> a) }</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true"></a><span class="kw">instance</span> <span class="dt">SIO</span> <span class="dt">CodeQ</span> <span class="dt">CodeIO</span> <span class="kw">where</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true"></a>  m <span class="op">&gt;&gt;=</span> k     <span class="ot">=</span> <span class="dt">CodeIO</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true"></a>    [<span class="op">||</span> bindIO <span class="op">$$</span>(unCodeIO m) (\x <span class="ot">-&gt;</span> <span class="op">$$</span>(unCodeIO (k [<span class="op">||</span> x <span class="op">||</span>]))) <span class="op">||</span>]</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true"></a>  <span class="fu">readFile</span> fn <span class="ot">=</span> <span class="dt">CodeIO</span> [<span class="op">||</span> BS.readFile <span class="op">$$</span>fn <span class="op">||</span>]</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true"></a>  <span class="fu">putStr</span> bs   <span class="ot">=</span> <span class="dt">CodeIO</span> [<span class="op">||</span> BS.putStr <span class="op">$$</span>bs <span class="op">||</span>]</span></code></pre></div>
<p>Then in our main production module we can splice the example in like</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true"></a><span class="ot">spliced ::</span> <span class="dt">FilePath</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true"></a>spliced fn <span class="ot">=</span> <span class="op">$$</span>(SIO.unCodeIO <span class="op">$</span> SIO.do</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true"></a>    example [<span class="op">||</span> fn <span class="op">||</span>]</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true"></a>    example [<span class="op">||</span> fn <span class="op">||</span>])</span></code></pre></div>
<p>and the generated code has no effect handling abstractions; in fact not even a <code>Monad</code>, as we used <code>thenIO</code> and <code>bindIO</code> building blocks:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true"></a>spliced fn_a3kY <span class="ot">=</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true"></a>    (GHC.Base.thenIO</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true"></a>       ((GHC.Base.bindIO (Data.ByteString.readFile fn_a3kY))</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true"></a>          (\ x_a3m2 <span class="ot">-&gt;</span> Data.ByteString.putStr x_a3m2)))</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true"></a>      ((GHC.Base.bindIO (Data.ByteString.readFile fn_a3kY))</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true"></a>         (\ x_a3m3 <span class="ot">-&gt;</span> Data.ByteString.putStr x_a3m3))</span></code></pre></div>
<p>We have a precise control (but also a responsibility) to control the inlining of building blocks (i.e. if we want <code>example</code> let-bound first and then called twice, we must do that manually: power comes with responsibility). This is either a pro or con, depending on your POV. I think this is a pro if you go this far caring about the performance. If GHC Haskell had a type-class like mechanism with full monomorphisation guarantee, we'd would still like to to control inlining.</p>
<p>You may also worry that "wont staging generate <em>a lot of code</em>". Yes it will, but so would full monomorphisation (of templates in C++ or traits in Rust). It's a behaviour we arguably want, but it's GHC which may be worried and don't do too good job. With staging we could also do modular code-generation too, making layered type-class hierarchy, generating i.e. "pre-splicing" intermediate layers (layers like in <a href="https://www.parsonsmatt.org/2018/03/22/three_layer_haskell_cake.html">three layer cake</a>).</p>
<h2 id="conclusion">Conclusion</h2>
<p><a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/qualified_do.html"><code>QualifiedDo</code></a> is a neat GHC extension. We saw two more examples of its usage, where we want something like regular <code>Monad</code> desugaring, but which doesn't fit the <code>Monad</code> type-class. I also think we could have more of <code>Qualified*</code> syntactic extensions.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>In comparison <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/applicative_do.html"><code>ApplicativeDo</code></a> applies globally. These design choices are probably not-so-intentional. For <code>QualifiedDo</code> it would require some additional setting to change <em>all</em> do statements in the source module (like <code>-fplugin</code> takes a module name). OTOH <code>ApplicativeDo</code> main motivation (using it with <a href="https://hackage.haskell.org/package/haxl"><code>haxl</code></a>) was to use it globally. But if you want to use it only in <em>some</em> <code>do</code> statement, you can't. Similarly <code>OverloadedStrings</code> applies to all string literals, and in the same way for all of them. Compare to Python which has kind of "QualifiedStrings" with string literals very differently: imagine writing <code>T."this is text"</code> but still having <code>"this is string" :: String</code>, without any type-class resolution.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>GHC doesn't try to reason about completeness through pattern synonyms: you may want to keep a pattern synonym group intentionally incomplete (so extending an otherwise abstract type with new ones isn't a breaking change), or to tell that something is complete (due to invariant you maintain, but GHC has no chance figuring out).<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
]]></summary>
</entry>

</feed>
