<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://docs.getdbt.com/blog</id>
    <title>dbt Developer Hub Blog</title>
    <updated>2026-03-26T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://docs.getdbt.com/blog"/>
    <subtitle>dbt Developer Hub Blog</subtitle>
    <entry>
        <title type="html"><![CDATA[The Catalog Linked Database Diaries: On Freshness and Writes]]></title>
        <id>https://docs.getdbt.com/blog/catalog-linked-databases</id>
        <link href="https://docs.getdbt.com/blog/catalog-linked-databases"/>
        <updated>2026-03-26T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Learn about catalog linked databases with Apache Iceberg.]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-catalog-linked-database-diaries-on-freshness-and-writes">The Catalog Linked Database Diaries: On Freshness and Writes<a href="https://docs.getdbt.com/blog/catalog-linked-databases#the-catalog-linked-database-diaries-on-freshness-and-writes" class="hash-link" aria-label="Direct link to The Catalog Linked Database Diaries: On Freshness and Writes" title="Direct link to The Catalog Linked Database Diaries: On Freshness and Writes">​</a></h2>
<p>Last November, at dbt Summit, Jeremy introduced dbt’s multi-platform Iceberg capabilities.</p>
<div style="display:flex;justify-content:center"><iframe width="560" height="315" src="https://www.youtube.com/embed/bRJJkeJkUsE?si=mQTD3jUNpPqvwrJb" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin"></iframe></div>
<p>What intrigued us most was the promised interconnectivity of Databricks Unity Catalog and Snowflake catalog-linked databases.</p>
<p>AI’s all the rage, but another little revolution is taking shape: Teams are breaking their data storage out of vendor-specific platforms. For months, we have been chatting with users excited to adopt Iceberg as a core pillar of their data architecture. The Iceberg table format and Iceberg REST catalogs are the emerging standards powering that flexibility.</p>
<p>For dbt’s part, this shows up in two concrete use cases:</p>
<ul>
<li><strong>dbt projects at scale</strong>: Teams share one logical database, with many schemas and hundreds to thousands of tables</li>
<li><strong>Cross-platform mesh</strong>: One project in Snowflake, one in Databricks, sharing data without juggling manual refreshes or metadata pointers</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-is-a-catalog-linked-database">What is a Catalog Linked Database?<a href="https://docs.getdbt.com/blog/catalog-linked-databases#what-is-a-catalog-linked-database" class="hash-link" aria-label="Direct link to What is a Catalog Linked Database?" title="Direct link to What is a Catalog Linked Database?">​</a></h2>
<p>In the Iceberg model, the catalog is the system of record for table metadata—schemas, snapshots, and evolution. It’s designed so multiple engines can interoperate against that same metadata layer. A catalog-linked database (CLD) is Snowflake’s way of exposing an open Apache Iceberg catalog, and all the Iceberg tables it contains, inside Snowflake as just another database.</p>
<p>The dream is for teams to share Iceberg tables across platforms without recreating their metadata pointers one by one or copying actual data. Consider an organization with finance using figures in Databricks and a marketing team using Snowflake. The Snowflake team wants that upstream data fresh and even wants to backport manual row updates now and then. Old-school synced tables would treat Databricks as a source of truth. With Iceberg and a CLD-enabled architecture, both data platforms point to the same catalog-defined source of truth.</p>
<p>Some upfront configuration work in Snowflake buys you seamless cross-platform queries on seamlessly synced data objects—that’s the on-paper guarantee. (Snowflake recently published a <a href="https://docs.snowflake.com/en/user-guide/tutorials/tables-iceberg-set-up-bidirectional-access-to-unity-catalog" target="_blank" rel="noopener noreferrer">step-by-step tutorial</a> for using CLDs to enable bidirectional data sharing with Databricks — unimaginable a few years ago, and achievable today.)</p>
<p>As we developed our testing suite, we wondered what happens at scale for both reads and writes. Turn up the chaos: what happens when both are happening at once? For example, if the accounting team writes into a Databricks cluster at 6 a.m. every morning, but the synchronization step to the marketing team’s Snowflake cluster takes 2-3 hours, when will it be safe for their morning data analysis jobs to kick off?</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="our-testing-regimen">Our testing regimen<a href="https://docs.getdbt.com/blog/catalog-linked-databases#our-testing-regimen" class="hash-link" aria-label="Direct link to Our testing regimen" title="Direct link to Our testing regimen">​</a></h2>
<p>Based on our telemetry of real-world dbt projects, we see that large projects number in the hundreds of models. Some number in the thousands. For the purposes of our testing, we make the riskiest assumption that <em>every single model</em> would be materialized as an Iceberg table. This is our upper bound. (It’s rare behavior in dbt projects adopting Iceberg, but a team could have legitimate reasons for choosing this.)</p>
<p>At these scales, catalog behavior, metadata operations, and refresh mechanics really start to matter. We observed latencies and frictions in the hundreds, but for science, we pushed this to its extreme. We loaded 500k tables into one database and tested write performance, synchronization promises, etc.</p>
<ol>
<li><strong>Reading at scale:</strong> What’s the overhead when Snowflake reads tables owned elsewhere?</li>
<li><strong>Writing at scale:</strong> How does performance change when you’re creating/updating lots of tables and querying big sums of metadata?</li>
<li><strong>Freshness under change:</strong> When one platform updates data, how reliably and quickly does the other see it?</li>
</ol>
<p>We've published our full testing regimen and detailed findings, for anyone who wants to take a deeper look: <a href="https://github.com/dbt-labs/snow-dbx-iceberg-benchmark" target="_blank" rel="noopener noreferrer">https://github.com/dbt-labs/snow-dbx-iceberg-benchmark</a></p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="reads-at-scale">Reads at scale: Good performance, but only after platforms sync<a href="https://docs.getdbt.com/blog/catalog-linked-databases#reads-at-scale" class="hash-link" aria-label="Direct link to Reads at scale: Good performance, but only after platforms sync" title="Direct link to Reads at scale: Good performance, but only after platforms sync">​</a></h2>
<p>Using TPC-H queries over large benchmarking datasets, we found that once data is visible and up to date, querying those Iceberg tables from Snowflake is as fast as you’d want in any reasonable analytics workflow. Databricks querying the same data from the owning side is speedy too.</p>
<p>The catch is that “read performance” is really only half the story. In practice, what users experience is not “how fast is this query,” but “am I even querying fresh data?” When freshness slips, CLDs stop feeling like a pipe and start feeling like waiting for a package held up in customs.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="writes-and-changes-at-scale">Writes and change at scale: The compute bottleneck<a href="https://docs.getdbt.com/blog/catalog-linked-databases#writes-and-changes-at-scale" class="hash-link" aria-label="Direct link to Writes and change at scale: The compute bottleneck" title="Direct link to Writes and change at scale: The compute bottleneck">​</a></h2>
<p>When Snowflake is the one making lots of changes (creating tables, updating metadata, producing many Iceberg commits), the job runs the query against the upstream Databricks objects. A query might take twice as long, but the data is synchronized across both platforms. Write throughput becomes the limiting factor. In a dbt-shaped workload—many small/medium table operations rather than one giant append—this can make runs slow and sometimes fragile under contention. This can lead to outright failures claiming your table no longer exists.</p>
<p>Now, when Databricks is making changes, writes are the same as in any ordinary Databricks workflow. The difficulty becomes Snowflake's ability to reflect those changes.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-biggest-finding">The biggest finding: As scale increases, refresh latency does too<a href="https://docs.getdbt.com/blog/catalog-linked-databases#the-biggest-finding" class="hash-link" aria-label="Direct link to The biggest finding: As scale increases, refresh latency does too" title="Direct link to The biggest finding: As scale increases, refresh latency does too">​</a></h2>
<p>CLDs promise fast syncing. We found this is relatively true at small and medium scales. However, at larger scales, we observed that changes made by Databricks could take far longer than advertised on the tin to synchronize. Generally, we experienced auto-refresh waits 2x longer than expected. When we dialed things up to 500k tables, the refresh on Snowflake for a trivial Databricks <code>INSERT</code> could take two days to propagate. Some tables seemed to get “stuck.” Now, we eventually learned how to manually force refreshes for individual objects (i.e. hacking refresh-related settings to jog the system). But, we found it difficult to predict when data updates would propagate from Databricks back to Snowflake (the good news is that we hear the fine folks at Snowflake have ergonomic improvements on the way). We mostly operated on a gut feeling of when data would arrive.</p>
<p>Mulling over our experiences, we believe the question of whether and how you should adopt Snowflake CLDs comes down to scale and latency:</p>
<ol>
<li>How many Iceberg tables are you syncing across multiple engines?</li>
<li>Do your workflows require that Snowflake has a near-real-time view of externally managed Iceberg? Or can you treat it as a view that might be stale, accept eventual consistency, and live without clear guarantees unless you build your own manual playbook and monitoring framework?</li>
</ol>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="interoperability-friction">Interoperability friction: Why it’s not just the metadata<a href="https://docs.getdbt.com/blog/catalog-linked-databases#interoperability-friction" class="hash-link" aria-label="Direct link to Interoperability friction: Why it’s not just the metadata" title="Direct link to Interoperability friction: Why it’s not just the metadata">​</a></h2>
<p>Two non-performance issues showed up quickly:</p>
<ul>
<li>Naming, quoting, and casing differences become friction points when dbt is generating objects that need to be understood identically by two engines. Our deep dive has given us ideas for dbt to abstract over these ergonomic challenges. In the future, users shouldn’t need to memorize the casing/quoting rules of every catalog/engine combo. For now, unfortunately, that’s just the cost of doing platform-agnostic business.</li>
<li>Metadata and refresh behavior become part of your job. You’re managing tables and the system that decides when tables “exist.” And those <code>show iceberg tables</code> queries are slow.</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-takeaway">The takeaway<a href="https://docs.getdbt.com/blog/catalog-linked-databases#the-takeaway" class="hash-link" aria-label="Direct link to The takeaway" title="Direct link to The takeaway">​</a></h2>
<p>CLDs work—up to a point. They solve recurring problems about keeping data connected across platforms. If the number of tables or update volume is very large (on the order of tens of thousands), the pattern stops being a useful abstraction. The same goes if you depend on per-second precision for synchronizing writes. But until you approach that edge, CLDs really do make it possible to treat external Iceberg catalogs like any other database.</p>
<p>For us, that can unlock some very exciting capabilities within customers’ dbt workflows—cross-platform mesh, external sources, and maybe even running the same dbt project / DAG against multiple warehouses. We believe that Iceberg integrations will continue to improve, becoming more performant and easier to use. We need only look to the past year of features (including Snowflake CLDs and Databricks' native managed Iceberg tables, the two features that made this story possible) to be excited for what’s coming in the next one.</p>
<p>And finally, we can’t close without giving a nod to the Unity Catalog team for partnering with Snowflake on this killer feature.</p>]]></content>
        <author>
            <name>Anna Lee</name>
        </author>
        <author>
            <name>Mila Page</name>
        </author>
        <category label="iceberg" term="iceberg"/>
        <category label="catalogs" term="catalogs"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Make your AI better at data work with dbt's agent skills]]></title>
        <id>https://docs.getdbt.com/blog/dbt-agent-skills</id>
        <link href="https://docs.getdbt.com/blog/dbt-agent-skills"/>
        <updated>2026-02-05T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[We built a collection of Agent Skills to make coding agents better at using dbt projects and doing analytics work.]]></summary>
        <content type="html"><![CDATA[<p>Community-driven creation and curation of best practices is perhaps <em>the</em> driving factor behind dbt and analytics engineering’s rise - transferrable workflows and processes enable everyone to create and disseminate organizational knowledge. In the early days, <del>dbt Labs’</del> Fishtown Analytics’ <a href="https://github.com/dbt-labs/corp/blob/773079cc140e8636403771e0c9f56ab9be528597/dbt_style_guide.md" target="_blank" rel="noopener noreferrer">dbt_style_guide.md</a> contained foundational guidelines for anyone adopting the dbt viewpoint for the first time.</p>
<p>Today we released a collection of <a href="https://github.com/dbt-labs/dbt-agent-skills" target="_blank" rel="noopener noreferrer">dbt agent skills</a> so that <span>AI agents</span> (like Claude Code, OpenAI's Codex, Cursor, Factory or Kilo Code) can follow the same dbt best practices you would expect of any collaborator in your codebase. This matters because by extending their baseline capabilities, <strong>skills can transform generalist coding agents into highly capable data agents</strong>.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-agent-skills#" data-featherlight="/img/blog/2026-02-03-dbt-agent-skills/skills-blog-infographic.png"><img data-toggle="lightbox" alt="Diagram showing how dbt agent skills transform generalist coding agents into specialized data agents capable of analytics engineering, semantic layer definition, testing, debugging, natural language querying, and migration workflows" title="dbt agent skills allow you to transform generalist coding agents into highly capable data agents" src="https://docs.getdbt.com/img/blog/2026-02-03-dbt-agent-skills/skills-blog-infographic.png?v=2"></a></span><span class="title_aGrV">dbt agent skills allow you to transform generalist coding agents into highly capable data agents</span></div>
<p>These skills encapsulate a broad swathe of hard-won knowledge from the dbt Community and the dbt Labs Developer Experience team. Collectively, they represent dozens of hours of focused work by dbt experts, backed by years of using dbt.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-agent-skills#" data-featherlight="/img/blog/2026-02-03-dbt-agent-skills/with_skill.gif"><img data-toggle="lightbox" alt="A gif showing Claude using the analytics engineering skill to validate its work" title="With access to skills, agents like Claude take a systematic approach to tasks" src="https://docs.getdbt.com/img/blog/2026-02-03-dbt-agent-skills/with_skill.gif?v=2"></a></span><span class="title_aGrV">With access to skills, agents like Claude take a systematic approach to tasks</span></div>
<p>The ecosystem is rapidly evolving for both authors of skills and the agents that consume them. We believe these skills are very useful today, <em>and</em> that they will become more useful over the coming weeks and months as:</p>
<ul>
<li>skills become better embedded into agent workflows, particularly increasing the rate at which they select the right skills to use at the right time</li>
<li>wider community adoption and feedback improves the breadth and depth of available skills</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="whats-included">What’s included<a href="https://docs.getdbt.com/blog/dbt-agent-skills#whats-included" class="hash-link" aria-label="Direct link to What’s included" title="Direct link to What’s included">​</a></h2>
<p>Our <a href="https://github.com/dbt-labs/dbt-agent-skills" target="_blank" rel="noopener noreferrer">agent skills repo</a> contains skills for:</p>
<ul>
<li><strong>Analytics engineering</strong>: Build and modify dbt models, write tests, explore data sources</li>
<li><strong>Semantic layer</strong>: Create metrics, dimensions, and semantic models with MetricFlow</li>
<li><strong>Platform operations</strong>: Troubleshoot job failures, configure the dbt MCP server</li>
<li><strong>Migration</strong>: Move projects from dbt Core to the dbt Fusion engine</li>
</ul>
<p>You’ll notice these skills vary in size of task and complexity. The primary <a href="https://github.com/dbt-labs/dbt-agent-skills/tree/main/skills/dbt/skills/using-dbt-for-analytics-engineering" target="_blank" rel="noopener noreferrer"><em>using dbt for analytics engineering</em> skill</a> contains information about the entire workflow loop for analytics engineering. Other skills are more focused and task dependent.</p>
<p>We plan to continue refining these and adding more skills over time. If there’s a skill that would be useful that you don’t see, please open an issue on the repo.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="quickstart">Quickstart<a href="https://docs.getdbt.com/blog/dbt-agent-skills#quickstart" class="hash-link" aria-label="Direct link to Quickstart" title="Direct link to Quickstart">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="1-add-the-skills-to-your-agent">1. Add the skills to your agent<a href="https://docs.getdbt.com/blog/dbt-agent-skills#1-add-the-skills-to-your-agent" class="hash-link" aria-label="Direct link to 1. Add the skills to your agent" title="Direct link to 1. Add the skills to your agent">​</a></h3>
<p>In Claude Code, run these commands (one at a time):</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-bash codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">/plugin marketplace </span><span class="token function" style="color:rgb(130, 170, 255)">add</span><span class="token plain"> dbt-labs/dbt-agent-skills</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-bash codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">/plugin </span><span class="token function" style="color:rgb(130, 170, 255)">install</span><span class="token plain"> dbt@dbt-agent-marketplace</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>For agents other than Claude Code, use this command (<a href="https://nodejs.org/en/download" target="_blank" rel="noopener noreferrer">requires Node to be installed</a>):</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-bash codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">npx skills </span><span class="token function" style="color:rgb(130, 170, 255)">add</span><span class="token plain"> dbt-labs/dbt-agent-skills </span><span class="token parameter variable" style="color:rgb(214, 222, 235)">--global</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>or just manually copy the files you want into the <a href="https://github.com/vercel-labs/skills?tab=readme-ov-file#supported-agents" target="_blank" rel="noopener noreferrer">correct path for your agent</a>.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="2-start-a-new-agent-session">2. Start a new agent session<a href="https://docs.getdbt.com/blog/dbt-agent-skills#2-start-a-new-agent-session" class="hash-link" aria-label="Direct link to 2. Start a new agent session" title="Direct link to 2. Start a new agent session">​</a></h3>
<p>Restart your terminal to make sure the new skills are detected.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="3-try-it-yourself">3. Try it yourself<a href="https://docs.getdbt.com/blog/dbt-agent-skills#3-try-it-yourself" class="hash-link" aria-label="Direct link to 3. Try it yourself" title="Direct link to 3. Try it yourself">​</a></h3>
<p>Try giving an instruction like:</p>
<ul>
<li>Plan and build models for my new HubSpot source tables</li>
<li>Work out why my <code>dbt build</code> just failed</li>
<li>Write unit tests based on the requirements in this GitHub issue, then create a new model that passes</li>
<li>Update <code>fct_transactions</code> to become a semantic model</li>
<li>Is there a difference in bounce rate for free vs paid email domains?</li>
</ul>
<p>We focused on tasks that are either common (daily model building, debugging) or complex (semantic layer setup, unit testing edge cases). Each skill contains high-signal knowledge, and has been validated in real-world testing and against <span>ADE-bench</span>.</p>
<p>If you just want to get started today, you can stop reading now. But there’s a whole lot to say about what skills are, why they’re useful and how we expect them to plug into the dbt workflows of today and tomorrow.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>Normal cautions around agentic coding apply. Please take appropriate safeguards, particularly when working with production or sensitive data.</p></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="so-what-is-a-skill-anyway">So what is a skill, anyway?<a href="https://docs.getdbt.com/blog/dbt-agent-skills#so-what-is-a-skill-anyway" class="hash-link" aria-label="Direct link to So what is a skill, anyway?" title="Direct link to So what is a skill, anyway?">​</a></h2>
<p>You can think of skills as bundles of prompts (and scripts) which LLMs can dynamically string together to gain context or expertise on a given task.</p>
<p>In some ways, a skill is very simple - it’s a markdown file with a predefined structure. The venerable <code>dbt_style_guide.md</code> of yore would fit right in! It has a bunch of bulleted instructions, some sample code, and links out to other resources when necessary; the new Skills format does the same things. Anthropic introduced Skills in October 2025, and they are now an open standard adopted by <a href="https://github.com/vercel-labs/skills?tab=readme-ov-file#supported-agents" target="_blank" rel="noopener noreferrer">30+ agents</a>.</p>
<p>A better question than <em>what</em> might be <em>why</em>. From the <a href="https://agentskills.io/home" target="_blank" rel="noopener noreferrer">agent skills site</a>:</p>
<blockquote>
<p>Agents are increasingly capable, but often don’t have the context they need to do real work reliably. Skills solve this by giving agents access to procedural knowledge and company-, team-, and user-specific context they can load on demand.</p>
</blockquote>
<p>Here’s an <a href="https://claude.com/blog/equipping-agents-for-the-real-world-with-agent-skills" target="_blank" rel="noopener noreferrer">example skill from Anthropic</a>:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-agent-skills#" data-featherlight="/img/blog/2026-02-03-dbt-agent-skills/anthropic-skills-architecture.png"><img data-toggle="lightbox" alt="Anthropic’s diagram showing how agent skills use progressive disclosure with YAML frontmatter, markdown content, and reference files" title="An example SKILL.md file for working with PDFs, which also contains references to more complex workflows to load on-demand" src="https://docs.getdbt.com/img/blog/2026-02-03-dbt-agent-skills/anthropic-skills-architecture.png?v=2"></a></span><span class="title_aGrV">An example SKILL.md file for working with PDFs, which also contains references to more complex workflows to load on-demand</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-do-skills-interact-with-mcp">How do skills interact with MCP?<a href="https://docs.getdbt.com/blog/dbt-agent-skills#how-do-skills-interact-with-mcp" class="hash-link" aria-label="Direct link to How do skills interact with MCP?" title="Direct link to How do skills interact with MCP?">​</a></h3>
<p>Another common question is how skills differ from MCP servers, and whether both are necessary.</p>
<ul>
<li>MCP is how you provide access to tools (especially remote tools requiring authentication)</li>
<li>Skills are how you provide context and knowledge around using those tools</li>
</ul>
<p>dbt Agent skills and the <a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server">dbt MCP server</a> are <em>complementary</em>, but you don’t have to use both to get value.</p>
<p>Consider the PDF example. Working with PDF files doesn’t require a MCP server, because the editing library can be installed locally. But you want that library to be used in a consistent way instead of the LLM inventing something from first principles every time.</p>
<p>So then why does the dbt MCP also have tools that call into the CLI? For interfaces that support MCP but not skills, it’s helpful to bake the <em>specific way the CLI commands are called</em> into the MCP server, but this is an open question and something we’re watching closely.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="from-generalist-to-specialist">From generalist to specialist<a href="https://docs.getdbt.com/blog/dbt-agent-skills#from-generalist-to-specialist" class="hash-link" aria-label="Direct link to From generalist to specialist" title="Direct link to From generalist to specialist">​</a></h3>
<p>To summarize, the best way to think of skills is as a layered training manual. If you took a very smart generalist off the street, what would they need to be able to use and implement <em>your organization's workflows?</em></p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-agent-skills#" data-featherlight="/img/blog/2026-02-03-dbt-agent-skills/skills-blog-diagram.png"><img data-toggle="lightbox" alt="A pyramid diagram showing three layers: Coding Agent at the base (takes autonomous actions, runs dbt commands, and looks up docs), dbt best practice skills in the middle (knows dbt best practices and workflows), and Project skills at the top (knows workflows unique to your team and data model)" title="Skills provide layered context that builds on an agent's baseline capabilities" src="https://docs.getdbt.com/img/blog/2026-02-03-dbt-agent-skills/skills-blog-diagram.png?v=2"></a></span><span class="title_aGrV">Skills provide layered context that builds on an agent's baseline capabilities</span></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-skills-matter">Why skills matter<a href="https://docs.getdbt.com/blog/dbt-agent-skills#why-skills-matter" class="hash-link" aria-label="Direct link to Why skills matter" title="Direct link to Why skills matter">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="skills-allow-you-to-embed-complex-process-knowledge-that-is-non-obvious-to-agents">Skills allow you to embed complex process knowledge that is non-obvious to agents<a href="https://docs.getdbt.com/blog/dbt-agent-skills#skills-allow-you-to-embed-complex-process-knowledge-that-is-non-obvious-to-agents" class="hash-link" aria-label="Direct link to Skills allow you to embed complex process knowledge that is non-obvious to agents" title="Direct link to Skills allow you to embed complex process knowledge that is non-obvious to agents">​</a></h3>
<p>Any experienced dbt practitioner will have a number of intuitions when working with a dbt project:</p>
<ul>
<li>You want to poke around a bit and get a sense of the schema and underlying data before making any changes. Read some docs, run a couple of <code>dbt show</code> queries, that sort of thing.</li>
<li>If you’re modifying an existing model, you need to look at the underlying data and get a sense of what columns live in upstream data sources.</li>
<li>After making a new model or modifying one, you need to look at the data again, as well as run summary/aggregate statistics to see if it matches your expected shape and output</li>
</ul>
<p>The current generation of coding agents tends to not do these things by default. Skills fix that by including broad dbt best practices like the ones above, but they can also provide very in-depth and nuanced guidance through supplemental reference materials, such as:</p>
<ul>
<li>Warehouse-specific configurations, like avoiding full table scans on BigQuery when discovering data</li>
<li>Variations based on the specific dbt version or engine you’re using; <code>dbt compile</code> can <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">detect many SQL errors</a> when invoked from the dbt Fusion engine, but dbt Core needs to run <code>dbt build</code> for the same result.</li>
</ul>
<p>Skills can also evolve at a faster pace than frontier AI model releases, making it easier to update guidance and adapt to changes in the dbt authoring layer. We recently <a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec">revamped the authoring experience for semantic models</a>; by including a skill that knows about the new syntax, we can stop your agent from using the old syntax even though that’s the majority of training data online.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="skills-protect-against-plausible-but-incorrect-output">Skills protect against plausible but incorrect output<a href="https://docs.getdbt.com/blog/dbt-agent-skills#skills-protect-against-plausible-but-incorrect-output" class="hash-link" aria-label="Direct link to Skills protect against plausible but incorrect output" title="Direct link to Skills protect against plausible but incorrect output">​</a></h3>
<p>If you ask an LLM to add some tests to your model, it might add an <code>accepted values</code> test. dbt’s documentation on <code>accepted_values</code> tests <a href="https://docs.getdbt.com/reference/resource-properties/data-tests#accepted_values">contains an example</a> saying that the right values on an <code>order_status</code> column are <code>['placed', 'shipped', 'completed', 'returned']</code>, and we’ve seen some models replicate this or otherwise hallucinate potential column values.</p>
<p>With a skill, you can instruct the agent to <strong>preview the data before writing tests</strong> to ensure that the output matches the real data in your warehouse.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="skills-allow-you-to-give-opinionated-guidance-to-agents">Skills allow you to give opinionated guidance to agents<a href="https://docs.getdbt.com/blog/dbt-agent-skills#skills-allow-you-to-give-opinionated-guidance-to-agents" class="hash-link" aria-label="Direct link to Skills allow you to give opinionated guidance to agents" title="Direct link to Skills allow you to give opinionated guidance to agents">​</a></h3>
<p>Beyond global best practices, there are also a number of opinionated decisions inside of a given team’s dbt project:</p>
<ul>
<li>What types of data tests should I have on my models?</li>
<li>When should I use the Semantic Layer vs. SQL for natural language questions?</li>
<li>How should the project be structured (stg/int/mart? Medallion? Data vault?)</li>
</ul>
<p>Our current skills are only semi-opinionated - they have opinions on how and where you should apply your data tests but not on whether you should use dbt’s recommended project structure or style guide. In the future we anticipate that we will release first party opinionated guides on project and code structure and that there will be a thriving ecosystem of opinionated community-sourced skills on different dimensions of data work.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="skills-allow-you-to-give-non-public-information-to-agents">Skills allow you to give non-public information to agents<a href="https://docs.getdbt.com/blog/dbt-agent-skills#skills-allow-you-to-give-non-public-information-to-agents" class="hash-link" aria-label="Direct link to Skills allow you to give non-public information to agents" title="Direct link to Skills allow you to give non-public information to agents">​</a></h3>
<p>In addition to adopting our skills, you should add some of your own.</p>
<p>Taking a smart generalist across all disciplines and turning them into a smart generalist with a specialization in dbt still isn’t enough. They also need to become a specialist in the way your company does data.</p>
<p>Obviously we can’t include those in our general best practices skills, but this is where the composability of skills comes in. You can add context about your company, your data, the specific ins and outs and nuances of interacting with your systems, and expect it to augment what we provide.</p>
<p>Examples of questions you might like to answer in your skills:</p>
<ul>
<li>Have any default macros been overridden in my organization’s project?</li>
<li>What is my organization’s cross-project or cross-platform mesh strategy?</li>
<li>What partitioning rules should be applied to new models for a given usage pattern?</li>
</ul>
<p>More to come soon on how we might support org level skills within dbt projects.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-we-validated-the-dbt-agent-skills">How we validated the dbt Agent Skills<a href="https://docs.getdbt.com/blog/dbt-agent-skills#how-we-validated-the-dbt-agent-skills" class="hash-link" aria-label="Direct link to How we validated the dbt Agent Skills" title="Direct link to How we validated the dbt Agent Skills">​</a></h2>
<p>It can be challenging to assess the performance of AI workflows. There are many different ways to do this and all of them are imperfect, so we have settled on a multilayered strategy for ensuring our agent skills behave the way we want them to.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="careful-expert-generation-and-curation-of-skills">Careful expert generation and curation of skills<a href="https://docs.getdbt.com/blog/dbt-agent-skills#careful-expert-generation-and-curation-of-skills" class="hash-link" aria-label="Direct link to Careful expert generation and curation of skills" title="Direct link to Careful expert generation and curation of skills">​</a></h3>
<p>While we <em>did</em> have some LLM assistance in generating some of the skills, these are very much not "oneshotted outputs". Each skill represents hours of crafting, reviewing and refining by world class dbt experts to ensure that our knowledge has been accurately encoded into the skills. Data work has a <em>lot</em> of tacit knowledge and edge cases, and this is where skills really shine.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="hands-on-testing-of-each-skill-in-real-life-examples">Hands-on testing of each skill in real life examples<a href="https://docs.getdbt.com/blog/dbt-agent-skills#hands-on-testing-of-each-skill-in-real-life-examples" class="hash-link" aria-label="Direct link to Hands-on testing of each skill in real life examples" title="Direct link to Hands-on testing of each skill in real life examples">​</a></h3>
<p>Nothing beats hands-on usage and so we’ve tested each skill to see how it performs in real use cases. This has helped us tune the performance and identify non-obvious gaps in our instructions.</p>
<p>We were particularly thrilled when we asked the agent to make performance recommendations on one of the largest tables in our dbt project, with and without the skill. While both results gave plausible recommendations, the recommendations with the skill were more tailored and relevant to our use case as determined by our internal data team.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-agent-skills#" data-featherlight="/img/blog/2026-02-03-dbt-agent-skills/skills-validation-feedback.png"><img data-toggle="lightbox" alt="A Slack screenshot from @brandon, who says 'that version excites me much, much more. the recommendations on incremental filtering on all refs, pre-aggregated int models etc. i think would make a huge impact.'" title="" src="https://docs.getdbt.com/img/blog/2026-02-03-dbt-agent-skills/skills-validation-feedback.png?v=2"></a></span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="custom-suite-for-ab-testing-skills">Custom suite for A/B testing skills<a href="https://docs.getdbt.com/blog/dbt-agent-skills#custom-suite-for-ab-testing-skills" class="hash-link" aria-label="Direct link to Custom suite for A/B testing skills" title="Direct link to Custom suite for A/B testing skills">​</a></h3>
<p>We developed a <a href="https://github.com/dbt-labs/dbt-agent-skills/tree/main/evals" target="_blank" rel="noopener noreferrer">system for rapidly comparing different tool combinations</a> (MCP + skills, skills alone, no tools) to understand how they changed an agent’s output.</p>
<p>This library allows testing how variations of skills perform for a given scenario and reviewing in detail the skills and tools called by the agent.</p>
<p>We provide context to Claude Code (e.g. a dbt project or some YAML files) and we ask it to solve a problem with different setups:</p>
<ul>
<li>with different variations of a skill</li>
<li>with or without a MCP server connected</li>
<li>explicitly prompting the agent to use a skill, or leaving it to discover it solo</li>
</ul>
<p>We can then either manually compare the conversations (which skills were called, what output was produced), or ask Claude Code to rate the different runs automatically.</p>
<p>One thing we discovered in this process is that Claude is much less willing to use skills in "headless" CLI invocations than "interactive" ones where a user is talking back and forth. Because of this, we felt comfortable including the explicit prompt in benchmarking tasks.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="benchmarking-against-ade-bench">Benchmarking against ADE-bench<a href="https://docs.getdbt.com/blog/dbt-agent-skills#benchmarking-against-ade-bench" class="hash-link" aria-label="Direct link to Benchmarking against ADE-bench" title="Direct link to Benchmarking against ADE-bench">​</a></h3>
<p>We also ran through the <span>ADE-bench</span> tasks to assess performance with and without skills. While not every skill has corresponding tasks in the benchmark (yet!), this provides helpful signal, particularly on the primary analytics engineering skill.</p>
<p>We saw modest improvements in performance on the benchmark with Skills, rising from a 56% accuracy rate without skills to a 58.5% accuracy rate with Skills. But the bigger story is not the headline numbers, but the individual tasks that were solved with skills that previously had 0% success rates.</p>
<p>Notably, we found <strong>significant benefits in tasks which require iterative work</strong> on top of a dbt DAG, which is one of the most common failure points we've experienced in using coding agents with dbt.</p>
<div style="display:grid;grid-template-columns:repeat(auto-fit, minmax(300px, 1fr));gap:1rem"><div style="text-align:center"><p><strong>Without skills</strong></p><p><img decoding="async" loading="lazy" alt="Without skills, agents may skip important validation steps" src="https://docs.getdbt.com/assets/images/without_skill-da56373701544fd8711c725d25d95011.gif" width="1800" height="1000" class="img_ev3q"></p></div><div style="text-align:center"><p><strong>With skills</strong></p><p><img decoding="async" loading="lazy" alt="With access to skills, agents take a systematic approach to tasks" src="https://docs.getdbt.com/assets/images/with_skill-b62eb0689de0bb0352a714958a579f2e.gif" width="1800" height="1000" class="img_ev3q"></p></div></div>
<p>For example, when asked to <a href="https://github.com/dbt-labs/ade-bench/blob/main/tasks/airbnb007/task.yaml" target="_blank" rel="noopener noreferrer">produce multiple models based on their schema.yml definition</a>, the baseline agent created 6 models at once and declared victory. The skill-using agent worked iteratively, and successfully completed the task every time.</p>
<p>On the other hand, encouraging DRY principles led to the skill-using agent intermittently reusing a column with a logic bug in <a href="https://github.com/dbt-labs/ade-bench/blob/main/tasks/f1009/task.yaml" target="_blank" rel="noopener noreferrer">this task</a>, where the baseline agent noticed and corrected the bug.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="where-there-are-gaps">Where there are gaps<a href="https://docs.getdbt.com/blog/dbt-agent-skills#where-there-are-gaps" class="hash-link" aria-label="Direct link to Where there are gaps" title="Direct link to Where there are gaps">​</a></h3>
<p>Today, skill loading can be a little hit-and-miss. As with everything in AI, things are moving fast, and skills are seeing widespread adoption, so we don’t think that’s going to be a long term issue. We’d also love to see stronger and more reliable cross-skill referencing, such as <a href="https://github.com/agentskills/agentskills/issues/90" target="_blank" rel="noopener noreferrer">what’s described here</a>.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="again-you-should-go-try-this-yourself">Again: you should go try this yourself<a href="https://docs.getdbt.com/blog/dbt-agent-skills#again-you-should-go-try-this-yourself" class="hash-link" aria-label="Direct link to Again: you should go try this yourself" title="Direct link to Again: you should go try this yourself">​</a></h2>
<p><a href="https://github.com/dbt-labs/dbt-agent-skills" target="_blank" rel="noopener noreferrer">Here’s the repo</a>, with installation instructions in the readme.</p>
<p>Agent skills have tremendous bang-for-buck for procedural tasks, especially considering how easily you can get started. We’re excited to see many people from across the Community trying them on real-world workflows, and building new skills of their own.</p>
<p>We’re also exploring ways to enable tighter integration between dbt and agent skills, as well as making it easier to manage custom skills for your specific dbt project and data.</p>
<p>The best way to stay involved is to share what you're discovering in <a href="https://getdbt.slack.com/archives/C0A1MRWEH8C/p1769918756796829" target="_blank" rel="noopener noreferrer">#topic-agentic-analytics</a> on Slack or to open up issues on the GitHub repo.</p>]]></content>
        <author>
            <name>Joel Labes</name>
        </author>
        <author>
            <name>Jason Ganz</name>
        </author>
        <category label="ai" term="ai"/>
        <category label="data_ecosystem" term="data_ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Modernizing the Semantic Layer Spec]]></title>
        <id>https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec</id>
        <link href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec"/>
        <updated>2026-01-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Learn about the new semantic layer specification and how it improves data modeling.]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="new-engine-who-dis">New engine, who dis?<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#new-engine-who-dis" class="hash-link" aria-label="Direct link to New engine, who dis?" title="Direct link to New engine, who dis?">​</a></h2>
<p>It’s unlikely that anyone reading this blog has not heard about the new dbt Fusion engine — it’s been the talk of the data town since last January, culminating in Elias’s legendary live Coalesce 2025 demo of the incredible capabilities that native SQL comprehension in dbt can unlock. If you attended Coalesce, or have upgraded your project to Fusion already, you’ve likely also heard about the changes we’ve made to the authoring layer of dbt (the literal code you write in your project). As part of the major version upgrade, we took the opportunity to simplify + standardize the configuration language of dbt to be built to scale as we enter the next era of analytics engineering.</p>
<p>In particular, we wanted to reevaluate how metrics are defined in the dbt Semantic Layer. We’ve heard from numerous community members over the years that defining metrics was <em>just plain hard</em>. In conversation with internal + external users and our newest pals from SDF, we’ve come up with a redesigned YAML spec that is simpler, more integrated to the dbt configuration experience we’ve come to know and love, and built for the future.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="whats-changing">What’s changing?<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#whats-changing" class="hash-link" aria-label="Direct link to What’s changing?" title="Direct link to What’s changing?">​</a></h3>
<p>There are three major updates to the structure of semantic modeling in dbt:</p>
<ul>
<li><strong>Measures → Metrics:</strong> Measures are removed from the authorship spec. Simple metrics now can include aggregations and expressions and are the primary building block for more complex metrics.</li>
<li><strong>Reducing nesting:</strong> We removed as much deep dictionary nesting as possible to simplify the look and feel of the code, and renamed keys to more directly describe their behavior.</li>
<li><strong>Standardizing on models YAML entries:</strong> Semantic annotations are embedded within the model’s YAML entry to remove the need to manage many YAML entries across many files to enrich your models with semantic metadata.</li>
</ul>
<div style="display:grid;grid-template-columns:repeat(auto-fit, minmax(300px, 1fr));gap:16px;margin:0 -20px 20px -20px;max-width:calc(100% + 40px)"><div style="padding:0 12px"><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="legacy-implementation">Legacy implementation<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#legacy-implementation" class="hash-link" aria-label="Direct link to Legacy implementation" title="Direct link to Legacy implementation">​</a></h3><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-yaml codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token key atrule" style="color:rgb(255, 203, 139)">models</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customers</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> Customer overview data mart</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> offering key details for each unique customer. One row per customer.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customer_id</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> The unique key of the orders mart.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> first_ordered_at</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> The timestamp when a customer placed their first order.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token key atrule" style="color:rgb(255, 203, 139)">semantic_models</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">customers</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> ref('customers')</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> Semantic Model for Customers</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">defaults</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">agg_time_dimension</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> first_ordered_at</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">entities</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customer</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> primary</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">expr</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customer_id</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">dimensions</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">       </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> first_ordered_at</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">         </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> time</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">         </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type_params</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">           </span><span class="token key atrule" style="color:rgb(255, 203, 139)">time_granularity</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> day</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">measures</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">       </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> lifetime_spend_pretax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">         </span><span class="token key atrule" style="color:rgb(255, 203, 139)">agg</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> sum</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">metrics</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">   </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> lifetime_spend_pretax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> simple</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> Customer's lifetime spend before tax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">label</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> LTV Pre</span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain">tax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">     </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type_params</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">measure</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> lifetime_spend_pretax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"> </span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div></div><div style="padding:0 12px"><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="new-implementation">New implementation<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#new-implementation" class="hash-link" aria-label="Direct link to New implementation" title="Direct link to New implementation">​</a></h3><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-yaml codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token key atrule" style="color:rgb(255, 203, 139)">models</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customers</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># enable semantic modeling on this model</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">semantic_model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">enabled</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token boolean important" style="color:rgb(255, 88, 116)">true</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># set default aggregation time dimension as a model property</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">agg_time_dimension</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> first_ordered_at</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> Customer overview data mart</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> offering key details for each unique customer. One row per customer.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customer_id</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> The unique key of the orders mart.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># annotate column as a primary entity</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">entity</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> customer</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> primary</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> first_ordered_at</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> The timestamp when a customer placed their first order.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># annotate column as a time dimension</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">granularity</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> day</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">dimension</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> time</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">          </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># define simple metric directly within the model's YAML</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">metrics</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> lifetime_spend_pretax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> simple</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">description</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> Customer's lifetime spend before tax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">label</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> LTV Pre</span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain">tax</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token key atrule" style="color:rgb(255, 203, 139)">agg</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> sum</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div></div></div>
<p>This has a few clear benefits:</p>
<ul>
<li><strong>DRYer code:</strong> Semantic annotations are now alongside the model’s YAML entry, reducing duplicative work. Now, if a column within the model is a dimension or entity, you can configure it as such. The properties of the column, like its description, are then reflected as the description of the dimension / entity!</li>
<li><strong>Tidier YAML:</strong> A tidier metric entry is easier to write, easier to read, and easier to share context across your data team. Maintenance of metric code should be as easy as possible!</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="is-this-the-osi-spec">Is this the OSI spec?<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#is-this-the-osi-spec" class="hash-link" aria-label="Direct link to Is this the OSI spec?" title="Direct link to Is this the OSI spec?">​</a></h3>
<p>You may have heard some buzz that dbt joined the industry initiative called the <a href="https://www.snowflake.com/en/blog/open-semantic-interchange-ai-standard/" target="_blank" rel="noopener noreferrer">Open Semantic Interchange</a>, working with partners like Snowflake and Tableau to create an open standard for semantic metadata. This is not the OSI Spec! This is an update to the existing dbt Semantic Layer spec, designed to make it easier for dbt users to define and manage their metrics. However, we are actively exploring how we can align with the OSI spec in the future, and we see this as a step towards that goal.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="get-started-today">Get started today<a href="https://docs.getdbt.com/blog/modernizing-the-semantic-layer-spec#get-started-today" class="hash-link" aria-label="Direct link to Get started today" title="Direct link to Get started today">​</a></h3>
<p>This new spec is <strong>live on the Fusion engine today.</strong> If you’ve <a href="https://docs.getdbt.com/guides/upgrade-to-fusion?step=1">migrated</a> onto the engine, and are curious about getting started with the dbt Semantic Layer, <a href="https://docs.getdbt.com/docs/build/latest-metrics-spec">check out our docs</a> and get started defining your metrics! This new spec will also be released to dbt Core in version 1.12, coming in the near future. dbt platform users on the dbt Core engine will be able to migrate to the new spec as soon as they upgrade to the Latest dbt version!</p>
<p>Additionally, if you’re an existing user of the semantic layer, our <a href="https://github.com/dbt-labs/dbt-autofix" target="_blank" rel="noopener noreferrer"><code>dbt-autofix</code> script</a> now has support for migrating from the legacy metrics implementation to the new one! Simply run <code>dbt-autofix deprecations --semantic-layer</code>, locally or in dbt Studio on the platform, and the vast majority of the code will be migrated automatically!</p>
<p>We’re eager for feedback! Reach out in dbt Community Slack in the <a href="https://getdbt.slack.com/archives/C046L0VTVR6" target="_blank" rel="noopener noreferrer"><code>#dbt-semantic-layer</code> channel</a> and let us know how your migration / onboarding experience goes!</p>]]></content>
        <author>
            <name>Dave Connors</name>
        </author>
        <category label="ai" term="ai"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Building the Remote dbt MCP Server]]></title>
        <id>https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server</id>
        <link href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server"/>
        <updated>2025-08-26T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Learn about the new remote dbt MCP server, how it was built, and how to use it to build agents.]]></summary>
        <content type="html"><![CDATA[<p>In April, we released the local <a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server">dbt MCP (Model Context Protocol) server</a> as an open source project to connect AI agents and LLMs with direct, governed access to trusted dbt assets. The dbt MCP server provides a <a href="https://docs.anthropic.com/en/docs/mcp" target="_blank" rel="noopener noreferrer">universal, open standard</a> for bridging AI systems with your structured context that keeps your agents accurate, governed, and trustworthy. Learn more in <a href="https://docs.getdbt.com/docs/dbt-ai/about-mcp">About dbt Model Context Protocol</a>.</p>
<p>Since releasing the local dbt MCP server, the dbt community has been applying it in incredible ways including agentic conversational analytics, data catalog exploration, and dbt project refactoring. However, a key piece of feedback we received from AI engineers was that the local dbt MCP server isn’t easy to deploy or host for multi-tenanted workloads, making it difficult to build applications on top of the dbt MCP server.</p>
<p>This is why we are excited to announce a new way to integrate with dbt MCP: <strong>the remote dbt MCP server</strong>. The remote dbt MCP server doesn’t require installing dependencies or running the dbt MCP server in your infrastructure, making it easier than ever to build and run agents. It is <strong>available today in public beta</strong> for users with dbt Starter, Enterprise, or Enterprise+ plans, ready for you to start building AI-powered applications.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-is-the-remote-dbt-mcp-server-">What is the Remote dbt MCP Server? <a href="https://www.getdbt.com/pricing" target="_blank" rel="noopener noreferrer" class="lifecycle_J1Zi lifecycle" style="background-color:#E5E7EB;color:#030711;cursor:pointer;transition:background-color 0.2s ease, transform 0.2s ease, text-decoration 0.2s ease;text-decoration:none" title="Go to https://www.getdbt.com/pricing">Starter</a><a href="https://www.getdbt.com/pricing" target="_blank" rel="noopener noreferrer" class="lifecycle_J1Zi lifecycle" style="background-color:#E5E7EB;color:#030711;cursor:pointer;transition:background-color 0.2s ease, transform 0.2s ease, text-decoration 0.2s ease;text-decoration:none" title="Go to https://www.getdbt.com/pricing">Enterprise</a><a href="https://www.getdbt.com/pricing" target="_blank" rel="noopener noreferrer" class="lifecycle_J1Zi lifecycle" style="background-color:#E5E7EB;color:#030711;cursor:pointer;transition:background-color 0.2s ease, transform 0.2s ease, text-decoration 0.2s ease;text-decoration:none" title="Go to https://www.getdbt.com/pricing">Enterprise +</a><a href="https://docs.getdbt.com/docs/dbt-versions/product-lifecycles" target="_blank" rel="noopener noreferrer" class="lifecycle_J1Zi lifecycle" style="background-color:#bab2ff;color:#030711;cursor:pointer;transition:background-color 0.2s ease, transform 0.2s ease, text-decoration 0.2s ease;text-decoration:none" title="Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles">Beta</a><a href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server#what-is-the-remote-dbt-mcp-server-" class="hash-link" aria-label="Direct link to what-is-the-remote-dbt-mcp-server-" title="Direct link to what-is-the-remote-dbt-mcp-server-">​</a></h2>
<p>Commonly, agents and MCP servers run locally on your computer, but local-first agents are limited in the type of applications that can be built. With remote MCP, new experiences are possible. For instance, remote MCP enables server-side agents to perform long-running tasks, be shared across an organization, and be accessed through web applications -- all experiences that are far more difficult (or impossible) in a local agent architecture.</p>
<p>The remote dbt MCP server brings <strong>structured, governed context</strong> to these experiences and enables you to build innovative data applications on top of them. The remote dbt MCP server makes it possible for your agent to answer business questions with the <a href="https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl">dbt Semantic Layer</a>, discover data assets with the <a href="https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api">dbt Discovery API</a>, and run natural-language queries with SQL tools. Check out our docs <a href="https://docs.getdbt.com/docs/dbt-ai/about-mcp">here</a> to learn about the full list of supported tools. These capabilities are easy to integrate in various platforms with the standardized MCP specification.</p>
<p>The remote dbt MCP server is great for application builders, but there are still times when you would want to run the dbt MCP server locally. Specifically, <strong>if you are using a local coding agent like Cursor or Claude Code, we recommend the local dbt MCP server.</strong> This ensures that the code you are writing locally matches what the agent has access to.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-remote-dbt-mcp-server-architecture">The Remote dbt MCP Server Architecture<a href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server#the-remote-dbt-mcp-server-architecture" class="hash-link" aria-label="Direct link to The Remote dbt MCP Server Architecture" title="Direct link to The Remote dbt MCP Server Architecture">​</a></h2>
<p>Hosting your own remote MCP server is non-trivial. While a local MCP server only has to consider a single tenant experience, remote servers need to manage concurrent connections from multiple different users as well as the deployment and maintenance of the server and infrastructure. Additionally, connections need to be securely authenticated and isolated from each other. The latest updates to the MCP spec provides a new way to communicate with MCP servers, <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/transports" target="_blank" rel="noopener noreferrer">Streamable HTTP</a>, allowing for stateless remote connections with agents. Streamable HTTP makes things easier but there is still a high lift for most data teams to deploy an MCP server. With the remote dbt MCP server, we handle all of this complexity so, if you are building an agentic application, all you need to worry about is making an HTTP connection to our API.</p>
<p>At the same time, we want the remote dbt MCP server to have similar functionality as the local dbt MCP server without entirely reimplementing the tools. We implemented these requirements by running a Streamable HTTP MCP server and adding proxied versions of each dbt MCP tool to this server. The proxied version of each tool has the same tool parameters, description, and implementation as the open source version, ensuring a consistent experience. The difference is that the proxied versions are configured via HTTP headers rather than environment variables and these tools connect directly to our internal APIs which reduces latency.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server#" data-featherlight="/img/blog/2025-08-26-building-the-remote-dbt-mcp-server/remote-dbt-mcp.png"><img data-toggle="lightbox" alt="The remote dbt MCP architecture" title="The remote dbt MCP architecture" src="https://docs.getdbt.com/img/blog/2025-08-26-building-the-remote-dbt-mcp-server/remote-dbt-mcp.png?v=2"></a></span><span class="title_aGrV">The remote dbt MCP architecture</span></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-remote-dbt-mcp-server-in-action">The Remote dbt MCP Server in Action<a href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server#the-remote-dbt-mcp-server-in-action" class="hash-link" aria-label="Direct link to The Remote dbt MCP Server in Action" title="Direct link to The Remote dbt MCP Server in Action">​</a></h2>
<p>Now that we have a better understanding of how the remote dbt MCP server works, let's implement it in practice by creating a simple agent loop with LangGraph in Python. We are using LangGraph as an example here, but you can use whichever language or framework you would like. Check out our <a href="https://github.com/dbt-labs/dbt-mcp/tree/main/examples" target="_blank" rel="noopener noreferrer">examples directory</a> for more resources on creating agents with the dbt MCP server, including the full example shown here.</p>
<p>The agent we implement here will be able to conduct conversational analytics grounded in <strong>structured, governed context</strong> from your dbt project. This means it can receive a user's question, search for relevant metadata with the dbt Discovery API, find important metrics with dbt Semantic Layer API, explore the data, and return an accurate, trustworthy answer. This shows how the remote dbt MCP server can power AI applications that combine the flexibility of LLMs with the trust and consistency of your dbt assets.</p>
<p>For this example to work, you will need to install LangGraph dependencies and set an environment variable for the Anthropic API key:</p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-shell codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">pip </span><span class="token function" style="color:rgb(130, 170, 255)">install</span><span class="token plain"> langgraph </span><span class="token string" style="color:rgb(173, 219, 103)">"langchain[anthropic]"</span><span class="token plain"> langchain-mcp-adapters</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token builtin class-name" style="color:rgb(255, 203, 139)">export</span><span class="token plain"> </span><span class="token assign-left variable" style="color:rgb(214, 222, 235)">ANTHROPIC_API_KEY</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token operator" style="color:rgb(127, 219, 202)">&lt;</span><span class="token plain">your-api-key</span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>First, we need to define the URL &amp; headers that the MCP client will use. These values will depend on your specific dbt Cloud deployment. In this example, we are setting the configuration from environment variables. For more information on this configuration, refer to <a href="https://docs.getdbt.com/docs/dbt-ai/about-mcp">About dbt Model Context Protocol </a>.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">import</span><span class="token plain"> os</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">url </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">f"https://</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string-interpolation interpolation">os</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token string-interpolation interpolation">environ</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token string-interpolation interpolation">get</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string-interpolation interpolation string" style="color:rgb(173, 219, 103)">'DBT_HOST'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">/api/ai/v1/mcp/"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">headers </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token string" style="color:rgb(173, 219, 103)">"x-dbt-user-id"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> os</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">"DBT_USER_ID"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token string" style="color:rgb(173, 219, 103)">"x-dbt-prod-environment-id"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> os</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">"DBT_PROD_ENV_ID"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token string" style="color:rgb(173, 219, 103)">"x-dbt-dev-environment-id"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> os</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">"DBT_DEV_ENV_ID"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token string" style="color:rgb(173, 219, 103)">"Authorization"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">f"token </span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string-interpolation interpolation">os</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token string-interpolation interpolation">environ</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token string-interpolation interpolation">get</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string-interpolation interpolation string" style="color:rgb(173, 219, 103)">'DBT_TOKEN'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Next, we need to create an MCP client, so our agent knows how to use the remote dbt MCP server.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> langchain_mcp_adapters</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">client </span><span class="token keyword" style="color:rgb(127, 219, 202)">import</span><span class="token plain"> MultiServerMCPClient</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">client </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> MultiServerMCPClient</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token string" style="color:rgb(173, 219, 103)">"dbt"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token string" style="color:rgb(173, 219, 103)">"url"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> url</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token string" style="color:rgb(173, 219, 103)">"headers"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> headers</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token string" style="color:rgb(173, 219, 103)">"transport"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"streamable_http"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Then, we need to get the available tools from the remote dbt MCP server.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">tools </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">await</span><span class="token plain"> client</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get_tools</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Now, we can create our LangGraph agent.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> langgraph</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">prebuilt </span><span class="token keyword" style="color:rgb(127, 219, 202)">import</span><span class="token plain"> create_react_agent</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> langgraph</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">checkpoint</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory </span><span class="token keyword" style="color:rgb(127, 219, 202)">import</span><span class="token plain"> InMemorySaver</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">agent </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> create_react_agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  model</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"anthropic:claude-3-7-sonnet-latest"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  tools</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain">tools</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># This allows the agent to have conversational memory.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  checkpointer</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain">InMemorySaver</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Finally, we can run the agent in a loop. This example relies on <code>print_stream_item</code> which you can find in the full example <a href="https://github.com/dbt-labs/dbt-mcp/blob/365bc0f4c28b48510d194201370a5500d69cc5ea/examples/langgraph_agent/main.py#L11" target="_blank" rel="noopener noreferrer">here</a>. You can exit the loop by killing the program with CTRL+C.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># This config maintains the conversation thread.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">config </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string" style="color:rgb(173, 219, 103)">"configurable"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string" style="color:rgb(173, 219, 103)">"thread_id"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"1"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">while</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">True</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  user_input </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">input</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">"User &gt; "</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  agent_response </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string" style="color:rgb(173, 219, 103)">"messages"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string" style="color:rgb(173, 219, 103)">"role"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"user"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"content"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> user_input</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    config</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  print_stream_item</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">item</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>With our agent implemented, we can run the program and ask it a question. You should see an output like this:</p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-shell codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">User </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> How much revenue did we </span><span class="token function" style="color:rgb(130, 170, 255)">make</span><span class="token plain"> last month?</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">Agent </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> I</span><span class="token string" style="color:rgb(173, 219, 103)">'ll help you find out the revenue for last month. Let me first check what metrics are available in the dbt Semantic Layer.</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token string" style="color:rgb(173, 219, 103)">    using tool: list_metrics</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token string" style="color:rgb(173, 219, 103)">Agent &gt; I see that we have a "revenue" metric available. Let me get the dimensions for this metric to understand how I can query for last month'</span><span class="token plain">s data:</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    using tool: get_dimensions</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">Agent </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> Now I</span><span class="token string" style="color:rgb(173, 219, 103)">'ll query the revenue metric for last month. I'</span><span class="token plain">ll use the </span><span class="token string" style="color:rgb(173, 219, 103)">"metric_time"</span><span class="token plain"> dimension with a MONTH grain:</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    using tool: query_metrics</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">Agent </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> Based on the results, the total revenue </span><span class="token keyword" style="color:rgb(127, 219, 202)">for</span><span class="token plain"> last month was **</span><span class="token variable" style="color:rgb(214, 222, 235)">$102</span><span class="token plain">,379.00**.</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="future-work">Future Work<a href="https://docs.getdbt.com/blog/building-the-remote-dbt-mcp-server#future-work" class="hash-link" aria-label="Direct link to Future Work" title="Direct link to Future Work">​</a></h2>
<p>Now that remote dbt MCP is available in public beta, we encourage you to build agents to interact with your dbt resources, bringing <strong>structured, governed context</strong> into AI workflows without the overhead of local setup. Here are some ideas for types of agents you can build with the remote dbt MCP server:</p>
<ul>
<li>Answer business-related questions with accurate, governed metrics from dbt</li>
<li>Identify PII columns and enforcing governance policies automatically</li>
<li>PR review agent to improve code quality and expedite the review process</li>
<li>Explore metadata and catalog information to accelerate data discovery and troubleshooting</li>
<li>On-call incident support agent to remediate issues faster</li>
</ul>
<p>We are continuing to invest in remote dbt MCP, with upcoming features like OAuth-based authentication to make remote MCP authentication &amp; authorization even easier. If you have any feedback, need help, or just want to chat, join us in the #tools-dbt-mcp channel in <a href="https://www.getdbt.com/community/join-the-community" target="_blank" rel="noopener noreferrer">our community Slack</a>.</p>]]></content>
        <author>
            <name>Devon Fulcher</name>
        </author>
        <category label="ai" term="ai"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[How to train a linear regression model with dbt and BigFrames]]></title>
        <id>https://docs.getdbt.com/blog/train-linear-dbt-bigframes</id>
        <link href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes"/>
        <updated>2025-07-11T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How to build a scalable linear regression model by combining dbt's modular orchestration with BigFrames' in-database Python execution in BigQuery.]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="introduction-to-dbt-and-bigframes">Introduction to dbt and BigFrames<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#introduction-to-dbt-and-bigframes" class="hash-link" aria-label="Direct link to Introduction to dbt and BigFrames" title="Direct link to Introduction to dbt and BigFrames">​</a></h2>
<p><strong>dbt</strong>: A framework for transforming data in modern data warehouses using modular SQL or Python. dbt enables data teams to develop analytics code collaboratively and efficiently by applying software engineering best practices such as version control, modularity, portability, CI/CD, testing, and documentation. For more information, refer to <a href="https://docs.getdbt.com/docs/introduction#dbt">What is dbt?</a></p>
<p><strong>BigQuery DataFrames (BigFrames)</strong>: An open-source Python library offered by Google. BigFrames scales Python data processing by transpiling common Python data science APIs (pandas and scikit-learn) to BigQuery SQL.</p>
<p>You can read more in the <a href="https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction" target="_blank" rel="noopener noreferrer">official BigFrames guide</a> and view the <a href="https://github.com/googleapis/python-bigquery-dataframes" target="_blank" rel="noopener noreferrer">public BigFrames GitHub repository</a>.</p>
<p>By combining dbt with BigFrames via the <code>dbt-bigquery</code> adapter (referred to as <em>"dbt-BigFrames"</em>), you gain:</p>
<ul>
<li>dbt’s modular SQL and Python modeling, dependency management with dbt.ref(), environment configurations, and data testing. With the cloud-based dbt platform, you also get job scheduling and monitoring.</li>
<li>BigFrames’ ability to execute complex Python transformations (including machine learning) directly in BigQuery.</li>
</ul>
<p><code>dbt-BigFrames</code> utilizes the <strong>Colab Enterprise notebook executor service</strong> in a GCP project to run Python models. These notebooks execute BigFrames code, which is translated into BigQuery SQL.</p>
<blockquote>
<p>Refer to these guides to learn more: <a href="https://cloud.google.com/bigquery/docs/dataframes-dbt" target="_blank" rel="noopener noreferrer">Use BigQuery DataFrames in dbt</a> or <a href="https://docs.getdbt.com/guides/dbt-python-bigframes?step=1">Using BigQuery DataFrames with dbt Python models</a>.</p>
</blockquote>
<p>To illustrate the practical impact of combining dbt with BigFrames, the following sections explore how this integration can streamline and scale a common machine learning task: training a linear regression model on large datasets.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-power-of-dbt-bigframes-for-large-scale-linear-regression">The power of dbt-BigFrames for large-scale linear regression<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#the-power-of-dbt-bigframes-for-large-scale-linear-regression" class="hash-link" aria-label="Direct link to The power of dbt-BigFrames for large-scale linear regression" title="Direct link to The power of dbt-BigFrames for large-scale linear regression">​</a></h2>
<p>Linear regression is a cornerstone of predictive analytics, used in:</p>
<ul>
<li>Sales forecasting</li>
<li>Financial modeling</li>
<li>Demand planning</li>
<li>Real estate valuation</li>
</ul>
<p>These tasks often require processing datasets too large for traditional in-memory Python. BigFrames alone solves this, but combining it with dbt offers a structured, maintainable, and production-ready way to train models or generate batch predictions on large data.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="dbt-bigframes-with-ml-a-practical-example">“dbt-BigFrames” with ML: A practical example<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#dbt-bigframes-with-ml-a-practical-example" class="hash-link" aria-label="Direct link to “dbt-BigFrames” with ML: A practical example" title="Direct link to “dbt-BigFrames” with ML: A practical example">​</a></h2>
<p>We’ll walk through training a linear regression model using a <strong>dbt Python model powered by BigFrames</strong>, focusing on the structure and orchestration provided by dbt.</p>
<p>We’ll use the <code>epa_historical_air_quality</code> dataset from BigQuery Public Data (courtesy of the U.S. Environmental Protection Agency).</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="problem-statement">Problem statement<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#problem-statement" class="hash-link" aria-label="Direct link to Problem statement" title="Direct link to Problem statement">​</a></h3>
<p>Develop a machine learning model to predict atmospheric ozone levels using historical air quality and environmental sensor data, enabling more accurate monitoring and forecasting of air pollution trends.</p>
<p><strong>Key stages:</strong></p>
<ol>
<li><strong>Data Foundation</strong>: Transform raw source tables into an analysis-ready dataset.</li>
<li><strong>Machine learning Analysis</strong>: Train a linear regression model on the cleaned data.</li>
</ol>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="setting-up-your-dbt-project-for-bigframes">Setting up your dbt project for BigFrames<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#setting-up-your-dbt-project-for-bigframes" class="hash-link" aria-label="Direct link to Setting up your dbt project for BigFrames" title="Direct link to Setting up your dbt project for BigFrames">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="prerequisites">Prerequisites<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites">​</a></h3>
<ul>
<li>A Google Cloud account</li>
<li>A dbt platform or Core setup</li>
<li>Basic to intermediate SQL and Python</li>
<li>Familiarity with dbt using <a href="https://docs.getdbt.com/guides?level=Beginner">Beginner dbt guides</a></li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="sample-profilesyml-for-bigframes">Sample <code>profiles.yml</code> for BigFrames<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#sample-profilesyml-for-bigframes" class="hash-link" aria-label="Direct link to sample-profilesyml-for-bigframes" title="Direct link to sample-profilesyml-for-bigframes">​</a></h3>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-yaml codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token key atrule" style="color:rgb(255, 203, 139)">my_epa_project</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token key atrule" style="color:rgb(255, 203, 139)">outputs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">dev</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">compute_region</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:rgb(199, 146, 234)">-</span><span class="token plain">central1</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">dataset</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> your_bq_dataset</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">gcs_bucket</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> your_gcs_bucket</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">location</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> US</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">method</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> oauth</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">priority</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> interactive</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">project</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> your_gcp_project</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">threads</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">type</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> bigquery</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token key atrule" style="color:rgb(255, 203, 139)">target</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> dev</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="sample-dbt_projectyml">Sample <code>dbt_project.yml</code><a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#sample-dbt_projectyml" class="hash-link" aria-label="Direct link to sample-dbt_projectyml" title="Direct link to sample-dbt_projectyml">​</a></h3>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-yaml codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token key atrule" style="color:rgb(255, 203, 139)">name</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'my_epa_project'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token key atrule" style="color:rgb(255, 203, 139)">version</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'1.0.0'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token key atrule" style="color:rgb(255, 203, 139)">config-version</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token key atrule" style="color:rgb(255, 203, 139)">models</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token key atrule" style="color:rgb(255, 203, 139)">my_epa_project</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">submission_method</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> bigframes</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">notebook_template_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> 701881164074529xxxx  </span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># Optional</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">timeout</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">6000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token key atrule" style="color:rgb(255, 203, 139)">example</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">      </span><span class="token key atrule" style="color:rgb(255, 203, 139)">+materialized</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> view</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-dbt-python-models-for-linear-regression">The dbt Python models for linear regression<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#the-dbt-python-models-for-linear-regression" class="hash-link" aria-label="Direct link to The dbt Python models for linear regression" title="Direct link to The dbt Python models for linear regression">​</a></h2>
<p>This project uses <strong>two modular dbt Python models</strong>:</p>
<ol>
<li><code>prepare_table.py</code> — Ingests and prepares data</li>
<li><code>prediction.py</code> — Trains the model and generates predictions</li>
</ol>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="part-1-preparing-the-table-prepare_tablepy">Part 1: Preparing the table (<code>prepare_table.py</code>)<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#part-1-preparing-the-table-prepare_tablepy" class="hash-link" aria-label="Direct link to part-1-preparing-the-table-prepare_tablepy" title="Direct link to part-1-preparing-the-table-prepare_tablepy">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">dbt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> session</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    dbt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">config</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">submission_method</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"bigframes"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> timeout</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token number" style="color:rgb(247, 140, 108)">6000</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    dataset </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"bigquery-public-data.epa_historical_air_quality"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    index_columns </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token string" style="color:rgb(173, 219, 103)">"state_name"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"county_name"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"site_num"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"date_local"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"time_local"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    param_column </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"parameter_name"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    value_column </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"sample_measurement"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    params_dfs </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    table_param_dict </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"co_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"co"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"no2_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"no2"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"o3_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"pressure_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"pressure"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"so2_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"so2"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        </span><span class="token string" style="color:rgb(173, 219, 103)">"temperature_hourly_summary"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"temperature"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token keyword" style="color:rgb(127, 219, 202)">for</span><span class="token plain"> table</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> param </span><span class="token keyword" style="color:rgb(127, 219, 202)">in</span><span class="token plain"> table_param_dict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">items</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        param_df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> bpd</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">read_gbq</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">f"</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string-interpolation interpolation">dataset</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">.</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string-interpolation interpolation">table</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain">index_columns </span><span class="token operator" style="color:rgb(127, 219, 202)">+</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">value_column</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        param_df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> param_df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">sort_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">drop_duplicates</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">rename</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain">value_column</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> param</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        params_dfs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">append</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">param_df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    wind_table </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">f"</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token string-interpolation interpolation">dataset</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token string-interpolation string" style="color:rgb(173, 219, 103)">.wind_hourly_summary"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    wind_speed_df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> bpd</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">read_gbq</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        wind_table</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain">index_columns </span><span class="token operator" style="color:rgb(127, 219, 202)">+</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">value_column</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">        filters</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">param_column</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"=="</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"Wind Speed - Resultant"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    wind_speed_df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> wind_speed_df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">sort_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">drop_duplicates</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">rename</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain">value_column</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"wind_speed"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    params_dfs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">append</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">wind_speed_df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> bpd</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">concat</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">params_dfs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> axis</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> join</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"inner"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token keyword" style="color:rgb(127, 219, 202)">return</span><span class="token plain"> df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">reset_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="part-2-training-the-model-and-making-predictions-predictionpy">Part 2: Training the model and making predictions (<code>prediction.py</code>)<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#part-2-training-the-model-and-making-predictions-predictionpy" class="hash-link" aria-label="Direct link to part-2-training-the-model-and-making-predictions-predictionpy" title="Direct link to part-2-training-the-model-and-making-predictions-predictionpy">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-python codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">dbt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> session</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    dbt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">config</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">submission_method</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"bigframes"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> timeout</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token number" style="color:rgb(247, 140, 108)">6000</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> dbt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">ref</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">"prepare_table"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    train_data_filter </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">date_local</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">dt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">year </span><span class="token operator" style="color:rgb(127, 219, 202)">&lt;</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2017</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    test_data_filter </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">date_local</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">dt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">year </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2017</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(127, 219, 202)">&amp;</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">date_local</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">dt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">year </span><span class="token operator" style="color:rgb(127, 219, 202)">&lt;</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2020</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    predict_data_filter </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">date_local</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">dt</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">year </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2020</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    index_columns </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token string" style="color:rgb(173, 219, 103)">"state_name"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"county_name"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"site_num"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"date_local"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">"time_local"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df_train </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">train_data_filter</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df_test </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">test_data_filter</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df_predict </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">predict_data_filter</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">index_columns</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    X_train</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> y_train </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df_train</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">drop</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> df_train</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    X_test</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> y_test </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df_test</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">drop</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> df_test</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    X_predict </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> df_predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">drop</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">columns</span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token string" style="color:rgb(173, 219, 103)">"o3"</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> bigframes</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">ml</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">linear_model </span><span class="token keyword" style="color:rgb(127, 219, 202)">import</span><span class="token plain"> LinearRegression</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    model </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> LinearRegression</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">fit</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">X_train</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> y_train</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    df_pred </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> model</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">X_predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token keyword" style="color:rgb(127, 219, 202)">return</span><span class="token plain"> df_pred</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="running-your-dbt-ml-pipeline">Running your dbt ML pipeline<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#running-your-dbt-ml-pipeline" class="hash-link" aria-label="Direct link to Running your dbt ML pipeline" title="Direct link to Running your dbt ML pipeline">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-bash codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># Run all models</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">dbt run</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token comment" style="color:rgb(99, 119, 119);font-style:italic"># Or run just your new models</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">dbt run </span><span class="token parameter variable" style="color:rgb(214, 222, 235)">--select</span><span class="token plain"> prepare_table prediction</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="key-advantages-of-dbt-and-bigframes-for-ml">Key advantages of dbt and BigFrames for ML<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#key-advantages-of-dbt-and-bigframes-for-ml" class="hash-link" aria-label="Direct link to Key advantages of dbt and BigFrames for ML" title="Direct link to Key advantages of dbt and BigFrames for ML">​</a></h2>
<ul>
<li><strong>Scalability &amp; Efficiency</strong>: Handle large datasets in BigQuery via BigFrames</li>
<li><strong>Simplified Workflow</strong>: Use familiar APIs like <code>pandas</code> and <code>scikit-learn</code></li>
<li><strong>dbt Orchestration</strong>:<!-- -->
<ul>
<li>Dependency management with <code>dbt.ref()</code> and <code>dbt.source()</code></li>
<li>Scheduled retraining with <code>dbt run</code></li>
<li>Testing, documentation, and reproducibility</li>
</ul>
</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="conclusion-and-next-steps">Conclusion and next steps<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#conclusion-and-next-steps" class="hash-link" aria-label="Direct link to Conclusion and next steps" title="Direct link to Conclusion and next steps">​</a></h2>
<p>By integrating <strong>BigFrames</strong> into your <strong>dbt workflows</strong>, you can build scalable, maintainable, and production-ready machine learning pipelines. While this example used linear regression, the same principles apply across other ML use cases with <code>bigframes.ml</code>.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="feedback-and-support">Feedback and support<a href="https://docs.getdbt.com/blog/train-linear-dbt-bigframes#feedback-and-support" class="hash-link" aria-label="Direct link to Feedback and support" title="Direct link to Feedback and support">​</a></h2>
<ul>
<li>📚 <a href="https://docs.getdbt.com/docs/dbt-support">dbt Support</a></li>
<li>📨 Email feedback on BigFrames: <a href="mailto:bigframes-feedback@google.com" target="_blank" rel="noopener noreferrer">bigframes-feedback@google.com</a></li>
<li>🛠 <a href="https://github.com/googleapis/python-bigquery-dataframes" target="_blank" rel="noopener noreferrer">File issues on GitHub</a></li>
<li>📬 <a href="https://docs.google.com/forms/d/10EnDyYdYUW9HvelHYuBRC8L3GdGVl3rX0aroinbRZyc/viewform?edit_requested=true" target="_blank" rel="noopener noreferrer">Subscribe to BigFrames updates</a></li>
</ul>]]></content>
        <author>
            <name>Jialuo Chen</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The new dbt VS Code extension: The experience we've all been waiting for]]></title>
        <id>https://docs.getdbt.com/blog/vscode-extension-experience</id>
        <link href="https://docs.getdbt.com/blog/vscode-extension-experience"/>
        <updated>2025-06-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How the new dbt VS Code extension finally delivers the dev experience we've always wanted.]]></summary>
        <content type="html"><![CDATA[<p>Hello, community!</p>
<p>My name is Bruno, and you might have seen me posting dbt content on LinkedIn. If you haven't, let me introduce myself. I started working with dbt more than 3 years ago. At that time, I was very new to the tool, and to understand it a bit better, I started creating some resources to help me with dbt learning. One of them, a dbt cheatsheet, was the starting point for my community journey.</p>
<p>I went from this cheatsheet to creating all different kinds of content, contributing and engaging with the community, until I got the dbt community award two times, and I am very thankful and proud about that.</p>
<p>Since the acquisition of SDF Labs by dbt Labs, I have been waiting for the day that we would see what the result of the fusion of these two companies would be. Spoiler alert: It’s the dbt Fusion engine and it's better than I could have expected.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-dbt-developer-experience-in-the-pre-fusion-era">The dbt developer experience in the pre-fusion-era<a href="https://docs.getdbt.com/blog/vscode-extension-experience#the-dbt-developer-experience-in-the-pre-fusion-era" class="hash-link" aria-label="Direct link to The dbt developer experience in the pre-fusion-era" title="Direct link to The dbt developer experience in the pre-fusion-era">​</a></h2>
<p>If you've ever started a dbt project, chances are your journey began like mine did: cloning <code>jaffle_shop</code>, opening it in VS Code, and running <a href="https://docs.getdbt.com/reference/commands/run"><code>dbt run</code></a> for the first time (actually the second time, because I know you forgot to run <a href="https://docs.getdbt.com/reference/commands/deps"><code>dbt deps</code></a> in the first one). This is the dbt initiation process, our ‘hello-world’.</p>
<p>You played around with <a href="https://docs.getdbt.com/best-practices/how-we-structure/2-staging#staging-models">staging models</a>, the orders table, customers table. But let's be honest, the developer experience in that setup was always a bit... clunky.</p>
<p>You wanted to check the lineage of your project, one of the coolest features of dbt, and you had to run <a href="https://docs.getdbt.com/reference/commands/cmd-docs#dbt-docs-generate"><code>dbt docs generate</code></a>, <a href="https://docs.getdbt.com/reference/commands/cmd-docs#dbt-docs-serve"><code>serve</code></a>, and open the docs in a browser. Made some updates? Do all the steps again.</p>
<p>Did you want to check your project's metadata? You had to rely on <a href="https://docs.getdbt.com/reference/commands/cmd-docs"><code>dbt docs</code></a> (that whole process again), or build some custom solution with the <a href="https://docs.getdbt.com/reference/artifacts/manifest-json"><code>manifest.json</code></a>.</p>
<p>Moving to dbt Cloud (now called just <span>dbt platform</span>) made things smoother. It has a built-in <span>Studio IDE</span> with git integration, easier to compile and preview models. An auto-updating lineage tab below the model, a much better documentation with dbt Explorer, now renamed to Catalog. And a lot of other powerful features for orchestration, observability, CI/CD, and more.</p>
<p>The cloud-based <span>dbt platform</span> was a big step up, but even so, many of us still preferred to use our own dev environments. We like using our themes, our VS Code extensions, our terminals, but this would mean losing all the nice cloud features while developing. A sad trade-off.</p>
<p>We've already been to <span>dbt platform</span> platform and back to the terminal, and some problems remain. Consider this all too common scenario when modifying a dbt model: forgetting a comma[1]. You don't learn your mistake until after dbt tries to run this model on your warehouse, but dbt can't do this until your cluster is turned on. So it's not until a full minute later that you get the feedback about your missing punctuation mark.</p>
<p>[1]: because you are using trailing commas instead of leading commas, and they're harder to see, and I'm talking too much about the comma fight.</p>
<p>All this back-and-forth communication of dbt and the platform was slowing down your project.</p>
<p>That’s why this new release is such a big deal. It solves all the problems above and introduces other things I didn't know I needed until I saw it.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-new-era-of-dbt-development">The new era of dbt development<a href="https://docs.getdbt.com/blog/vscode-extension-experience#the-new-era-of-dbt-development" class="hash-link" aria-label="Direct link to The new era of dbt development" title="Direct link to The new era of dbt development">​</a></h2>
<p>With the acquisition of SDF Labs and a renewed focus on developer experience, dbt Labs announced its new engine, <a href="https://docs.getdbt.com/docs/fusion/about-fusion">Fusion</a>. This engine was built from zero with Rust, and its intelligence will power up dbt, no matter where you run it. There are different ways you can use the Fusion engine, and the best one is with the also announced VS Code extension.</p>
<p>The Fusion engine with the VS Code extension is how folks will want to develop with dbt moving forward. I can say this feels like the experience we’ve all been waiting for.</p>
<p>After using it, it’s hard to imagine going back. Working with dbt in VS Code without this extension just doesn’t make sense anymore.</p>
<p>It comes with a lot of features to streamline your work and make you more efficient by developing faster and spending less. But let me tell you about my favorites:</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="catch-sql-errors-in-real-time">Catch SQL Errors in Real Time<a href="https://docs.getdbt.com/blog/vscode-extension-experience#catch-sql-errors-in-real-time" class="hash-link" aria-label="Direct link to Catch SQL Errors in Real Time" title="Direct link to Catch SQL Errors in Real Time">​</a></h3>
<p>There was no question what I was picking first. No more waiting for your platform to debug your code for you. If you misspell a column name or goof up a function's order of parameters, you catch those errors before you run anything.</p>
<p>This is because Fusion doesn't treat SQL code as just a string anymore; it really understands it. It also shows you some helpful information about the error.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/vscode-extension-experience#" data-featherlight="/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_function_error.png"><img data-toggle="lightbox" alt="Showing function errors." title="Showing function errors." src="https://docs.getdbt.com/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_function_error.png?v=2"></a></span><span class="title_aGrV">Showing function errors.</span></div>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/vscode-extension-experience#" data-featherlight="/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_column_error.png"><img data-toggle="lightbox" alt="Showing column name errors." title="Showing column name errors." src="https://docs.getdbt.com/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_column_error.png?v=2"></a></span><span class="title_aGrV">Showing column name errors.</span></div>
<p>This is the greatest improvement of this engine, IMHO.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="model-and-column-lineage">Model and Column Lineage<a href="https://docs.getdbt.com/blog/vscode-extension-experience#model-and-column-lineage" class="hash-link" aria-label="Direct link to Model and Column Lineage" title="Direct link to Model and Column Lineage">​</a></h3>
<p>My next favorite feature is the lineage view. If you were a <span>dbt</span> platform user, you would feel at home. And if you were using dbt Core, finally, no more generating <code>dbt docs</code> to visualize lineage.</p>
<p>Now there's a tab lineage tab that shows your project’s lineage directly in VS Code. It’s interactive and live. You can use the lenses feature, that's pretty cool to have a good visualization of your project by different attributes like resource_type, or materialization.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/vscode-extension-experience#" data-featherlight="/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_project_lineage.png"><img data-toggle="lightbox" alt="Project Lineage." title="Project Lineage." src="https://docs.getdbt.com/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_project_lineage.png?v=2"></a></span><span class="title_aGrV">Project Lineage.</span></div>
<p>And something I was not expecting to be here, but thankfully it is, column-level lineage! Not just where columns come from, but also how they change: renamed, transformed, or passed through.</p>
<p>This is incredibly helpful for debugging transformations or understanding how that key metric is shaped across models.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/vscode-extension-experience#" data-featherlight="/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_cll.png"><img data-toggle="lightbox" alt="Column-level Lineage." title="Column-level Lineage." src="https://docs.getdbt.com/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_cll.png?v=2"></a></span><span class="title_aGrV">Column-level Lineage.</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="instant-refactoring">Instant refactoring<a href="https://docs.getdbt.com/blog/vscode-extension-experience#instant-refactoring" class="hash-link" aria-label="Direct link to Instant refactoring" title="Direct link to Instant refactoring">​</a></h3>
<p>Ok, let me show you just one more thing! Have you ever faced a situation where you'd like to rename a model or a column, but it's used many places downstream that you give up because you don't want to refactor everything or you are afraid you will break something?</p>
<p>Now, thanks to the deep dbt Fusion SQL understanding, you can rename your model or column, and the extension will refactor all downstream dependencies for you. But don't worry, before doing it, the extension allows you to see a preview of the changes, so you can be sure it is doing what you want.</p>
<video width="100%" height="100%" muted="" controls=""><source src="/img/blog/2025-06-16-the-new-dbt-vscode-extension/vs_code_extension_refactoring.webm" type="video/webm"></video>
<p>There are a lot more other features this extension is bringing, like navigating through models instantly, autocompleting everything, renaming models or columns and being warned how it will impact your project, previewing models &amp; CTEs, and other features that are already covered in other blogs. By the way, it just launched, so I believe we can expect more and more enhancements to come.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="conclusion-a-new-default">Conclusion: A New Default<a href="https://docs.getdbt.com/blog/vscode-extension-experience#conclusion-a-new-default" class="hash-link" aria-label="Direct link to Conclusion: A New Default" title="Direct link to Conclusion: A New Default">​</a></h2>
<p>This extension changes what using dbt feels like. It brings together performance, context, and interactivity in a way that finally makes dbt feel at home inside a modern developer environment. And the best part? It’s just getting started.</p>
<p>The Fusion engine is already powering a faster, smarter dbt under the hood. And it opens the door to a more fluid, confident, and intuitive development experience. Fewer context switches. Fewer gotchas. More time spent thinking about your data, not your tooling.</p>
<p>If you’ve ever built models in a text editor and wished dbt “just knew more,” this is for you.</p>
<p>If you’ve relied on the CLI but missed having true autocomplete, this is for you.</p>
<p>And if you’ve wanted the best of both worlds, the flexibility of Core with the power of Cloud, this might just become your new default. Even if you use dbt Cloud, it powers up dbt-core to another level.</p>
<p>We’re incredibly excited to see how the community builds with this. Try it out. Push it. Share what’s working, and what’s missing.</p>
<p>This new extension will be constantly updated, so stay tuned for more improvements.</p>
<p><strong>This is the experience we’ve all been waiting for.</strong></p>
<p><em>Bruno is a lead Data Engineer at <a href="https://www.phdata.io/" target="_blank" rel="noopener noreferrer">phData</a>, and recently built a dbt learning platform called <a href="https://www.datagym.io/" target="_blank" rel="noopener noreferrer">DataGym.io</a>.</em></p>]]></content>
        <author>
            <name>Bruno Souza de Lima</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Components of the dbt Fusion engine and how they fit together]]></title>
        <id>https://docs.getdbt.com/blog/dbt-fusion-engine-components</id>
        <link href="https://docs.getdbt.com/blog/dbt-fusion-engine-components"/>
        <updated>2025-05-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[The new engine makes it possible to decouple source code from functionality, introducing new ways to distribute functionality to the Community.]]></summary>
        <content type="html"><![CDATA[<p>Today, we announced the <a href="https://docs.getdbt.com/blog/dbt-fusion-engine">dbt Fusion engine</a>.</p>
<p>Fusion isn't just one thing — it's a set of interconnected components working together to power the next generation of analytics engineering.</p>
<p>This post maps out each piece of the Fusion architecture, explains how they fit together, and clarifies what's available to you whether you're compiling from source, using our pre-built binaries, or developing within a dbt Fusion powered product experience.</p>
<p>From the Rust engine to the VS Code extension, through to new Arrow-based adapters and Apache-licensed foundational technologies, we'll break down exactly what each component does, how each component is licensed (for why, see <a href="https://www.getdbt.com/blog/new-code-new-license-understanding-the-new-license-for-the-dbt-fusion-engine" target="_blank" rel="noopener noreferrer">Tristan's accompanying post</a>), and how you can start using it and get involved today.</p>
<p><em>This post describes the state of the world as it will be when Fusion reaches General Availability. For a look at the path to GA, read <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga">this post</a>.</em></p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="ways-to-access">There are a number of different ways to access the dbt Fusion engine<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#ways-to-access" class="hash-link" aria-label="Direct link to There are a number of different ways to access the dbt Fusion engine" title="Direct link to There are a number of different ways to access the dbt Fusion engine">​</a></h2>
<p>A big change between the dbt Fusion engine and the dbt Core engine is their language. Core is Python; Fusion is Rust. This is meaningful not just because of the performance benefits, but because it creates a new way for us to distribute functionality to the community.</p>
<p>To distribute a Python program, you also have to distribute its underlying source code. But Rust is a compiled language, meaning we can share either the source code or just the compiled binaries derived from that source code.</p>
<p>This means that features which would have otherwise had to stay completely proprietary for IP reasons can instead be broadly distributed in binary form. There's also a completely source-available version of dbt Fusion which will exceed dbt Core's capabilities by the time we reach GA.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-variants-of-the-dbt-fusion-engine-exist">What variants of the dbt Fusion engine exist?<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#what-variants-of-the-dbt-fusion-engine-exist" class="hash-link" aria-label="Direct link to What variants of the dbt Fusion engine exist?" title="Direct link to What variants of the dbt Fusion engine exist?">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="source-available-dbt-fusion-engine">Source-available dbt Fusion engine<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#source-available-dbt-fusion-engine" class="hash-link" aria-label="Direct link to Source-available dbt Fusion engine" title="Direct link to Source-available dbt Fusion engine">​</a></h3>
<p>Artifact type: Code</p>
<p>Available at: <a href="https://github.com/dbt-labs/dbt-fusion" target="_blank" rel="noopener noreferrer">https://github.com/dbt-labs/dbt-fusion</a> (Note: this repo currently only contains the code necessary for a <code>dbt parse</code> and <code>dbt deps</code> - more will follow!)</p>
<p>License: ELv2</p>
<p>This will be the foundation of the Fusion engine - the code that lets you:</p>
<ul>
<li>Execute your <code>dbt seed/run/test/build</code></li>
<li>Render your Jinja and create your DAG</li>
<li>Connect to the adapters that render your dbt project into the DDL and DML that hits your warehouse</li>
<li>Produce the artifacts in your dbt project</li>
</ul>
<p>To be clear, the self-compiled binary that's available today doesn't do much yet. By the time the new engine enters general availability, its source-available components will <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga">exceed the net capabilities of dbt Core</a>. <strong>If you are a data team running dbt Core, simply running the self-compiled version of dbt Fusion will be a pure upgrade.</strong></p>
<p>This repository will also include the code necessary for <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#level-1-parsing">Level 1 SQL Comprehension</a> (the ability to parse SQL into a syntax tree).</p>
<p>As long as you comply with the <a href="http://www.getdbt.com/licenses-faq" target="_blank" rel="noopener noreferrer">three restrictions in ELv2</a>:</p>
<ul>
<li>✅&nbsp;You can adopt the binary into your data workflows without dbt Labs' involvement</li>
<li>✅&nbsp;You&nbsp;can see and modify the code</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="precompiled-dbt-fusion-engine-binary">Precompiled dbt Fusion engine binary<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#precompiled-dbt-fusion-engine-binary" class="hash-link" aria-label="Direct link to Precompiled dbt Fusion engine binary" title="Direct link to Precompiled dbt Fusion engine binary">​</a></h3>
<p>Artifact type: Precompiled binary</p>
<p>How to access: download following the instructions <a href="https://docs.getdbt.com/docs/local/install-dbt?version=2#get-started">here</a></p>
<p>License: ELv2</p>
<p>When you download the precompiled binary created by dbt Labs, it contains:</p>
<ul>
<li>
<p><strong>All of the functionality in the Source Available Fusion</strong></p>
</li>
<li>
<p>Additional capabilities which are derived from proprietary code (such as the <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#level-2-compiling">Level 2 SQL Comprehension</a> required to compile and type-check your SQL).</p>
</li>
</ul>
<p>As long as you comply with the three restrictions in ELv2,</p>
<ul>
<li>✅&nbsp;You can&nbsp;adopt the binary into your data workflows without dbt Labs' involvement</li>
<li>❌&nbsp;But you cannot see or modify the code itself</li>
</ul>
<p><strong>The vast majority of existing dbt Core users that adopt the freely distributed components of Fusion should use the binary to do so, rather than compiling it from source code.</strong> The binary has the same permissions but more capabilities (and it saves you from having to compile it yourself). You can use it internally at your company for free, even if you are not a dbt Labs customer.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="using-the-dbt-fusion-engine-with-a-commercial-agreement">Using the dbt Fusion engine with a commercial agreement<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#using-the-dbt-fusion-engine-with-a-commercial-agreement" class="hash-link" aria-label="Direct link to Using the dbt Fusion engine with a commercial agreement" title="Direct link to Using the dbt Fusion engine with a commercial agreement">​</a></h3>
<p>Artifact type: Precompiled binary and managed service</p>
<p>Available at: <a href="https://docs.getdbt.com/docs/local/install-dbt?version=2#get-started">Download binary</a> and <a href="http://getdbt.com/signup" target="_blank" rel="noopener noreferrer">sign up for the service</a></p>
<p>License: ELv2 (binary) and Proprietary (service)</p>
<p>Organizations who <em>do</em> have a commercial agreement will unlock even more capabilities, but they'll use the exact same publicly-released binary discussed above. If you want to start using platform features, <a href="https://docs.getdbt.com/docs/mesh/govern/project-dependencies" target="_blank" rel="noopener noreferrer">such as dbt Mesh</a>, all you need to do is <a href="https://docs.getdbt.com/docs/cloud/configure-cloud-cli#configure-the-dbt-cloud-cli" target="_blank" rel="noopener noreferrer">download a configuration file</a>. <em>(Joel commentary - As someone who has been juggling the dbt Cloud CLI alongside dbt Core for the last couple of years, I cannot overstate how thrilled I am by this.)</em></p>
<p>Obviously there's additional cloud-backed services necessary to deliver platform-specific features, such as State-Aware Orchestration. That code is proprietary and governed by your agreement with dbt Labs.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="other-pieces-of-the-puzzle">Other pieces of the puzzle<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#other-pieces-of-the-puzzle" class="hash-link" aria-label="Direct link to Other pieces of the puzzle" title="Direct link to Other pieces of the puzzle">​</a></h2>
<p>The dbt Fusion engine is the headline act, but its underlying technologies can be mixed and matched in a variety of ways.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-dbt-vs-code-extension-and-language-server">The dbt VS Code Extension and Language Server<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#the-dbt-vs-code-extension-and-language-server" class="hash-link" aria-label="Direct link to The dbt VS Code Extension and Language Server" title="Direct link to The dbt VS Code Extension and Language Server">​</a></h3>
<p>Artifact type: Precompiled binaries</p>
<p>How to access: <a href="https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt" target="_blank" rel="noopener noreferrer">Install on the VS Code marketplace</a></p>
<p>License: Proprietary</p>
<p>The dbt VS Code extension is one of the first product experiences built on top of Fusion. It is not <em>part</em> of Fusion, it is <em>powered</em> by Fusion and is part of the wider dbt platform's offerings (with a generous free tier). Specifically, the VS Code extension interacts with another brand-new binary, the dbt <a href="https://microsoft.github.io/language-server-protocol/" target="_blank" rel="noopener noreferrer">Language Server</a>.</p>
<p>The Language Server is built on top of a subset of the technology powering the extended Fusion engine: as an example, it can quickly compile SQL and interact with databases, but it defers to the dbt binary when it's time to actually run a model.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine-components/vscode-ext-binary-roles.png"><img data-toggle="lightbox" alt="The VS Code extension interacts with the Language Server to understand your SQL, and the Fusion binary to execute your SQL." title="The VS Code extension interacts with the Language Server to understand your SQL, and the Fusion binary to execute your SQL." src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine-components/vscode-ext-binary-roles.png?v=2"></a></span><span class="title_aGrV">The VS Code extension interacts with the Language Server to understand your SQL, and the Fusion binary to execute your SQL.</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-dbt-authoring-layer">The dbt Authoring Layer<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#the-dbt-authoring-layer" class="hash-link" aria-label="Direct link to The dbt Authoring Layer" title="Direct link to The dbt Authoring Layer">​</a></h3>
<p>Artifact type: JSON Schema definitions</p>
<p>Available at: Git repos for <a href="https://github.com/dbt-labs/dbt-jsonschema" target="_blank" rel="noopener noreferrer">input files</a> and <a href="https://github.com/dbt-labs/schemas.getdbt.com" target="_blank" rel="noopener noreferrer">output artifacts</a></p>
<p>License: Apache 2.0</p>
<p>When you think of dbt, you're probably thinking of a combination of the Engine (described above) and the Authoring Layer.</p>
<p>The Authoring Layer is made up of everything necessary to define the <em>what</em> of a dbt project: things like the <strong>YAML specs, Artifact specs, CLI commands and flags</strong>, and <strong>macro signatures</strong>. As the user interface to dbt, the authoring layer is standard between Core and Fusion, although the Fusion engine does not include support for various behaviours and functions deprecated in earlier releases of dbt Core.</p>
<p>For the first time, we're releasing a series of definitive JSON schemas, <em>backed by the code in dbt Core and Fusion</em>, that encapsulate the acceptable content of dbt's various YAML files. These are Apache 2.0-licensed and will be particularly helpful for other tools integrating with dbt projects.</p>
<p>This joins the existing JSON schemas defining the shape of dbt's output artifacts (e.g. <code>manifest.json</code>). As we stabilize Fusion's metadata output (logging and artifacts) on the path to GA, we will update the published schemas.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="dbt-fusion-engine-adapters">dbt Fusion engine adapters<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#dbt-fusion-engine-adapters" class="hash-link" aria-label="Direct link to dbt Fusion engine adapters" title="Direct link to dbt Fusion engine adapters">​</a></h3>
<p>Artifact type: Source code</p>
<p>Available at: Initial code in <a href="https://github.com/dbt-labs/dbt-fusion" target="_blank" rel="noopener noreferrer"><code>dbt-fusion</code> repo</a>, with more to come</p>
<p>License: Apache 2.0 (later this year)</p>
<p>Adapters are responsible for two key tasks:</p>
<ul>
<li>Knowing how to create the appropriate SQL commands (via macros and materializations) for a data platform</li>
<li>Connecting to that target data platform and sending it SQL commands</li>
</ul>
<p>Much like Fusion is the next generation engine for dbt, we also needed next-generation <em>adapters</em> for dbt. These adapters are written in Rust and built on the Apache Arrow standard.</p>
<p>The templating of SQL commands largely carries over from macros in the dbt Core adapters. Database connectivity is another story, the dbt Fusion engine cannot use the Python classes present in each adapter, for reasons both practical and performance-related.</p>
<p>Enter the Apache Arrow ecosystem at large, and the new <a href="https://arrow.apache.org/adbc/current/index.html" target="_blank" rel="noopener noreferrer">ADBC API</a> in particular. ADBC is a future-looking platform for database connectivity, and we are leaning into it heavily with these Fusion adapters.</p>
<p>Because the ADBC standard is extremely new, not all databases are compatible with ADBC yet, and using ADBC in a Rust client isn't easy. To solve both problems, we have created a Rust client library, <code>XDBC</code> that:</p>
<ul>
<li>Supports ODBC connections to databases where Arrow is not yet provided as an output</li>
<li>Provides generic methods for creating and managing connections to databases</li>
<li>Is useful for anyone who wants to build data tooling in Rust, inside or outside of the dbt ecosystem</li>
</ul>
<p>All of this will be open-sourced under the Apache 2.0 license later this year, namely:</p>
<ul>
<li>Fusion adapters we have created</li>
<li>The XDBC library</li>
<li>We'll also continue upstreaming improvements to Apache Arrow's ADBC project</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="antlr-grammars">ANTLR Grammars<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#antlr-grammars" class="hash-link" aria-label="Direct link to ANTLR Grammars" title="Direct link to ANTLR Grammars">​</a></h3>
<p>Artifact type: g4 files</p>
<p>Available at: (repo to come, in the meantime you can discuss this in #dbt-fusion-engine in the dbt Slack)</p>
<p>License: Apache 2.0 (later this year)</p>
<p><a href="https://www.antlr.org/" target="_blank" rel="noopener noreferrer">ANTLR</a> grammars are the formal language specifications that let Fusion <a href="https://docs.getdbt.com/blog/sql-comprehension-technologies">parse</a> every SQL statement across multiple dialects. Specifically, ANTLR takes in these declarative, high level grammars and uses them to generate a parser. The grammars have wide utility anywhere it's necessary to parse SQL – not just in Fusion – and we're releasing them as Apache 2 to enable the Community and others in the data ecosystem to build on top of them.</p>
<p>Most ANTLR grammars are only applicable to a single dialect, but the SDF team created a system which makes it possible to define a shared base grammar and generate each warehouse's g4 file from there. This halves the amount of work required to support a new dialect at the level of precision and robustness required.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="dbt-jinja">dbt-jinja<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#dbt-jinja" class="hash-link" aria-label="Direct link to dbt-jinja" title="Direct link to dbt-jinja">​</a></h3>
<p>Artifact type: Source code</p>
<p>Available at: <a href="https://github.com/dbt-labs/dbt-fusion/tree/main/dbt-jinja" target="_blank" rel="noopener noreferrer">A subdirectory of the dbt-fusion repo</a> (but there's still work to do before it's easy to use outside of the Fusion repository)</p>
<p>License: Apache 2.0</p>
<p>Since Fusion is completely Rust-based, while Jinja is a Python project, we needed a completely new way to render all the Jinja spread through users' projects. We started by switching to <a href="https://github.com/mitsuhiko/minijinja" target="_blank" rel="noopener noreferrer">minijinja</a>: a Rust port of a subset of the original Jinja project, written by Jinja's original maintainer.</p>
<p>This subset of coverage wasn't enough to support existing dbt projects, so we created Rust-native implementations of the majority of these missing features. This achieved the best of both worlds: significant performance improvements while maintaining compatibility with users' existing codebases.</p>
<p>dbt-jinja is the most feature-complete implementation of Jinja in Rust, and is available with an Apache 2.0 license today, with a more formal release (documentation etc) later this year. It's useful whether you're building tooling to operate on top of dbt projects, or working on something completely different which just needs to render Jinja quickly.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-do-i-engage-with-these-components">How do I engage with these components?<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#how-do-i-engage-with-these-components" class="hash-link" aria-label="Direct link to How do I engage with these components?" title="Direct link to How do I engage with these components?">​</a></h2>
<p>Our <a href="https://docs.getdbt.com/community/resources/contributor-expectations">Contributors' Principles</a> remain: Building dbt is a team sport!</p>
<ul>
<li>If you want to open a PR against publicly-viewable code, you can.</li>
<li>If you want to open issues describing bugs during the Fusion engine's beta period, you can. (This is probably one of the highest-leverage things you can do!)</li>
<li>If you want to open a discussion and pitch a new way to use dbt more effectively in our new SQL-aware world, you can.</li>
<li>If you want to move upstream, and contribute to the standards underlying the dbt Fusion engine like Arrow, ADBC, Iceberg, or DataFusion, you can. You might see some familiar faces while you're there!</li>
<li>If you just want to let dbt get better and better in the background, you can do that too.</li>
<li>Want to get involved in the team building this? If the components here are uniquely interesting to you, email <a href="mailto:careers.fusion@dbtlabs.com" target="_blank" rel="noopener noreferrer">careers.fusion@dbtlabs.com</a>.</li>
</ul>
<p>If you need a hand wrapping your head around any of these new components, drop by #dbt-fusion-engine in the Community Slack - we'd love to chat.</p>]]></content>
        <author>
            <name>Jason Ganz</name>
        </author>
        <author>
            <name>Joel Labes</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Path to GA: How the dbt Fusion engine rolls out from beta to production]]></title>
        <id>https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga</id>
        <link href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga"/>
        <updated>2025-05-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[We're moving quickly to enable as many teams as possible to start using the new dbt Fusion engine. Check out our roadmap and learn how to follow our progress.]]></summary>
        <content type="html"><![CDATA[<p>Today, we announced that the dbt Fusion engine is <a href="https://getdbt.com/blog/get-to-know-the-new-dbt-fusion-engine-and-vs-code-extension" target="_blank" rel="noopener noreferrer">available in beta</a>.</p>
<ul>
<li>If Fusion works with your project today, great! You're in for a treat 😄</li>
<li>If it's your first day using dbt, welcome! You should start on Fusion — you're in for a treat too.</li>
</ul>
<p>Today is Launch Day —&nbsp;the first day of a new era: the Age of Fusion. We expect many teams with existing projects will encounter at least one issue that will prevent them from adopting the dbt Fusion engine in production environments. That's ok!</p>
<p>We're moving quickly to unblock more teams, and we are committing that by the time Fusion reaches General Availability:</p>
<ul>
<li>We will support Snowflake, Databricks, BigQuery, Redshift&nbsp;—&nbsp;and likely also Athena, Postgres, Spark, and Trino — with the new <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#dbt-fusion-engine-adapters">Fusion Adapter pattern</a>.</li>
<li>We will have coverage for (basically) all dbt Core functionality. Some things are impractical to replicate outside of Python, or so seldom-used that we'll be more reactive than proactive. On the other hand, many existing dbt Core behaviours will be improved by the unique capabilities of the dbt Fusion engine, such as speed and SQL comprehension. You'll see us talk about this in relevant GitHub issues, many of which we've linked below.</li>
<li>The source-available <code>dbt-fusion</code> repository will contain more total functionality than what is available in dbt Core today. (<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#ways-to-access">Read more about this here</a>.)</li>
<li>The developer experience will be even speedier and more intuitive.</li>
</ul>
<p>These statements aren't true yet —&nbsp;but you can see where we're headed. That's what betas are for, that's the journey we're going on together, and that's why we want to have you all involved.</p>
<p><strong>We will be adding functionality rapidly over the coming weeks.</strong> In particular, keep an eye out for Databricks, BigQuery and Redshift support (in that order) in the coming weeks.</p>
<p>The most popular dbt Labs packages (<code>dbt_utils</code>, <code>audit_helper</code>, <code>dbt_external_tables</code>, <code>dbt_project_evaluator</code>) are already compatible with Fusion. Some external packages may not work out of the box, but we plan to work with package maintainers to get them ready &amp; working on Fusion.</p>
<p>So when is Fusion going to be GA? We're targeting later this year for full feature parity, but we're also hoping to approach it asymptotically&nbsp;—&nbsp;meaning that many existing dbt users ca start adopting Fusion much sooner.</p>
<p>During the beta period, you may run into unanticipated (and anticipated) issues when trying to run your project on Fusion. Please share any issues in the <a href="https://github.com/dbt-labs/dbt-fusion" target="_blank" rel="noopener noreferrer">dbt-fusion</a> repository or on Slack in <a href="https://getdbt.slack.com/archives/C088YCAB6GH" target="_blank" rel="noopener noreferrer">#dbt-fusion-engine</a>, and we'll do our best to to unblock you.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="can-i-use-fusion-for-my-dbt-project-today">Can I use Fusion for my dbt project today?<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#can-i-use-fusion-for-my-dbt-project-today" class="hash-link" aria-label="Direct link to Can I use Fusion for my dbt project today?" title="Direct link to Can I use Fusion for my dbt project today?">​</a></h2>
<p>Maybe! The biggest first question: "Is your adapter supported yet?" (If not, sit tight, we're working fast!) If so, then it depends on the exact matrix of features you currently use in your dbt project.</p>
<p>You may be able to start using Fusion immediately, may need to make (mostly automatic) modifications to your project to resolve deprecations, or your project may not <em>yet</em> be parsable at all:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>State</th><th>Description</th><th>Workaround</th><th>Resolvable by</th></tr></thead><tbody><tr><td>Unblocked</td><td>You can adopt the dbt Fusion engine with no changes to your project</td><td></td><td></td></tr><tr><td>Soft blocked</td><td>Your project parses successfully but relies on not-yet-implemented functionality</td><td>Don't invoke unsupported functions or build unsupported models</td><td>dbt Labs</td></tr><tr><td>Hard blocked by deprecations</td><td>Your project contains <a href="https://www.getdbt.com/blog/how-to-get-ready-for-the-new-dbt-engine" target="_blank" rel="noopener noreferrer">functionality deprecated in dbt Core v1.10</a></td><td>Resolve deprecations with the <a href="https://github.com/dbt-labs/dbt-autofix" target="_blank" rel="noopener noreferrer">dbt-autofix script</a> or workflow in dbt Studio</td><td>You</td></tr><tr><td>Hard blocked by known parse issues</td><td>Your project contains Python models or uses a not-yet-supported adapter</td><td>Temporarily remove Python models</td><td>dbt Labs</td></tr><tr><td>Hard blocked by unknown parse issues</td><td>Your project is probably doing something surprising with Jinja</td><td>Create an issue, consider modifying impacted code</td><td>You &amp; dbt Labs</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p>We're continuously removing blockers to Fusion adoption on a rolling basis during this beta period and in the leadup to a broader release. The rest of this post will go deeper into the four thematic criteria we set out above:</p>
<ul>
<li>Adapter coverage</li>
<li>Feature coverage</li>
<li>Source-available code publishing</li>
<li>Developer experience improvements</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="requirement-for-ga-adapter-coverage">Requirement for GA: Adapter Coverage<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#requirement-for-ga-adapter-coverage" class="hash-link" aria-label="Direct link to Requirement for GA: Adapter Coverage" title="Direct link to Requirement for GA: Adapter Coverage">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="databricks-bigquery-and-redshift">Databricks, BigQuery and Redshift<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#databricks-bigquery-and-redshift" class="hash-link" aria-label="Direct link to Databricks, BigQuery and Redshift" title="Direct link to Databricks, BigQuery and Redshift">​</a></h3>
<p>dbt Fusion's adapters are now based on the <a href="https://arrow.apache.org/adbc/current/driver/status.html" target="_blank" rel="noopener noreferrer">ADBC standard</a>, a modern, high-performance Apache project optimised for columnar analytical databases.</p>
<p>dbt Labs has developed new ADBC-compatible drivers (and a <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components#dbt-fusion-engine-adapters">supporting framework, XDBC</a>) to complement the existing, stable Snowflake driver.</p>
<p><strong>Target release dates:</strong> We expect to add support for <a href="https://github.com/dbt-labs/dbt-fusion/issues/4" target="_blank" rel="noopener noreferrer">Databricks</a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/5" target="_blank" rel="noopener noreferrer">BigQuery</a>, and <a href="https://github.com/dbt-labs/dbt-fusion/issues/6" target="_blank" rel="noopener noreferrer">Redshift</a> (in that order) in the coming weeks.</p>
<p>Where possible, Fusion adapters will support the same authentication methods and connection/credential configurations as dbt Core adapters. We've also heard loud &amp; clear feedback from dbt platform customers who have beta-tested the Fusion CLI —&nbsp;we want to figure out a way for Fusion CLI to use connection setup (config/creds) from the platform for local runs (<a href="https://github.com/dbt-labs/dbt-fusion/issues/23" target="_blank" rel="noopener noreferrer">tracking issue</a>).</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="athena-postgres-spark-and-trino">Athena, Postgres, Spark and Trino<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#athena-postgres-spark-and-trino" class="hash-link" aria-label="Direct link to Athena, Postgres, Spark and Trino" title="Direct link to Athena, Postgres, Spark and Trino">​</a></h3>
<p>We're aiming to support these adapters later in the year, prior to GA. Check each adapter's tracking issue (<a href="https://github.com/dbt-labs/dbt-fusion/issues/39" target="_blank" rel="noopener noreferrer">Trino</a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/39" target="_blank" rel="noopener noreferrer">Athena</a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/38" target="_blank" rel="noopener noreferrer">Spark</a>, and <a href="https://github.com/dbt-labs/dbt-fusion/issues/31" target="_blank" rel="noopener noreferrer">Postgres</a>) for specific timelines.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="custom-adapters">Custom adapters<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#custom-adapters" class="hash-link" aria-label="Direct link to Custom adapters" title="Direct link to Custom adapters">​</a></h3>
<p>The short answer: Fusion's new adapter format could be extended to support community development of third-party adapters, but it's not on the near-term roadmap before GA (<a href="https://github.com/dbt-labs/dbt-fusion/issues/46" target="_blank" rel="noopener noreferrer">tracking issue</a>).</p>
<p>The longer answer: Fusion now downloads necessary drivers (part of the adapter stack) on-demand. This dynamic linking requires the drivers to be signed by dbt Labs, meaning that we need to have a system in place to review contributions of new drivers and ensure their security.</p>
<p>In the meantime, if you want to migrate a supported project to the dbt Fusion engine but have a dependency on another project using a custom adapter, you can use a <a href="https://docs.getdbt.com/docs/deploy/hybrid-setup">Hybrid project</a> to have <span>dbt Core</span> execute the unsupported part of the pipeline and then publish artifacts for downstream projects to consume.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="requirement-for-ga-feature-coverage">Requirement for GA: Feature coverage<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#requirement-for-ga-feature-coverage" class="hash-link" aria-label="Direct link to Requirement for GA: Feature coverage" title="Direct link to Requirement for GA: Feature coverage">​</a></h2>
<p>Feature coverage includes ensuring documented features work as expected, as well as (where possible) supporting undocumented "accidental" features.</p>
<p>Most of the time, even if your project uses an unimplemented feature, you can still take Fusion for a spin. This is because as long as your project parses, you can just skip unsupported models.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="known-unimplemented-features">Known unimplemented features<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#known-unimplemented-features" class="hash-link" aria-label="Direct link to Known unimplemented features" title="Direct link to Known unimplemented features">​</a></h3>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="python-models">Python models<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#python-models" class="hash-link" aria-label="Direct link to Python models" title="Direct link to Python models">​</a></h4>
<p>Python models are the one exception to that "just skip them" advice. The dbt Fusion engine does not currently support parsing Python models, which means it can not extract refs or configs inside the files. Instead of potentially building models out of DAG order, <strong>we've chosen to not support Python models at all for now</strong>. They're coming back though - <a href="https://github.com/dbt-labs/dbt-fusion/issues/3" target="_blank" rel="noopener noreferrer">check out the issue</a> for details.</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="breadth-of-materialization-support">Breadth of Materialization Support<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#breadth-of-materialization-support" class="hash-link" aria-label="Direct link to Breadth of Materialization Support" title="Direct link to Breadth of Materialization Support">​</a></h4>
<p>As of today we support the most common materializations: <code>table</code>, <code>view</code>, <code>incremental</code>, <code>ephemeral</code> for models —&nbsp;plus the materializations underlying snapshots, seeds, and tests. Other native strategies (like <a href="https://github.com/dbt-labs/dbt-fusion/issues/12" target="_blank" rel="noopener noreferrer">microbatch incremental models</a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/28" target="_blank" rel="noopener noreferrer">iceberg tables</a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/27" target="_blank" rel="noopener noreferrer">materialized views/dynamic tables</a>, or <a href="https://github.com/dbt-labs/dbt-fusion/issues/15" target="_blank" rel="noopener noreferrer">stored test failures</a>) as well as <a href="https://github.com/dbt-labs/dbt-fusion/issues/17" target="_blank" rel="noopener noreferrer">custom materializations</a> are on the roadmap — check their respective issues to see when.</p>
<p>It's worth reiterating here: Even if you have models that rely on not-yet-supported materialization strategies, you can still try the dbt Fusion engine in the rest of your project. The rest of your DAG will build as normal, but unsupported strategies will raise an error if they are included in scope of <code>dbt build</code> or <code>dbt run</code>.</p>
<p>To exclude those nodes, use a command like</p>
<ul>
<li><code>dbt build --exclude config.materialized:my_custom_mat</code></li>
<li><code>dbt build --exclude config.incremental_strategy:microbatch</code></li>
</ul>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="other-common-features">Other common features<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#other-common-features" class="hash-link" aria-label="Direct link to Other common features" title="Direct link to Other common features">​</a></h4>
<p>Did you know that there are over 400 documented features of dbt? <a href="https://github.com/dbeatty10" target="_blank" rel="noopener noreferrer">Doug</a> does, because he had to put them all into a Notion database.</p>
<p>Fusion already supports two-thirds of them, and we have a plan for the rest. You can follow along at <a href="https://github.com/dbt-labs/dbt-fusion/issues" target="_blank" rel="noopener noreferrer">the <code>dbt-fusion</code> repo</a>, where there are issues to track the outstanding behaviours. There's also a rough set of milestones attached, but those are subject to reordering as more teams start using Fusion and giving feedback.</p>
<p>Some of the most relevant ones include:</p>
<ul>
<li><a href="https://github.com/dbt-labs/dbt-fusion/issues/13" target="_blank" rel="noopener noreferrer">Exposures</a></li>
<li>A new <a href="https://github.com/dbt-labs/dbt-fusion/issues/7" target="_blank" rel="noopener noreferrer">stable logging system</a></li>
<li>A new <a href="https://github.com/dbt-labs/dbt-fusion/issues/9" target="_blank" rel="noopener noreferrer">local documentation experience</a> that replaces dbt-docs (!)</li>
<li><a href="https://github.com/dbt-labs/dbt-fusion/issues/10" target="_blank" rel="noopener noreferrer">Programmatic invocations</a></li>
<li><a href="https://github.com/dbt-labs/dbt-fusion/issues/25" target="_blank" rel="noopener noreferrer">Model governance</a> (contracts, constraints, access, deprecation_date)</li>
<li>A grab bag of CLI commands like <a href="https://github.com/dbt-labs/dbt-fusion/issues/22" target="_blank" rel="noopener noreferrer"><code>dbt clone</code></a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/33" target="_blank" rel="noopener noreferrer"><code>state:modified.subselector</code></a>, <a href="https://github.com/dbt-labs/dbt-fusion/issues/34" target="_blank" rel="noopener noreferrer"><code>--empty</code></a>, ...</li>
</ul>
<p>It's worth noting that <em>resolution</em> doesn't necessarily mean identical behaviours. As a couple of examples:</p>
<ul>
<li>Many of these behaviours have not been implemented yet because the Fusion engine introduces new capabilities, above all SQL comprehension, that we will leverage to provide a superior experience. A direct port-over of the feature would miss the point.</li>
<li>Others (like the events and logging system) are tightly coupled to dbt Core's Python roots — they're worth a rethink, and not worth shooting for exact 100% conformance</li>
</ul>
<p>Here's a point-in-time snapshot of how we expect to tackle the known remaining work. Please refer to the <a href="https://github.com/dbt-labs/dbt-fusion/issues" target="_blank" rel="noopener noreferrer">repository's issues page</a> as the source of truth:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine-path-to-ga/indicative-timeline.png"><img data-toggle="lightbox" alt="An indication of the dbt Fusion engine's path to GA" title="An indication of the dbt Fusion engine's path to GA" src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine-path-to-ga/indicative-timeline.png?v=2"></a></span><span class="title_aGrV">An indication of the dbt Fusion engine's path to GA</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="surprise-unimplemented-features">Surprise unimplemented features<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#surprise-unimplemented-features" class="hash-link" aria-label="Direct link to Surprise unimplemented features" title="Direct link to Surprise unimplemented features">​</a></h3>
<p>Did you know that there are also over a bajillion <em>undocumented</em> features of dbt? Since March, we've been validating the new engine's parser against projects orchestrated by the dbt platform, which has flagged hundreds of divergent behaviours and common parse bugs.</p>
<p>But we also know there is a long tail of behaviours that will only arise in the wild, and that the easiest way to get to the bottom of them will be to work with users.</p>
<p>This work will be ongoing, alongside feature support. When you start using the Fusion engine, please <a href="https://github.com/dbt-labs/dbt-fusion/issues" target="_blank" rel="noopener noreferrer">open an issue</a> if you hit an unexpected error — and please include a basic project that reproduces the error, so we can fix it!</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="requirement-for-ga-the-source-available-dbt-fusion-codebase-is-better-than-dbt-core-for-most-use-cases">Requirement for GA: The Source-available <code>dbt-fusion</code> codebase is better than <code>dbt-core</code> for most use cases<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#requirement-for-ga-the-source-available-dbt-fusion-codebase-is-better-than-dbt-core-for-most-use-cases" class="hash-link" aria-label="Direct link to requirement-for-ga-the-source-available-dbt-fusion-codebase-is-better-than-dbt-core-for-most-use-cases" title="Direct link to requirement-for-ga-the-source-available-dbt-fusion-codebase-is-better-than-dbt-core-for-most-use-cases">​</a></h2>
<p>By GA, the <a href="https://github.com/dbt-labs/dbt-fusion" target="_blank" rel="noopener noreferrer"><code>dbt-fusion</code> repository</a> will have the necessary (and fully source-available) components to compile a functional engine for the vast majority of dbt Core projects —&nbsp;and a faster one at that. That means that you will always have the ability to compile, use, and modify this code itself, without requiring access to the dbt Labs provided binary (although we think you'll probably just want to use the binary, for reasons detailed in the <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components">Components of the dbt Fusion engine</a> post).</p>
<p>So far, we've released the code necessary to self-compile a dbt binary that can run <code>dbt deps</code> and <code>dbt parse</code>. Throughout the beta period we will continue to prepare more code for use by those who want to view, contribute to, or modify the code for their own purposes, including what's necessary for the rest of the commands to work.</p>
<p>Beyond just the code necessary to produce a complete dbt binary, we've also committed to open-sourcing several of the underlying library components (such as dbt-jinja, dbt-serde-yaml, and the grammars necessary to produce a high-performance SQL parser). Again, check out the <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components">Components of the dbt Fusion engine</a> post for the details.</p>
<p>Some behaviours that worked in dbt Core won't have an equivalent in this new codebase. The most obvious examples are those which depended on the vagaries of Python: arbitrary callbacks on the EventManager (there's no longer an EventManager on which to register a callback!), the experimental <a href="https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/plugins/manager.py" target="_blank" rel="noopener noreferrer">plugins system</a> (dynamic loading of binaries works completely differently in Rust and would require signing), or the dbt templater in SQLFluff (which hooked into dbt Core beyond the exposed interfaces - although we plan to build a <a href="https://github.com/dbt-labs/dbt-fusion/issues/11" target="_blank" rel="noopener noreferrer">fast linter ourselves</a>).</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="requirement-for-ga-the-dx-rocks">Requirement for GA: The DX rocks<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#requirement-for-ga-the-dx-rocks" class="hash-link" aria-label="Direct link to Requirement for GA: The DX rocks" title="Direct link to Requirement for GA: The DX rocks">​</a></h2>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="more-speed">More speed<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#more-speed" class="hash-link" aria-label="Direct link to More speed" title="Direct link to More speed">​</a></h3>
<p>Invocations powered by the dbt Fusion engine are already significantly faster than the same invocation in dbt Core, but there's more to do here! We know that there is still a lot of low-hanging fruit, and by GA we expect to see tasks like full project compilation complete at least twice as fast for many projects.</p>
<p>If you do some benchmarking, we're particularly interested in any situations where Fusion "pauses" on a single file for a couple of seconds. Some other things to keep in mind:</p>
<ul>
<li>Writing very large manifests is pretty slow, no matter what. Try including <code>--no-write-json</code>. We're wondering whether it makes sense to have a trimmed-down manifest by default. What do you think?</li>
<li>The <code>dbt compile</code> command involves more work in Fusion than in dbt Core, because it's doing full SQL validation. To compare <em>just</em> the SQL rendering step (the equivalent of dbt Core's <code>compile</code> command), you can try <a href="https://docs.getdbt.com/docs/fusion/new-concepts">turning off static analysis</a> with the CLI flag <code>--static-analysis off</code>.</li>
</ul>
<p>As a sign of what's possible, take note of the incremental recompilation used to provide real-time feedback in the VS Code extension.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="a-more-info-dense-console-output">A more info-dense console output<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#a-more-info-dense-console-output" class="hash-link" aria-label="Direct link to A more info-dense console output" title="Direct link to A more info-dense console output">​</a></h3>
<p>While we were preparing for the beta release, we kept the Fusion CLI output intentionally verbose — it displays <em>everything</em> that's happening, which means errors and warnings can be pushed out of view by other status updates. We're already in the process of <a href="https://github.com/dbt-labs/dbt-fusion/issues/52" target="_blank" rel="noopener noreferrer">clearing this up a bit</a>, and we've got some funny ideas about the possibility of progress bars. However we do it, the goal should be that you see the log lines about things that need attention, and not much more.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="your-idea-here">Your idea here<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#your-idea-here" class="hash-link" aria-label="Direct link to Your idea here" title="Direct link to Your idea here">​</a></h3>
<p>What feels <em>off</em> when you're using dbt Fusion? Tell us all about it — if you've got a clear idea for what's wrong and what it should be instead, feel free to jump straight to a GitHub issue. Bonus points if you've got a minimal repro project.</p>
<p>If you need to kick an idea around before opening an issue, we'll also be actively checking in on #dbt-fusion-engine (for high-level discussions) and #dbt-fusion-engine-migration (to get into the weeds of a specific bug) on Slack.</p>
<p>From now until Fusion is GA, we will be prioritizing parity with existing framework features, <em>not adding new ones.</em> Once we hit GA, we'll think about whether to transfer existing feature requests from the <code>dbt-core</code> repo to <code>dbt-fusion</code> — or maybe a third place? — stay tuned.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="following-along">Following along<a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga#following-along" class="hash-link" aria-label="Direct link to Following along" title="Direct link to Following along">​</a></h2>
<p>The path to GA for Fusion is a Community-wide effort. We want to hear from you, work with you, get your ideas and feedback. Whether it is sharing a bug report, an idea for a feature or more high level thoughts and feedback, we're looking to engage with you.</p>
<ul>
<li>In Slack, we're on <a href="https://getdbt.slack.com/archives/C088YCAB6GH" target="_blank" rel="noopener noreferrer">#dbt-fusion-engine</a> and #dbt-fusion-engine-migration</li>
<li>The GitHub repo is <a href="https://github.com/dbt-labs/dbt-fusion" target="_blank" rel="noopener noreferrer">https://github.com/dbt-labs/dbt-fusion</a></li>
<li>There are a couple of dozen <em>dbt World Circuit</em> meetups happening globally during June: <a href="https://www.meetup.com/pro/dbt/" target="_blank" rel="noopener noreferrer">https://www.meetup.com/pro/dbt/</a>. (Jeremy will be speaking in Paris, Marseille, and Boston —&nbsp;come hang out!)</li>
<li>We'll be having regular office hours with a revolving cast of characters from the Developer Experience, Engineering, and Product teams. Dates will be circulated in the #dbt-fusion-engine channel.</li>
</ul>]]></content>
        <author>
            <name>Jeremy Cohen</name>
        </author>
        <author>
            <name>Joel Labes</name>
        </author>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Meet the dbt Fusion Engine: the new Rust-based, industrial-grade engine for dbt]]></title>
        <id>https://docs.getdbt.com/blog/dbt-fusion-engine</id>
        <link href="https://docs.getdbt.com/blog/dbt-fusion-engine"/>
        <updated>2025-05-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[The dbt Fusion engine delivers a next-gen developer experience by combining high-speed execution with deep understanding of your code.]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="tldr-what-you-need-to-know">TL;DR: What You Need to Know<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#tldr-what-you-need-to-know" class="hash-link" aria-label="Direct link to TL;DR: What You Need to Know" title="Direct link to TL;DR: What You Need to Know">​</a></h2>
<ul>
<li>dbt’s familiar authoring layer remains unchanged, but the execution engine beneath it is completely new.</li>
<li>The new engine is called the dbt Fusion engine — rewritten from the ground up in Rust based on technology <a href="https://www.getdbt.com/blog/dbt-labs-acquires-sdf-labs" target="_blank" rel="noopener noreferrer">from SDF</a>.  The dbt Fusion engine is substantially faster than dbt Core and has built in <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">SQL comprehension technology</a> to power the next generation of analytics engineering workflows.</li>
<li>The dbt Fusion engine is currently in beta. You can try it today if you use Snowflake — with additional adapters coming starting in early June. Review our <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga">path to general availability</a> (GA) and <a href="https://docs.getdbt.com/guides/fusion">try the quickstart</a>.</li>
<li><strong>You do not need to be a dbt Labs customer to use Fusion - dbt Core users can adopt the dbt Fusion engine today for free in your local environment.</strong></li>
<li>You can use Fusion with the <a href="https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt" target="_blank" rel="noopener noreferrer">new dbt VS Code extension</a>, <a href="https://docs.getdbt.com/docs/local/install-dbt?version=2#get-started">directly via the CLI</a>, or <a href="https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud#dbt-fusion-engine">via dbt Studio</a>.</li>
<li>This is the beginning of a new era for analytics engineering. For a glimpse into what the Fusion engine is going to enable over the next 1 to 2 years, <a href="https://getdbt.com/blog/where-we-re-headed-with-the-dbt-fusion-engine" target="_blank" rel="noopener noreferrer">read this post</a>.</li>
</ul>
<p>Since its introduction in 2016, dbt has paved the way for the analytics engineering revolution. Teams worldwide have moved from ad hoc processes running customized SQL scripts into a mature analytics workflow based on the <a href="https://docs.getdbt.com/community/resources/viewpoint" target="_blank" rel="noopener noreferrer">dbt viewpoint</a>. dbt enables data practitioners to <em>work like software engineers</em>, building their analytics code as an asset to ship trusted data products faster.</p>
<p>dbt came to represent many things:</p>
<ul>
<li>A <strong>viewpoint</strong> on how analytics should be done</li>
<li>A <strong>workflow</strong> where data practitioners could put that viewpoint into action</li>
<li>A <strong>framework</strong> — dbt Core — that powered this workflow comprised of:<!-- -->
<ul>
<li>An authoring layer: The schema, spec, and definitions for a dbt project written in SQL, YML, and Jinja</li>
<li>An engine: The tooling via which the authoring layer was built and executed against a data platform, resolving templated code into executable SQL, building your dependency graph, and more.</li>
</ul>
</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine/engine-and-authoring-layer.png"><img data-toggle="lightbox" alt="dbt is made up of two different things: authoring layer and engine." title="dbt is made up of two different things: authoring layer and engine." src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine/engine-and-authoring-layer.png?v=2"></a></span><span class="title_aGrV">dbt is made up of two different things: authoring layer and engine.</span></div>
<p>While the authoring layer has continued to evolve nicely, giving dbt developers ever-more functionality to work with, the engine itself, dbt Core, is still built on the same technology and uses the same primary design principles that it started with in 2016. This causes two primary problems that cannot be iteratively solved:</p>
<ol>
<li>dbt Core can be <em>slow</em>.  It’s built in Python and for larger dbt projects it can become unworkable. Even for smaller projects, to power a great developer experience, users would need a step change in performance.</li>
<li>The dbt engine renders SQL, but it doesn’t <em>comprehend SQL.</em> That means that any functionality relying on specifics of SQL code was impossible to build into dbt.</li>
</ol>
<p>And so it became clear that for us to power the analytics workloads of tomorrow, we weren't going to get there with incremental improvements —&nbsp;we needed to <strong>rebuild the dbt engine from scratch</strong>. We needed:</p>
<ul>
<li>An engine built for speed.</li>
<li>An engine that <em>knows about your code.</em></li>
<li>An engine that powers the next generation of developer experience.</li>
</ul>
<p>And that engine is Fusion.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-exactly-is-fusion">What exactly is Fusion?<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#what-exactly-is-fusion" class="hash-link" aria-label="Direct link to What exactly is Fusion?" title="Direct link to What exactly is Fusion?">​</a></h2>
<p>Fusion is the new engine for dbt.</p>
<p>If the authoring layer is "what" your dbt project is supposed to do, then the engine is the "how." That includes:</p>
<ul>
<li>Rendering Jinja</li>
<li>Building dependency graphs</li>
<li>Creating artifact files</li>
<li>Communicating with databases</li>
</ul>
<p>At first glance, Fusion looks a lot like dbt Core. Your projects are built using the familiar dbt authoring layer. You still write SQL and Jinja. You still type <code>dbt run</code>. (To make it easier to try Fusion, we're also shipping with an optional <code>dbtf</code> alias, as many users have the <code>dbt</code> namespace already specified).</p>
<p>But underneath that is a layer of technical depth and rigor that is entirely new to dbt, happening at the engine layer.</p>
<p>Fusion:</p>
<ul>
<li>Is fully rewritten in Rust, enabling a <a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust">dramatically faster dbt experience</a>. Fusion does not depend on Python at all. In fact, besides the adapter macros, not a single line of code is shared between dbt Core and the dbt Fusion engine. (For long-time dbt spelunkers, we've described the new structure in a <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-components">separate post</a>.)</li>
<li><a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">Understands your SQL code.</a> It’s a true SQL <em>compiler</em> and gives dbt a full view on what the code in your dbt project means and how it will propagate across your entire data lineage.</li>
</ul>
<p>Based on the technology from <a href="https://www.getdbt.com/blog/dbt-labs-acquires-sdf-labs" target="_blank" rel="noopener noreferrer">SDF</a>, Fusion represents a step change increase in the technical capabilities of dbt.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine/familiar-authoring-powerful-new-engine.png"><img data-toggle="lightbox" alt="Familiar Authoring Layer, Powerful New Engine." title="Familiar Authoring Layer, Powerful New Engine." src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine/familiar-authoring-powerful-new-engine.png?v=2"></a></span><span class="title_aGrV">Familiar Authoring Layer, Powerful New Engine.</span></div>
<p>As a result of these capabilities, Fusion can deliver new experiences. Some of these we’re releasing today, like real-time error detection in VS Code and significant cost savings in project execution.  dbt now knows about your code!</p>
<p><strong>You probably now know enough now to head on over to the quickstart and get going</strong>, but if you want to know little more about what Fusion delivers today, keep reading.</p>
<hr>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="near-term-benefits-of-adopting-fusion">Near-term benefits of adopting Fusion<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#near-term-benefits-of-adopting-fusion" class="hash-link" aria-label="Direct link to Near-term benefits of adopting Fusion" title="Direct link to Near-term benefits of adopting Fusion">​</a></h2>
<p>You can think of Fusion as the same dbt you know and love, but better and faster, and you're going to see it show up in a lot of places!</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine/next-gen-star.png"><img data-toggle="lightbox" alt="Functionality powered by the dbt Fusion Engine and its components" title="Functionality powered by the dbt Fusion Engine and its components" src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine/next-gen-star.png?v=2"></a></span><span class="title_aGrV">Functionality powered by the dbt Fusion Engine and its components</span></div>
<p>So how and why should you adopt Fusion for your dbt project?</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="just-the-new-fusion-powered-dbt-cli">Just the new Fusion-powered dbt CLI<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#just-the-new-fusion-powered-dbt-cli" class="hash-link" aria-label="Direct link to Just the new Fusion-powered dbt CLI" title="Direct link to Just the new Fusion-powered dbt CLI">​</a></h3>
<ul>
<li><strong>Significant performance improvements:</strong> Up to 30x faster parsing and 2x quicker full-project compilation, with near-instant recompilation of single files in the VS Code Extension. We expect continued performance gains as part of the path to GA.</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-new-fusion-powered-dbt-fusion-cli--vs-code-extension">The new Fusion-powered dbt Fusion CLI + VS Code extension<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#the-new-fusion-powered-dbt-fusion-cli--vs-code-extension" class="hash-link" aria-label="Direct link to The new Fusion-powered dbt Fusion CLI + VS Code extension" title="Direct link to The new Fusion-powered dbt Fusion CLI + VS Code extension">​</a></h3>
<p>But the real benefit of Fusion is not just going to be in the CLI itself — it’s in the ability to build net new product experiences that leverage Fusion’s capabilities. The first of these, unveiled today, is the VS Code extension, powered by <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">dbt Fusion’s SQL Comprehension</a>. This extension could <em>only</em> be built on Fusion:</p>
<ul>
<li>It’s fast — the VS Code extension recompiles your entire dbt project in the background every time you save <em>any</em> file, as well as identifying errors instantly for the active file. For that to be workable, it needs to happen fast.</li>
<li>It understand SQL and functions as a compiler — it knows what columns exist in your project, what functions you are using and the type signature and output of those functions.</li>
</ul>
<p>There’s a whole host of features in the VS Code extension. Some early favorites:</p>
<ul>
<li>
<p><strong>Write code with confidence — live error detection and function autocomplete.</strong></p>
<ul>
<li>
<p>How many time have you hit <code>dbt run</code> only to realize that you typed <code>select * frmo</code>, misspelled a column name or tried to sum the unsummable? No more! With the LSP-powered VS Code extension, you can immediately see when pesky errors sneak into your code.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        "><span><a href="https://docs.getdbt.com/blog/dbt-fusion-engine#" data-featherlight="/img/blog/2025-05-28-dbt-fusion-engine/you-wouldnt-sum-a-datetime.png"><img data-toggle="lightbox" alt="You wouldn't sum a datetime." title="You wouldn't sum a datetime." src="https://docs.getdbt.com/img/blog/2025-05-28-dbt-fusion-engine/you-wouldnt-sum-a-datetime.png?v=2"></a></span><span class="title_aGrV">You wouldn't sum a datetime.</span></div>
</li>
<li>
<p>Similarly — is it <code>dateadd</code> or <code>date_add</code>? And which way around do the arguments go again? Just start typing and you'll see contextual prompts and autocomplete.</p>
</li>
</ul>
</li>
<li>
<p><strong>See how the code you’ve written iteratively progresses to your transformed data:</strong> <em>Preview CTEs and viewing compiled code</em></p>
<ul>
<li>Because the VS Code extension compiles your code every time you save, you can view the compiled code from your project in real time as you’re making edits. This is a real lifesaver when working on complex macros.</li>
<li>Writing your code with CTEs allows you to modularly split up the logic in your model. The days when you swap out the <code>final</code> CTE at the end for the name of the CTE you're debugging are no more, now you can just click.</li>
</ul>
</li>
<li>
<p><strong>Traverse your project:</strong> Go-to-reference and built in lineage</p>
<ul>
<li>Need to find out how an upstream model was defined? Or where all the inputs from the model you’re working on came from? With both the ability to jump to the model and column references <em>and</em> view model and column level lineage, it’s honestly a night and day difference.</li>
</ul>
</li>
</ul>
<video width="100%" height="100%" muted="" controls=""><source src="/img/docs/extension/go-to-definition.webm" type="video/webm"></video>
<p>I could go on and on and on — there’s so much here.</p>
<p>Taken separately, these range from quality of life improvements to significant changes.</p>
<p>But taken together, it actually fundamentally changes the experience of writing your dbt code. There were just <em>so many things</em> that you had to constantly be juggling in the back of your head that are now offloaded to the extension. The sum change to the experience of writing dbt code... is exceptional. I already can’t imagine working without this.</p>
<p>Of course — there’s another technology changing the experience of writing dbt (and all) code — AI. The functionality that Fusion enables dovetails perfectly with AI-assisted coding by allowing you to vet, validate, and comprehend AI-generated code more easily. Moving forward, expect even tighter coupling between Fusion and AI-based coding assistants as the speed and rigor of Fusion will help produce higher quality AI-generated code.</p>
<p>The VS Code extension is one of our first product experiences exclusively powered by the dbt Fusion engine. The extension depends on the Language Server, and the Language Server depends on Fusion's SQL comprehension capabilities. We made the decision not to support dbt Core for the VS Code Extension because existing community-built extensions have already built as much as is possible on top of dbt Core's foundation.  To get to this next level of experience, we needed Fusion.</p>
<hr>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-to-get-started-with-fusion">How to get started with Fusion<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#how-to-get-started-with-fusion" class="hash-link" aria-label="Direct link to How to get started with Fusion" title="Direct link to How to get started with Fusion">​</a></h3>
<p>The dbt Fusion engine is currently in beta. We've written <a href="https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga">a separate post</a> describing the path to Fusion's final release, and how you can see if your project is compatible today.</p>
<p>Whether or not you can move your existing project to Fusion today, you can jump into the VS Code extension <a href="https://docs.getdbt.com/guides/fusion">using our quickstart</a> to try get a feeling for what's ahead.</p>
<ul>
<li><strong>dbt customers:</strong> Over the coming weeks, in projects eligible to start using Fusion, you’ll see a toggle in your account or receive a message from your account team. From there, <a href="https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud#dbt-fusion-engine">you can activate Fusion for your environments</a>.</li>
<li><strong>To use the VS Code extension:</strong> <a href="https://docs.getdbt.com/docs/install-dbt-extension">Install the "dbt" extension</a> directly from the marketplace for automated setup and head to the quickstart. This will also automatically install the Fusion-powered CLI for you.</li>
<li><strong>To use the dbt CLI powered by Fusion:</strong> Simply <a href="https://docs.getdbt.com/docs/local/install-dbt?version=2#get-started">install Fusion</a></li>
</ul>
<p><em>If you are looking to migrate an existing project to Fusion, see the <a href="https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion">migration guide</a> —&nbsp;as well as the <a href="https://github.com/dbt-labs/dbt-autofix" target="_blank" rel="noopener noreferrer"><code>dbt-autofix</code></a> helper, which automatically addresses many of the changes needed to migrate to Fusion.</em></p>
<hr>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="whats-next">What's Next?<a href="https://docs.getdbt.com/blog/dbt-fusion-engine#whats-next" class="hash-link" aria-label="Direct link to What's Next?" title="Direct link to What's Next?">​</a></h2>
<p>Today’s launch is the start. There is much left to do over the short term and long term.</p>
<p>Moving forward we’re building many net new products and evolutions of our current products that simply wouldn’t have been possible in a pre-Fusion world. This will be particularly impactful for powering AI workflows, both to assist in the creation of high quality dbt projects and serving as the trusted interface to structured data for AI agents.</p>
<p>We’re excited to work with the Community on the evolution of Fusion. If you’ve heard talk about the early days of the dbt Community and wished you could have been around for it, you now have the opportunity to make the deep, foundational impact that is often only possible at the start of a new technical innovation cycle.</p>
<p>So get involved!</p>
<ul>
<li>Try out <a href="https://docs.getdbt.com/guides/fusion">the Fusion quickstart</a></li>
<li><a href="https://github.com/dbt-labs/dbt-fusion/issues" target="_blank" rel="noopener noreferrer">Open up a GitHub issue in <code>dbt-fusion</code></a> to report a bug or participate in the path to GA</li>
<li>Join us <a href="https://www.getdbt.com/community/join-the-community" target="_blank" rel="noopener noreferrer">on Slack</a> in #dbt-fusion-engine and share your thoughts or questions</li>
<li>Head to an <a href="https://www.meetup.com/pro/dbt/" target="_blank" rel="noopener noreferrer">in-person dbt Meetup</a> — we’re hosting the dbt World Circuit 🏎️&nbsp;around the world where you can and come talk to one of us about Fusion!</li>
</ul>]]></content>
        <author>
            <name>Jason Ganz</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI Evaluation in dbt]]></title>
        <id>https://docs.getdbt.com/blog/ai-eval-in-dbt</id>
        <link href="https://docs.getdbt.com/blog/ai-eval-in-dbt"/>
        <updated>2025-05-04T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How to extend dbt quality testing to monitor AI Agentic Quality]]></summary>
        <content type="html"><![CDATA[<p><strong>The AI revolution is here—but are we ready?</strong><br>
<!-- -->Across the world, the excitement around AI is undeniable.  Discussions on large language models, agentic workflows, and how AI is set to transform every industry abound, yet real-world use cases of AI in production remain few and far between.</p>
<p>A common issue blocking people from moving AI use cases to production is an ability to evaluate the validity of AI responses in a systematic and well governed way.
Moving AI workflows from prototype to production requires rigorous evaluation, and most organizations do not have a framework to ensure AI outputs remain high-quality, trustworthy, and actionable.</p>
<p><strong>Why AI Evaluation Matters</strong><br>
<!-- -->The more conversations we have with data teams, the clearer the problem becomes: Companies don’t want to move AI into production unless they can monitor and ensure its quality once it's there -- the fear of a ‘rogue AI’ still exceeds perceived benefits.</p>
<p>The core challenge isn’t just building AI use cases; it’s about continuously monitoring their performance and ensuring the same level of quality and reliability we’ve come to expect from other data assets.
To trust AI in production, we need structured workflows that:</p>
<ul>
<li><strong>Ensure data quality</strong> before it’s fed into AI models</li>
<li><strong>Evaluate AI-generated responses</strong> against responses known to be true</li>
<li><strong>Trigger alerts or corrective actions</strong> when AI performance drifts below acceptable thresholds</li>
</ul>
<p>Without these capabilities, AI workflows remain stuck in experimental phases, unable to meet the reliability requirements of production use cases.</p>
<p><strong>Using dbt to Build AI Evaluation Workflows</strong>
Most organizations already use dbt to transform, test and validate their data.
As an already trusted framework for data quality, it seemed natural to use dbt’s testing capabilities to evaluate and monitor AI workflows as well.</p>
<p>Let’s walk through a simple example using <strong>dbt and Snowflake Cortex</strong> for AI evaluation.</p>
<ul>
<li><strong>Ingest Data</strong> We start by uploading a dataset of IMDB movie reviews, along with human-labeled sentiment scores (positive or negative). This serves as our source of truth.</li>
<li><strong>Run AI Workflow</strong> As an simple example workflow, we use Snowflake Cortex’s sentiment analysis function, to classify each review.</li>
<li><strong>Evaluate AI Output versus Human Review</strong> We create an evaluation model in dbt that uses the Cortex Complete function to compare the AI-generated sentiment to the actual human-labeled sentiment.</li>
<li><strong>Define Pass/Fail Criteria</strong> We configure a custom dbt test to set an accuracy threshold (e.g., 75% accuracy). If AI sentiment predictions fall below this level, the test triggers a warning or error.</li>
<li><strong>Store and Visualize Results</strong> Native dbt functionality can easily store test failures in the warehouse providing traceability for further investigation, and data for reporting on AI accuracy.</li>
</ul>
<p><strong>Scaling AI Evaluation with dbt</strong><br>
<!-- -->This workflow naturally extends dbt’s native testing capabilities and leverages the powerful ability to embed Snowflake Cortex calls in SQL models.
In this way users can combine the power of Snowflake Cortex with the established governance and quality framework of dbt to address the issues stated above.</p>
<p>By using dbt to evaluate AI, organizations can apply the same rigorous testing principles they already use for data pipelines to ensure their AI models are production-ready and maintain quality and governance of all data assets centrally.</p>
<p><strong>What we Built</strong>
Let's walk through this example step by step to give you a sense of how it all works.
For this example, we start with a test data set which contains the input to our AI workflow, as well as a true measurement given by a human reviewer -- in this example our input is the text review of different movies and the <code>actual_sentiment</code> contains a -1 for negative reviews and 1 for positive reviews.
Finally we include a time stamp indicating when our AI provided the response. This time stamp will allow us to track our AI accuracy over time.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/ai-eval-in-dbt#" data-featherlight="/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_one.png"><img data-toggle="lightbox" alt="our input data set, including actual sentiment" title="our input data set, including actual sentiment" src="https://docs.getdbt.com/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_one.png?v=2"></a></span><span class="title_aGrV">our input data set, including actual sentiment</span></div>
<p>The next step is to create another output table containing both the true measurement from our dataset and the value returned by our AI.
Since we can embed the Snowflake Cortex call directly in a SQL model we can easily build this in dbt using a simple reference function.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/ai-eval-in-dbt#" data-featherlight="/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_two.png"><img data-toggle="lightbox" alt="results of our agentic workflow" title="results of our agentic workflow" src="https://docs.getdbt.com/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_two.png?v=2"></a></span><span class="title_aGrV">results of our agentic workflow</span></div>
<p>We also include the input to our AI workflow along with the AI calculated and human determined measurement for the data set.
Including all these data points, while not strictly necessary, allows for clear understanding of what was fed into the AI workflow and easy traceability of specific responses.
We will follow this same pattern again, using a dbt reference function to create one last dbt model where we build the evaluation prompt and use Cortex Complete to give this prompt to Cortex and store the results.
The lionshare of the work building this model was the prompt engineering for the evaluation prompt. We initially built the prompt directly in Snowflake Cortex to ensure it was returning the type of response needed before moving the prompt into dbt.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/ai-eval-in-dbt#" data-featherlight="/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_three.png"><img data-toggle="lightbox" alt="AI generated results automatically evaluated by one or more models" title="AI generated results automatically evaluated by one or more models" src="https://docs.getdbt.com/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_three.png?v=2"></a></span><span class="title_aGrV">AI generated results automatically evaluated by one or more models</span></div>
<p>We chose to define our prompt as a Jinja variable as opposed to listing it directly in each dbt model.
This has the upside of increasing model readability, but obscures the text of what the prompt is from someone reading the model.
To address this issue and provide full traceability, we materialize the prompt as a column in this table. This means that each output row contains not only the evaluation score but also the exact prompt given to produce it.
Regardless of where you define your evaluation prompt, by including it as part of your dbt project it will benefit from the same change management and version control processes as the rest of your dbt project, ensuring strong governance of your AI workflows.
Another great benefit of this approach and the flexibility provided by dbt and Snowflake Cortex is that you can easily toggle the model you are using to run the evaluation. In this example we use Snowflake Llama, but using any other <a href="https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex" target="_blank" rel="noopener noreferrer">supported model</a> is as easy as changing a function parameter.
You can even run multiple evaluations using different models to assess accuracy by simply adding additional columns to your dbt model.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/ai-eval-in-dbt#" data-featherlight="/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_four.png"><img data-toggle="lightbox" alt="dbt Testing evaluates AI accuracry along side data quality" title="dbt Testing evaluates AI accuracry along side data quality" src="https://docs.getdbt.com/img/blog/2025-04-04-ai-evaluation-and-how-dbt-can-help/ai_eval_blog_image_four.png?v=2"></a></span><span class="title_aGrV">dbt Testing evaluates AI accuracry along side data quality</span></div>
<p>The final step here is writing a dbt <a href="https://docs.getdbt.com/best-practices/writing-custom-generic-tests">custom test</a> to find any responses failing to meet our accuracy threshold. By creating this dbt test we can ensure issues with AI accuracy are caught and flagged as part of our standard dbt runs and quality checks.
We can also easily leverage dbt’s ability to <a href="https://docs.getdbt.com/reference/resource-configs/store_failures">store test failures</a> to record quality issues found in AI processes for further investigation and triage.</p>
<p>As a final benefit of capturing AI evaluations as part of your dbt project is just that - your AI quality information becomes part of your dbt project meaning quality results are available in all the same ways as any other dbt test result.
You can view this information in <span>Catalog</span>, feed it into your data catalog of choice, use the test results to trigger additional downstream processes or visualize the information as quality dashboards through BI.
As AI workflows become more commonplace, businesses need a systematic way to evaluate and monitor AI outputs, just as they do with traditional data products. Fortunately, the same principles and tools within dbt can be easily applied to AI evaluation as well.
With dbt, data teams can bridge the gap between AI experimentation and AI in production, by ensuring trust, reliability, and governance to AI workflows.</p>
<p>Ready to bring AI evaluation into your dbt workflow? Get started with the dbt MCP server—it makes it easy to connect your AI systems to trusted, governed data.</p>]]></content>
        <author>
            <name>Kyle Dempsey</name>
        </author>
        <author>
            <name>Luis Leon</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Scaling Data Pipelines for a Growth-Stage Fintech with Incremental Models]]></title>
        <id>https://docs.getdbt.com/blog/scaling-data-pipelines-fintech</id>
        <link href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech"/>
        <updated>2025-05-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How Kuda leveraged dbt incremental models to reduce costs, speed up pipelines, and scale confidently.]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="introduction">Introduction<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction">​</a></h2>
<p>Building scalable data pipelines in a fast-growing fintech can feel like fixing a bike while riding it. You must keep insights flowing even as data volumes explode. At Kuda (a Nigerian neo-bank), we faced this problem as our user base surged. Traditional batch ETL (rebuilding entire tables each run) started to buckle; pipelines took hours, and costs ballooned. We needed to keep data fresh without reprocessing everything. Our solution was to leverage dbt’s <a href="https://docs.getdbt.com/docs/build/incremental-models">incremental models</a>, which process only new or changed records. This dramatically cut run times and curbed our BigQuery costs, letting us scale efficiently.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="challenges-in-scaling">Challenges in Scaling<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#challenges-in-scaling" class="hash-link" aria-label="Direct link to Challenges in Scaling" title="Direct link to Challenges in Scaling">​</a></h2>
<p>Rapid growth brought some serious scaling challenges, and the most important were:</p>
<ul>
<li>
<p><strong>Performance</strong>: Our nightly full-refresh models that once took minutes began taking hours as data grew. For example, our core transactions table became too slow to rebuild from scratch for each update. Analytics dashboards lagged, and stakeholders lost timely insights. In real-time fintech, such latency is unacceptable.</p>
</li>
<li>
<p><strong>Cost</strong>: More data and longer processing also drove up our BigQuery bills. Scanning a 2TB table every hour to grab a few MB of new data was wasteful. Under BigQuery’s on-demand pricing model, this could rack up thousands of dollars per month. We needed to increase throughput without scaling cost linearly, which meant rethinking our processing to avoid full table scans.</p>
</li>
<li>
<p><strong>Data</strong> Integrity: As pipelines and dependencies multiplied, so did the risk of inconsistencies or failures. Any new approach had to maintain accuracy and consistency even as we sped things up.</p>
</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="approach-incremental-models--key-strategies">Approach: Incremental Models &amp; Key Strategies<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#approach-incremental-models--key-strategies" class="hash-link" aria-label="Direct link to Approach: Incremental Models &amp; Key Strategies" title="Direct link to Approach: Incremental Models &amp; Key Strategies">​</a></h2>
<p>We tackled these issues by embracing dbt’s incremental models, which process only new or updated records since the last run. Instead of monolithic daily rebuilds, our models continuously ingested changes in small bites. Below, we outline our key <a href="https://docs.getdbt.com/docs/build/incremental-strategy">incremental strategies</a> —<code>append</code>, <code>insert_overwrite</code>, and <code>merge</code> — and how we tuned performance and cost.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="append-strategy">Append Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#append-strategy" class="hash-link" aria-label="Direct link to Append Strategy" title="Direct link to Append Strategy">​</a></h3>
<p>This is the simplest incremental approach: Each run adds new rows to the existing table and never touches old rows. It's ideal for append-only data (e.g. logs or transactions that never change after insertion).</p>
<p>In dbt, using append is straightforward. We configure the model as incremental and specify <code>incremental_strategy='append'</code> (supported in some adapters like Snowflake).</p>
<blockquote>
<p><strong>Note:</strong> <code>append</code> is not currently supported in BigQuery. Always confirm <a href="https://docs.getdbt.com/docs/build/incremental-strategy#supported-incremental-strategies-by-adapter">adapter support</a> before choosing an incremental strategy.</p>
</blockquote>
<p>In the SQL, we filter the source to only new records since the last load. For example, to incrementally load new transactions:</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="code-append-incremental-strategy">Code: Append Incremental Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#code-append-incremental-strategy" class="hash-link" aria-label="Direct link to Code: Append Incremental Strategy" title="Direct link to Code: Append Incremental Strategy">​</a></h4>
<div class="language-jinja codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-jinja codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">{{ config(</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    materialized = 'incremental',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    incremental_strategy = 'append'</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">) }}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-sql codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">SELECT</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    transaction_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    customer_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    transaction_date</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    amount</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    </span><span class="token keyword" style="color:rgb(127, 219, 202)">status</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">FROM</span><span class="token plain"> {{ source</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">'core'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'transactions'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> }}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">{</span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">if</span><span class="token plain"> is_incremental</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain">}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">WHERE</span><span class="token plain"> transaction_date </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">   </span><span class="token keyword" style="color:rgb(127, 219, 202)">SELECT</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">MAX</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">transaction_date</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">FROM</span><span class="token plain"> {{ this }}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">{</span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain"> endif </span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain">}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This query appends only transactions that have a transaction date later than the maximum transaction date in the target table.</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="append-incremental-model--before-incremental-run">Append Incremental Model – Before Incremental Run<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#append-incremental-model--before-incremental-run" class="hash-link" aria-label="Direct link to Append Incremental Model – Before Incremental Run" title="Direct link to Append Incremental Model – Before Incremental Run">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>transaction_id</th><th>customer_id</th><th>transaction_date</th><th>amount</th><th>status</th></tr></thead><tbody><tr><td>10001</td><td>C001</td><td>2023-09-28</td><td>₦12,000</td><td>completed</td></tr><tr><td>10002</td><td>C002</td><td>2023-09-28</td><td>₦5,000</td><td>completed</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="append-incremental-model--after-incremental-run">Append Incremental Model – After Incremental Run<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#append-incremental-model--after-incremental-run" class="hash-link" aria-label="Direct link to Append Incremental Model – After Incremental Run" title="Direct link to Append Incremental Model – After Incremental Run">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>transaction_id</th><th>customer_id</th><th>transaction_date</th><th>amount</th><th>status</th></tr></thead><tbody><tr><td>10001</td><td>C001</td><td>2023-09-28</td><td>₦12,000</td><td>completed</td></tr><tr><td>10002</td><td>C002</td><td>2023-09-28</td><td>₦5,000</td><td>completed</td></tr><tr><td><strong>10003</strong></td><td><strong>C003</strong></td><td><strong>2023-09-29</strong></td><td><strong>₦8,500</strong></td><td><strong>completed</strong></td></tr><tr><td><strong>10004</strong></td><td><strong>C004</strong></td><td><strong>2023-09-30</strong></td><td><strong>₦7,250</strong></td><td><strong>completed</strong></td></tr><tr><td><strong>10005</strong></td><td><strong>C005</strong></td><td><strong>2023-09-30</strong></td><td><strong>₦3,100</strong></td><td><strong>pending</strong></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p><strong><em>Illustration: The "Before" table shows data before an incremental run; the "After" table shows new transactions (in bold) added. No historical data is touched. Append is great for immutable data streams (like transaction logs or event streams that only ever grow).</em></strong></p>
<p>Append served us well for ingestion pipelines that just accumulate history without reprocessing old data. However, we had to guard against duplicates (if the source might resend records, we applied deduplication or unique constraints). Also, pure append doesn’t handle updates or deletions to existing records. If data can change after insertion (e.g. a transaction status moves from "pending" to "completed"), a different strategy is needed.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="insert-overwrite-strategy">Insert Overwrite Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#insert-overwrite-strategy" class="hash-link" aria-label="Direct link to Insert Overwrite Strategy" title="Direct link to Insert Overwrite Strategy">​</a></h3>
<p>For data partitioned by date (or another key) that may need partial replacements, <code>insert_overwrite</code> is ideal. Instead of merging rows, this strategy overwrites entire partitions of the target table each run. The table must be partitioned (daily, hourly, etc.), and the model will drop and rebuild only the partitions that have new or updated data.</p>
<p>We used <code>insert_overwrite</code> for partitioned data like daily aggregates, where changes are isolated by date. For example, if a table is partitioned by <code>transaction_date</code>, an <code>insert_overwrite</code> model can refresh just the partition for <em>"2023-10-01"</em> without affecting other days.</p>
<p>Here’s how we configured a model to use <code>insert_overwrite</code> on BigQuery:</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="code-insert-overwrite-strategy">Code: Insert Overwrite Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#code-insert-overwrite-strategy" class="hash-link" aria-label="Direct link to Code: Insert Overwrite Strategy" title="Direct link to Code: Insert Overwrite Strategy">​</a></h4>
<div class="language-jinja codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-jinja codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">{{ config(</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    materialized = 'incremental',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    incremental_strategy = 'insert_overwrite',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    partition_by = { 'field': 'transaction_date', 'data_type': 'date' }</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">) }}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-sql codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    customer_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    transaction_date</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    amount</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    transaction_type</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">FROM</span><span class="token plain"> {{ source</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">'core'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'transactions'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> }}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">WHERE</span><span class="token plain"> transaction_date </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;=</span><span class="token plain"> {{ this</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">last_partition }}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Here, <code>partition_by</code> defines the table partition. The <code>WHERE</code> clause uses <code>{{ this.last_partition }}</code> (<em>the latest partition already present in the target</em>) to pull only data for new or updated partitions. On each run, BigQuery will replace any existing partition that meets the filter (e.g. the partition for the latest date) with the query results. Older partitions stay untouched.</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="insert-overwrite-strategy--before-incremental-run">Insert Overwrite Strategy – Before Incremental Run<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#insert-overwrite-strategy--before-incremental-run" class="hash-link" aria-label="Direct link to Insert Overwrite Strategy – Before Incremental Run" title="Direct link to Insert Overwrite Strategy – Before Incremental Run">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>transaction_date</th><th>transaction_id</th><th>customer_id</th><th>amount</th><th>status</th></tr></thead><tbody><tr><td>2023-09-29</td><td>11001</td><td>C011</td><td>₦14,000</td><td>completed</td></tr><tr><td>2023-09-29</td><td>11002</td><td>C012</td><td>₦6,500</td><td>completed</td></tr><tr><td><strong>2023-10-01</strong></td><td><strong>12001</strong></td><td><strong>C021</strong></td><td><strong>₦8,000</strong></td><td><strong>pending</strong></td></tr><tr><td><strong>2023-10-01</strong></td><td><strong>12002</strong></td><td><strong>C022</strong></td><td><strong>₦4,200</strong></td><td><strong>completed</strong></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p><em>(Partition to be overwritten highlighted in bold.)</em></p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="insert-overwrite-strategy--after-incremental-run">Insert Overwrite Strategy – After Incremental Run<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#insert-overwrite-strategy--after-incremental-run" class="hash-link" aria-label="Direct link to Insert Overwrite Strategy – After Incremental Run" title="Direct link to Insert Overwrite Strategy – After Incremental Run">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>transaction_date</th><th>transaction_id</th><th>customer_id</th><th>amount</th><th>status</th></tr></thead><tbody><tr><td>2023-09-29</td><td>11001</td><td>C011</td><td>₦14,000</td><td>completed</td></tr><tr><td>2023-09-29</td><td>11002</td><td>C012</td><td>₦6,500</td><td>completed</td></tr><tr><td><strong>2023-10-01</strong></td><td><strong>12003</strong></td><td><strong>C023</strong></td><td><strong>₦8,150</strong></td><td><strong>completed</strong></td></tr><tr><td><strong>2023-10-01</strong></td><td><strong>12004</strong></td><td><strong>C024</strong></td><td><strong>₦3,900</strong></td><td><strong>completed</strong></td></tr><tr><td><strong>2023-10-01</strong></td><td><strong>12005</strong></td><td><strong>C025</strong></td><td><strong>₦5,500</strong></td><td><strong>completed</strong></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p><em>(New partition data is shown in bold, replacing the old partition.)</em></p>
<p><em><strong><em>Illustration: "Before" shows a partitioned table with the October 1, 2023 partition highlighted; "After" shows that partition replaced with fresh rows. This approach lets us refresh a specific day’s data (e.g. to capture late-arriving transactions or corrections) without rebuilding the whole table.</em></strong></em></p>
<p>At Kuda, <code>insert_overwrite</code> was invaluable for derived tables and rollups. For instance, our daily customer spend aggregates are updated incrementally by replacing just the latest day's data, keeping those tables accurate with minimal cost. By replacing whole partitions, we avoided complex row-by-row merges while still catching any corrections within that day (for example, a back-dated transaction on that day would be picked up when the partition is reprocessed).</p>
<p>One note on static vs dynamic partitions: We mostly used static daily partitions (overwriting entire days). Some warehouses (and newer dbt features) allow dynamic partition updates (updating only changed rows within a partition), but we stuck to whole-day replacements for simplicity. It's easier to say "<em>each run rebuilds yesterday’s partition from scratch,</em>" ensuring we capture any late modifications for that day. This dramatically improved performance for large tables (no full table scans) while still correcting recent data when needed. We just had to align the partition_by field and filter logic to avoid wiping the wrong partition.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="merge-strategy">Merge Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#merge-strategy" class="hash-link" aria-label="Direct link to Merge Strategy" title="Direct link to Merge Strategy">​</a></h3>
<p>For tables where new records arrive and existing ones can change, we used the merge strategy. It performs an upsert based on a unique key: new rows are inserted, and if a key already exists, specified fields are updated. This is perfect for data like customer profiles or account balances that evolve.</p>
<p>In dbt, using <code>incremental_strategy='merge'</code> requires a unique_key (on BigQuery or Snowflake, dbt compiles it into a merge statement). We can also limit which columns get updated using merge_update_columns or exclude certain fields with <code>merge_exclude_columns</code>. For example:</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="code-merge-strategy">Code: Merge Strategy<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#code-merge-strategy" class="hash-link" aria-label="Direct link to Code: Merge Strategy" title="Direct link to Code: Merge Strategy">​</a></h4>
<div class="language-jinja codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-jinja codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token plain">{{ config(</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    materialized = 'incremental',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    incremental_strategy = 'merge',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    unique_key = 'account_id',</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    merge_update_columns = ['balance', 'last_updated']</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">) }}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-sql codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">SELECT</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    account_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    balance</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">    last_updated</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">FROM</span><span class="token plain"> {{ source</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">'core'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'accounts'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> }}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">{</span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">if</span><span class="token plain"> is_incremental</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain">}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">WHERE</span><span class="token plain"> last_updated </span><span class="token operator" style="color:rgb(127, 219, 202)">&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">   </span><span class="token keyword" style="color:rgb(127, 219, 202)">SELECT</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">MAX</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">last_updated</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">FROM</span><span class="token plain"> {{ this }}</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">{</span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain"> endif </span><span class="token operator" style="color:rgb(127, 219, 202)">%</span><span class="token plain">}</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This model selects only new or updated records (those with a <code>last_updated</code> more recent than the max in the target) and merges them into the accounts table on <code>account_id</code>. We chose to update only the <code>balance</code> and <code>last_updated</code> fields for existing accounts (to avoid overwriting other data). If an incoming <code>account_id</code> doesn’t exist in the target yet, a new row is inserted.</p>
<p>Merge was our go-to for upserts. For example, we maintained a daily updated accounts table of customer statuses and balances using merge. Each day, new accounts were added, and any changes (balance updates, status changes) were merged into existing records. This prevented duplicates (which a naive append would create) and ensured one row per account with the latest info.</p>
<p>We learned to define unique keys and update columns carefully. In one case, we omitted a merge_exclude_columns and accidentally overwrote a timestamp we meant to preserve—a quick lesson in being explicit. Also, merge comes with a performance cost: each run joins the new data with the existing table. With proper clustering on the key and only a day’s worth of new data, this was fine for us, but at a very large scale, it needs monitoring.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="optimizing-performance-and-cost">Optimizing Performance and Cost<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#optimizing-performance-and-cost" class="hash-link" aria-label="Direct link to Optimizing Performance and Cost" title="Direct link to Optimizing Performance and Cost">​</a></h2>
<p>Choosing the right incremental strategy was half the battle; we also employed several performance tuning and cost optimization techniques to make our pipelines truly scale:</p>
<ul>
<li>
<p><strong>Partitioning</strong>: On large tables, we partitioned by date (or another key) so incremental runs only scan the new slice. For example, partitioning the transactions table by transaction_date meant a daily incremental load only touched that day's partition. BigQuery’s partition pruning reduced scans from entire multi-terabyte tables to just a few GB per run (e.g. 2TB down to 0.01TB), yielding huge savings. Partitioning also sped up downstream queries that filter by date.</p>
</li>
<li>
<p><strong>Clustering</strong>: We added clustering on columns frequently used in filters or joins (e.g. clustering transactions by <code>customer_id</code>). In BigQuery, clustering sorts the data by those columns, so queries filtering on them scan less data within each partition. The improvements were subtle but meaningful; some queries that once scanned tens of GBs now scan only a fraction when the table is well-clustered.</p>
</li>
<li>
<p><strong>Smart Scheduling</strong>: We tuned model run frequencies to balance freshness vs. cost. Not every model needs to run constantly. Our customer-facing tables (transactions, balances) ran hourly for near-real-time updates, whereas internal analytics models ran daily or a few times per day. Adjusting schedules avoided wasteful runs and saved on compute cost. We also used dependency-based scheduling (via dbt Cloud), so heavy models ran only after upstream data was updated, preventing runs when no new source data arrived.</p>
</li>
<li>
<p><strong>Warehouse Tuning</strong>: We optimized our warehouse compute as well. Since incremental models drastically cut per-run processing, we could use smaller clusters/slots and run more often without overspending, a big win as data volumes grew.</p>
</li>
<li>
<p><strong>Monitoring &amp; Alerting</strong>: We tracked metrics like run durations and rows processed to catch anomalies. For example, if a daily incremental model that usually adds hundreds of rows suddenly adds zero, that's a red flag (upstream failure or missing source data). Similarly, if a job that normally takes 5 minutes jumps to 50, it likely did an unintended full scan. We also watched data freshness, if an hourly model hasn’t loaded new data in 3 hours, we investigate. These checks helped us catch issues early (like a stale source or a broken filter) and kept data flowing reliably.</p>
</li>
</ul>
<p>With these optimizations in place, we vastly improved our pipeline speed and cost-efficiency. Instead of fearing the next data surge, we were confident our system could handle growth by design.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="real-world-implementation-kuda-case-study">Real-World Implementation: Kuda Case Study<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#real-world-implementation-kuda-case-study" class="hash-link" aria-label="Direct link to Real-World Implementation: Kuda Case Study" title="Direct link to Real-World Implementation: Kuda Case Study">​</a></h3>
<p>How did these approaches work out in practice at Kuda during hyper-growth?</p>
<p>One critical dataset was our customer transactions feed. Initially, a full daily rebuild of the transactions table took over an hour and scanned the entire history. We refactored it into an incremental model (append strategy with partition pruning). The first run built the historical backlog, and subsequent runs pulled only new transactions. The difference was night and day: the incremental job ran in minutes, and data scanned per run dropped by over 90%. Analysts saw new transactions within the hour, and our monthly BigQuery cost for that table plummeted even as data continued to grow.</p>
<p>To illustrate, here’s a simplified daily transactions summary. Initially, it contained data up to <em>Sept 30, 2023</em>:</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="daily-transactions-summary--before">Daily Transactions Summary – Before<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#daily-transactions-summary--before" class="hash-link" aria-label="Direct link to Daily Transactions Summary – Before" title="Direct link to Daily Transactions Summary – Before">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>date</th><th>transactions_count</th><th>total_amount</th></tr></thead><tbody><tr><td>2023-09-28</td><td>1,045</td><td>25,100,000</td></tr><tr><td>2023-09-29</td><td>980</td><td>22,340,000</td></tr><tr><td>2023-09-30</td><td>1,102</td><td>27,500,000</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p>After the next incremental load (bringing in transactions from <em>October 1, 2023</em>), the table automatically includes the new day's metrics without recomputing prior days:</p>
<h4 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="daily-transactions-summary--after">Daily Transactions Summary – After<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#daily-transactions-summary--after" class="hash-link" aria-label="Direct link to Daily Transactions Summary – After" title="Direct link to Daily Transactions Summary – After">​</a></h4>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>date</th><th>transactions_count</th><th>total_amount</th></tr></thead><tbody><tr><td>2023-09-28</td><td>1,045</td><td>25,100,000</td></tr><tr><td>2023-09-29</td><td>980</td><td>22,340,000</td></tr><tr><td>2023-09-30</td><td>1,102</td><td>27,500,000</td></tr><tr><td>2023-10-01</td><td>1,210</td><td>30,230,000</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p><em><strong><em>Table: Example of a daily transactions summary. After an incremental load for 2023-10-01, the new day’s data appears without reprocessing previous days.</em></strong></em></p>
<p>This approach kept our teams and customers up-to-date. Customer support could view nearly real-time transactions to investigate issues, and customers could generate current account statements on the fly.</p>
<p>We also used incremental models for regulatory and finance reporting. For example, our Finance team needed a daily reconciliation of balances and an end-of-day accounts table. They were fine with data being a day old, but it had to be accurate and deduplicated. We built this with a nightly incremental merge on the accounts table, merging changes from the core accounts data into a fresh daily view of account states. It provided a reliable daily snapshot of accounts. (Our Finance team didn’t realize any fancy incremental process was involved, they just got their report each morning!)</p>
<p>During the launch of a new card product, we needed to monitor transaction declines and errors in near real-time. Our existing daily-refreshed dashboard wasn’t enough. We set up an incremental model that ingested card transaction events every 15 minutes. To ensure that no historical fixes were missed, we also scheduled a nightly full refresh of this model during the launch period. This hybrid approach gave us timely visibility and a daily catch-up for any late-arriving corrections. It proved crucial: we spotted issues (like a spike in declines from an API glitch) early and fixed them, minimizing customer impact. After the launch, we reverted to purely incremental runs once things stabilized.</p>
<p>The overall impact at Kuda was huge. Some heavy transformations that were close to failing became reliable again. Stakeholders noticed fresher data in their reports, our customer satisfaction scores improved as no one saw stale data. By controlling costs and keeping pipelines efficient, we kept management and regulators happy.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="lessons-learned--best-practices">Lessons Learned &amp; Best Practices<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#lessons-learned--best-practices" class="hash-link" aria-label="Direct link to Lessons Learned &amp; Best Practices" title="Direct link to Lessons Learned &amp; Best Practices">​</a></h2>
<p>Throughout this scaling journey, we learned a ton about what worked and what didn’t. Here are some of the key lessons and best practices we’d recommend to any growth-stage fintech looking to implement incremental models:</p>
<ul>
<li>
<p><strong>Choose the Right Strategy</strong>: Not all tables should use the same incremental approach. Generally, use append for insert-only data, <code>insert_overwrite</code> for data partitioned by date (or ID) where you can replace whole chunks, and merge for true upsert scenarios. If source deletes are an issue, consider <code>delete+insert</code> or handle soft deletes.</p>
</li>
<li>
<p><strong>Partition Wisely</strong>: Partitioning is critical, but pick an appropriate granularity. The right grain (hour, day, month) depends on data volume and query patterns. For us, daily partitions were often the sweet spot, small enough to reduce scanned data, not so small as to create thousands of partitions. Always align your incremental filter with the partition field to enable pruning.</p>
</li>
<li>
<p><strong>Monitor Your Models</strong>: Implement tests or alerts on incremental models. For example, check that each run’s row count is within expected bounds, if a daily incremental that usually adds hundreds of rows suddenly adds zero, that’s a red flag. Catching these issues early prevents bigger problems down the line.</p>
</li>
<li>
<p><strong>Periodic Full Refreshes</strong>: Over time, even a well-built incremental model can drift due to small errors or schema changes. We scheduled occasional full refreshes for critical models to realign them with source data, essentially giving a clean slate that catches any discrepancies or missed data. Similarly, after major logic changes, we’d do a one-time full refresh to apply the new logic across all historical data and then switch back to incremental.</p>
</li>
<li>
<p><strong>Test and Document</strong>: We treated incremental models like mission-critical code. We wrote tests to ensure the logic is sound (for instance, after an incremental run, the target’s record count for a period equals the source’s count for that period, if not, the filter might be wrong). We also documented each model’s assumptions (e.g. “<em>this model runs incrementally, do not disable the is_incremental filter in development</em>”). Good documentation helped new team members avoid breaking incremental logic.</p>
</li>
<li>
<p><strong>Design for Scale Early</strong>: Our biggest lesson was to plan for scale from the start. Now, when designing a new model or pipeline, we ask, “<em>What if the data grows 10x?</em>”. If a full refresh won't be feasible at that size, we build it incrementally from day one. It's much easier than refactoring under pressure later. This mindset, combined with dbt’s flexible incremental features, has future-proofed our pipelines. As our data continues to grow, the incremental approach should keep holding up.</p>
</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="conclusion">Conclusion<a href="https://docs.getdbt.com/blog/scaling-data-pipelines-fintech#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h2>
<p>Scaling a fintech data platform doesn’t have to mean scaling cost and runtime at the same pace. By using dbt’s incremental models—paired with optimizations like partitioning, clustering, and careful scheduling—Kuda transformed its pipelines to handle rapid growth. We kept data fresh and accurate for users without breaking the bank. Incremental processing let us handle ever-increasing volumes in bite-sized chunks, maintaining agility as the company grew.</p>
<p>If you’re at a growing company struggling with slow or expensive data jobs, give dbt’s incremental models a try. As we saw at Kuda, the payoff can be huge: faster insights, happier stakeholders, and a data platform ready for whatever the future brings. The future of data processing (in fintech and beyond) is incremental. With tools like dbt, you can ride the wave of growth instead of drowning in it.</p>]]></content>
        <author>
            <name>Adedamola Onabanjo</name>
        </author>
        <category label="analytics" term="analytics"/>
        <category label="dbt Cloud" term="dbt Cloud"/>
        <category label="fintech" term="fintech"/>
        <category label="BigQuery" term="BigQuery"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents]]></title>
        <id>https://docs.getdbt.com/blog/introducing-dbt-mcp-server</id>
        <link href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server"/>
        <updated>2025-04-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[We’re open‑sourcing an experimental dbt MCP server so LLMs and agents can discover, query, and run your dbt project.]]></summary>
        <content type="html"><![CDATA[<p>dbt is the standard for creating governed, trustworthy datasets on top of your structured data. <a href="https://www.anthropic.com/news/model-context-protocol" target="_blank" rel="noopener noreferrer">MCP</a> is showing increasing promise as the standard for providing context to LLMs to allow them to function at a high level in real world, operational scenarios.</p>
<p>Today, we are open sourcing an experimental version of the <a href="https://github.com/dbt-labs/dbt-mcp/tree/main" target="_blank" rel="noopener noreferrer">dbt MCP server</a>. We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data.</p>
<p>In particular, we expect both <a href="https://roundup.getdbt.com/p/how-ai-will-disrupt-bi-as-we-know" target="_blank" rel="noopener noreferrer">Business Intelligence</a> and <a href="https://roundup.getdbt.com/p/how-ai-will-disrupt-data-engineering" target="_blank" rel="noopener noreferrer">Data Engineering</a> will be driven by AI operating on top of the context defined in your dbt projects.</p>
<p><strong>We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage.</strong> Over the coming months and years, data teams will increasingly focus on building the rich context that feeds into the dbt MCP server.  Both AI agents and business stakeholders will then operate on top of LLM-driven systems hydrated by the dbt MCP context.</p>
<div style="margin:40px 10px"><div class="loomWrapper_TTvb"><iframe width="640" class="loomFrame_B61a" height="400" src="https://www.loom.com/embed/28cd33da8bcc41ccbe43338d327e73d8" frameborder="0" allowfullscreen="" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe></div></div>
<p>Today’s system is not a full realization of the vision in the posts shared above, but it is a meaningful step towards safely integrating your structured enterprise data into AI workflows. In this post, we’ll walk through what the dbt MCP server can do today, some tips for getting started and some of the limitations of the current implementation.</p>
<p>We believe it is important for the industry to start coalescing on best practices for safe and trustworthy ways to access your business data via LLM.</p>
<p><strong>What is MCP?</strong></p>
<p>MCP stands for Model Context Protocol - it is an open protocol released by Anthropic in <a href="https://www.anthropic.com/news/model-context-protocol" target="_blank" rel="noopener noreferrer">November of last year</a> to allow AI systems to dynamically pull in context and data. Why does this matter?</p>
<blockquote>
<p>Even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult to scale.</p>
<p>MCP addresses this challenge. It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. - Anthropic</p>
</blockquote>
<p>Since then, MCP has become widely supported, with Google, Microsoft and OpenAI all committing to support MCP.</p>
<p><strong>What does the dbt MCP Server do?</strong></p>
<p>Think of it as the missing glue between:</p>
<ul>
<li><strong>Your dbt project</strong> (models, docs, lineage, Semantic&nbsp;Layer)</li>
<li><strong>Any MCP‑enabled <a href="https://modelcontextprotocol.io/clients" target="_blank" rel="noopener noreferrer">client</a></strong> (Claude Desktop Projects, Cursor, agent frameworks, custom apps, etc.)</li>
</ul>
<p>We’ve <a href="https://roundup.getdbt.com/p/semantic-layer-as-the-data-interface" target="_blank" rel="noopener noreferrer">known for a while</a> that the combination of structured data from your dbt project + LLMs is a potent combo (particularly when using the dbt Semantic Layer). The question has been, what is the best way to provision this across a wide variety of LLM applications in a way that puts the power in the hands of the Community and the ecosystem, rather than us building out a series of one-off integrations.</p>
<p>The dbt MCP server provides access to a set of <em>tools</em> that operate on top of your dbt project. These tools can be called by LLM systems to learn about your data and metadata.</p>
<p><strong>Remember, as with any AI workflows, to make sure that you are taking appropriate caution in terms of giving these access to production systems and data. Consider starting in a sandbox environment or only granting read permissions.</strong></p>
<p>There are three primary functions of the dbt MCP server today.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#" data-featherlight="/img/blog/2025-04-18-dbt-mcp-server/mcp_use_cases.png"><img data-toggle="lightbox" alt="Three use‑case pillars of the dbt MCP server" title="Three use‑case pillars of the dbt MCP server" src="https://docs.getdbt.com/img/blog/2025-04-18-dbt-mcp-server/mcp_use_cases.png?v=2"></a></span><span class="title_aGrV">Three use‑case pillars of the dbt MCP server</span></div>
<ul>
<li>Data discovery: Understand what data assets exist in your dbt project.</li>
<li>Data querying: Directly query the data in your dbt project. This has two components:<!-- -->
<ul>
<li>Use the dbt Semantic Layer for trustworthy, single source of truth reporting on your metrics</li>
<li>Execution of SQL queries for more freewheeling data exploration and development</li>
</ul>
</li>
<li>Run and perform commands within dbt: Access the dbt CLI to run a project and perform other operations</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#" data-featherlight="/img/blog/2025-04-18-dbt-mcp-server/mcp_architecture_overview.png"><img data-toggle="lightbox" alt="How the dbt MCP server fits between data sources and MCP‑enabled clients" title="How the dbt MCP server fits between data sources and MCP‑enabled clients" src="https://docs.getdbt.com/img/blog/2025-04-18-dbt-mcp-server/mcp_architecture_overview.png?v=2"></a></span><span class="title_aGrV">How the dbt MCP server fits between data sources and MCP‑enabled clients</span></div>
<p>❓Do I need to be a dbt Cloud customer to use the dbt MCP server?</p>
<ul>
<li>No - there is functionality for both <span>dbt</span> and dbt Core users included in the MCP. Over time, Cloud-specific services will be built into the MCP server where they provide differentiated value.</li>
</ul>
<p>Let’s walk through examples of these and why each of them can be helpful in human driven and agent driven use cases:</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="using-the-dbt-mcp-server-for-data-asset-discovery"><strong>Using the dbt MCP Server for Data Asset Discovery</strong><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#using-the-dbt-mcp-server-for-data-asset-discovery" class="hash-link" aria-label="Direct link to using-the-dbt-mcp-server-for-data-asset-discovery" title="Direct link to using-the-dbt-mcp-server-for-data-asset-discovery">​</a></h2>
<p>dbt has knowledge about the data assets that exist across your entire data stack, from raw staging models to polished analytical marts. The dbt MCP server exposes this knowledge in a way that makes it accessible to LLMs and AI agents, enabling powerful discovery capabilities:</p>
<ul>
<li><strong>For human stakeholders</strong>: Learn about your production dbt project interactively through natural language. Business users can ask questions like "What customer data do we have?" or "Where do we store marketing spend information?" and receive accurate information based on your dbt project's documentation and structure.</li>
<li><strong>For AI agent workflows</strong>: Automatically discover and understand the available data models, their relationships, and their structures without human intervention. This allows agents to autonomously navigate complex data environments and produce accurate insights. This can be useful context for any agent that needs to operate on top of information in a data platform.</li>
</ul>
<p>The data discovery tools allow LLMs to understand what data exists, how it's structured, and how different data assets relate to each other. This contextual understanding is essential for generating accurate SQL, answering business questions, and providing trustworthy data insights.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="data-asset-discovery-tools"><strong>Data Asset Discovery Tools:</strong><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#data-asset-discovery-tools" class="hash-link" aria-label="Direct link to data-asset-discovery-tools" title="Direct link to data-asset-discovery-tools">​</a></h3>
<p><em>note - for all of these tools, you do not need to directly access them in your workflow. Rather, the MCP client will use the context you have provided to determine which is the most accurate tool to use at a given time.</em></p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Tool Name</th><th>Purpose</th><th>Output</th></tr></thead><tbody><tr><td><code>get_all_models</code></td><td>Provides a complete inventory of all models in the dbt project, regardless of type</td><td>List of all model names and their descriptions</td></tr><tr><td><code>get_mart_models</code></td><td>Identifies presentation layer models specifically designed for end-user consumption</td><td>List of mart model names and descriptions (models in the reporting layer)</td></tr><tr><td><code>get_model_details</code></td><td>Retrieves comprehensive information about a specific model</td><td>Compiled SQL, description, column names, column descriptions, and column data types</td></tr><tr><td><code>get_model_parents</code></td><td>Identifies upstream dependencies for a specific model</td><td>List of parent models that the specified model depends on</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="using-the-dbt-mcp-server-for-querying-data-via-the-dbt-semantic-layer"><strong>Using the dbt MCP server for querying data via the dbt Semantic Layer</strong><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#using-the-dbt-mcp-server-for-querying-data-via-the-dbt-semantic-layer" class="hash-link" aria-label="Direct link to using-the-dbt-mcp-server-for-querying-data-via-the-dbt-semantic-layer" title="Direct link to using-the-dbt-mcp-server-for-querying-data-via-the-dbt-semantic-layer">​</a></h2>
<p>The <a href="https://www.getdbt.com/product/semantic-layer" target="_blank" rel="noopener noreferrer">dbt Semantic Layer</a> defines your organization's metrics and dimensions in a consistent, governed way. With the dbt MCP server, LLMs can understand and query these metrics directly, ensuring that AI-generated analyses are consistent with your organization's definitions.</p>
<ul>
<li><strong>For human stakeholders</strong>: Request metrics using natural language. Users can ask for "monthly revenue by region" and get accurate results that match your organization's standard metric definitions, with a <a href="https://roundup.getdbt.com/p/semantic-layer-as-the-data-interface" target="_blank" rel="noopener noreferrer">higher baseline of accuracy than LLM generated SQL queries</a>.</li>
<li><strong>For AI agent workflows</strong>: As agentic systems take action in the real world over a longer time horizon, they will need ways to understand the underlying reality of your business. From feeding into deep research style reports to feeding operational agents, the dbt Semantic Layer can provide a trusted underlying interface for LLM systems.</li>
</ul>
<p>By leveraging the dbt Semantic Layer through the MCP server, you ensure that LLM-generated analyses are based on rigorous definitions instantiated as code, flexibly available in any MCP-supported client.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="semantic-layer-tools">Semantic Layer Tools:<a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#semantic-layer-tools" class="hash-link" aria-label="Direct link to Semantic Layer Tools:" title="Direct link to Semantic Layer Tools:">​</a></h3>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Tool Name</th><th>Purpose</th><th>Output</th></tr></thead><tbody><tr><td><code>list_metrics</code></td><td>Provides an inventory of all available metrics in the dbt Semantic Layer</td><td>Complete list of metric names, types, labels, and descriptions</td></tr><tr><td><code>get_dimensions</code></td><td>Identifies available dimensions for specified metrics</td><td>List of dimensions that can be used to group/filter the specified metrics</td></tr><tr><td><code>query_metrics</code></td><td>Executes queries against metrics in the dbt Semantic Layer</td><td>Query results based on specified metrics, dimensions, and filters</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="using-the-dbt-mcp-server-for-sql-execution-to-power-text-to-sql"><strong>Using the dbt MCP server for SQL execution to power text to sql</strong><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#using-the-dbt-mcp-server-for-sql-execution-to-power-text-to-sql" class="hash-link" aria-label="Direct link to using-the-dbt-mcp-server-for-sql-execution-to-power-text-to-sql" title="Direct link to using-the-dbt-mcp-server-for-sql-execution-to-power-text-to-sql">​</a></h2>
<p>While the dbt Semantic Layer provides a governed, metrics-based approach to data querying, there are many analytical needs that require more flexible, exploratory SQL queries. The dbt MCP server will soon include SQL validation and querying capabilities with rich context awareness.</p>
<ul>
<li><strong>For human stakeholders</strong>: Ask complex analytical questions that go beyond predefined metrics. Users can explore data freely while still benefiting from the LLM's understanding of their specific data models, ensuring that generated SQL is correct and optimized for your environment.</li>
<li><strong>For AI agent workflows</strong>: Generate and validate SQL against your data models automatically. Agents can create and execute complex queries that adapt to schema changes, optimize for performance, and follow your organization's SQL patterns and conventions.</li>
</ul>
<p>Unlike traditional SQL generation, queries created through the dbt MCP server will be aware of your specific data models, making them more accurate and useful for your particular environment. This capability is particularly valuable for data exploration, one-off analyses, and prototype development that might later be incorporated into your dbt project.</p>
<p>Currently SQL execution is managed through the dbt Show tool, over the near term we expect to release tooling that is more performant and fit to this precise use case.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="using-the-dbt-mcp-server-for-project-execution"><strong>Using the dbt MCP server for project execution</strong><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#using-the-dbt-mcp-server-for-project-execution" class="hash-link" aria-label="Direct link to using-the-dbt-mcp-server-for-project-execution" title="Direct link to using-the-dbt-mcp-server-for-project-execution">​</a></h2>
<p>The dbt MCP server doesn't just provide access to data—it also allows LLMs and AI agents to interact directly with dbt, executing commands and managing your project.</p>
<ul>
<li><strong>For human stakeholders</strong>: Trigger dbt commands through conversational interfaces without CLI knowledge. Users can ask to "run the daily models" or "test the customer models" and get clear explanations of the results, including suggestions for fixing any issues that arise.</li>
<li><strong>For AI agent workflows</strong>: Autonomously run dbt processes in response to events. Agents can manage project execution, automatically test and validate model changes, and even debug common issues without human intervention.</li>
</ul>
<p>While the discovery and query tools operate on top of <em>environments</em> as the context source, these execution tools interact directly with the CLI, both dbt Core and the Cloud CLI.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="project-execution-tools">Project Execution Tools<a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#project-execution-tools" class="hash-link" aria-label="Direct link to Project Execution Tools" title="Direct link to Project Execution Tools">​</a></h3>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Tool Name</th><th>Purpose</th><th>Output</th></tr></thead><tbody><tr><td><code>build</code></td><td>Executes the dbt build command to build the entire project</td><td>Results of the build process including success/failure status and logs</td></tr><tr><td><code>compile</code></td><td>Executes the dbt compile command to compile the project's SQL</td><td>Results of the compilation process including success/failure status and logs</td></tr><tr><td><code>list</code></td><td>Lists all resources in the dbt project</td><td>Structured list of resources within the project</td></tr><tr><td><code>parse</code></td><td>Parses the dbt project files</td><td>Results of the parsing process including success/failure status and logs</td></tr><tr><td><code>run</code></td><td>Executes the dbt run command to run models in the project</td><td>Results of the run process including success/failure status and logs</td></tr><tr><td><code>test</code></td><td>Executes tests defined in the dbt project</td><td>Results of test execution including success/failure status and logs</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="getting-started">Getting Started<a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started">​</a></h2>
<p>The dbt MCP server is now available as an experimental release. To get started:</p>
<ol>
<li>Clone the repository from GitHub: <a href="https://github.com/dbt-labs/dbt-mcp" target="_blank" rel="noopener noreferrer">dbt-labs/dbt-mcp</a></li>
<li>Follow the installation instructions in the README</li>
<li>Connect your dbt project and start exploring the capabilities</li>
</ol>
<p>We're excited to see how the community builds with and extends the dbt MCP server. Whether you're building an AI-powered BI tool, an autonomous data agent, or just exploring the possibilities of LLMs in your data workflows, the dbt MCP server provides a solid foundation for bringing your dbt context to AI applications.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-is-the-best-workflow-for-the-current-iteration-of-the-mcp-server">What is the best workflow for the current iteration of the MCP server?<a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#what-is-the-best-workflow-for-the-current-iteration-of-the-mcp-server" class="hash-link" aria-label="Direct link to What is the best workflow for the current iteration of the MCP server?" title="Direct link to What is the best workflow for the current iteration of the MCP server?">​</a></h2>
<p>This early release is primarily meant to be used <em>on top of an existing dbt project to answer questions about your data and metadata -</em> roughly tracking towards the set of use cases described in this <a href="https://roundup.getdbt.com/p/how-ai-will-disrupt-bi-as-we-know" target="_blank" rel="noopener noreferrer">post</a> on the future of BI and data consumption.</p>
<p><em>Chat use case:</em></p>
<p>We suggest using Claude Desktop for this and creating a custom <a href="https://www.anthropic.com/news/projects" target="_blank" rel="noopener noreferrer">project</a> that includes a prompt explaining the use cases you are looking to cover.</p>
<p>To get this working:</p>
<ul>
<li>Follow the instructions in the Readme to install the MCP server</li>
<li>Validate that you have added the MCP config to your Claude desktop config. You should see ‘dbt’ when you go to Claude→Settings→Developer</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#" data-featherlight="/img/blog/2025-04-18-dbt-mcp-server/claudedesktop_settings_dbt_mcp.png"><img data-toggle="lightbox" alt="Claude Desktop – MCP server running in Developer settings" title="Claude Desktop – MCP server running in Developer settings" src="https://docs.getdbt.com/img/blog/2025-04-18-dbt-mcp-server/claudedesktop_settings_dbt_mcp.png?v=2"></a></span><span class="title_aGrV">Claude Desktop – MCP server running in Developer settings</span></div>
<ul>
<li>Create a new project called “analytics”. Give it a description of how an end user might interact with it.</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#" data-featherlight="/img/blog/2025-04-18-dbt-mcp-server/claudedesktop_project_card.png"><img data-toggle="lightbox" alt="Example Claude Desktop project connected to the dbt MCP server" title="Example Claude Desktop project connected to the dbt MCP server" src="https://docs.getdbt.com/img/blog/2025-04-18-dbt-mcp-server/claudedesktop_project_card.png?v=2"></a></span><span class="title_aGrV">Example Claude Desktop project connected to the dbt MCP server</span></div>
<ul>
<li><strong>Add a custom prompt explaining that questions in this project will likely be routed through the dbt MCP server.</strong> You’ll likely want to customize this to your particular organizational context.<!-- -->
<ul>
<li>For example: This conversation is connected to and knows about the information in your dbt Project via the dbt MCP server. When you receive a question that plausibly needs data from an external data source, you will likely want to use the tools available via the dbt MCP server to provide it.</li>
</ul>
</li>
</ul>
<p><em>Deployment considerations:</em></p>
<ul>
<li>This is an <em>experimental release</em>. We recommend that initial use should be focused on prototyping and proving value before rolling out widely across your organization.</li>
<li>Be particularly mindful with the project execution tools - remember that LLMs make mistakes and begin with permissions scoped so that you can experiment but not disrupt your data operations.</li>
<li>Start with the smallest possible use case that provides tangible value. Instead of giving this access to your entire production dbt Project, consider creating an upstream project that inherits a smaller subset of models and metrics that will power the workflow.</li>
<li>As of right now we don’t have perfect adherence for tool selection. In our testing, the model will sometimes cycle through several unnecessary tool calls or call them in the wrong order. While this can usually be fixed by more specific prompting by the end user, that goes against the spirit of allowing the model to dynamically select the right tool for the job. We expect this to be addressed over time via improvements in the dbt MCP Server, as well as client interfaces and the protocol itself.</li>
<li>Think carefully about the use cases for Semantic Layer tool vs. using the SQL execution tool. SQL execution is powerful but less controllable. We’re looking to do a lot of hands on testing to begin to develop heuristics about when SQL execution is the best option, when to bake logic into the Semantic Layer and whether there are new abstractions that might be needed for AI workflows.</li>
<li>Tool use is powerful because it can link multiple tools together. What tools complement the dbt MCP Server? How can we use this to tie our structured data into other workflows?</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-future-of-the-dbt-mcp-and-the-correct-layers-of-abstraction-for-interfacing-with-your-data">The future of the dbt MCP and the correct layers of abstraction for interfacing with your data<a href="https://docs.getdbt.com/blog/introducing-dbt-mcp-server#the-future-of-the-dbt-mcp-and-the-correct-layers-of-abstraction-for-interfacing-with-your-data" class="hash-link" aria-label="Direct link to The future of the dbt MCP and the correct layers of abstraction for interfacing with your data" title="Direct link to The future of the dbt MCP and the correct layers of abstraction for interfacing with your data">​</a></h2>
<p>We are in the <em>very</em> early days of MCP as a protocol and determining how best to connect your structured data to LLM systems. This is an extremely exciting, dynamic time where we are working out, in real time, how to best serve this data and context.</p>
<p>We have high confidence that the approach of serving context to your AI systems via dbt will prove a durable piece of this stack. As we work with the Community on implementing this in real world use cases, it is quite likely that the details of the implementation and how you access it may change. Here are some of the areas we expect this to evolve.</p>
<p><strong>Determining the best source of context for the dbt MCP</strong>
You’ll notice that these tools have two broad information inputs - dbt Cloud APIs and the dbt CLI. We expect to continue to build on both of these, specifically with dbt Cloud APIs to serve the abstraction of choice when it is desirable to operate off of a specific <a href="https://docs.getdbt.com/docs/dbt-cloud-environments" target="_blank" rel="noopener noreferrer">environment</a>.</p>
<p>There will be other use cases, specifically for dbt development, when you’ll want to operate based off of your current working context, we’ll be releasing tooling for that in the near future (and welcome Community submitted ideas and contributions). We’re looking forward to trying out alternative methods here and looking forward to hearing from the Community how you would like to have this context loaded in. Please feel free to experiment and share your findings with us.</p>
<p><strong>Determining the most useful tools for the dbt MCP</strong></p>
<p>What are the best and most useful set of tools to enable human in the loop and AI driven LLM access to structured data? The dbt MCP server presents our early explorations, but we anticipate that the Community will find many more.</p>
<p><strong>How to handle hosting, authentication, RBAC and more</strong></p>
<p>Currently the dbt MCP server is locally hosted, with access management via scoped service tokens from dbt Cloud or locally configured via your CLI. We expect there to be three levels via which we will continue to build out systems to make this not only safe and secure, but tailored to the needs of the specific user (human or agent) accessing the MCP.</p>
<ol>
<li>Hosting of the MCP: In the near future we will have a Cloud hosted version of the MCP alongside the current local MCP</li>
<li>Managing data access with the MCP: We are committed to offering safe and trustworthy data and data asset access (think OAuth support and more)</li>
<li>User and domain level context: Over the longer run we are looking into ways to provide user and domain specific knowledge about your data assets to the systems as they are querying it.</li>
</ol>
<p>Expect to hear more on this front on <a href="https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase" target="_blank" rel="noopener noreferrer">5/28</a>.</p>
<p>This is a new frontier for the whole Community. We need to be having open, honest discussions about how to integrate these systems into our existing workflows and open up new use cases.</p>
<p>To join the conversation, head over to #tools-dbt-mcp in the dbt Community Slack.</p>]]></content>
        <author>
            <name>Jason Ganz</name>
        </author>
        <category label="ai" term="ai"/>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Establishing dbt Cloud: Securing your account through SSO & RBAC]]></title>
        <id>https://docs.getdbt.com/blog/dbt-cloud-sso-rbac</id>
        <link href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac"/>
        <updated>2025-04-17T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How to configure dbt Cloud with SSO & RBAC]]></summary>
        <content type="html"><![CDATA[<p>As a dbt Cloud admin, you’ve just upgraded to dbt Cloud on the <a href="https://www.getdbt.com/pricing" target="_blank" rel="noopener noreferrer">Enterprise plan</a> - <strong>congrats</strong>! dbt Cloud has a lot to offer such as <a href="https://docs.getdbt.com/docs/deploy/about-ci">CI/CD</a>, <a href="https://docs.getdbt.com/docs/deploy/deployments">Orchestration</a>, <a href="https://docs.getdbt.com/docs/explore/explore-projects">dbt Explorer</a>, <a href="https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl">dbt Semantic Layer</a>, <a href="https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro">dbt Mesh</a>, <a href="https://docs.getdbt.com/docs/cloud/canvas">Visual Editor</a>, <a href="https://docs.getdbt.com/docs/cloud/dbt-copilot">dbt Copilot</a>, and so much more. <em><strong>But where should you begin?</strong></em></p>
<p>We strongly recommend as you start adopting dbt Cloud functionality to make it a priority to set up Single-Sign On (SSO) and Role-Based Access Control (RBAC). This foundational step enables your organization to keep your data pipelines secure, onboard users into dbt Cloud with ease, and optimize cost savings for the long term.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="authentication-vs-authorization">Authentication vs. Authorization<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#authentication-vs-authorization" class="hash-link" aria-label="Direct link to Authentication vs. Authorization" title="Direct link to Authentication vs. Authorization">​</a></h2>
<p>Before we dig into SSO, RBAC, and more — let’s go over how they map into two foundational security concepts.</p>
<ul>
<li><strong>Authentication:</strong> <a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#single-sign-on-sso">SSO</a> is configured to gate authentication - it verifies (via an IdP) that users are who they say they are and can log into the specified dbt Cloud account.</li>
<li><strong>Authorization:</strong> <a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#role-based-access-control-via-idp">RBAC</a> is an authorization model - it controls what users can see and do within dbt Cloud based on their assigned licenses, groups, and permission sets.</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="single-sign-on-sso">Single-Sign On (SSO)<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#single-sign-on-sso" class="hash-link" aria-label="Direct link to Single-Sign On (SSO)" title="Direct link to Single-Sign On (SSO)">​</a></h2>
<p>Your SSO configuration steps will depend on your IdP, so we encourage you to start at our <a href="https://docs.getdbt.com/docs/cloud/manage-access/sso-overview">SSO Overview</a> page and find the IdP-specific doc under that section that’s specific to your setup.</p>
<p>Regardless of what IdP you use, one of the first things you should do as a dbt Cloud admin is set the <strong>login slug</strong> value. This should be a <em>unique company identifier</em>.</p>
<p>Keep in mind that whatever you set, the slug will be appended to the end of the SSO login URL that your users will use to sign into dbt Cloud. For example:</p>
<ul>
<li>If I set my login slug to <code>mynewco</code></li>
<li>My SSO login URL will look something like <code>https://cloud.getdbt.com/enterprise-login/mynewco</code>.</li>
</ul>
<p>At first glance, this screen has a lot of info and fields — but with the <a href="https://docs.getdbt.com/docs/cloud/manage-access/sso-overview">SSO docs</a> in hand, dbt Cloud admins are ready to start setting up smooth, scalable workflows.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/1_sso_config.png"><img data-toggle="lightbox" alt="dbt Cloud's SSO configuration page" title="dbt Cloud's SSO configuration page" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/1_sso_config.png?v=2"></a></span><span class="title_aGrV">dbt Cloud's SSO configuration page</span></div>
<p>Let’s break this down at a high level to make it more digestible:</p>
<ol>
<li>After setting the desired login slug, a <em>dbt Cloud admin</em> will go to the dbt Cloud SSO configuration page and copy/paste everything under the <strong>Identity provider values</strong> section and will share the values with the <em>IdP admin</em>.</li>
<li>The <em>IdP admin</em> will create a <a href="https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-saml-2.0#creating-the-application">dbt Cloud app</a> and then provide the values under the <strong>dbt configuration</strong> section to the <em>dbt Cloud admin</em>.<!-- -->
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Refer to the appropriate setup docs for <a href="https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-google-workspace">Google Workspace</a>, <a href="https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-okta">Okta</a>, <a href="https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id">Microsoft Entra ID</a>, or <a href="https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-saml-2.0">SAML 2.0</a>.</p></div></div>
</li>
<li>The <em>dbt Cloud admin</em> will fill in those values into the SSO configuration page under the <strong>dbt configuration</strong> section and click <strong>Save</strong> to complete the process.</li>
</ol>
<p>After completing this process:</p>
<ul>
<li>We <em>strongly</em> advise you validate the SSO flow is working by pasting the SSO login URL (it should look like <code>https://cloud.getdbt.com/enterprise-login/dbtlabs</code>) into your web browser’s private window</li>
<li>And try to log into your account via the IdP.</li>
<li>If the SSO flow isn’t working as expected, an account admin will still be able to log in with a password to correct the configuration.</li>
</ul>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Be aware of our <a href="https://docs.getdbt.com/docs/cloud/manage-access/sso-overview#sso-enforcement" target="_blank" rel="noopener noreferrer">SSO enforcement policy</a> — once SSO is configured, all non-admin users will have to log in via SSO as a security best practice, while account admins, by default, can still authenticate with a password in lieu of <a href="https://docs.getdbt.com/docs/cloud/manage-access/mfa">multi-factor authentication (MFA)</a>.</p></div></div>
<p>Once you've set up SSO successfully, you have additional ways to onboard your users into dbt Cloud on top of sending out an email invite:</p>
<ul>
<li>Provide users the SSO login URL to access dbt Cloud. This is also known as the <em>SP-initiated flow</em> (SP stands for Service Provider; in this case, it would be dbt Cloud).</li>
<li>Provision the dbt Cloud for users to access on their IdP’s dashboard. This is also known as the <em>IdP-initiated flow</em>.</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/2_sso_flows.png"><img data-toggle="lightbox" alt="SSO flows into dbt Cloud" title="SSO flows into dbt Cloud" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/2_sso_flows.png?v=2"></a></span><span class="title_aGrV">SSO flows into dbt Cloud</span></div>
<p>Get stuck setting up SSO? <a href="mailto:support@getdbt.com" target="_blank" rel="noopener noreferrer">Open a support ticket</a>, and one of our Customer Solutions Engineers will be happy to help you!</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="licenses-and-groups">Licenses and Groups<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#licenses-and-groups" class="hash-link" aria-label="Direct link to Licenses and Groups" title="Direct link to Licenses and Groups">​</a></h2>
<p>In dbt Cloud, there are two main levers to control user access:</p>
<ul>
<li><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#licenses">Licenses</a></li>
<li><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#groups">Groups</a></li>
</ul>
<p>As a prerequisite, these all should be set <em>before</em> configuring RBAC. Let’s get into these!</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="licenses">Licenses<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#licenses" class="hash-link" aria-label="Direct link to Licenses" title="Direct link to Licenses">​</a></h3>
<p>There are three <a href="https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users">license types</a> in dbt Cloud:</p>
<ul>
<li><strong>Developer:</strong> User can be granted&nbsp;<em>any</em>&nbsp;permissions.</li>
<li><strong>Read-Only:</strong> User has read-only permissions applied to all dbt Cloud resources regardless of the role-based permissions that the user is assigned.</li>
<li><strong>IT:</strong> User has Security Admin and Billing Admin&nbsp;permissions&nbsp;applied, regardless of the group permissions assigned.</li>
</ul>
<p>Odds are that the majority of your users will be developers or analysts who’ll need Developer licenses. You can assign default licenses to users based on the groups that they’re in on the IdP side under <strong>Account Settings</strong> --&gt; <strong>Groups &amp; Licenses</strong> --&gt; <strong>License mappings</strong>.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/3_license_mapping_example.png"><img data-toggle="lightbox" alt="An example license mapping" title="An example license mapping" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/3_license_mapping_example.png?v=2"></a></span><span class="title_aGrV">An example license mapping</span></div>
<p>If a user is in multiple groups with different license types assigned, they will be granted the highest license type — Developer.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="groups">Groups<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#groups" class="hash-link" aria-label="Direct link to Groups" title="Direct link to Groups">​</a></h3>
<p>Groups are used to manage permissions. They define what a user can see and do across projects and environments. We recommend reviewing our <a href="https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions" target="_blank" rel="noopener noreferrer">available permissions sets</a> and determining which are applicable to your dbt Cloud user base.</p>
<p>Keep in mind group permissions are additive in nature for users that belong to more than one group — meaning if a user belongs to multiple groups, they'll inherit all assigned permissions.</p>
<p>Navigating to Groups &amp; Licenses page in dbt Cloud, you’ll see three default groups — Everyone, Member, and Owner. There’s also an option to create your own groups on the top right.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/4_default_dbt_cloud_groups.png"><img data-toggle="lightbox" alt="The out-of-the-box dbt Cloud groups you may use" title="The out-of-the-box dbt Cloud groups you may use" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/4_default_dbt_cloud_groups.png?v=2"></a></span><span class="title_aGrV">The out-of-the-box dbt Cloud groups you may use</span></div>
<p>Here’s a brief primer on the default groups:</p>
<ul>
<li><strong>Owner:</strong> This group is for individuals responsible for the entire account and will give them elevated account admin privileges. You cannot change the permissions.</li>
<li><strong>Member:</strong> This group is for the general members of your organization, who will also have full developer access to the account. You cannot change the permissions. By default, dbt Cloud adds new users to this group.</li>
<li><strong>Everyone:</strong> A general group for all members of your organization. Customize the permissions to fit your organizational needs. By default, dbt Cloud adds new users to this group and only grants user access to their personal profile.</li>
</ul>
<p>While we recommend creating your own groups and deleting the defaults to better tailor it to your business’ needs, you should only delete the defaults <em>after</em> your own groups have been created and permission sets have been associated with them. These default groups are available to you as a means of getting users started in dbt Cloud. To sum up what they do, the Owner group will give users full account admin access while Everyone and Member groups will give users full developer access.</p>
<p>To help get you started, these are the main permission sets that should be assigned to most users:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th><strong>User persona</strong></th><th><strong>Permission set</strong></th></tr></thead><tbody><tr><td>dbt Cloud Admin</td><td>Account Admin</td></tr><tr><td>dbt Developer</td><td>Developer</td></tr><tr><td>dbt Analyst</td><td>Analyst</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p>You can also use groups to control which projects and environments users can access.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/5_new_dbt_cloud_group.png"><img data-toggle="lightbox" alt="Creating a new dbt Cloud group" title="Creating a new dbt Cloud group" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/5_new_dbt_cloud_group.png?v=2"></a></span><span class="title_aGrV">Creating a new dbt Cloud group</span></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="role-based-access-control-via-idp">Role-Based Access Control via IdP<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#role-based-access-control-via-idp" class="hash-link" aria-label="Direct link to Role-Based Access Control via IdP" title="Direct link to Role-Based Access Control via IdP">​</a></h2>
<p>If you made it this far, thanks for staying with me here! We’re now ready to configure RBAC, which assign users to the right groups and effectively the right permission sets after they authenticate into dbt Cloud. This hinges on the <em>SSO group mapping(s)</em> you’ll find within a group.</p>
<p>As an example, let’s say that I want specific users in this group where the SSO group mapping is <code>dbt-developer</code>. Note that you can also specify more than one.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/6_sso_group_mapping.png"><img data-toggle="lightbox" alt="Configuring a SSO group mapping within a group" title="Configuring a SSO group mapping within a group" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/6_sso_group_mapping.png?v=2"></a></span><span class="title_aGrV">Configuring a SSO group mapping within a group</span></div>
<p>Here’s what we do to make it happen:</p>
<ol>
<li>Have your IdP admin create a <code>dbt-developer</code> group in the IdP.</li>
<li>Assign users who should be in the dbt Cloud group to that IdP group.</li>
<li>Have users sign into dbt Cloud to confirm they get assigned to that group.</li>
</ol>
<p>Easy enough, right? Just make sure these two conditions are checked for RBAC to work properly between your IdP and dbt Cloud:</p>
<ul>
<li>Group names must be an exact match</li>
<li>Group names have the same casing</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#" data-featherlight="/img/blog/2025-04-10-sso-and-rbac/7_okta_sso_group_mapping_example.png"><img data-toggle="lightbox" alt="Making a SSO group mapping work with your idenity provider" title="Making a SSO group mapping work with your idenity provider" src="https://docs.getdbt.com/img/blog/2025-04-10-sso-and-rbac/7_okta_sso_group_mapping_example.png?v=2"></a></span><span class="title_aGrV">Making a SSO group mapping work with your idenity provider</span></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="automate-sso--rbac-introducing-scim">Automate SSO &amp; RBAC: Introducing SCIM<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#automate-sso--rbac-introducing-scim" class="hash-link" aria-label="Direct link to Automate SSO &amp; RBAC: Introducing SCIM" title="Direct link to Automate SSO &amp; RBAC: Introducing SCIM">​</a></h2>
<p>We have exciting news — <a href="https://docs.getdbt.com/docs/cloud/manage-access/scim">System for Cross-Domain Identity Management) (SCIM)</a> support will be generally available in May 2025 (for SCIM-compliant IdPs &amp; Okta)! If you’re unfamiliar with SCIM, you can think of it as automated user provisioning in dbt Cloud. It makes user data more secure and simplifies the admin and user experience by automating the user identity and group lifecycle.</p>
<p>Here’s why you should care about SCIM as a dbt Cloud admin:</p>
<ol>
<li><strong>Improved Admin and end user experience</strong> — Through automating user onboarding and offboarding, SCIM saves time for dbt Cloud admins that are managing multiple users on a weekly basis. If a user is added or removed in the IdP, their license and user account is automatically added/removed from dbt Cloud.</li>
<li><strong>Simplified RBAC with group management</strong> — Admins can simplify access control management by using SCIM to update group membership. Currently, SSO group mapping enables admins to add new users to groups when they are JIT provisioned. SCIM would build on that functionality to allow group management not only for new users but also for existing users.</li>
</ol>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="closing-thoughts">Closing thoughts<a href="https://docs.getdbt.com/blog/dbt-cloud-sso-rbac#closing-thoughts" class="hash-link" aria-label="Direct link to Closing thoughts" title="Direct link to Closing thoughts">​</a></h2>
<p>Securing your account through SSO and RBAC should be one of your first priorities after getting on the Enterprise plan.</p>
<p>Not only does it keep your data safe, it allows you to onboard users into your account at scale. While it may be just the beginning of your dbt Cloud journey, putting in the work to check off this crucial step will establish that users are leveraging dbt responsibly at an enterprise grade level!</p>]]></content>
        <author>
            <name>Brian Jan</name>
        </author>
        <category label="dbt tutorials" term="dbt tutorials"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Getting Started with git Branching Strategies and dbt]]></title>
        <id>https://docs.getdbt.com/blog/git-branching-strategies-with-dbt</id>
        <link href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt"/>
        <updated>2025-03-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How to configure dbt Cloud with common git strategies]]></summary>
        <content type="html"><![CDATA[<p>Hi! We’re Christine and Carol, Resident Architects at dbt Labs. Our day-to-day
work is all about helping teams reach their technical and business-driven goals.
Collaborating with a broad spectrum of customers ranging from scrappy startups
to massive enterprises, we’ve gained valuable experience guiding teams to
implement architecture which addresses their major pain points.</p>
<p>The information we’re about to share isn't just from our experiences - we
frequently collaborate with other experts like Taylor Dunlap and Steve Dowling
who have greatly contributed to the amalgamation of this guidance. Their work
lies in being the critical bridge for teams between
implementation and business outcomes, ultimately leading teams to align on a
comprehensive technical vision through identification of problems and solutions.</p>
<p><strong>Why are we here?</strong><br>
<!-- -->We help teams with dbt architecture, which encompasses the tools, processes and
configurations used to start developing and deploying with dbt. There’s a lot of
decision making that happens behind the scenes to standardize on these pieces -
much of which is informed by understanding what we want the development workflow
to look like. The focus on having the <em><strong>perfect</strong></em> workflow often gets teams
stuck in heaps of planning and endless conversations, which slows down or even
stops momentum on development. If you feel this, we’re hoping our guidance will
give you a great sense of comfort in taking steps to unblock development - even
when you don’t have everything figured out yet!</p>
<p>There are three major tools that play an important role in dbt development:</p>
<ul>
<li><strong>A repository</strong><br>
<!-- -->Contains the code we want to change or deploy, along with tools for change management processes.</li>
<li><strong>A data platform</strong><br>
<!-- -->Contains data for our inputs (loaded from other systems) and databases/schemas for our outputs, as well as permission management for data objects.</li>
<li><strong>A dbt project</strong><br>
<!-- -->Helps us manage development and deployment processes of our code to our data platform (and other cool stuff!)</li>
</ul>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/1_dbt_eco.png"><img data-toggle="lightbox" alt="dbt's relationship to git and the data platform" title="dbt's relationship to git and the data platform" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/1_dbt_eco.png?v=2"></a></span><span class="title_aGrV">dbt's relationship to git and the data platform</span></div>
<p>No matter how you end up <strong>defining</strong> your development workflow, these major steps are always present:</p>
<ul>
<li><strong>Development</strong>: How teams make and test changes to code</li>
<li><strong>Quality Assurance</strong>: How teams ensure changes work and produce expected outputs</li>
<li><strong>Promotion</strong>: How teams move changes to the next stage</li>
<li><strong>Deployment</strong>: How teams surface changes to others</li>
</ul>
<p>This article will be focusing mainly on the topic of git and your repository, how
code corresponds to populating your data platform, and the common dbt configurations
we implement to make this happen. We’ll also be pinning ourselves to the steps of
the development workflow throughout.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-we-should-focus-on-git">Why we should focus on git<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#why-we-should-focus-on-git" class="hash-link" aria-label="Direct link to Why we should focus on git" title="Direct link to Why we should focus on git">​</a></h2>
<p>Source control (and git in particular) is foundational to modern development with
or without dbt. It facilitates collaboration between teams of any size and makes
it easy to maintain oversight of the code changes in your project. Understanding
these controlled processes and what code looks like at each step makes
understanding how we need to configure our data platform and dbt much easier.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="️-how-to-just-get-started-️">⭐️ How to “just get started” ⭐️<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#%EF%B8%8F-how-to-just-get-started-%EF%B8%8F" class="hash-link" aria-label="Direct link to ⭐️ How to “just get started” ⭐️" title="Direct link to ⭐️ How to “just get started” ⭐️">​</a></h2>
<p>This article will be talking about git topics in depth — this will be helpful if
your team is familiar with some of the options and needs help considering
the tradeoffs. If you’re getting started for the first time and don’t have strong
opinions, <strong>we recommend starting with Direct Promotion</strong>.</p>
<p>Direct Promotion is the foundation of all git branching strategies, works well
with basic git knowledge, requires the least amount of provisioning, and can easily
evolve into another strategy if or when your team needs it. We understand this
recommendation can invoke some thoughts of “what if?”. <strong>We urge you to think
about starting with direct promotion like getting a suit tailored</strong>. Your
developers can wear it while you’re figuring out the adjustments, and this is a much
more informative step forward because it allows us to see how the suit functions
<em>in motion —</em> our resulting adjustments can be starkly different than what we
thought we’d need when it was static.</p>
<p>The best part with ‘just getting started’
is that it’s not hard to change configurations in dbt for your git strategy
later on (and we'll cover this), so don’t think of this as a critical decision that will
that will result in months of breaking development for re-configuration if you
don’t get it right immediately. Truly, changing your git strategy can be done in
a matter of minutes in dbt Cloud.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="branching-strategies">Branching strategies<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#branching-strategies" class="hash-link" aria-label="Direct link to Branching strategies" title="Direct link to Branching strategies">​</a></h2>
<p>Once a repository has its initial commit, it always starts with one default
branch which is typically called <code>main</code> or <code>master</code> — we’ll be calling the
default branch <code>main</code> in our upcoming examples. The <code>main</code> branch is <em>always the
final destination that we’re aiming to land our changes, and most often
corresponds to the term "production"</em> - another term you'll see us use throughout.</p>
<p><em><strong>How we want our workflow to look getting our changes from development to
<code>main</code> is the big discussion</strong></em>. Our process needs to consider all the steps in our
workflow: development, quality assurance, promotion, and deployment.
<strong>Branching Strategies</strong> define what this process looks like. We at dbt are not
reinventing the wheel - a number of common strategies have already been defined,
implemented, iterated on, and tested for at least a decade.</p>
<p>There are two major strategies that encompass all forms of branching strategies:
<strong>Direct Promotion</strong> and <strong>Indirect Promotion</strong>. We’ll start by laying these two
out simply:</p>
<ul>
<li>What is the strategy?</li>
<li>How does the development workflow of the strategy look to a team?</li>
<li>Which <strong>repository branching rules and helpers</strong> help us in this strategy?</li>
<li>How do we commonly configure <strong>dbt Cloud</strong> for this strategy?</li>
<li>How do branches and dbt processes map to our <strong>data platform</strong> with this strategy?</li>
</ul>
<p>Then, we’ll end by comparing the strategies and covering some frequently asked questions.</p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>Know before you go</div><div class="admonitionContent_BuS1"><p>There are <em>many</em> ways to configure each tool (especially dbt) to accomplish what you need. The upcoming
strategy details were written intently to provide what we think are the minimal standards
to get teams up and running quickly. These are starter configurations and practices which
are easy to tweak and adjust later on. Expanding on these configurations is
an exercise left to the reader!</p></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="direct-promotion">Direct promotion<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#direct-promotion" class="hash-link" aria-label="Direct link to Direct promotion" title="Direct link to Direct promotion">​</a></h2>
<p><strong>Direct promotion</strong> means we only keep one long-lived branch
in our repository — in our case, <code>main</code>. Here’s the workflow for this strategy:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/2_direct_git.png"><img data-toggle="lightbox" alt="Direct promotion branching strategy" title="Direct promotion branching strategy" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/2_direct_git.png?v=2"></a></span><span class="title_aGrV">Direct promotion branching strategy</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-does-the-development-workflow-look-to-a-team">How does the development workflow look to a team?<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#how-does-the-development-workflow-look-to-a-team" class="hash-link" aria-label="Direct link to How does the development workflow look to a team?" title="Direct link to How does the development workflow look to a team?">​</a></h3>
<p>Layout:</p>
<ul>
<li><code>feature</code> is the developer’s unique branch where task-related changes happen</li>
<li><code>main</code> is the branch that contains our “production” version of code</li>
</ul>
<p>Workflow:</p>
<ul>
<li><strong>Development</strong>: I create a <code>feature</code> branch from <code>main</code> to make, test, and personally review changes</li>
<li><strong>Quality Assurance</strong>: I open a pull request comparing my <code>feature</code> against <code>main</code>, which is then reviewed by peers (required), stakeholders, or subject matter experts (SMEs). We highly recommend including stakeholders or SMEs for feedback during PR in this strategy because the next step changes <code>main</code>.</li>
<li><strong>Promotion</strong>: After all required approvals and checks, I merge my changes to <code>main</code></li>
<li><strong>Deployment</strong>: Others can see and use my changes in <code>main</code> after I merge and <code>main</code> is deployed</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="repository-branching-rules-and-helpers">Repository branching rules and helpers<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#repository-branching-rules-and-helpers" class="hash-link" aria-label="Direct link to Repository branching rules and helpers" title="Direct link to Repository branching rules and helpers">​</a></h3>
<p>At a minimum, we like to set up:</p>
<ul>
<li><strong>Branch protection</strong> on <code>main</code> (<a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches" target="_blank" rel="noopener noreferrer">like these settings for GitHub</a>), requiring:<!-- -->
<ul>
<li>a pull request (no direct commits to <code>main</code>)</li>
<li>pull requests must have at least 1 reviewer's approval</li>
</ul>
</li>
<li><strong>A PR template</strong> (<a href="https://docs.getdbt.com/blog/analytics-pull-request-template" target="_blank" rel="noopener noreferrer">such as our boiler-plate PR template</a>) for <code>feature</code> PRs against <code>main</code></li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="dbt-cloud-processes-and-environments">dbt Cloud processes and environments<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#dbt-cloud-processes-and-environments" class="hash-link" aria-label="Direct link to dbt Cloud processes and environments" title="Direct link to dbt Cloud processes and environments">​</a></h3>
<p>Here’s our branching strategy again, but now with the dbt Cloud processes we want to incorporate:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/3_direct_dbt_deployment.png"><img data-toggle="lightbox" alt="Direct Promotion strategy with dbt cloud processes denoted" title="Direct Promotion strategy with dbt cloud processes denoted" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/3_direct_dbt_deployment.png?v=2"></a></span><span class="title_aGrV">Direct Promotion strategy with dbt cloud processes denoted</span></div>
<p>In order to create the jobs in our diagram, we need dbt Cloud environments. Here are the common configurations for this setup:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Environment Name</th><th><a href="https://docs.getdbt.com/docs/dbt-cloud-environments#types-of-environments" target="_blank" rel="noopener noreferrer">Environment Type</a></th><th><a href="https://docs.getdbt.com/docs/deploy/deploy-environments#staging-environment" target="_blank" rel="noopener noreferrer">Deployment Type</a></th><th>Base Branch</th><th>Will handle…</th></tr></thead><tbody><tr><td>Development</td><td>development</td><td>-</td><td><code>main</code></td><td>Operations done in the IDE (including creating feature branches)</td></tr><tr><td>Continuous Integration</td><td>deployment</td><td>General</td><td><code>main</code></td><td>A continuous integration job</td></tr><tr><td>Production</td><td>deployment</td><td>Production</td><td><code>main</code></td><td>A deployment job</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="data-platform-organization">Data platform organization<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#data-platform-organization" class="hash-link" aria-label="Direct link to Data platform organization" title="Direct link to Data platform organization">​</a></h3>
<p>Now we need to focus on where we want to build things in our data platform. For that,
we need to set our <strong>database</strong> and <strong>schema</strong> settings on the environments.
Here’s our diagram again, but now mapping how we want our objects to populate
from our branches to our data platform:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/4_direct_data_population.png"><img data-toggle="lightbox" alt="Direct Promotion strategy with branch relations to data platform objects" title="Direct Promotion strategy with branch relations to data platform objects" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/4_direct_data_population.png?v=2"></a></span><span class="title_aGrV">Direct Promotion strategy with branch relations to data platform objects</span></div>
<p>Taking the table we created previously for our dbt Cloud environment, let's further
map environment configurations to our data platform:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Environment Name</th><th><strong>Database</strong></th><th><strong>Schema</strong></th></tr></thead><tbody><tr><td>Development</td><td><code>development</code></td><td>User-specified in Profile Settings &gt; Credentials</td></tr><tr><td>Continuous Integration</td><td><code>development</code></td><td>Any safe default, like <code>dev_ci</code> (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.</td></tr><tr><td>Production</td><td><code>production</code></td><td><code>analytics</code></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>We are showing environment configurations here, but a default database will be set at the highest level in a <strong><a href="https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections" target="_blank" rel="noopener noreferrer">connection</a></strong> (which is a required setting of an environment). <em>Deployment</em> environments can override a connection's database setting when needed.</p></div></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="direct-promotion-example">Direct promotion example<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#direct-promotion-example" class="hash-link" aria-label="Direct link to Direct promotion example" title="Direct link to Direct promotion example">​</a></h3>
<p><em>In this example, Steve uses the term “QA” for defining the environment which builds the changed code from feature branch pull requests. This is equivalent to our ‘Continuous Integration’ environment — this is a great example of defining names which make the most sense for your team!</em></p>
<div style="margin:40px 10px"><div class="loomWrapper_TTvb"><iframe width="640" class="loomFrame_B61a" height="400" src="https://www.loom.com/embed/59c71a9549b5497f99ef86622aad945e" frameborder="0" allowfullscreen="" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="indirect-promotion">Indirect promotion<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#indirect-promotion" class="hash-link" aria-label="Direct link to Indirect promotion" title="Direct link to Indirect promotion">​</a></h2>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>A note about Indirect Promotion</div><div class="admonitionContent_BuS1"><p>Indirect Promotion introduces more steps of ownership, so this branching strategy
works best when you can identify people who have a great understanding of git to
handle branch management. Additionally, the <em><strong>time from development to production
is lengthier</strong></em> due to the workload of these new steps, so it requires good
project management. We expand more on this later, but it’s an important call out
as this is where we see unprepared teams struggle most.</p></div></div>
<p><strong>Indirect promotion</strong> adds other long-lived branches that derive from <code>main</code>.
The most simple version of indirect promotion is a two-trunk <em>hierarchical</em> structure
— this is the one we see implemented most commonly in indirect workflows.</p>
<p><em>Hierarchical promotion</em> is promoting changes back the same way we derived the branches. Example:</p>
<ul>
<li>a middle branch is derived from <code>main</code></li>
<li>feature branches derive from the middle branch</li>
<li>feature branches merge back to the middle branch</li>
<li>the middle branch merges back to <code>main</code></li>
</ul>
<p>Some common names for a middle branch as seen in the wild are:</p>
<ul>
<li><code>qa</code>: Quality Assurance</li>
<li><code>uat</code>: User Acceptance Testing</li>
<li><code>staging</code> or <code>preprod</code>: Common software development terminology</li>
</ul>
<p>We’ll be calling our middle branch <code>qa</code> from throughout the rest of this article.</p>
<p>Here’s the workflow for this strategy:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/6_indirect_git.png"><img data-toggle="lightbox" alt="Indirect Promotion branching strategy" title="Indirect Promotion branching strategy" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/6_indirect_git.png?v=2"></a></span><span class="title_aGrV">Indirect Promotion branching strategy</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-does-the-development-workflow-look-to-a-developer">How does the development workflow look to a developer?<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#how-does-the-development-workflow-look-to-a-developer" class="hash-link" aria-label="Direct link to How does the development workflow look to a developer?" title="Direct link to How does the development workflow look to a developer?">​</a></h3>
<p>Changes from our direct promotion workflow are highlighted in <mark style="background-color:#d6eaf8">blue</mark>.</p>
<p>Layout:</p>
<ul>
<li><code>feature</code> is the developer’s unique branch where task-related changes happen</li>
<li>
<mark style="background-color:#d6eaf8"><code style="background-color:#aed6f1">qa</code> contains approved changes from developers’ <code style="background-color:#aed6f1">feature</code> branches, which will be merged to main and enter production together once additional testing is complete.<code style="background-color:#aed6f1">qa</code> is always ahead of <code style="background-color:#aed6f1">main</code> in changes.</mark>
</li>
<li><code>main</code> is the branch that contains our “production” version of code</li>
</ul>
<p>Workflow:</p>
<ul>
<li><strong>Development</strong>: I create a <code>feature</code> branch from <mark style="background-color:#d6eaf8"><code>qa</code></mark> to make, test, and personally review changes</li>
<li><strong>Quality Assurance:</strong> I open a pull request comparing my <code>feature</code> branch to <mark style="background-color:#d6eaf8"><code>qa</code></mark>, which is then reviewed by peers and <mark style="background-color:#d6eaf8"><em>optionally</em></mark> subject matter experts or stakeholders</li>
<li><strong>Promotion</strong>: After all required approvals and checks, I can merge my changes to <mark style="background-color:#d6eaf8"><code>qa</code></mark></li>
<li>
<mark style="background-color:#d6eaf8"><strong>Quality Assurance</strong>: SMEs or other stakeholders can review my changes in <code style="background-color:#aed6f1">qa</code> when I merge my <code style="background-color:#aed6f1">feature</code></mark>
</li>
<li>
<mark style="background-color:#d6eaf8"><strong>Promotion:</strong> Once QA specialists give their approval of <code style="background-color:#aed6f1">qa</code>’s version of data, a <strong>release manager</strong> opens a pull request using <code style="background-color:#aed6f1">qa</code>’s branch targeting <code style="background-color:#aed6f1">main</code> (we define this as a <strong>“release”</strong>)</mark>
</li>
<li><strong>Deployment</strong>: Others can see and use my changes (<mark style="background-color:#d6eaf8">and other’s changes</mark>) in <code>main</code> <mark style="background-color:#d6eaf8">after <code style="background-color:#aed6f1">qa</code> is merged to <code style="background-color:#aed6f1">main</code></mark> and <code>main</code> is deployed</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="repository-branching-rules-and-helpers-1">Repository branching rules and helpers<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#repository-branching-rules-and-helpers-1" class="hash-link" aria-label="Direct link to Repository branching rules and helpers" title="Direct link to Repository branching rules and helpers">​</a></h3>
<p>At a minimum, we like to set up:</p>
<ul>
<li><strong>Branch protection</strong> on <code>main</code> and <code>qa</code> (<a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches" target="_blank" rel="noopener noreferrer">like these settings for GitHub</a>), requiring:<!-- -->
<ul>
<li>a pull request (no direct commits to <code>main</code> or <code>qa</code>)</li>
<li>pull requests must have at least 1 reviewer's approval</li>
</ul>
</li>
<li><strong>A PR template</strong> (<a href="https://docs.getdbt.com/blog/analytics-pull-request-template" target="_blank" rel="noopener noreferrer">such as our boiler-plate PR template</a>) for <code>feature</code> PRs against <code>qa</code></li>
<li><strong>A PR template</strong> (<a href="https://github.com/dbt-labs/dbt-proserv/blob/main/.github/release_pull_request_template.md" target="_blank" rel="noopener noreferrer">such as our boiler-plate PR template for releases</a>) for <code>qa</code> PRs against <code>main</code></li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="dbt-cloud-processes-and-environments-1">dbt Cloud processes and environments<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#dbt-cloud-processes-and-environments-1" class="hash-link" aria-label="Direct link to dbt Cloud processes and environments" title="Direct link to dbt Cloud processes and environments">​</a></h3>
<p>Here’s our branching strategy again, but now with the dbt Cloud processes we want to incorporate:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/7_indirect_dbt_deployment.png"><img data-toggle="lightbox" alt="Indirect Promotion strategy with dbt cloud processes denoted" title="Indirect Promotion strategy with dbt cloud processes denoted" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/7_indirect_dbt_deployment.png?v=2"></a></span><span class="title_aGrV">Indirect Promotion strategy with dbt cloud processes denoted</span></div>
<p>In order to create the jobs in our diagram, we need dbt Cloud environments. Here are the common configurations for this setup:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Environment Name</th><th><a href="https://docs.getdbt.com/docs/dbt-cloud-environments#types-of-environments" target="_blank" rel="noopener noreferrer">Environment Type</a></th><th><a href="https://docs.getdbt.com/docs/deploy/deploy-environments#staging-environment" target="_blank" rel="noopener noreferrer">Deployment Type</a></th><th>Base Branch</th><th>Will handle…</th></tr></thead><tbody><tr><td>Development</td><td>development</td><td>-</td><td><code>qa</code></td><td>Operations done in the IDE (including creating feature branches)</td></tr><tr><td>Feature CI</td><td>deployment</td><td>General</td><td><code>qa</code></td><td>A continuous integration job</td></tr><tr><td>Quality Assurance</td><td>deployment</td><td>Staging</td><td><code>qa</code></td><td>A deployment job</td></tr><tr><td>Release CI</td><td>deployment</td><td>General</td><td><code>main</code></td><td>A continuous integration job</td></tr><tr><td>Production</td><td>deployment</td><td>Production</td><td><code>main</code></td><td>A deployment job</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="data-platform-organization-1">Data platform organization<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#data-platform-organization-1" class="hash-link" aria-label="Direct link to Data platform organization" title="Direct link to Data platform organization">​</a></h3>
<p>Now we need to focus on where we want to build things in our data platform. For that,
we need to set our <strong>database</strong> and <strong>schema</strong> settings on the environments.
There are two common setups for mapping code, but before we get in to those
remember this note from direct promotion:</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>We are showing environment configurations here, but a default database will be set at the highest level in a <strong><a href="https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections" target="_blank" rel="noopener noreferrer">connection</a></strong> (which is a required setting of an environment). <em>Deployment</em> environments can override a connection's database setting when needed.</p></div></div>
<ul>
<li>
<p><strong>Configuration 1</strong>: A 1:1 of <code>qa</code> and <code>main</code> assets
In this pattern, the CI schemas are populated in a database <em>outside</em> of Production and QA. This is usually done to keep the databases aligned to what’s been merged on their corresponding branches.
Here’s our diagram, now mapping to the data platform with this pattern:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/8_indirect_data_population.png"><img data-toggle="lightbox" alt="Indirect Promotion branches and how they relate to 1\:1 organization in the data platform" title="Indirect Promotion branches and how they relate to 1\:1 organization in the data platform" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/8_indirect_data_population.png?v=2"></a></span><span class="title_aGrV">Indirect Promotion branches and how they relate to 1\:1 organization in the data platform</span></div>
<p>Here are our configurations for this pattern:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Environment Name</th><th><strong>Database</strong></th><th><strong>Schema</strong></th></tr></thead><tbody><tr><td>Development</td><td><code>development</code></td><td>User-specified in Profile Settings &gt; Credentials</td></tr><tr><td>Feature CI</td><td><code>development</code></td><td>Any safe default, like <code>dev_ci</code> (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.</td></tr><tr><td>Quality Assurance</td><td><code>qa</code></td><td><code>analytics</code></td></tr><tr><td>Release CI</td><td><code>development</code></td><td>A safe default</td></tr><tr><td>Production</td><td><code>production</code></td><td><code>analytics</code></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
</li>
<li>
<p><strong>Configuration 2</strong>: A reflection of the workflow initiative</p>
<p>In this pattern, the CI schemas populate in a <code>qa</code> database because it’s a step in quality assurance.
Here’s our diagram, now mapping to the data platform with this pattern:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/9_alt_indirect_data_population.png"><img data-toggle="lightbox" alt="Indirect Promotion branches and how they relate to workflow initiative organization in the data platform" title="Indirect Promotion branches and how they relate to workflow initiative organization in the data platform" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/9_alt_indirect_data_population.png?v=2"></a></span><span class="title_aGrV">Indirect Promotion branches and how they relate to workflow initiative organization in the data platform</span></div>
<p>Here are our configurations for this pattern:</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Environment Name</th><th><strong>Database</strong></th><th><strong>Schema</strong></th></tr></thead><tbody><tr><td>Development</td><td><code>development</code></td><td>User-specified in Profile Settings &gt; Credentials</td></tr><tr><td>Feature CI</td><td><code>qa</code></td><td>Any safe default, like <code>dev_ci</code> (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.</td></tr><tr><td>Quality Assurance</td><td><code>qa</code></td><td><code>analytics</code></td></tr><tr><td>Release CI</td><td><code>qa</code></td><td>A safe default</td></tr><tr><td>Production</td><td><code>production</code></td><td><code>analytics</code></td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="indirect-promotion-example">Indirect promotion example<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#indirect-promotion-example" class="hash-link" aria-label="Direct link to Indirect promotion example" title="Direct link to Indirect promotion example">​</a></h3>
<p><em>In this example, Steve uses the term “UAT” to define the automatic deployment of the middle branch and “QA” to define what’s built from feature branch pull requests. He also defines a database for each (with four databases total - one for development schemas, one for CI schemas, one for middle branch deployments, and one for production deployments) — we wanted to show you this example as it speaks to how configurable these processes are apart from our standard examples.</em></p>
<div style="margin:40px 10px"><div class="loomWrapper_TTvb"><iframe width="640" class="loomFrame_B61a" height="400" src="https://www.loom.com/embed/0e03faf9f8f7434fbe01eaf7b818e507" frameborder="0" allowfullscreen="" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe></div></div>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-did-indirect-promotion-change">What did indirect promotion change?<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#what-did-indirect-promotion-change" class="hash-link" aria-label="Direct link to What did indirect promotion change?" title="Direct link to What did indirect promotion change?">​</a></h2>
<p>You’ve probably noticed there is one overall theme of adding our additional branch, and that’s supporting our <em>Quality Assurance</em> initiative. Let’s break it down:</p>
<ul>
<li>
<p><strong>Development</strong></p>
<p>While no one will be developing in the <code>qa</code> branch itself, it does need a level of oversight  just like a <code>feature</code> branch needs in order to stay in sync with its base branch. This is because a change now to <code>main</code> (like a hotfix or accidental merge) won’t immediately flag our <code>feature</code> branches since they are based off of <code>qa</code>'s version of code. This branch needs to stay in sync with any change in <code>main</code> for this reason.</p>
</li>
<li>
<p><strong>Quality Assurance</strong></p>
<p>There are now <em>two places</em> where quality can be reviewed (<code>feature</code> and <code>qa</code>) before changes hit production. <code>qa</code> is typically leveraged in at least one of these ways for more quality assurance work:</p>
<ul>
<li>Testing and reviewing how end-to-end changes are performing over time</li>
<li>Deploying the full image of the <code>qa</code> changes to a centralized location. Some common reasons to deploy <code>qa</code> code are:<!-- -->
<ul>
<li>Leveraging <a href="https://docs.getdbt.com/reference/node-selection/defer" target="_blank" rel="noopener noreferrer">deferral</a> and <a href="https://docs.getdbt.com/docs/deploy/advanced-ci" target="_blank" rel="noopener noreferrer">Advanced</a> comparison features in CI</li>
<li>Testing builds from environment-specific data sets (dynamic sources)</li>
<li>Creating staging versions of workbooks in your BI tool.
This is most relevant when your BI tool doesn’t do well with changing underlying schemas. For instance, some tools have better controls for grabbing a production workbook for development, switching the underlying schema to a <code>dbt_cloud_pr_#</code> schema, and reflecting those changes without breaking things. Other tools will break every column selection you have in your workbook, even if the structure is the same. For this reason, it is sometimes easier to create one “staging” version workbook and always point it to a database built from QA code - the changes then can always be reflected and reviewed from that workbook before the code changes in production.</li>
<li>For other folks who want to see or test changes,  but aren’t personas that would be included in the review process.
For instance, you may have a subject matter expert reviewing and approving alongside developers, who understands the process of looking at <code>dbt_cloud_pr</code> schemas. However, if this person now communicates that they have just approved some changes with development to their teammates who will use those changes, the team might ask if there is a way they can also see the changes. Since the CI schema is dropped after merge, they would need to wait see this change in production if there is no process deploying the middle branch.</li>
</ul>
</li>
</ul>
</li>
<li>
<p><strong>Promotion</strong></p>
<p>There are now two places where code needs to be promoted:</p>
<ul>
<li>From <code>feature</code> to <code>qa</code> by a developer and peer (and optionally SMEs or stakeholders)</li>
<li>From <code>qa</code> to <code>main</code> by a release manager and SMEs or stakeholders</li>
</ul>
<p>Additionally, approved changes from feature branches are promoted together from <code>qa</code>.</p>
</li>
<li>
<p><strong>Deployment</strong></p>
<p>There are now two major branches code can be deployed from:</p>
<ul>
<li><code>qa</code>: The “working” version with changes, <code>features</code> merge here</li>
<li><code>main</code>: The “production” version</li>
</ul>
<p>Due to our changes collecting on the <code>qa</code> branch, our deployment process
changes from continuous deployment (”streaming” changes to <code>main</code> in direct
promotion) to continuous delivery (”batched” changes to <code>main</code>).
Julia Schottenstein does a great job explaining the differences <a href="https://www.getdbt.com/blog/adopting-ci-cd-with-dbt-cloud" target="_blank" rel="noopener noreferrer">here</a>.</p>
</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="comparing-branching-strategies">Comparing branching strategies<a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#comparing-branching-strategies" class="hash-link" aria-label="Direct link to Comparing branching strategies" title="Direct link to Comparing branching strategies">​</a></h2>
<p>Since most teams can make <strong>direct promotion</strong> work, we’ll list some key flags for when we start thinking about <strong>indirect promotion</strong> with a team:</p>
<ul>
<li>They speak about having a dedicated environment for QA, UAT, staging, or pre-production work.</li>
<li>They ask how they can test changes end-to-end and over time before things hit production.</li>
<li>Their developers aren’t the same, or the only, folks who are checking data outputs for validity - especially if the other folks are more familiar performing validations from other tools (like from BI dashboards).</li>
<li>Their different environments aren’t working with identical data. Like software environments, they may have limited or scrubbed versions of production data depending on the environment.</li>
<li>They have a schedule in mind for making changes “public”, and want to hold features back from being seen or usable until then.</li>
<li>They have very high-stakes data consumption.</li>
</ul>
<p>If you fit any of these, you likely fit into an indirect promotion strategy.</p>
<p><strong>Strengths and Weaknesses</strong></p>
<p>We highly recommend that you choose your branching strategy based on which <em>best supports</em> <em>your workflow needs</em> over any perceived pros and cons — when these are put in the context of your team’s structure and technical skills, you’ll find some aren’t strengths or weaknesses at all!</p>
<ul>
<li>
<p><strong>Direct promotion</strong></p>
<p>Strengths</p>
<ul>
<li>Much faster in terms of seeing changes - once the PR is merged and deployed, the changes are “in production”.</li>
<li>Changes don’t get stuck in a middle branch that’s pending the acceptance of someone else’s validation on data output.</li>
<li>Management is mainly distributed - every developer owns their own branch and ensuring it’s in sync with what’s in <code>main</code>.</li>
<li>There’s no releases to worry about, so no extra processes to manage.</li>
</ul>
<p>Weaknesses</p>
<ul>
<li>It can present challenges for testing changes end-to-end or over time in an environment that isn't production. Our desire to build only modified and directly impacted models to reduce the amount of models executed in CI goes against the grain of full end-to-end testing, and our CI mechanism (which executes only upon pull request or new commit) won’t help us test over time.</li>
<li>It can be more difficult for differing schedules or technical abilities when it comes to review. It’s essential in this strategy to include stakeholders or subject matter experts on pull requests <em>before merge,</em> because the next step is production. Additionally, some tools aren’t great at switching databases and schemas even if the shape of the data is the same. Constant breakage of reports for review can be too much overhead.</li>
<li>It can be harder to test configurations or job changes before they hit production, especially if things function a bit differently based on environment.</li>
<li>It can be harder to share code that works fully but isn’t a full reflection of a complete task. Changes need to be agreed upon to go to production so others can pull them in, otherwise developers need to know how to pull these in from other branches that aren’t <code>main</code> (and be aware of staying in sync or risk merge conflicts).</li>
</ul>
</li>
<li>
<p><strong>Indirect promotion</strong></p>
<p>Strengths</p>
<ul>
<li>There’s a dedicated environment to test end-to-end changes over time.</li>
<li>Data outputs can be reviewed either with a developer on PR or once things are in the middle branch.</li>
<li>Review from other tools is much easier because we have the option of deploying our middle branch to a centralized location. “Staging” reports can be set up to always refer to this location for reviewing changes, and processes for creating new reports can flow from staging to production.</li>
<li>Configurations and job changes can be tested with production-like parameters before they actually hit production.</li>
<li>Changes merged to the middle branch for shared development won't be reflected in production. Consumers of <code>main</code> will be none-the-wiser about the things that developers do for ease of collaboration.</li>
</ul>
<p>Weaknesses</p>
<ul>
<li>Changes can be slower to get to production due to the extra processes intended for the middle branch. In order to keep things moving, there should be someone (or a group of people) in place who fully own managing the changes, validation status, and release cycle.</li>
<li>Changes that are valid can get stuck behind other changes that aren’t - having a good plan in place for how the team should handle this scenario is essential because conundrum can hold up getting things to production.</li>
<li>There’s extra management of any new trunks, which will need ownership - without someone (or a group of people) who are knowledgeable, it can be confusing understanding what needs to be done and how to do it when things get out of sync.</li>
<li>It can require additional compute in the form of scheduled jobs in the QA environment, as well as an additional CI job from <code>qa</code> &gt; <code>main</code> for testing releases before they're merged.</li>
</ul>
</li>
</ul>
<h1>Further enhancements</h1>
<p>Once you have your basic configurations in place, you can further tweak your project by considering which other features will be helpful for your needs:</p>
<ul>
<li>Continuous Integration:<!-- -->
<ul>
<li><a href="https://docs.getdbt.com/docs/deploy/ci-jobs#set-up-ci-jobs" target="_blank" rel="noopener noreferrer">Only running and testing changed models</a> and their dependencies</li>
<li>Using <a href="https://docs.getdbt.com/reference/commands/clone" target="_blank" rel="noopener noreferrer">dbt clone</a> to get a copy of large incrementals in CI</li>
</ul>
</li>
<li>Development and Deployment:<!-- -->
<ul>
<li>Using <a href="https://docs.getdbt.com/docs/build/custom-schemas" target="_blank" rel="noopener noreferrer">schema configurations</a> in the project to add more separation in a database</li>
<li>Using <a href="https://docs.getdbt.com/docs/build/custom-databases" target="_blank" rel="noopener noreferrer">database configurations</a> in the project to switch databases for model builds</li>
</ul>
</li>
</ul>
<h1>Frequently asked git questions</h1>
<p><strong>General</strong></p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do you prevent developers from changing specific files?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Many git providers have a CODEOWNERS feature which can be leveraged to tag appropriate reviewers when certain files or folders are changed.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do you execute other types of checks in the development workflow?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Auto-formatting and linting are both <a href="https://docs.getdbt.com/docs/cloud/studio-ide/lint-format#format" target="_blank" rel="noopener noreferrer">features available in dbt Cloud's IDE</a>. You can enable linting <a href="https://docs.getdbt.com/docs/deploy/continuous-integration#sql-linting" target="_blank" rel="noopener noreferrer">within your CI job</a>.</p><p>Other types of checks are typically implemented through external pipelines, and usually through the git provider due to the alignment of where these checks are desired in the development workflow. Many git providers have pipeline features available, such as GitHub's Actions or Gitlab's CI/CD Pipelines. Here's an example which <a href="https://medium.com/@durgeshm01722/add-a-branch-naming-pattern-status-check-to-your-github-prs-660c53331b68" target="_blank" rel="noopener noreferrer">checks that a branch name follows a pattern upon a pull request event</a>).</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do you revert changes?</summary><div><div class="collapsibleContent_i85q"><p></p><p>This is an action performed outside of dbt through git operations, but an immediate solution can be implemented using git tags/releases until your code is fixed to your liking:</p><ul>
<li>Apply a git tag (a feature on most git platforms) on the commit SHA that you want to roll back to</li>
<li>Use the tag as your <code>custom branch</code> on your production environment in dbt Cloud. Your jobs will now check out the code at this point in time.</li>
<li>Now you can work as normal. Fix things through the development workflow or have a knowledgeable person revert the changes through git, it doesn’t matter - production is pinned to the previous state until you change the custom branch back to <code>main</code>!</li>
</ul><p></p></div></div></details>
<p><strong>Indirect promotion-specific</strong></p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do you make releases?</summary><div><div class="collapsibleContent_i85q"><p></p><p>In our examples, we noted that our definition of a release is a pull request from <code>qa</code> to <code>main</code>, and this is opened from the git platform.</p><p><strong>Having the source branch as <code>qa</code> on your pull request will also incorporate any new merges to <code>qa</code> while your PR stays open, possibly resulting in other features being promoted to <code>main</code> unintentionally once you merge.</strong> Because of this, it’s important that the person opening a release stays up to date on merges and last runs to ensure the validity of changes before the release is merged. There are two options we like to implement to make this easier:</p><ul>
<li>A CI job for pull requests against <code>main</code>: this will run a CI job comparing our middle branch to <code>main</code> at release time, and will rerun when there are any new merges to <code>qa</code>. Not only that, but the status will show on our pull request and we can leverage other features like <a href="https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks" target="_blank" rel="noopener noreferrer">GitHub's required status checks</a> to further ensure we're only merging successful and tested changes.</li>
<li>An <a href="https://docs.getdbt.com/docs/deploy/merge-jobs" target="_blank" rel="noopener noreferrer">on-merge job</a> using our <code>qa</code> environment. This will run a job any time someone merges. You may opt for this if you’d rather not wait on a CI pipeline to finish when you open a release. However, this will not put a status on a release PR and we wouldn't be able to block anyone on merging a release based on run status. When using this method, the release owner should still be staying up to date with merges and status of latest run before merge.</li>
</ul><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Hierarchical promotion introduces changes that may not be ready for production yet, which holds up releases. How do you manage that?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The process of choosing specific commits to move to another branch is called <strong>Cherry Picking</strong>.</p><link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/11_cherry_picking.png"><img data-toggle="lightbox" alt="Cherry Picking diagram" title="Cherry Picking diagram" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/11_cherry_picking.png?v=2"></a></span><span class="title_aGrV">Cherry Picking diagram</span></div><p>You may be tempted to change to a less standard branching strategy to avoid this - our colleague Grace Goheen has <a href="https://docs.getdbt.com/blog/the-case-against-git-cherry-picking" target="_blank" rel="noopener noreferrer">written some thoughts on this</a> and provided examples - it’s a worthwhile read!</p><p>dbt does not perform cherry picking operations and this needs to be done from a command line interface or your git platform’s user interface, if the option is available. We align with Grace on this one — not only does cherry picking require a very good understanding of git operations and the state of the branches, but when it isn’t done with care, it can introduce a host of other issues that can be hard to resolve. What we tend to see is that the CI processes we’ve exemplified instead shift what the definition of the first PR’s approval is - not only can it be approved for coding and syntax by a peer, but it can also be approved for it’s output by selecting from objects built within the CI schema. This eliminates a lot of the issues with encountering code that can’t be merged to production.</p><p>We also implement other features that help us in trying times:</p><ul>
<li>The <a href="https://docs.getdbt.com/reference/node-selection/exclude" target="_blank" rel="noopener noreferrer"><code>--exclude</code></a> command flag helps us omit building models in a job</li>
<li>The <a href="https://docs.getdbt.com/reference/resource-configs/enabled" target="_blank" rel="noopener noreferrer"><code>enabled</code></a> configuration helps us keep models from being executed in any job for a longer-term solution</li>
<li>Using <a href="https://docs.getdbt.com/docs/mesh/govern/model-contracts" target="_blank" rel="noopener noreferrer">contracts</a> and <a href="https://docs.getdbt.com/docs/mesh/govern/model-versions" target="_blank" rel="noopener noreferrer">versions</a> helps alleviate breaking code changes between teams in dbt Mesh</li>
<li><a href="https://docs.getdbt.com/docs/build/unit-tests" target="_blank" rel="noopener noreferrer">Unit tests</a> and <a href="https://docs.getdbt.com/docs/build/data-tests" target="_blank" rel="noopener noreferrer">data tests</a>, along with forming best practices around the minimum requirements for every model, helps us continuously test our expectations (see the <a href="https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest/" target="_blank" rel="noopener noreferrer">dbt_meta_testing</a> package)</li>
<li>Using the <a href="https://hub.getdbt.com/dbt-labs/audit_helper/latest" target="_blank" rel="noopener noreferrer">dbt audit helper</a> package or <a href="https://docs.getdbt.com/docs/deploy/advanced-ci" target="_blank" rel="noopener noreferrer">enabling advanced CI on our continuous integration jobs</a> helps us understand the impacts our changes make to the original data set</li>
</ul><p>If you are seeing a need to cherry-pick regularly, assessing your review and quality assurance processes and where they are happening in your pipeline can be very helpful in determining how you can avoid it.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What if a bad change made it all the way in to production?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The process of fixing <code>main</code> directly is called a <strong>hotfix</strong>. This needs to be done with git locally or with your git platform’s user interface because dbt’s IDE is based on the branch you set for your developers to base from (in our case, <code>qa</code>).</p><p>The pattern for hotfixes in hierarchical promotion looks like this:</p><link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/12_hotfixes.png"><img data-toggle="lightbox" alt="Hotfix diagram" title="Hotfix diagram" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/12_hotfixes.png?v=2"></a></span><span class="title_aGrV">Hotfix diagram</span></div><p>Here’s how it’s typically performed:</p><ol>
<li>Create a branch from <code>main</code>, then make the change and test the fix.</li>
<li>Open a PR to <code>main</code>, get the fix approved, then merged. The fix is now live.</li>
<li>Check out <code>qa</code>, and <code>git pull</code> to ensure it’s up to date with what’s on the remote.</li>
<li>Merge <code>main</code> into <code>qa</code>: <code>git merge main</code>.</li>
<li><code>git push</code> the changes back to the remote.</li>
<li>At this point in our example, developers will be flagged in dbt Cloud’s IDE that there is a change on their base branch and can ”Pull from remote”. However, if you implement more than one middle branch you will need to continue resolving your branches hierarchically until you update the branch that developers base from.</li>
</ol><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What if we want to use more than one middle branch in our strategy?</summary><div><div class="collapsibleContent_i85q"><p></p><p>In our experience, using more than one middle branch is rarely needed. The more steps you are away from main, the more hurdles you’ll need to jump through getting back to it. If your team isn’t properly equipped, this ends up putting a lot of overhead on development operations. For this reason, we don’t recommend more branches if you can help it. The teams who are successful with more trunks are built with plenty of folks who can properly dedicate the time and management to these processes.</p><link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/13_more_branches.png"><img data-toggle="lightbox" alt="A git strategy with more branches" title="A git strategy with more branches" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/13_more_branches.png?v=2"></a></span><span class="title_aGrV">A git strategy with more branches</span></div><p>This structure is mostly desired when there are requirements for using different versions data (i.e scrubbed data) by different teams, but working with the same code changes. This structure allows each team to have a dedicated environment for deployments. Example:</p><ol>
<li>Developers work off of mocked data for their <code>feature</code> branches and merge to <code>qa</code> for end-to-end and over-time testing of all merged changes using the mocked data before releasing to <code>preproduction</code>.</li>
<li>Once <code>qa</code> is merged to <code>preproduction</code>, the underlying data being used switches to using scrubbed production data and other personas can start looking at and reviewing how this data is functioning before it hits production.</li>
<li>Once <code>preproduction</code> is merged to <code>main</code>, the underlying data being used switches to production data sets.</li>
</ol><p>To show a comparison, this same use case can be covered with a more simple branching strategy through the use of git tags and <a href="https://docs.getdbt.com/docs/build/environment-variables" target="_blank" rel="noopener noreferrer">dbt environment variables</a> to switch source data:</p><ul>
<li>
<p>Indirect Promotion:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/14_indirect_tagging.png"><img data-toggle="lightbox" alt="Tagging in Indirect Promotion" title="Tagging in Indirect Promotion" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/14_indirect_tagging.png?v=2"></a></span><span class="title_aGrV">Tagging in Indirect Promotion</span></div>
</li>
<li>
<p>Direct Promotion:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/15_direct_tagging.png"><img data-toggle="lightbox" alt="Tagging in Direct Promotion" title="Tagging in Direct Promotion" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/15_direct_tagging.png?v=2"></a></span><span class="title_aGrV">Tagging in Direct Promotion</span></div>
</li>
</ul><p>Whichever option you decide depends on how your team would like to manage the changes. No matter the reason for more branches, these points are always relevant to plan out:</p><ul>
<li>Can we accurately describe the use case of each branch?</li>
<li>Who owns the oversight of any new branches?</li>
<li>Who are the major players in the promotion process between each branch and what are they responsible for?</li>
<li>Which major branches do we want dbt Cloud deployment jobs for?</li>
<li>Which PR stages do we want continuous integration jobs on?</li>
<li>Which major branch rules or PR templates do we need to add?</li>
</ul><p>By answering these questions, you should be able to follow our same guidance from our examples for setting up your additional branches.</p><p></p></div></div></details>
<p><strong>Direct promotion-specific</strong></p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>We need a middle environment and don’t want to change our branching strategy! Is there any way to reflect what’s in development?</summary><div><div class="collapsibleContent_i85q"><p></p><p>git releases/tags are a mechanism which help you label a specific commit SHA. <em>Deployment environments</em> in dbt Cloud can use these just like they can a custom branch. Teams will leverage this either to pin their environments to code at a certain point in time or to keep as a roll-back option, if needed.</p><p>We can use the pinning method to create our middle environment. Example:</p><ul>
<li>We create a release tag, <code>v2</code>, from our repository.</li>
<li>We specify <code>v2</code> as our branch in our Production environment’s <strong>custom branch</strong> setting.
Jobs using Production will now check out code at <code>v2</code>.</li>
<li>We set up an environment called “QA”, with the <strong>custom branch</strong> setting as <code>main</code>. For the database and schema, we specify the <code>qa</code> database and <code>analytics</code> schema. Jobs created using this environment will check out code from <code>main</code> and build it to <code>qa.analytics</code>.</li>
</ul><link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/git-branching-strategies-with-dbt#" data-featherlight="/img/blog/2025-01-28-git-branching-strategies-and-dbt/16_direct_tagging_middle_env.png"><img data-toggle="lightbox" alt="Tagging in Direct Promotion to create a middle environment" title="Tagging in Direct Promotion to create a middle environment" src="https://docs.getdbt.com/img/blog/2025-01-28-git-branching-strategies-and-dbt/16_direct_tagging_middle_env.png?v=2"></a></span><span class="title_aGrV">Tagging in Direct Promotion to create a middle environment</span></div><div style="margin:40px 10px"><div class="loomWrapper_TTvb"><iframe width="640" class="loomFrame_B61a" height="400" src="https://www.loom.com/embed/dfe057bf92b2498eb1e653c32fc72e93" frameborder="0" allowfullscreen="" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe></div></div><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do we change from a direct promotion strategy to an indirect promotion strategy?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Here’s the additional setup steps in a nutshell (using the name <code>qa</code> for our middle environment) - for more details be sure to read through the indirect promotion section:</p><ul>
<li>git Platform<!-- -->
<ul>
<li>Create a new branch called <code>qa</code>, which is derived from <code>main</code></li>
<li>Protect <code>qa</code> with branch protection rules</li>
</ul>
</li>
<li>dbt Cloud<!-- -->
<ul>
<li>Development: Switch your environment to use the <strong>custom branch</strong> option and specify <code>qa</code>. This will base developers now off of <code>qa</code> code.</li>
<li>Continous Integration: If you have an existing job for this, ensure the <strong>custom branch</strong> is changed to <code>qa</code>. This will change the CI job’s trigger to occur on pull requests to <code>qa</code>.</li>
</ul>
</li>
</ul><p><strong>At this point, your developers will be following the indirect promotion workflow and you can continue working on things in the background.</strong> You may still need to set up a database, database permissions, environments, deployment jobs, etc. Here is a short checklist to help you out! Refer back to our section on indirect promotion for many more details:</p><ul>
<li>
<p><strong>Decide if you want to deploy QA code.</strong> Many folks will deploy so they can make use of deferral and Advanced CI features. If so:**</p>
<ul>
<li>Create the database where the objects will build</li>
<li>Set up a service account for QA and give it all the proper permissions to create and modify the contents within this database. It should also have select-only access to raw data.</li>
<li>Set up an environment for QA in dbt Cloud, being sure to connect it to the database and schema you want your deployments to build in.</li>
<li>Set up any deployment jobs using the QA environment.</li>
<li>If you want to use deferral or advanced features in CI, be sure that you first have a successful run in QA and then set your deferral setting on your CI job to the QA environment.</li>
</ul>
</li>
<li>
<p><strong>Decide if you want CI on release pull requests (from <code>qa</code> to <code>main</code>). If so:</strong></p>
<ul>
<li>Set up an environment called “Release CI”</li>
<li>Set up the continuous integration job using the “Release CI” environment</li>
<li>If you want to leverage deferral or advanced CI features, defer to your production environment.</li>
</ul>
</li>
</ul><p></p></div></div></details>]]></content>
        <author>
            <name>Christine Berger</name>
        </author>
        <author>
            <name>Carol Ohms</name>
        </author>
        <author>
            <name>Taylor Dunlap</name>
        </author>
        <author>
            <name>Steve Dowling</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Parser, Better, Faster, Stronger: A peek at the new dbt engine]]></title>
        <id>https://docs.getdbt.com/blog/faster-project-parsing-with-rust</id>
        <link href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust"/>
        <updated>2025-02-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.]]></summary>
        <content type="html"><![CDATA[<p>Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#" data-featherlight="/img/blog/2025-02-19-faster-project-parsing-with-rust/parsing_10k.gif"><img data-toggle="lightbox" alt="Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it" title="Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it" src="https://docs.getdbt.com/img/blog/2025-02-19-faster-project-parsing-with-rust/parsing_10k.gif?v=2"></a></span><span class="title_aGrV">Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it</span></div>
<p>After a <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">series of deep dives</a> into the <a href="https://docs.getdbt.com/blog/sql-comprehension-technologies">guts of SQL comprehension</a>, let's talk about speed a little bit. Specifically, I want to talk about one of the most annoying slowdowns as your project grows: project parsing.</p>
<p>When you're waiting a few seconds or a few minutes for things to start happening after you invoke dbt, it's because parsing isn't finished yet. But Lukas' <a href="https://www.getdbt.com/resources/webinars/accelerating-dbt-with-sdf" target="_blank" rel="noopener noreferrer">SDF demo at last month's webinar</a> didn't have a big wait, so why not?</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="a-primer-on-parsing">A primer on parsing<a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#a-primer-on-parsing" class="hash-link" aria-label="Direct link to A primer on parsing" title="Direct link to A primer on parsing">​</a></h2>
<p>Parsing your project (remember: <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension">not your SQL</a>!) is how dbt builds the dependency graph of models and macros. If you've ever looked at a <code>manifest.json</code> and noticed all the <code>depends_on</code> blocks, that's what we're talking about.</p>
<p>Without the resolved dependencies, dbt can't filter down to a subset of your project – this is why parsing is always an all-or-nothing affair. You can't do <code>dbt parse --select my_model+</code> because parsing is what works out what's on the other side of that plus. (Of course, most projects use partial parsing so are not starting from scratch every time).</p>
<p>All those refs and macros are defined in Jinja. I don't know if you've ever thought about how Jinja gets from curly braces into text, but it's pretty weird! It's actually a two-step process: first it gets converted into Python code, and then that Python code is <em>itself run to generate a string</em>!</p>
<p>This is kinda slow. Not so much as a one-off, but a project with 10,000 nodes might have 15-20,000 dependencies so every millisecond adds up.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-if-we-wanted-it-to-be-faster">What if we wanted it to be faster?<a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#what-if-we-wanted-it-to-be-faster" class="hash-link" aria-label="Direct link to What if we wanted it to be faster?" title="Direct link to What if we wanted it to be faster?">​</a></h2>
<p>Since running the code is slow, one way to get results faster is to not run the code. Since v1.0, dbt's parser has <a href="https://github.com/dbt-labs/dbt-core/blob/main/docs/guides/parsing-vs-compilation-vs-runtime.md#:~:text=Simple%20Jinja%2DSQL%20models%20(using%20just%20ref()%2C%20source()%2C%20%26/or%20config()%20with%20literal%20inputs)%20are%20also%20statically%20analyzed%2C%20using%20a%20thing%20we%20built.%20This%20is%20very%20fast%20(~0.3%20ms)" target="_blank" rel="noopener noreferrer">used a static analyzer</a> to resolve refs when possible, which is <a href="https://docs.getdbt.com/reference/parsing#:~:text=For%20now%2C%20the%20static%20parser,speedup%20in%20the%20model%20parser" target="_blank" rel="noopener noreferrer">about 3x faster</a> than going through the whole rigmarole above.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#" data-featherlight="/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_1.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_1.png?v=2"></a></span></div>
<p>The other way you could get the result faster is to run the code faster.</p>
<p>The original author of Jinja also wrote <a href="https://github.com/mitsuhiko/minijinja" target="_blank" rel="noopener noreferrer">minijinja</a> – a Rust implementation of a subset of the original Jinja library.</p>
<p>This is not the post for a deep dive on <em>why</em> Rust and Python have such different performance characteristics, but the key takeaway is that <a href="https://github.com/mitsuhiko/minijinja/tree/main/benchmarks" target="_blank" rel="noopener noreferrer">minijinja can <em>fully evaluate</em> a ref 30 times faster</a> than today's dbt can even <em>statically analyze</em> it.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#" data-featherlight="/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_2.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_2.png?v=2"></a></span></div>
<p>Our analysis in the leadup to dbt v1.0 showed that the static analyzer could handle 60% of models. Evaluating refs 30x faster in 60% of models would itself be great.</p>
<p>But recall that static analysis was the workaround for evaluating Jinja being slow. Since <strong>we can now evaluate Jinja faster than we can statically analyze it</strong>, let's just<sup>†</sup> evaluate everything!</p>
<p><sup>†</sup>The word "just" is doing a <em>lot</em> of heavy lifting here. In practice, there's a lot happening behind the scenes to get both the performance of minijinja and the ability to process the full range of capabilities of a dbt project. Another story for another day.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-does-this-mean-in-practice">What does this mean in practice?<a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#what-does-this-mean-in-practice" class="hash-link" aria-label="Direct link to What does this mean in practice?" title="Direct link to What does this mean in practice?">​</a></h2>
<p>As you saw at the top of the post, I've been running some synthetic projects against an early build of the new dbt engine, and it's pretty snappy - <strong>parsing a 10,000 model project in under 600ms</strong>. Let's see how it goes with some other common project sizes:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#" data-featherlight="/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_linear.png"><img data-toggle="lightbox" alt="You might have to squint, but I promise there's a yellow line on each of those groups" title="You might have to squint, but I promise there's a yellow line on each of those groups" src="https://docs.getdbt.com/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_linear.png?v=2"></a></span><span class="title_aGrV">You might have to squint, but I promise there's a yellow line on each of those groups</span></div>
<p>Even a 20,000-model project finished parsing in about a second. The equivalent cold parse takes well over a minute, and a partial parse (with no changed files) took about 12 seconds.</p>
<p>Let's look at one more comparison: <strong>100k models. I need to break out the log scale for this one:</strong></p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#" data-featherlight="/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_log.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_log.png?v=2"></a></span></div>
<p>The new dbt engine parsed our 100,000 model example project in under 10 seconds, compared with almost 20 minutes.</p>
<p>Let me be clear: I do not think you should put 100,000 models into your project! I mostly ran that one for the lols. But back in the realm of project sizes that actually exist:</p>
<ul>
<li>If your project isn't currently eligible for partial parsing, cold parses in Rust are fast enough to make it a moot point.</li>
<li>Regardless of how your project parses today, your project will feel like it's a couple of orders of magnitude smaller than it is.</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="were-just-getting-started">We're just getting started<a href="https://docs.getdbt.com/blog/faster-project-parsing-with-rust#were-just-getting-started" class="hash-link" aria-label="Direct link to We're just getting started" title="Direct link to We're just getting started">​</a></h2>
<p>Speed is just one benefit to come from this integration, and pales in comparison to, say, <a href="https://roundup.getdbt.com/p/the-power-of-a-plan-how-logical-plans" target="_blank" rel="noopener noreferrer">the importance of logical plans</a>. But it sure is fun!</p>
<p>The teams are still hard at work integrating the two tools, and we'll have more to share on how the developer experience will change thanks to SDF's tech at our <a href="https://www.getdbt.com/resources/webinars/dbt-developer-day" target="_blank" rel="noopener noreferrer">Developer Day event in March</a>.</p>]]></content>
        <author>
            <name>Joel Labes</name>
        </author>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The key technologies behind SQL Comprehension]]></title>
        <id>https://docs.getdbt.com/blog/sql-comprehension-technologies</id>
        <link href="https://docs.getdbt.com/blog/sql-comprehension-technologies"/>
        <updated>2025-01-24T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[The technologies that power the three levels of SQL comprehension. ]]></summary>
        <content type="html"><![CDATA[<p>You ever wonder what’s <em>really</em> going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database?</p>
<p>OK, probably not 😅. Your personal tastes aside, we’ve been talking a <em>lot</em> about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a <a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension" target="_blank" rel="noopener noreferrer">blog that talked about the different levels of SQL Comprehension tools</a>. If you read that, you may have encountered a few new terms you weren’t super familiar with.</p>
<p>In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights!</p>
<p>Here’s a quick refresher on the levels of SQL comprehension:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-23-levels-of-sql-comprehension/validation_all_levels.png"><img data-toggle="lightbox" alt="The three levels of SQL Comprehension, with example SQL." title="The three levels of SQL Comprehension, with example SQL." src="https://docs.getdbt.com/img/blog/2025-01-23-levels-of-sql-comprehension/validation_all_levels.png?v=2"></a></span><span class="title_aGrV">The three levels of SQL Comprehension, with example SQL.</span></div>
<p>Each of these levels is powered by a distinct set of technologies. It’s useful to explore these technologies in the context of the SQL Comprehension tool you are probably most familiar with: a database! A database, as you might have guessed, has the deepest possible SQL comprehension abilities as well as SQL <em>execution</em> abilities — it contains all necessary technology to translate a SQL query text into rows and columns.</p>
<p>Here’s a simplified diagram to show your query’s fantastic voyage of translation into tabular data:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/full_translation_flow.png"><img data-toggle="lightbox" alt="A flow chart showing a SQL query's journey to raw data." title="A flow chart showing a SQL query's journey to raw data." src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/full_translation_flow.png?v=2"></a></span><span class="title_aGrV">A flow chart showing a SQL query's journey to raw data.</span></div>
<p>First, databases use a <strong>parser</strong> to translate SQL code into a <strong>syntax tree.</strong> This enables syntax validation + error handling.</p>
<p>Second, database <strong>compilers</strong> <strong>bind</strong> metadata to the syntax tree to create a fully validated <strong>logical plan.</strong> This enables a complete understanding of the operations required to generate your dataset, including information about the datatypes that are input and output during SQL execution.</p>
<p>Third, the database <strong>optimizes</strong> and <strong>plans</strong> the operations defined by a logical plan, generating a <strong>physical plan</strong> that maps the logical steps to physical hardware, then executes the steps with data to finally return your dataset!</p>
<p>Let’s explore each of these levels in more depth!</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-1-parsing">Level 1: Parsing<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#level-1-parsing" class="hash-link" aria-label="Direct link to Level 1: Parsing" title="Direct link to Level 1: Parsing">​</a></h2>
<p>At Level 1, SQL comprehension tools use a <strong>parser</strong> to translate SQL code into a <strong>syntax tree.</strong> This enables syntax validation + error handling. <em>Key Concepts: Intermediate Representations, Parsers, Syntax Trees</em></p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/parser.png"><img data-toggle="lightbox" alt="Parsers can model the grammar and structure of code." title="Parsers can model the grammar and structure of code." src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/parser.png?v=2"></a></span><span class="title_aGrV">Parsers can model the grammar and structure of code.</span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="intermediate-representations">Intermediate representations<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#intermediate-representations" class="hash-link" aria-label="Direct link to Intermediate representations" title="Direct link to Intermediate representations">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p><strong>Intermediate representations</strong> are data objects created during the process of <em>compiling</em> code.</p></div></div>
<p>Before we dive into the specific technologies, we should define a key concept in computer science that’s very relevant to understanding how this entire process works under the hood: an <a href="https://en.wikipedia.org/wiki/Intermediate_representation" target="_blank" rel="noopener noreferrer"><strong>Intermediate Representation (IR)</strong></a>. When code is executed on a computer, it has to be translated from the human-readable code we write to the machine-readable code that actually does the work that the higher-level code specifies, in a process called <em>compiling</em>. As a part of this process, your code will be translated into a number of different objects as the program runs; each of these is called an <em>intermediate representation.</em></p>
<p>To provide an example / analogy that will be familiar to dbt users, think about what your intermediate models are in the context of your dbt DAG — a translated form of your source data created in the process of synthesizing your final data marts. These models are effectively an intermediate representation. We’re going to talk about a few different types of IRs in this post, so it’s useful to know about them now before we get too deep!</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="parsers">Parsers<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#parsers" class="hash-link" aria-label="Direct link to Parsers" title="Direct link to Parsers">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p><strong>Parsers</strong> are programs that translate raw code into <em>syntax trees</em>.</p></div></div>
<p>All programming languages require a parser, which is often the first step in the translation process from human-readable to machine-readable code. Parsers are programs that can map the syntax, or grammar, of your code into a syntax tree, and understand whether the code you wrote follows the basic rules of the language.</p>
<p>In computing, parsers have a few underlying pieces of technology that build the syntax tree that understands the relationships between your variables, functions, and classes, etc. The components of a parser include:</p>
<ul>
<li><strong>a lexer</strong>, which takes raw code strings, and return lists of tokens recognized in the code (in SQL, <code>SELECT</code> , <code>FROM</code> , and <code>sum</code> would be examples of tokens recognized by a lexer)</li>
<li><strong>a parser</strong>, which takes the lists of tokens generated by a lexer, and builds the syntax tree based on grammatical rules of the language (i.e. a <code>SELECT</code> must be followed by one or more column expressions, a <code>FROM</code> must reference a table, or CTE, or subquery, etc).</li>
</ul>
<p>In other words, the lexer first detects the tokens that are present in a SQL query (is there a filter? which functions are called?) and the parser is responsible for mapping the dependencies between them.</p>
<p>A quick vocab note: while technically, the parser is only the component that translates tokens into a syntax tree, the word “parser” has come to be shorthand for the whole process of lexing and parsing.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="syntax-trees">Syntax trees<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#syntax-trees" class="hash-link" aria-label="Direct link to Syntax trees" title="Direct link to Syntax trees">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p><strong>Syntax trees</strong> are a representation of a unit of language according to a set of grammatical rules.</p></div></div>
<p>Your first introduction to understanding syntactical rules probably came when you learned how to diagram sentences in your grade school grammar classes! Diagramming the parts of speech in a sentence and mapping the dependencies between each of its components is precisely what a parser does — the resulting representation of the sentence is a syntax tree. Here’s a silly example:</p>
<blockquote>
<p><code>My cat jumped over my lazy dog</code></p>
</blockquote>
<p>By parsing this sentence according to the rules of the English language, we can get this syntax tree:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/sentence_syntax_tree.png"><img data-toggle="lightbox" alt="Apologies to my mother, an english teacher, who likely takes umbrage with this simplified example" title="Apologies to my mother, an english teacher, who likely takes umbrage with this simplified example" src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/sentence_syntax_tree.png?v=2"></a></span><span class="title_aGrV">Apologies to my mother, an english teacher, who likely takes umbrage with this simplified example</span></div>
<p>Let’s do the same thing with simple SQL query:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-sql codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">select</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  order_id</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token function" style="color:rgb(130, 170, 255)">sum</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">amount</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">as</span><span class="token plain"> total_order_amount</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> order_items</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">where</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  date_trunc</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(173, 219, 103)">'year'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> ordered_at</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> </span><span class="token string" style="color:rgb(173, 219, 103)">'2025-01-01'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">group</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(127, 219, 202)">by</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>By parsing this query according to the rules of the SQL language, we get something that looks like this:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/sql_syntax_tree.png"><img data-toggle="lightbox" alt="This is a simplified syntax tree — This was made by hand, and may not be exactly what the output of a real SQL parser looks like!" title="This is a simplified syntax tree — This was made by hand, and may not be exactly what the output of a real SQL parser looks like!" src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/sql_syntax_tree.png?v=2"></a></span><span class="title_aGrV">This is a simplified syntax tree — This was made by hand, and may not be exactly what the output of a real SQL parser looks like!</span></div>
<p>The syntax trees produced by parsers are a very valuable type of intermediate representation; with a syntax tree, you can power features like syntax validation, code linting, and code formatting, since those tools only need knowledge of the <em>syntax</em> of the code you’ve written to work.</p>
<p>However, parsers also dutifully parse <em>syntactically correct code</em> that <em>means nothing at all</em>. To illustrate this, consider the <a href="https://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously" target="_blank" rel="noopener noreferrer">famous sentence</a> developed by linguistics + philosophy professor Noam Chomsky:</p>
<blockquote>
<p><code>Colorless green ideas sleep furiously</code></p>
</blockquote>
<p>That’s a perfectly valid, diagrammable, parsable sentence according to the rules of the English language. But that means <em>absolutely nothing</em>. In SQL engines, you need a way to imbue a syntax tree with additional metadata to understand whether or not it represents executable code. As described in our first post, Level 1 SQL Comprehension tools are not designed to provide this context. They can only provide pure syntax validation. Level 2 SQL Comprehension tools augment these syntax trees with <em>meaning</em> by fully **compiling **the SQL.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-2-compiling">Level 2: Compiling<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#level-2-compiling" class="hash-link" aria-label="Direct link to Level 2: Compiling" title="Direct link to Level 2: Compiling">​</a></h2>
<p>At Level 2, SQL comprehension tools use a <strong>compiler</strong> to <strong>bind</strong> metadata to the syntax tree to create a fully validated <strong>logical plan.</strong>  <em>Key concepts: Binders, Logical Plans, Compilers</em></p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/compiler.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/compiler.png?v=2"></a></span></div>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="binders">Binders<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#binders" class="hash-link" aria-label="Direct link to Binders" title="Direct link to Binders">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>In SQL <em>compilers</em>, <strong>binders</strong> are programs that enhance + resolve <em>syntax trees</em> into <em>logical plans.</em></p></div></div>
<p>In compilers, <em>binders</em> (also called <em>analyzers</em> or <em>resolvers</em>) combine additional metadata with a syntax tree representation and produce a richer, validated, <em>executable</em> intermediate representation. In the above English language example, in our heads, we’re <em>binding</em> our knowledge of the definitions of each of the words to the structure of the sentence, after which, we can derive <em>meaning</em>.</p>
<p>Binders are responsible for this process of resolution. They must bind additional information about the components of the written code (their types, their scopes, their memory implications) to the code you wrote to produce a valid, executable unit of computation.</p>
<p>In the case of SQL binders, a major part of its job is to add <em>warehouse schema information,</em> like column <em>datatypes</em>, with the <em>type signatures</em> of warehouse operators described by the syntax tree to bring full <em>type awareness</em> to the syntax tree. It’s one thing to recognize a <code>substring</code> function in a query; it’s another to <em>understand</em> that a <code>substring</code> <em>must</em> operate on string data, and <em>always</em> produces string data, and will fail if you pass it an integer.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/binder.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/binder.png?v=2"></a></span></div>
<p>In this example, while the syntax tree knows that the <code>x</code> column is aliased as <code>u</code>, the binder has the knowledge that <code>x</code> is indeed a column of type <code>int</code> and therefore, the resulting column <code>u</code> must also be of type <code>int</code>. Similarly, it knows that the filter condition specified will produce a <code>bool</code> value, and therefore must have compatible datatypes as its two arguments. Luckily, the binder can also see that <code>x</code> and <code>0</code> are both of type int, so we're confident this is a fully valid expression. This layer of validation, powered by metadata, is referred to as <em>type awareness.</em></p>
<p>In addition to being able to trace the way datatypes will flow and change through a set of SQL operations, the function signatures allow the binder to fully validate that you’ve provided valid arguments to a function, inclusive of the acceptable types of columns provided to the function (e.g. <code>split_part</code> can’t work on an <code>int</code> field) as well as valid function configurations (e.g. the acceptable date parts for <code>datediff</code> includes <code>'nanosecond'</code> but not <code>'dog_years'</code>).</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="logical-plan">Logical plan<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#logical-plan" class="hash-link" aria-label="Direct link to Logical plan" title="Direct link to Logical plan">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>In SQL <em>compilers</em>, <strong>logical plans</strong> define the validated, resolved set of data processing operations defined by a SQL query.</p></div></div>
<p>The intermediate representation output by a binder is a richer intermediate representation that can be executed in a low level language; in the case of database engines, this IR is known as a <em>Logical Plan</em>.</p>
<p>Critically, as a result of the binder’s work of mapping data types to the syntax tree, logical plans have <em>full data type awareness</em> — logical plans can tell you precisely how data flows through an analysis, and can pinpoint when datatypes may change as a result of, say, an aggregation operation.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:85%"><span><a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#" data-featherlight="/img/blog/2025-01-24-sql-comprehension-technologies/logical_plan.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-24-sql-comprehension-technologies/logical_plan.png?v=2"></a></span></div>
<p>You can see we’ve gotten a more specific description of how to generate the dataset. Rather than simply mapping the SQL keywords and their dependencies, we have a resolved set of operations, in this case scanning a table, filtering the result, and projecting the values in the <code>x</code> column with an alias of <code>u</code>.</p>
<p>The logical plan contains precise logical description of the computing process your query defined, and validates that it can be executed. Logical plans describe the operations as <a href="https://en.wikipedia.org/wiki/Relational_algebra" target="_blank" rel="noopener noreferrer"><em>relational algebra</em></a>, which is what enable these plans to be fully optimized — the steps in a logical plan can be rearranged and reduced with mathematical equivalency to ensure the steps are as efficient as possible.</p>
<p>This plan can be very helpful for you as a developer, especially if it’s available before you execute the query. If you’ve ever executed an <code>explain</code> function in your database, you’ve viewed a logical plan! You can know exactly what operations will be executed, and critically, you can know that they are valid! This validity check pre-compute is what is referred to as <em>static analysis</em>.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="compilers">Compilers<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#compilers" class="hash-link" aria-label="Direct link to Compilers" title="Direct link to Compilers">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p><strong>Compilers</strong> are programs that translate high-level language to low-level language. <em>Parsers</em> and <em>binders</em> together constitute compilers.</p></div></div>
<p>Taken together, a parser plus a binder constitute a <em>compiler,</em> a program that takes in high-level code (one that is optimized for human readability, like SQL) and outputs low-level code (one that is optimized for machine readability + execution).  In SQL compilers, this output is the logical plan.</p>
<p>A compiler definitionally gives you a deeper understanding of the behavior of the query than a parser alone. We’re now able to trace the data flows and operations that we were abstractly expressing when we initially wrote our SQL query. The compiler incrementally enriches its understanding of the original SQL string and results in a logical plan, which provides static analysis and validation of your SQL logic.</p>
<p>We are however, not all the way down the rabbit hole —  a compiler-produced logical plan contains the full instructions for how to execute a piece of code, but doesn’t have any sense of how to actually execute these steps! There’s one more translation required for the rubber to fully meet the motherboard.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-3-executing">Level 3: Executing<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#level-3-executing" class="hash-link" aria-label="Direct link to Level 3: Executing" title="Direct link to Level 3: Executing">​</a></h2>
<p><em>At Level 3, the database’s <strong>execution engine</strong> translates the logical plan into a <strong>physical plan</strong>, which can finally be executed to return a dataset.</em> <em>Key concepts: Optimization and Planning, Engines, Physical plans</em></p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="optimization-and-planning">Optimization and planning<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#optimization-and-planning" class="hash-link" aria-label="Direct link to Optimization and planning" title="Direct link to Optimization and planning">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>A logical plan goes through a process of <strong>optimization and planning</strong> that maps its operations to the physical hardware that is going to execute each step.</p></div></div>
<p>Once the database has a resolved logical plan, it goes through a process of optimization and planning. As mentioned, because logical plans are expressed as relational algebraic expressions, it can choose to execute equivalent steps in whichever order it chooses.</p>
<p>Let’s think of a simple example SQL statement:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6deeb;--prism-background-color:#011627"><div class="codeBlockContent_m3Ux"><pre tabindex="0" class="prism-code language-sql codeBlock_qGQc thin-scrollbar" style="color:#d6deeb;background-color:#011627"><code class="codeBlockLines_p187"><span class="token-line" style="color:#d6deeb"><span class="token keyword" style="color:rgb(127, 219, 202)">select</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain">  </span><span class="token operator" style="color:rgb(127, 219, 202)">*</span><span class="token plain"></span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">from</span><span class="token plain"> a</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">join</span><span class="token plain"> b </span><span class="token keyword" style="color:rgb(127, 219, 202)">on</span><span class="token plain"> a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">id </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> b</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">a_id</span><br></span><span class="token-line" style="color:#d6deeb"><span class="token plain"></span><span class="token keyword" style="color:rgb(127, 219, 202)">join</span><span class="token plain"> c </span><span class="token keyword" style="color:rgb(127, 219, 202)">on</span><span class="token plain"> b</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">id </span><span class="token operator" style="color:rgb(127, 219, 202)">=</span><span class="token plain"> c</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">b_id</span><br></span></code></pre><div class="buttonGroup_6DOT"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The logical plan will contain steps to join the tables together as defined in SQL — great! Let’s suppose, however, that table <code>a</code> is several orders of magnitude larger than each of the other two. In that case, the order of joining makes a huge difference in the performance of the query! If we join <code>a</code> and <code>b</code> first, then the result <code>ab</code> with <code>c</code>, we end up scanning the entirety of the extremely large table <code>a</code> twice. If instead we join <code>b</code> and <code>c</code> first, and join the much smaller result <code>bc</code> with table <code>a</code> , we get the same result of <code>abc</code> at a fraction of the cost!</p>
<p>Layering in the knowledge of the physical characteristics of the objects referenced in a query to ensure efficient execution is the job of the optimization and planning stage.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="physical-plan">Physical plan<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#physical-plan" class="hash-link" aria-label="Direct link to Physical plan" title="Direct link to Physical plan">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>A <strong>physical plan</strong> is the intermediate representations that contains all necessary information to execute the query.</p></div></div>
<p>Once we do the work to decide on the optimal plan with details about the physical characteristics of the data, we get one final intermediate representation: the physical plan. Think about the operations defined by a logical plan — we may know that we have a <code>TableScan</code> operation of a table called <code>some_table</code>. A physical plan is able to map that operation to <em>specific data partitions</em> in <em>specific data storage locations</em>. The physical plan also contains information relevant to memory allocation so the engine can plan accordingly — as in the previous example, it knows the second join will be a lot more resource intensive!</p>
<p>Think about what your data platform of choice has to do when you submit a validated SQL query: the last mile step is deciding which partitions of data on which of its servers should be scanned, how they should be joined and aggregated to ultimately generate the dataset you need. Physical plans are among the last intermediate representations created along the way to actually returning data back from a database.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="execution">Execution<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#execution" class="hash-link" aria-label="Direct link to Execution" title="Direct link to Execution">​</a></h3>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>A query engine can <strong>execute</strong> a <em>physical plan</em> and return tabular data</p></div></div>
<p>Once a physical plan is generated, all that’s left to do is run it! The database engine executes the physical plan, and fetches, combines, and aggregates your data into the format described by your SQL code. The way that the engine accomplishes this can vary significantly depending on the architecture of your database! Some databases are “single node” in that there is a single computer doing all the work; others are “distributed” and can federate the work across many working compute nodes.</p>
<p>In general, the engine must:</p>
<ol>
<li>
<p><strong>Allocate resources</strong> — In order to run your query, a computer must be online and available to do so! This step allocates CPU to each of the operation in the physical plan, whether it be one single node or many nodes executing the full query task</p>
</li>
<li>
<p><strong>Read Data Into Memory</strong> — The tables referenced are then scanned as efficiently as possible, and the rows are processed. This may happen in partial stages depending on whether the tasks are distributed or happening within one single node</p>
</li>
<li>
<p><strong>Execute Operations</strong> — Once the required data is read into memory, it flows through a pipeline of the nodes in your physical plan. There is more than 50 years of work in building optimizations for these steps as applied to different data structures and in-memory representations; everything from row-oriented databases, to columnar, to time series to geo-spatial to graph. But fundamentally, there are 5 common operations:</p>
<ol>
<li>
<p><strong>Projection</strong> — Extract only the columns or expressions that the user requested needed (e.g. <code>order_id</code>).</p>
</li>
<li>
<p><strong>Filtering</strong> — Rows that don’t meet your <code>WHERE</code> condition are dropped.</p>
</li>
<li>
<p><strong>Joining</strong> — If your query involves multiple tables, the engine merges or joins them—this could be a hash join, sort-merge join, or even a nested loop join depending on data statistics.</p>
</li>
<li>
<p><strong>Aggregation</strong> — If you have an aggregation like <code>SUM(amount)</code> or <code>COUNT(*)</code>, the engine groups rows by the specified columns and calculates the aggregated values.</p>
</li>
<li>
<p><strong>Sorting / Window Functions</strong> — If the query uses <code>ORDER BY</code>, <code>RANK()</code>, or other window functions, the data flows into those operators next.</p>
</li>
</ol>
</li>
<li>
<p><strong>Merge and return results</strong> — The last mile step is generating the tabular dataset. In the case of distributed systems, this may require combining the results from several nodes into a single result.</p>
</li>
</ol>
<p>Finally! Actionable business insights, right in the palm of your hand!</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="looking-ahead">Looking ahead<a href="https://docs.getdbt.com/blog/sql-comprehension-technologies#looking-ahead" class="hash-link" aria-label="Direct link to Looking ahead" title="Direct link to Looking ahead">​</a></h2>
<p>That’s probably more about databases that you bargained for! I know this is a lot to absorb - but the best data practitioners have a deep understanding of their tools and this is all extremely relevant for understanding the next evolution of data tooling and data work. Next time you run a query, don't forget to thank your database for all the hard work it's doing for you.</p>]]></content>
        <author>
            <name>Dave Connors</name>
        </author>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Three Levels of SQL Comprehension: What they are and why you need to know about them]]></title>
        <id>https://docs.getdbt.com/blog/the-levels-of-sql-comprehension</id>
        <link href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension"/>
        <updated>2025-01-23T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Parsers, compilers, executors, oh my! What it means when we talk about 'understanding SQL'.]]></summary>
        <content type="html"><![CDATA[<p>Ever since <a href="https://www.getdbt.com/blog/dbt-labs-acquires-sdf-labs" target="_blank" rel="noopener noreferrer">dbt Labs acquired SDF Labs last week</a>, I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are <em>fascinating.</em></p>
<p>For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a <a href="https://en.wikipedia.org/wiki/Preprocessor" target="_blank" rel="noopener noreferrer">string preprocessor</a> and into fully comprehending SQL. <strong>For the first time, SDF provides the technology necessary to make this possible.</strong> Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-is-sql-comprehension">What is SQL comprehension?<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#what-is-sql-comprehension" class="hash-link" aria-label="Direct link to What is SQL comprehension?" title="Direct link to What is SQL comprehension?">​</a></h2>
<p>Let’s call any tool that can look at a string of text, interpret it as SQL, and extract some meaning from it a <em>SQL Comprehension tool.</em></p>
<p>Put another way, SQL Comprehension tools <strong>recognize SQL code and deduce more information about that SQL than is present in the <a href="https://www.postgresql.org/docs/current/sql-syntax-lexical.html" target="_blank" rel="noopener noreferrer">tokens</a> themselves</strong>. Here’s a non-exhaustive set of behaviors and capabilities that such a tool might have for a given <a href="https://blog.sdf.com/p/sql-dialects-and-the-tower-of-babel" target="_blank" rel="noopener noreferrer">dialect</a> of SQL:</p>
<ul>
<li>Identify constituent parts of a query.</li>
<li>Create structured artifacts for their own use or for other tools to consume in turn.</li>
<li>Check whether the SQL is valid.</li>
<li>Understand what will happen when the query runs: things like what columns will be created, what datatypes do they have, and what DDL is involved</li>
<li>Execute the query and return data (unsurprisingly, your database is a tool that comprehends SQL!)</li>
</ul>
<p>By building on top of tools that truly understand SQL, it is possible to create systems that are much more capable, resilient and flexible than we’ve seen to date.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-levels-of-sql-comprehension">The Levels of SQL Comprehension<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#the-levels-of-sql-comprehension" class="hash-link" aria-label="Direct link to The Levels of SQL Comprehension" title="Direct link to The Levels of SQL Comprehension">​</a></h2>
<p>When you look at the capabilities above, you can imagine some of those outcomes being achievable with <a href="https://github.com/joellabes/mode-dbt-exposures/blob/main/generate_yaml.py#L52" target="_blank" rel="noopener noreferrer">one line of regex</a> and some that are only possible if you’ve literally built a database. Given that range of possibilities, we believe that “can you comprehend SQL” is an insufficiently precise question.</p>
<p>A better question is “to what level can you comprehend SQL?” To that end, we have identified different levels of capability. Each level deals with a key artifact (or more precisely - a specific "<a href="https://en.wikipedia.org/wiki/Intermediate_representation" target="_blank" rel="noopener noreferrer">intermediate representation</a>"). And in doing so, each level unlocks specific capabilities and more in-depth validation.</p>
<div class="filterableTableContainer_mhtg"><table style="display:none"><thead><tr><th>Level</th><th>Name</th><th>Artifact</th><th>Example Capability Unlocked</th></tr></thead><tbody><tr><td>1</td><td>Parsing</td><td>Syntax Tree</td><td>Know what symbols are used in a query.</td></tr><tr><td>2</td><td>Compiling</td><td>Logical Plan</td><td>Know what types are used in a query, and how they change, regardless of their origin.</td></tr><tr><td>3</td><td>Executing</td><td>Physical Plan + Query Results</td><td>Know how a query will run on your database, all the way to calculating its results.</td></tr></tbody></table><div class="tableWrapper_oiMt"><div class="searchBar_xnmH"><div class="searchContainer_fLyJ"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M416 208c0 45.9-14.9 88.3-40 122.7L502.6 457.4c12.5 12.5 12.5 32.8 0 45.3s-32.8 12.5-45.3 0L330.7 376c-34.4 25.2-76.8 40-122.7 40C93.1 416 0 322.9 0 208S93.1 0 208 0S416 93.1 416 208zM208 352a144 144 0 1 0 0-288 144 144 0 1 0 0 288z"></path></svg><input type="text" placeholder="Search table..." class="searchInput_xT8h" aria-label="Search table" value=""></div></div><table class="filterableTable_QAKT"><thead><tr></tr></thead><tbody><tr><td colspan="5" style="text-align:center;padding:20px">Loading table...</td></tr></tbody></table></div></div>
<p>At Level 1, you have a baseline comprehension of SQL. By parsing the string of SQL into a Syntax Tree, it’s possible to <strong>reason about the components of a query</strong> and identify whether you've <strong>written syntactically legal code</strong>.</p>
<p>At Level 2, the system produces a complete Logical Plan. A logical plan knows about every function that’s called in your query, the datatypes being passed into them, and what every column will look like as a result (among many other things). Static analysis of this plan makes it possible to <strong>identify almost every error before you run your code</strong>.</p>
<p>Finally, at Level 3, you can actually <strong>execute a query and modify data</strong>, because it understands all the complexities involved in answering the question "how does the exact data passed into this query get transformed/mutated".</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="can-i-see-an-example">Can I see an example?<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#can-i-see-an-example" class="hash-link" aria-label="Direct link to Can I see an example?" title="Direct link to Can I see an example?">​</a></h2>
<p>This can feel pretty theoretical based on descriptions alone, so let’s look at a basic Snowflake query.</p>
<p>A system at each level of SQL comprehension understands progressively more about the query, and that increased understanding enables it to <strong>say with more precision whether the query is valid</strong>.</p>
<p>To tools at lower levels of comprehension, some elements of a query are effectively a black box - their syntax tree has the contents of the query but cannot validate whether everything makes sense. <strong>Remember that comprehension is deducing more information than is present in the plain text of the query; by comprehending more, you can validate more.</strong></p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-1-parsing">Level 1: Parsing<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#level-1-parsing" class="hash-link" aria-label="Direct link to Level 1: Parsing" title="Direct link to Level 1: Parsing">​</a></h3>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#" data-featherlight="/img/blog/2025-01-23-levels-of-sql-comprehension/level_1.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-23-levels-of-sql-comprehension/level_1.png?v=2"></a></span></div>
<p>A parser recognizes that a function called <code>dateadd</code> has been called with three arguments, and knows the contents of those arguments.</p>
<p>However, without knowledge of the <a href="https://en.wikipedia.org/wiki/Type_signature#Signature" target="_blank" rel="noopener noreferrer">function signature</a>, it has no way to validate whether those arguments are valid types, whether three is the right number of arguments, or even whether <code>dateadd</code> is an available function. This also means it can’t know what the datatype of the created column will be.</p>
<p>Parsers are intentionally flexible in what they will consume - their purpose is to make sense of what they're seeing, not nitpick. Most parsers describe themselves as “non-validating”, because true validation requires compilation.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-2-compiling">Level 2: Compiling<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#level-2-compiling" class="hash-link" aria-label="Direct link to Level 2: Compiling" title="Direct link to Level 2: Compiling">​</a></h3>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#" data-featherlight="/img/blog/2025-01-23-levels-of-sql-comprehension/level_2.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-23-levels-of-sql-comprehension/level_2.png?v=2"></a></span></div>
<p>Extending beyond a parser, a compiler <em>does</em> know the function signatures. It knows that on Snowflake, <code>dateadd</code> is a function which takes three arguments: a <code>datepart</code>, an <code>integer</code>, and an <code>expression</code> (in that order).</p>
<p>A compiler also knows what types a function can return without actually running the code (this is called <a href="https://en.wikipedia.org/wiki/Static_program_analysis" target="_blank" rel="noopener noreferrer">static analysis</a>, we’ll get into that another day). In this case, because <code>dateadd</code>’s return type depends on the input expression and our expression isn’t explicitly cast, the compiler just knows that the <code>new_day</code> column can be <a href="https://docs.snowflake.com/en/sql-reference/functions/dateadd#returns" target="_blank" rel="noopener noreferrer">one of three possible datatypes</a>.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="level-3-executing">Level 3: Executing<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#level-3-executing" class="hash-link" aria-label="Direct link to Level 3: Executing" title="Direct link to Level 3: Executing">​</a></h3>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#" data-featherlight="/img/blog/2025-01-23-levels-of-sql-comprehension/level_3.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-23-levels-of-sql-comprehension/level_3.png?v=2"></a></span></div>
<p>A tool with execution capabilities knows everything about this query and the data that is passed into it, including how functions are implemented. Therefore it can perfectly represent the results as run on Snowflake. Again, that’s what databases do. A database is a Level 3 tool.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="review">Review<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#review" class="hash-link" aria-label="Direct link to Review" title="Direct link to Review">​</a></h3>
<p>Let’s review the increasing validation capabilities unlocked by each level of comprehension, and notice that over time <strong>the black boxes completely disappear</strong>:</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:100%"><span><a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#" data-featherlight="/img/blog/2025-01-23-levels-of-sql-comprehension/validation_all_levels.png"><img data-toggle="lightbox" alt="" title="" src="https://docs.getdbt.com/img/blog/2025-01-23-levels-of-sql-comprehension/validation_all_levels.png?v=2"></a></span></div>
<p>In a toy example like this one, the distinctions between the different levels might feel subtle. As you move away from a single query and into a full-scale project, the functionality gaps become more pronounced. That’s hard to demonstrate in a blog post, but fortunately there’s another easier option: look at some failing queries. How the query is broken impacts what level of tool is necessary to recognize the error.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="so-lets-break-things">So let’s break things<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#so-lets-break-things" class="hash-link" aria-label="Direct link to So let’s break things" title="Direct link to So let’s break things">​</a></h2>
<p>As the great analytics engineer Tolstoy <a href="https://en.wikipedia.org/wiki/Anna_Karenina_principle" target="_blank" rel="noopener noreferrer">once noted</a>, “All correctly written queries are alike; each incorrectly written query is incorrect in its own way”.</p>
<p>Consider these three invalid queries:</p>
<ul>
<li><code>selecte dateadd('day', 1, getdate()) as tomorrow</code> (Misspelled keyword)</li>
<li><code>select dateadd('day', getdate(), 1) as tomorrow</code> (Wrong order of arguments)</li>
<li><code>select cast('2025-01-32' as date) as tomorrow</code> (Impossible date)</li>
</ul>
<p>Tools that comprehend SQL can catch errors. But they can't all catch the same errors! Each subsequent level will catch more subtle errors in addition to those from <em>all prior levels</em>. That's because the levels are additive — each level contains and builds on the knowledge of the ones below it.</p>
<p>Each of the above queries requires progressively greater SQL comprehension abilities to identify the mistake.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="parser-level-1-capture-syntax-errors">Parser (Level 1): Capture Syntax Errors<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#parser-level-1-capture-syntax-errors" class="hash-link" aria-label="Direct link to Parser (Level 1): Capture Syntax Errors" title="Direct link to Parser (Level 1): Capture Syntax Errors">​</a></h3>
<p>Example: <code>selecte dateadd('day', 1, getdate()) as tomorrow</code></p>
<p>Parsers know that <code>selecte</code> is <strong>not a valid keyword</strong> in Snowflake SQL, and will reject it.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="compiler-level-2-capture-compilation-errors">Compiler (Level 2): Capture Compilation Errors<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#compiler-level-2-capture-compilation-errors" class="hash-link" aria-label="Direct link to Compiler (Level 2): Capture Compilation Errors" title="Direct link to Compiler (Level 2): Capture Compilation Errors">​</a></h3>
<p>Example: <code>select dateadd('day', getdate(), 1) as tomorrow</code></p>
<p>To a parser, this looks fine - all the parentheses and commas are in the right places, and we’ve spelled <code>select</code> correctly this time.</p>
<p>A compiler, on the other hand, recognizes that the <strong>function arguments are out of order</strong> because:</p>
<ul>
<li>It knows that the second argument (<code>value</code>) needs to be a number, but that <code>getdate()</code> returns a <code>timestamp_ltz</code>.</li>
<li>Likewise, it knows that a number is not a valid date/time expression for the third argument.</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="executor-level-3-capture-data-errors">Executor (Level 3): Capture Data Errors<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#executor-level-3-capture-data-errors" class="hash-link" aria-label="Direct link to Executor (Level 3): Capture Data Errors" title="Direct link to Executor (Level 3): Capture Data Errors">​</a></h3>
<p>Example: <code>select cast('2025-01-32' as date) as tomorrow</code></p>
<p>Again, the parser signs off on this as valid SQL syntax.</p>
<p>But this time the compiler also thinks everything is fine! Remember that a compiler checks the signature of a function. It knows that <code>cast</code> takes a source expression and a target datatype as arguments, and it's checked that both these arguments are of the correct type.</p>
<p>It even has an overload that knows that strings can be cast into dates, but since it can’t do any validation of those strings’ <em>values</em> it doesn’t know <strong>January 32nd isn’t a valid date</strong>.</p>
<p>To actually know whether some data can be processed by a SQL query, you have to, well, process the data. Data errors can only be captured by a Level 3 system.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="conclusion">Conclusion<a href="https://docs.getdbt.com/blog/the-levels-of-sql-comprehension#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h2>
<p>Building your mental model of the levels of SQL comprehension – why they matter, how they're achieved and what they’ll unlock for you – is critical to understanding the coming era of data tooling.</p>
<p>In introducing these concepts, we’re still just scratching the surface. There's a lot more to discuss:</p>
<ul>
<li>Going deeper on the specific nuances of each level of comprehension</li>
<li>How each level actually works, including the technologies and artifacts that power each level</li>
<li>How this is all going to roll into a step change in the experience of working with data</li>
<li>What it means for doing great data work</li>
</ul>
<p>To learn more, check out <a href="https://docs.getdbt.com/blog/sql-comprehension-technologies">The key technologies behind SQL Comprehension</a>.</p>
<p>Over the coming days, you'll hear more about all of this from the dbt Labs team - both familiar faces and our new friends from SDF Labs.</p>
<p>This is a special moment for the industry and the community. It's alive with possibilities, with ideas, and with new potential. We're excited to navigate this new frontier with all of you.</p>]]></content>
        <author>
            <name>Joel Labes</name>
        </author>
        <category label="data ecosystem" term="data ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Why I wish I had a control plane for my renovation]]></title>
        <id>https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation</id>
        <link href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation"/>
        <updated>2025-01-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[When I think back to my renovation, I realize how much smoother it would've been if I’d had a control plane for the entire process.]]></summary>
        <content type="html"><![CDATA[<p>When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.</p>
<link href="/css/featherlight-styles.css" type="text/css" rel="stylesheet"><div class="
          docImage_EYbW
          
          
          
          
        " style="max-width:70%"><span><a href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation#" data-featherlight="/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png"><img data-toggle="lightbox" alt="My wife pondering our sanity" title="My wife pondering our sanity" src="https://docs.getdbt.com/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png?v=2"></a></span><span class="title_aGrV">My wife pondering our sanity</span></div>
<p>We had to coordinate multiple elements:</p>
<ul>
<li>The <strong>architects</strong>, who designed the layout, interior, and exterior.</li>
<li>The <strong>architectural plans</strong>, which outlined what the house should look like.</li>
<li>The <strong>builders</strong>, who executed those plans.</li>
<li>The <strong>inspectors</strong>, <strong>councils</strong>, and <strong>energy raters</strong>, who checked whether everything met the required standards.</li>
</ul>
<p>Each piece was critical — without the plans, there’s no shared vision; without the builders, the plans don’t come to life; and without inspections, mistakes go unnoticed.</p>
<p>But as an inexperienced project manager, I was also the one responsible for stitching everything together:</p>
<ul>
<li>Architects handed me detailed plans, builders asked for clarifications.</li>
<li>Inspectors flagged issues that were often too late to fix without extra costs or delays.</li>
<li>On top of all this, I also don't speak "builder".</li>
</ul>
<p>So what should have been quick and collaborative conversations, turned into drawn-out processes because there was no unified system to keep everyone on the same page.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="in-many-ways-this-mirrors-how-data-pipelines-operate">In many ways, this mirrors how data pipelines operate<a href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation#in-many-ways-this-mirrors-how-data-pipelines-operate" class="hash-link" aria-label="Direct link to In many ways, this mirrors how data pipelines operate" title="Direct link to In many ways, this mirrors how data pipelines operate">​</a></h2>
<ul>
<li>The <strong>architects</strong> are the engineers — designing how the pieces fit together.</li>
<li>The <strong>architectural plans</strong> are your dbt code — the models, tests, and configurations that define what your data should look like.</li>
<li>The <strong>builders</strong> are the compute layers (for example, Snowflake, BigQuery, or Databricks) that execute those transformations.</li>
<li>The <strong>inspectors</strong> are the monitoring tools, which focus on retrospective insights like logs, job performance, and error rates.</li>
</ul>
<p>Here’s the challenge: monitoring tools, by their nature, look backward. They’re great at telling you what happened, but they don’t help you plan or declare what should happen. And when these roles, plans, execution, and monitoring are siloed, teams are left trying to manually stitch them together, often wasting time troubleshooting issues or coordinating workflows.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-makes-dbt-cloud-different">What makes dbt Cloud different<a href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation#what-makes-dbt-cloud-different" class="hash-link" aria-label="Direct link to What makes dbt Cloud different" title="Direct link to What makes dbt Cloud different">​</a></h2>
<p><a href="https://www.getdbt.com/product/dbt-cloud" target="_blank" rel="noopener noreferrer">dbt Cloud</a> unifies these perspectives into a single <a href="https://www.getdbt.com/blog/data-control-plane-introduction" target="_blank" rel="noopener noreferrer">control plane</a>, bridging proactive and retrospective capabilities:</p>
<ul>
<li><strong>Proactive planning</strong>: In dbt, you declare the desired <a href="https://docs.getdbt.com/reference/node-selection/state-selection" target="_blank" rel="noopener noreferrer">state</a> of your data before jobs even run — your architectural plans are baked into the pipeline.</li>
<li><strong>Retrospective insights</strong>: dbt Cloud surfaces <a href="https://docs.getdbt.com/docs/deploy/run-visibility" target="_blank" rel="noopener noreferrer">job logs</a>, performance metrics, and test results, providing the same level of insight as traditional monitoring tools.</li>
</ul>
<p>But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-does-this-matter">Why does this matter?<a href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation#why-does-this-matter" class="hash-link" aria-label="Direct link to Why does this matter?" title="Direct link to Why does this matter?">​</a></h2>
<ol>
<li><strong>The silo problem</strong>: Many organizations rely on separate tools for transformation and monitoring. This fragmentation creates blind spots, making it harder to identify and resolve issues.</li>
<li><strong>Integrated workflows</strong>: dbt Cloud eliminates these silos by connecting transformation and monitoring logic in one place. It doesn’t just report on what happened; it ties those insights directly to the proactive plans that define your pipeline.</li>
<li><strong>Operational confidence</strong>: With dbt Cloud, you can trust that your data pipelines are not only functional but aligned with your business goals, monitored in real-time, and easy to troubleshoot.</li>
</ol>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-i-wish-i-had-a-control-plane-for-my-renovation">Why I wish I had a control plane for my renovation<a href="https://docs.getdbt.com/blog/wish-i-had-a-control-plane-for-my-renovation#why-i-wish-i-had-a-control-plane-for-my-renovation" class="hash-link" aria-label="Direct link to Why I wish I had a control plane for my renovation" title="Direct link to Why I wish I had a control plane for my renovation">​</a></h2>
<p>When I think back to my renovation, I realize how much smoother it would have been if I’d had a control plane for the entire process. There are firms that specialize in design-and-build projects, in-house architects, engineers, and contractors. The beauty of these firms is that everything is under one roof, so you know they’re communicating seamlessly.</p>
<p>In my case, though, my architect, builder, and engineer were all completely separate, which meant I was the intermediary. I was the pigeon service shuttling information between them, and it was exhausting. Discussions that should have taken minutes, stretched into weeks and sometimes even months because there was no centralized communication.</p>
<p>dbt Cloud is like having that design-and-build firm for your data pipelines. It’s the control plane that unites proactive planning with retrospective monitoring, eliminating silos and inefficiencies. With dbt Cloud, you don’t need to play the role of the pigeon service — it gives you the visibility, integration, and control you need to manage modern data workflows effortlessly.</p>]]></content>
        <author>
            <name>Mark Wan</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
        <category label="data_ecosystem" term="data_ecosystem"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Test smarter not harder: Where should tests go in your pipeline?]]></title>
        <id>https://docs.getdbt.com/blog/test-smarter-where-tests-should-go</id>
        <link href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go"/>
        <updated>2024-12-09T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Testing your data should drive action, not accumulate alerts. We take our testing framework developed in our last post and make recommendations for where tests ought to go at each transformation stage.]]></summary>
        <content type="html"><![CDATA[<p>👋&nbsp;Greetings, dbt’ers! It’s Faith &amp; Jerrie, back again to offer tactical advice on <em>where</em> to put tests in your pipeline.</p>
<p>In <a href="https://docs.getdbt.com/blog/test-smarter-not-harder">our first post</a> on refining testing best practices, we developed a prioritized list of data quality concerns. We also documented first steps for debugging each concern. This post will guide you on where specific tests should go in your data pipeline.</p>
<p><em>Note that we are constructing this guidance based on how we <a href="https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview#guide-structure-overview">structure data at dbt Labs.</a></em> You may use a different modeling approach—that’s okay! Translate our guidance to your data’s shape, and let us know in the comments section what modifications you made.</p>
<p>First, here’s our opinions on where specific tests should go:</p>
<ul>
<li>Source tests should be fixable data quality concerns. See the <a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#sources">callout box below</a> for what we mean by “fixable”.</li>
<li>Staging tests should be business-focused anomalies specific to individual tables, such as accepted ranges or ensuring sequential values. In addition to these tests, your staging layer should clean up any nulls, duplicates, or outliers that you can’t fix in your source system. You generally don’t need to test your cleanup efforts.</li>
<li>Intermediate and marts layer tests should be business-focused anomalies resulting specifically from joins or calculations.  You also may consider adding additional primary key and not null tests on columns where it’s especially important to protect the grain.</li>
</ul>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="where-should-tests-go-in-your-pipeline">Where should tests go in your pipeline?<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#where-should-tests-go-in-your-pipeline" class="hash-link" aria-label="Direct link to Where should tests go in your pipeline?" title="Direct link to Where should tests go in your pipeline?">​</a></h2>
<p><img decoding="async" loading="lazy" alt="A horizontal, multicolored diagram that shows examples of where tests ought to be placed in a data pipeline." src="https://docs.getdbt.com/assets/images/testing_pipeline-5654a8c833a4fe25846d9b32605b7d09.png" width="2701" height="1327" class="img_ev3q"></p>
<p>This diagram above outlines where you might put specific data tests in your pipeline. Let’s expand on it and discuss where each type of data quality issue should be tested.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="sources">Sources<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#sources" class="hash-link" aria-label="Direct link to Sources" title="Direct link to Sources">​</a></h3>
<p>Tests applied to your sources should indicate <em>fixable-at-the-source-system</em> issues. If your source tests flag source system issues that aren’t fixable, remove the test and mitigate the problem in your staging layer instead.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>What does fixable mean?</div><div class="admonitionContent_BuS1"><p>We consider a "fixable-at-the-source-system" issue to be something that:</p><ul>
<li>You yourself can fix in the source system.</li>
<li>You know the right person to fix it and have a good enough relationship with them that you know you can <em>get it fixed.</em></li>
</ul><p>You may have issues that can <em>technically</em> get fixed at the source, but it won't happen till the next planning cycle, or you need to develop better relationships to get the issue fixed, or something similar. This demands a more nuanced approach than we'll cover in this post. If you have thoughts on this type of situation, let us know!</p></div></div>
<p>Here’s our recommendation for what tests belong on your sources.</p>
<ul>
<li>Source freshness: testing data freshness for sources that are critical to your pipelines.<!-- -->
<ul>
<li>If any sources feed into any of the “top 3” <a href="https://docs.getdbt.com/blog/test-smarter-not-harder#how-to-prioritize-data-quality-concerns-in-your-pipeline" target="_blank" rel="noopener noreferrer">priority categories</a> in our last post, use <a href="https://docs.getdbt.com/docs/deploy/source-freshness" target="_blank" rel="noopener noreferrer"><code>dbt source freshness</code></a> in your job execution commands and set the severity to <code>error</code>. That way, if source freshness fails, so does your job.</li>
<li>If none of your sources feed into high priority categories, set your source freshness severity to <code>warn</code> and add source freshness to your job execution commands. That way, you still get source freshness information but stale data won't fail your pipeline.</li>
</ul>
</li>
<li>Data hygiene: tests that are <em>fixable</em> in the source system (see our note above on “fixability”).<!-- -->
<ul>
<li>Examples:<!-- -->
<ul>
<li>Duplicate customer records that can be deleted in the source system</li>
<li>Null records, such as a customer name or email address, that can be entered into the source system</li>
<li>Primary key testing where duplicates are removable in the source system</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="staging">Staging<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#staging" class="hash-link" aria-label="Direct link to Staging" title="Direct link to Staging">​</a></h3>
<p>In the staging layer, your models should be cleaning up or mitigating data issues that can't be fixed at the source. Your tests should be focused on business anomaly detection.</p>
<ul>
<li>Data cleanup and issue mitigation: Use our <a href="https://docs.getdbt.com/best-practices/how-we-structure/2-staging" target="_blank" rel="noopener noreferrer">best practices around staging layers</a> to clean things up. Don’t add tests to your cleanup efforts. If you’re filtering out nulls in a column, adding a not_null test is repetitive!  🌶️</li>
<li>Business-focused anomaly examples: these are data quality issues you <em>should</em> test for in your staging layer, because they fall outside of your business’s defined norms. These might be:<!-- -->
<ul>
<li>Values inside a single column that fall outside of an acceptable range. For example, a store selling a greater quantity of limited-edition items than they received in their stock delivery.</li>
<li>Values that should always be positive, are positive. This might look like a negative transaction amount that isn’t classified as a return. This failing test would then spur further investigation into the offending transaction.</li>
<li>An unexpected uptick in volume of a quantity column beyond a pre-defined percentage. This might look like a store’s customer volume spiking unexpectedly and outside of expected seasonal norms. This is an anomaly that could indicate a bug or modeling issue.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="intermediate-if-applicable">Intermediate (if applicable)<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#intermediate-if-applicable" class="hash-link" aria-label="Direct link to Intermediate (if applicable)" title="Direct link to Intermediate (if applicable)">​</a></h3>
<p>In your intermediate layer, focus on data hygiene and anomaly tests for new columns. Don’t re-test passthrough columns from sources or staging. Here are some examples of tests you might put in your intermediate layer based on the use cases of intermediate models we <a href="https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate#intermediate-models">outline in this guide</a>.</p>
<ul>
<li>Intermediate models often re-grain models to prepare them for marts.<!-- -->
<ul>
<li>Add a primary key test to any re-grained models.</li>
<li>Additionally, consider adding a primary key test to models where the grain <em>has remained the same</em> but has been <em>enriched.</em> This helps future-proof your enriched models against future developers who may not be able to glean your intention from SQL alone.</li>
</ul>
</li>
<li>Intermediate models may perform a first set of joins or aggregations to reduce complexity in a final mart.<!-- -->
<ul>
<li>Add simple anomaly tests to verify the behavior of your sets of joins and aggregations. This may look like:<!-- -->
<ul>
<li>An <a href="https://docs.getdbt.com/reference/resource-properties/data-tests#accepted_values">accepted_values</a> test on a newly calculated categorical column.</li>
<li>A <a href="https://github.com/dbt-labs/dbt-utils#mutually_exclusive_ranges-source" target="_blank" rel="noopener noreferrer">mutually_exclusive_ranges</a> test on two columns whose values behave in relation to one another (ex: asserting age ranges do not overlap).</li>
<li>A <a href="https://github.com/dbt-labs/dbt-utils#not_constant-source" target="_blank" rel="noopener noreferrer">not_constant</a> test on a column whose value should be continually changing (ex: page view counts on website analytics).</li>
</ul>
</li>
</ul>
</li>
<li>Intermediate models may isolate complex operations.<!-- -->
<ul>
<li>The anomaly tests we list above may suffice here.</li>
<li>You might also consider <a href="https://docs.getdbt.com/docs/build/unit-tests">unit testing</a> any particularly complex pieces of SQL logic.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="marts">Marts<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#marts" class="hash-link" aria-label="Direct link to Marts" title="Direct link to Marts">​</a></h3>
<p>Marts layer testing will follow the same hygiene-or-anomaly pattern as staging and intermediate. Similar to your intermediate layer, you should focus your testing on net-new columns in your marts layer. This might look like:</p>
<ul>
<li>Unit tests: validate especially complex transformation logic. For example:<!-- -->
<ul>
<li>Calculating dates in a way that feeds into forecasting.</li>
<li>Customer segmentation logic, especially logic that has a lot of CASE-WHEN statements.</li>
</ul>
</li>
<li>Primary key tests: focus on where where your mart's granularity has changed from its staging/intermediate inputs.<!-- -->
<ul>
<li>Similar to the intermediate models above, you may also want to add primary key tests to models whose grain hasn’t changed, but have been enriched with other data. Primary key tests here communicate your intent.</li>
</ul>
</li>
<li>Business focused anomaly tests: focus on <em>new</em> calculated fields, such as:<!-- -->
<ul>
<li>Singular tests on high-priority, high-impact tables where you have a specific problem you want forewarning about.<!-- -->
<ul>
<li>This might be something like fuzzy matching logic to detect when the same person is making multiple emails to extend a free trial beyond its acceptable end date.</li>
</ul>
</li>
<li>A test for calculated numerical fields that shouldn’t vary by more than certain percentage in a week.</li>
<li>A calculated ledger table that follows certain business rules, i.e. today’s running total of spend must always be greater than yesterday’s.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="cicd">CI/CD<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#cicd" class="hash-link" aria-label="Direct link to CI/CD" title="Direct link to CI/CD">​</a></h3>
<p>All of the testing you’ve applied in your different layers is the manual work of constructing your framework. CI/CD is where it gets automated.</p>
<p>You should run a <a href="https://docs.getdbt.com/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci">slim CI</a> to optimize your resource consumption.</p>
<p>With CI/CD and your regular production runs, your testing framework can be on autopilot. 😎</p>
<p>If and when you encounter failures, consult your trusty testing framework doc you built in our <a href="https://docs.getdbt.com/blog/test-smarter-not-harder">earlier post</a>.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="advanced-ci">Advanced CI<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#advanced-ci" class="hash-link" aria-label="Direct link to Advanced CI" title="Direct link to Advanced CI">​</a></h3>
<p>In the early stages of your smarter testing journey, start with dbt Cloud’s built-in flags for <a href="https://docs.getdbt.com/docs/deploy/advanced-ci">advanced CI</a>. In PRs with advanced CI enabled, dbt Cloud will flag what has been modified, added, or removed in the “compare changes” section. These three flags offer confidence and evidence that your changes are what you expect. Then, hand them off for peer review. Advanced CI helps jump start your colleague’s review of your work by bringing all of the implications of the change into one place.</p>
<p>We consider usage of Advanced CI beyond the modified, added, or changed gut checks to be an advanced (heh) testing strategy, and look forward to hearing how you use it.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="wrapping-it-all-up">Wrapping it all up<a href="https://docs.getdbt.com/blog/test-smarter-where-tests-should-go#wrapping-it-all-up" class="hash-link" aria-label="Direct link to Wrapping it all up" title="Direct link to Wrapping it all up">​</a></h2>
<p>Judicious data testing is like training for a marathon. It’s not productive to go run 20 miles a day and hope that you’ll be marathon-ready and uninjured. Similarly, throwing data tests randomly at your data pipeline without careful thought is not going to tell you much about your data quality.</p>
<p>Runners go into marathons with training plans. Analytics engineers who care about data quality approach the issue with a plan, too.</p>
<p>As you try out some of the guidance above here, remember that your testing needs are going to evolve over time. Don’t be afraid to revise your original testing strategy.</p>
<p>Let us know your thoughts on these strategies in the comments section. Try them out, and share your thoughts to help us refine them.</p>]]></content>
        <author>
            <name>Faith McKenna</name>
        </author>
        <author>
            <name>Jerrie Kumalah Kenney</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Test smarter not harder: add the right tests to your dbt project]]></title>
        <id>https://docs.getdbt.com/blog/test-smarter-not-harder</id>
        <link href="https://docs.getdbt.com/blog/test-smarter-not-harder"/>
        <updated>2024-11-11T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Testing your data should drive action, not accumulate alerts. We synthesized countless customer experiences to build a repeatable testing framework.]]></summary>
        <content type="html"><![CDATA[<p>The <a href="https://www.getdbt.com/resources/guides/the-analytics-development-lifecycle" target="_blank" rel="noopener noreferrer">Analytics Development Lifecycle (ADLC)</a> is a workflow for improving data maturity and velocity. Testing is a key phase here. Many dbt developers tend to focus on <a href="https://www.getdbt.com/blog/building-a-data-quality-framework-with-dbt-and-dbt-cloud" target="_blank" rel="noopener noreferrer">primary keys and source freshness.</a> We think there is a more holistic and in-depth path to tread. Testing is a key piece of the ADLC, and it should drive data quality.</p>
<p>In this blog, we’ll walk through a plan to define data quality. This will look like:</p>
<ul>
<li>identifying <em>data hygiene</em>  issues</li>
<li>identifying <em>business-focused anomaly</em>  issues</li>
<li>identifying <em>stats-focused anomaly</em>  issues</li>
</ul>
<p>Once we have <em>defined</em> data quality, we’ll move on to <em>prioritize</em> those concerns. We will:</p>
<ul>
<li>think through each concern in terms of the breadth of impact</li>
<li>decide if each concern should be at error or warning severity</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="who-are-we">Who are we?<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#who-are-we" class="hash-link" aria-label="Direct link to Who are we?" title="Direct link to Who are we?">​</a></h3>
<p>Let’s start with introductions - we’re Faith and Jerrie, and we work on dbt Labs’s training and services teams, respectively. By working closely with countless companies using dbt, we’ve gained unique perspectives of the landscape.</p>
<p>The training team collates problems organizations think about today and gauge how our solutions fit. These are shorter engagements, which means we see the data world shift and change in real time. Resident Architects spend much more time with teams to craft much more in-depth solutions, figure out where those solutions are helping, and where problems still need to be addressed. Trainers help identify patterns in the problems data teams face, and Resident Architects dive deep on solutions.</p>
<p>Today, we’ll guide you through a particularly thorny problem: testing.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-testing">Why testing?<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#why-testing" class="hash-link" aria-label="Direct link to Why testing?" title="Direct link to Why testing?">​</a></h2>
<p>Mariah Rogers broke early ground on data quality and testing in her <a href="https://www.youtube.com/watch?v=hxvVhmhWRJA" target="_blank" rel="noopener noreferrer">Coalesce 2022 talk</a>. We’ve seen similar talks again at Coalesce 2024, like <a href="https://www.youtube.com/watch?v=iCG-5vqMRAo" target="_blank" rel="noopener noreferrer">this one</a> from the data team at Aiven and <a href="https://www.youtube.com/watch?v=5bRG3y9IM4Q&amp;list=PL0QYlrC86xQnWJ72sJlzDqPS0peE7j9Ed&amp;index=71" target="_blank" rel="noopener noreferrer">this one</a> from the co-founder at Omni Analytics. These talks share a common theme: testing your dbt project too much can get out of control quickly, leading to alert fatigue.</p>
<p>In our customer engagements, we see <em>wildly different approaches</em> to testing data. We’ve definitely seen what Mariah, the Aiven team, and the Omni team have described, which is so many tests that errors and alerts just become noise. We’ve also seen the opposite end of the spectrum—only primary keys being tested. From our field experiences, we believe there’s room for a middle path.
A desire for a better approach to data quality and testing isn’t just anecdotal to Coalesce, or to dbt’s training and services. The dbt community has long called for a more intentional approach to data quality and testing - data quality is on the industry’s mind! In fact, <a href="https://www.getdbt.com/resources/reports/state-of-analytics-engineering-2024" target="_blank" rel="noopener noreferrer">57% of respondents</a> to dbt’s 2024 State of Analytics Engineering survey said that data quality is a predominant issue facing their day-to-day work.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-does-dta-qual1ty-even-mean">What does d@tA qUaL1Ty even mean?!<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#what-does-dta-qual1ty-even-mean" class="hash-link" aria-label="Direct link to What does d@tA qUaL1Ty even mean?!" title="Direct link to What does d@tA qUaL1Ty even mean?!">​</a></h3>
<p>High-quality data is <em>trusted</em> and <em>used frequently.</em> It doesn’t get argued over or endlessly scrutinized for matching to other data. Data <em>testing</em> should lead to higher data <em>quality</em> and insights, period.</p>
<p>Best practices in data quality are still nascent. That said, a lot of important baseline work has been done here. There are <a href="https://medium.com/@AtheonAnalytics/mastering-data-testing-with-dbt-part-1-689b2a025675" target="_blank" rel="noopener noreferrer">case</a> <a href="https://medium.com/@AtheonAnalytics/mastering-data-testing-with-dbt-part-2-c4031af3df18" target="_blank" rel="noopener noreferrer">studies</a> on implementing dbt testing well. dbt Labs also has an <a href="https://learn.getdbt.com/courses/advanced-testing" target="_blank" rel="noopener noreferrer">Advanced Testing</a> course, emphasizing that testing should spur action and be focused and informative enough to help address failures. You can even enforce testing best practices and dbt Labs’s own best practices using the <a href="https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest/" target="_blank" rel="noopener noreferrer">dbt_meta_testing</a> or <a href="https://github.com/dbt-labs/dbt-project-evaluator" target="_blank" rel="noopener noreferrer">dbt_project_evaluator</a> packages and dbt Explorer’s <a href="https://docs.getdbt.com/docs/explore/project-recommendations" target="_blank" rel="noopener noreferrer">Recommendations</a> page.</p>
<p>The missing piece is still cohesion and guidance for everyday practitioners to help develop their testing framework.</p>
<p>To recap, we’re going to start with:</p>
<ul>
<li>identifying <em>data hygiene</em> issues</li>
<li>identifying <em>business-focused anomaly</em> issues</li>
<li>identifying <em>stats-focused anomaly</em> issues</li>
</ul>
<p>Next, we’ll prioritize. We will:</p>
<ul>
<li>think through each concern in terms of the breadth of impact</li>
<li>decide if each concern should be at error or warning severity</li>
</ul>
<p>Get a pen and paper (or a google doc) and join us in constructing your own testing framework.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="identifying-data-quality-issues-in-your-pipeline">Identifying data quality issues in your pipeline<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#identifying-data-quality-issues-in-your-pipeline" class="hash-link" aria-label="Direct link to Identifying data quality issues in your pipeline" title="Direct link to Identifying data quality issues in your pipeline">​</a></h2>
<p>Let’s start our framework by <em>identifying</em> types of data quality issues.</p>
<p>In our daily work with customers, we find that data quality issues tend to fall into one of three broad buckets: <em>data hygiene, business-focused anomalies,</em> and <em>stats-focused anomalies.</em> Read the bucket descriptions below, and list 2-3 data quality concerns in your own business context that fall into each bucket.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="bucket-1-data-hygiene">Bucket 1: Data hygiene<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#bucket-1-data-hygiene" class="hash-link" aria-label="Direct link to Bucket 1: Data hygiene" title="Direct link to Bucket 1: Data hygiene">​</a></h3>
<p><em>Data hygiene</em> issues are concerns you address in your <a href="https://docs.getdbt.com/best-practices/how-we-structure/2-staging" target="_blank" rel="noopener noreferrer">staging layer.</a> Hygienic data meets your expectations around formatting, completeness, and granularity requirements. Here are a few examples.</p>
<ul>
<li><em>Granularity:</em> primary keys are unique and not null. Duplicates throw off calculations.</li>
<li><em>Completeness:</em> columns that should always contain text, <em>do.</em> Incomplete data often has to get excluded, reducing your overall analytical power.</li>
<li><em>Formatting:</em> email addresses always have a valid domain. Incorrect emails may affect things like marketing outreach.</li>
</ul>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="bucket-2-business-focused-anomalies">Bucket 2: Business-focused anomalies<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#bucket-2-business-focused-anomalies" class="hash-link" aria-label="Direct link to Bucket 2: Business-focused anomalies" title="Direct link to Bucket 2: Business-focused anomalies">​</a></h3>
<p><em>Business-focused anomalies</em> catch unexpected behavior. You can flag unexpected behavior by clearly defining <em>expected</em> behavior. <em>Business-focused anomalies</em> are when aspects of the data differ from what you know to be typical in your business. You’ll know what’s typical either through your own analyses, your colleagues’ analyses, or things your stakeholder homies point out to you.</p>
<p>Since business-focused anomaly testing is set by a human, it will be fluid and need to be adjusted periodically. Here’s an example.</p>
<p>Imagine you’re a sales analyst. Generally, you know that if your daily sales amount goes up or down by more than 20% daily, that’s bad. Specifically, it’s usually a warning sign for fraud or the order management system (OMS) dropping orders. You set a test in dbt to fail if any given day’s sales amount is a delta of 20% from the previous day. This works for a while.</p>
<p>Then, you have a stretch of 3 months where your test fails 5 times a week! Every time you investigate, it turns out to be valid consumer behavior. You’re suddenly in hypergrowth, and sales are legitimately increasing that much.</p>
<p>Your 20%-change fraud and OMS failure detector is no longer valid. You need to investigate anew which sales spikes or drops indicate fraud or OMS problems. Once you figure out a new threshold, you’ll go back and adjust your testing criteria.</p>
<p>Although your data’s expected behavior will shift over time, you should still commit to defining business-focused anomalies to grow your understanding of what is normal for your data.</p>
<p>Here’s how to identify potential anomalies.</p>
<p>Start at your business intelligence (BI) layer. Pick 1-3 dashboards or tables that you <em>know</em> are used frequently. List these 1-3 dashboards or tables. For each dashboard or table you have, identify 1-3 “expected” behaviors that your end-users rely on.  Here are a few examples to get you thinking:</p>
<ul>
<li>Revenue numbers should not change by more than X% in Y amount of time. This could indicate fraud or OMS problems.</li>
<li>Monthly active users should not decline more than X% after the initial onboarding period. This might indicate user dissatisfaction, usability issues, or that users not finding a feature valuable.</li>
<li>Exam passing rates should stay above Y%.  A decline below that threshold may indicate recent content changes or technical issues are affecting understanding or accessibility.</li>
</ul>
<p>You should also consider what data issues you have had in the past! Look through recent data incidents and pick out 3 or 4 to guard against next time. These might be in a #data-questions channel or perhaps a DM from a stakeholder.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="bucket-3-stats-focused-anomalies">Bucket 3: Stats-focused anomalies<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#bucket-3-stats-focused-anomalies" class="hash-link" aria-label="Direct link to Bucket 3: Stats-focused anomalies" title="Direct link to Bucket 3: Stats-focused anomalies">​</a></h3>
<p><em>Stats-focused anomalies</em> are fluctuations that go against your expected volumes or metrics.  Some examples include:</p>
<ul>
<li>Volume anomalies. This could be site traffic amounts that may indicate illicit behavior, or perhaps site traffic dropping one day then doubling the next, indicating that a chunk of data were not loaded properly.</li>
<li>Dimensional anomalies, like too many product types underneath a particular product line that may indicate incorrect barcodes.</li>
<li>Column anomalies, like sale values more than a certain number of standard deviations from a mean, that may indicate improper discounting.</li>
</ul>
<p>Overall, stats-focused anomalies can indicate system flaws, illicit site behavior, or fraud, depending on your industry. They also tend to require more advanced testing practices than we are covering in this blog. We feel stats-based anomalies are worth exploring once you have a good handle on your data hygiene and business-focused anomalies. We won’t give recommendations on stats-focused anomalies in this post.</p>
<h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-to-prioritize-data-quality-concerns-in-your-pipeline">How to prioritize data quality concerns in your pipeline<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#how-to-prioritize-data-quality-concerns-in-your-pipeline" class="hash-link" aria-label="Direct link to How to prioritize data quality concerns in your pipeline" title="Direct link to How to prioritize data quality concerns in your pipeline">​</a></h2>
<p>Now, you have a written and categorized list of data hygiene concerns and business-focused anomalies to guard against. It’s time to <em>prioritize</em> which quality issues deserve to fail your pipelines.</p>
<p>To prioritize your data quality concerns, think about real-life impact. A couple of guiding questions to consider are:</p>
<ul>
<li>Are your numbers <em>customer-facing?</em> For example, maybe you work with temperature-tracking devices. Your customers rely on these devices to show them average temperatures on perishable goods like strawberries in-transit. What happens if the temperature of the strawberries reads as 300C when they know their refrigerated truck was working just fine? How is your brand perception impacted when the numbers are wrong?</li>
<li>Are your numbers <em>used to make financial decisions?</em> For example, is the marketing team relying on your numbers to choose how to spend campaign funds?</li>
<li>Are your numbers <em>executive-facing?</em> Will executives use these numbers to reallocate funds or shift priorities?</li>
</ul>
<p>We think these 3 categories above constitute high-impact, pipeline-failing events, and should be your top priorities. Of course, adjust priority order if your business context calls for it.</p>
<p>Consult your list of data quality issues in the categories we mention above. Decide and mark if any are customer facing, used for financial decisions, or are executive-facing. Mark any data quality issues in those categories as “error”. These are your pipeline-failing events.</p>
<p>If any data quality concerns fall outside of these 3 categories, we classify them as <strong>nice-to-knows</strong>. <strong>Nice-to-know</strong> data quality testing <em>can</em> be helpful. But if you don’t have a <em>specific action you can immediately take</em> when a nice-to-know quality test fails, the test <em>should be a warning, not an error.</em></p>
<p>You could also remove nice-to-know tests altogether. Data testing should drive action. The more alerts you have in your pipeline, the less action you will take. Configure alerts with care!</p>
<p>However, we do think nice-to-know tests are worth keeping <em>if and only if</em> you are gathering evidence for action you plan to take within the next 6 months, like product feature research. In a scenario like that, those tests should still be set to warning.</p>
<h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="start-your-action-plan">Start your action plan<a href="https://docs.getdbt.com/blog/test-smarter-not-harder#start-your-action-plan" class="hash-link" aria-label="Direct link to Start your action plan" title="Direct link to Start your action plan">​</a></h3>
<p>Now, your data quality concerns are listed and prioritized. Next, add 1 or 2 initial debugging steps you will take if/when the issues surface. These steps should get added to your framework document. Additionally, consider adding them to a <a href="https://discourse.getdbt.com/t/is-it-possible-to-add-a-description-to-singular-tests/5472/4" target="_blank" rel="noopener noreferrer">test’s description.</a></p>
<p>This step is <em>important.</em> Data quality testing should spur action, not accumulate alerts. Listing initial debugging steps for each concern will refine your list to the most critical elements.</p>
<p>If you can't identify an action step for any quality issue, <em>remove it</em>. Put it on a backlog and research what you can do when it surfaces later.</p>
<p>Here’s a few examples from our list of unexpected behaviors above.</p>
<ul>
<li>For calculated field X, a value above Y or below Z is not possible.<!-- -->
<ul>
<li><em>Debugging initial steps</em>
<ul>
<li>Use dbt test SQL or recent test results in dbt Explorer to find problematic rows</li>
<li>Check these rows in staging and first transformed model</li>
<li>Pinpoint where unusual values first appear</li>
</ul>
</li>
</ul>
</li>
<li>Revenue shouldn’t change by more than X% in Y amount of time.<!-- -->
<ul>
<li><em>Debugging initial steps:</em>
<ul>
<li>Check recent revenue values in staging model</li>
<li>Identify transactions near min/max values</li>
<li>Discuss outliers with sales ops team</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>You now have written out a prioritized list of data quality concerns, as well as action steps to take when each concern surfaces. Next, consult <a href="http://hub.getdbt.com/" target="_blank" rel="noopener noreferrer">hub.getdbt.com</a> and find tests that address each of your highest priority concerns. <a href="https://hub.getdbt.com/calogica/dbt_expectations/latest/" target="_blank" rel="noopener noreferrer">dbt-expectations</a> and <a href="https://hub.getdbt.com/dbt-labs/dbt_utils/latest/" target="_blank" rel="noopener noreferrer">dbt_utils</a> are great places to start.</p>
<p>The data tests you’ve marked as “errors” above should get error-level severity. Any concerns falling into that nice-to-know category should either <em>not get tested</em> or have their tests <em>set to warning.</em></p>
<p>Your data quality priorities list is a living reference document. We recommend linking it in your project’s README so that you can go back and edit it as your testing needs evolve. Additionally, developers in your project should have easy access to this document. Maintaining good data quality is everyone’s responsibility!</p>
<p>As you try these ideas out, come to the dbt Community Slack and let us know what works and what doesn’t. Data is a community of practice, and we are eager to hear what comes out of yours.</p>]]></content>
        <author>
            <name>Faith McKenna</name>
        </author>
        <author>
            <name>Jerrie Kumalah Kenney</name>
        </author>
        <category label="analytics craft" term="analytics craft"/>
    </entry>
</feed>