<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: 樹神宇徳</title>
    <description>The latest articles on Forem by 樹神宇徳 (@kotama).</description>
    <link>https://forem.com/kotama</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867832%2Ffb2d1906-e196-421b-aa33-e8b1b49317e7.jpg</url>
      <title>Forem: 樹神宇徳</title>
      <link>https://forem.com/kotama</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kotama"/>
    <language>en</language>
    <item>
      <title>ARI — A Universal Research Automation System That Runs from Laptop to Supercomputer</title>
      <dc:creator>樹神宇徳</dc:creator>
      <pubDate>Wed, 08 Apr 2026 14:37:51 +0000</pubDate>
      <link>https://forem.com/kotama/ari-a-universal-research-automation-system-that-runs-from-laptop-to-supercomputer-32g</link>
      <guid>https://forem.com/kotama/ari-a-universal-research-automation-system-that-runs-from-laptop-to-supercomputer-32g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The idea of automating research isn't new. Since Sakana AI's AI Scientist v2, there have been many attempts to hand the entire research process over to LLM agents. But in practice, these systems require either a cloud budget, an in-house engineering team, or domain-specific tooling — making them tools for the few who already have resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARI (Artificial Research Intelligence)&lt;/strong&gt; is an open-source research automation system designed to tear down that wall. It runs identically on a local Ollama instance on your laptop and on a SLURM supercomputer cluster with commercial APIs — using a single Markdown file. The core contains zero hardcoded domain knowledge; every decision is made by the LLM at runtime. This design means the same pipeline can handle HPC performance benchmarks, ML hyperparameter tuning, and — in principle — chemistry optimization.&lt;/p&gt;

&lt;p&gt;In this article, I'll introduce the system design and walk through a real 11-page SpMM performance analysis paper that ARI produced with zero human intervention.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project homepage: &lt;a href="https://kotama7.github.io/ARI/" rel="noopener noreferrer"&gt;https://kotama7.github.io/ARI/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/kotama7/ARI" rel="noopener noreferrer"&gt;https://github.com/kotama7/ARI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3-Line Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: A Markdown file describing your research goal (minimum 3 lines)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Experiment code, measured data, figures, LaTeX paper, peer review, and reproducibility verification report&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment&lt;/strong&gt;: Seamlessly switches from laptop (local Ollama) to HPC cluster (SLURM + commercial API) with the same experiment file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Current version: &lt;strong&gt;v0.4.1&lt;/strong&gt; (released 2026-04-08). Includes a 9-page React/TypeScript web dashboard, 14 MCP skills, and documentation in 3 languages.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why ARI — Democratizing Research Automation
&lt;/h2&gt;

&lt;p&gt;Research automation has historically required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expensive cloud budgets&lt;/li&gt;
&lt;li&gt;In-house engineering teams&lt;/li&gt;
&lt;li&gt;Domain-specific tools that don't generalize&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ARI is built on a single claim: &lt;em&gt;the distance from "I have an idea" to "I have results" should be measured in hours, not months — regardless of your resources.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The system scales along 5 axes with one unified codebase:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Minimal&lt;/th&gt;
&lt;th&gt;Full&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;td&gt;Laptop (local process)&lt;/td&gt;
&lt;td&gt;Supercomputer (SLURM cluster)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Local Ollama (qwen3:8b)&lt;/td&gt;
&lt;td&gt;Commercial API (GPT-4, Claude)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiment spec&lt;/td&gt;
&lt;td&gt;3-line &lt;code&gt;.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Detailed SLURM scripts + rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain&lt;/td&gt;
&lt;td&gt;Compute benchmarks&lt;/td&gt;
&lt;td&gt;Physical world (robotics, sensors, lab)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expertise&lt;/td&gt;
&lt;td&gt;Beginner (goal only)&lt;/td&gt;
&lt;td&gt;Expert (full parameter control)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The minimal experiment file really is just this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Matrix Multiply Optimization&lt;/span&gt;
&lt;span class="gu"&gt;## Research Goal&lt;/span&gt;
Maximize GFLOPS of DGEMM on this machine.
&lt;span class="c"&gt;&amp;lt;!-- metric_keyword: GFLOPS --&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From this 3-line goal, ARI runs survey → hypothesis generation → implementation → execution → figure generation → paper writing → reproducibility verification end-to-end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture — "experiment.md → paper + verification report"
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;experiment.md ──► ARI Core ──► results + paper + reproducibility report
                      │
          ┌───────────┼──────────────────────┐
          │           │                      │
     BFTS Engine   ReAct Loop         Post-BFTS Pipeline
  (Best-First    (per-node agent)   (workflow.yaml driven)
   Tree Search)        │
                  MCP Skill Servers
                  (plugin system)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ARI's core has three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BFTS (Best-First Tree Search) engine&lt;/strong&gt; — explores the hypothesis space evidence-driven, not exhaustively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ReAct loop&lt;/strong&gt; — LLM agent running per node: reasoning → tool call → observation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP skill servers&lt;/strong&gt; — purely functional tools implemented via Model Context Protocol (HPC job submission, paper generation, figure generation, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After BFTS completes, the Post-BFTS Pipeline defined in &lt;code&gt;workflow.yaml&lt;/code&gt; runs data extraction → figure generation → paper writing → peer review → reproducibility verification automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  End-to-End Data Flow (10 Steps)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Survey&lt;/strong&gt; — fetch related work from arXiv / Semantic Scholar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hypothesis generation&lt;/strong&gt; — VirSci-style multi-agent deliberation determines hypotheses, key metrics, and evaluation criteria&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tree search&lt;/strong&gt; — BFTS expands candidate nodes in priority order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment execution&lt;/strong&gt; — ReAct agent generates, compiles, and runs code per node (auto-polls until SLURM job completes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peer review evaluation&lt;/strong&gt; — LLMEvaluator assigns &lt;code&gt;scientific_score&lt;/code&gt; (0.0–1.0)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tree-wide analysis&lt;/strong&gt; — Transform skill BFS-traverses the tree to extract hardware/method/ablation insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Figure generation&lt;/strong&gt; — Plot skill's LLM writes matplotlib code and outputs PDF figures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaTeX paper writing&lt;/strong&gt; — Paper skill generates a full paper with BibTeX citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paper peer review&lt;/strong&gt; — LLM acts as referee and scores the paper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility verification&lt;/strong&gt; — A separate ReAct agent reads only the paper text, re-runs the experiment, and cross-checks claimed values against actual measurements&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 10 is worth highlighting: the reproducibility agent reads &lt;strong&gt;only the paper&lt;/strong&gt; — no access to the original experiment setup. This checks whether the methods described in the paper are actually sufficient to reproduce the results. This is a check that human peer review cannot realistically perform.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Design — Zero Domain Knowledge Principle
&lt;/h2&gt;

&lt;p&gt;Reading the ARI source code, you'll notice something: &lt;code&gt;ari-core&lt;/code&gt; contains zero domain-specific keywords for HPC, ML, chemistry, or anything else. This is not accidental — it's a &lt;strong&gt;design invariant enforced in code review&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;❌ Forbidden&lt;/th&gt;
&lt;th&gt;✅ Correct&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;if "GFLOP" in metric_name&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use LLM's &lt;code&gt;scientific_score&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;`grep -i "gcc\&lt;/td&gt;
&lt;td&gt;openmp"`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Compare against MKL" in prompt&lt;/td&gt;
&lt;td&gt;LLM decides comparisons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardcode figure type&lt;/td&gt;
&lt;td&gt;LLM chooses from data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;+0.2&lt;/code&gt; score weight&lt;/td&gt;
&lt;td&gt;LLM scores holistically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;lscpu&lt;/code&gt; in system prompt&lt;/td&gt;
&lt;td&gt;LLM calls it if needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The core specifies only three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Format&lt;/strong&gt;: tool calls in JSON, experiment descriptions in Markdown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol&lt;/strong&gt;: skill communication via MCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signal&lt;/strong&gt;: BFTS ranking via LLM-assigned &lt;code&gt;scientific_score&lt;/code&gt; (0.0–1.0)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else — what to measure, what to compare, which hardware info matters, which figures to draw, which citations to include — is determined autonomously by the LLM at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;scientific_score&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;Earlier versions of ARI (pre-v0.2) ranked nodes using domain-specific keywords like &lt;code&gt;gflop&lt;/code&gt;, &lt;code&gt;bandwidth&lt;/code&gt;. This worked for HPC but silently failed for other domains.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;scientific_score&lt;/code&gt; is a 0.0–1.0 quality signal assigned holistically by an LLM acting as peer reviewer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the experiment actually generate measured values?&lt;/li&gt;
&lt;li&gt;Did it compare against existing methods?&lt;/li&gt;
&lt;li&gt;Is the methodology reproducible?&lt;/li&gt;
&lt;li&gt;Do the results support a clear scientific claim?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM decides the weights; ARI only reads the number. This lets the same BFTS engine work equally well for HPC benchmarks, ML hyperparameter tuning, and chemistry optimization.&lt;/p&gt;




&lt;h2&gt;
  
  
  BFTS — Treating Failures as Information, Not Noise
&lt;/h2&gt;

&lt;p&gt;ARI's BFTS runs with a two-pool structure: &lt;strong&gt;pending&lt;/strong&gt; (nodes waiting to run) and &lt;strong&gt;frontier&lt;/strong&gt; (completed but unexpanded nodes).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bfts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;frontier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;all_nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_total_nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 1: LLM selects best frontier node to expand
&lt;/span&gt;        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;frontier&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_parallel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_select_best_to_expand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frontier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;frontier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;children&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_propose_directions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# improve / ablation / validation
&lt;/span&gt;            &lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;all_nodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Parallel batch execution
&lt;/span&gt;        &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_select_next_nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_parallel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parallel_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eval_summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;frontier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# success or failure — both go to frontier
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_nodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_scientific_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four key design points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lazy expansion&lt;/strong&gt;: Completed nodes aren't expanded until selected by the LLM. Low-scoring nodes stay in "holding" indefinitely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failures are not retried&lt;/strong&gt;: A failed node spawns a &lt;code&gt;debug&lt;/code&gt;-labeled child that inherits the failure context. This is not a retry — retries treat failure as noise; ARI's &lt;code&gt;expand()&lt;/code&gt; treats failure as signal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict budget management&lt;/strong&gt;: &lt;code&gt;len(all_nodes) &amp;lt; max_total_nodes&lt;/code&gt; is the only termination condition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node labels&lt;/strong&gt;: &lt;code&gt;draft&lt;/code&gt;, &lt;code&gt;improve&lt;/code&gt;, &lt;code&gt;debug&lt;/code&gt;, &lt;code&gt;ablation&lt;/code&gt;, &lt;code&gt;validation&lt;/code&gt; — each communicates intent and context to the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ancestor-Chain-Scoped Memory
&lt;/h3&gt;

&lt;p&gt;Each node has memory it can only read from its own ancestor chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root ──▶ memory["root"]
  ├─ node_A ──▶ memory["node_A"]
  │   ├─ node_A1 (reads: root + node_A)
  │   └─ node_A2 (reads: root + node_A, NOT node_A1)
  └─ node_B (reads: root only, NOT node_A branch)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sibling nodes don't share memory, so parallel branches can't contaminate each other. Memory queries use the node's own &lt;code&gt;eval_summary&lt;/code&gt; (not domain keywords), keeping search results semantically relevant to that node's work.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP Skills — "LLM Reasons, Skills Execute"
&lt;/h2&gt;

&lt;p&gt;All side effects in ARI go through MCP skill servers. Each skill is an independent process exposing functions via &lt;code&gt;@mcp.tool()&lt;/code&gt; decorator.&lt;/p&gt;

&lt;p&gt;This gives three properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Each skill runs in its own process. A bug in paper generation can't break HPC job submission&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replaceability&lt;/strong&gt;: Any skill can be swapped without touching others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discoverability&lt;/strong&gt;: LLM agents discover available tools at runtime. Adding a skill = adding a capability, no agent reprogramming needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;14 skills are available; 9 are registered by default in &lt;code&gt;workflow.yaml&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Uses LLM?&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-hpc&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SLURM submission / polling / Singularity / bash&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-evaluator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Metric spec extraction from experiment file&lt;/td&gt;
&lt;td&gt;△&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-idea&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;arXiv survey + VirSci hypothesis generation&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-web&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DuckDuckGo / arXiv / Semantic Scholar / citation crawl&lt;/td&gt;
&lt;td&gt;△&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ancestor-scoped node memory (JSONL)&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-transform&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BFTS tree → science-ready data format&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-plot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Matplotlib / seaborn figure generation&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-paper&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LaTeX writing + BibTeX + peer review&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ari-skill-paper-re&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ReAct reproducibility verification&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Skills that can be written deterministically (&lt;code&gt;ari-skill-hpc&lt;/code&gt;, &lt;code&gt;ari-skill-memory&lt;/code&gt;) use no LLM. LLM calls have both cost and latency; "use a pure function if you can" is ARI's policy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension Path to the Physical World
&lt;/h3&gt;

&lt;p&gt;The MCP plugin architecture is intentionally designed to grow beyond computation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Today (compute):
  ari-skill-hpc    → SLURM job submission
  ari-skill-evaluator → metric extraction from stdout
  ari-skill-paper  → LaTeX paper writing

Tomorrow (physical world):
  ari-skill-robot  → robot arm control via ROS2 MCP bridge
  ari-skill-sensor → temperature / pressure sensor reads
  ari-skill-labware → pipette control, plate reader integration
  ari-skill-camera → experiment observation via computer vision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adding these requires &lt;strong&gt;no changes to &lt;code&gt;ari-core&lt;/code&gt;&lt;/strong&gt;. Write a &lt;code&gt;server.py&lt;/code&gt; with &lt;code&gt;@mcp.tool()&lt;/code&gt; functions, register it in &lt;code&gt;workflow.yaml&lt;/code&gt;. The same infrastructure that optimizes compiler flags today can optimize reaction temperatures tomorrow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Web Dashboard — The Main Interface
&lt;/h2&gt;

&lt;p&gt;ARI v0.4.x ships a 9-page React/TypeScript SPA dashboard as the main interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ari viz ./checkpoints/ &lt;span class="nt"&gt;--port&lt;/span&gt; 8765  &lt;span class="c"&gt;# http://localhost:8765&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Page&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Home&lt;/td&gt;
&lt;td&gt;Quick actions, recent experiments, system status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Experiment&lt;/td&gt;
&lt;td&gt;4-step wizard (chat / write / upload goal → scope → resources → launch)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiments&lt;/td&gt;
&lt;td&gt;List / delete / resume all checkpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitor&lt;/td&gt;
&lt;td&gt;Real-time phase stepper (Idle → Idea → BFTS → Paper → Review), SSE live log, cost tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tree&lt;/td&gt;
&lt;td&gt;Interactive BFTS node tree — open any node to see metrics, tool call trace, generated code, stdout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Results&lt;/td&gt;
&lt;td&gt;View/download paper (PDF/TeX), review report, reproducibility results, generated figures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideas&lt;/td&gt;
&lt;td&gt;VirSci-generated hypotheses with novelty / feasibility scores and gap analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflow&lt;/td&gt;
&lt;td&gt;Edit post-BFTS pipeline stages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Settings&lt;/td&gt;
&lt;td&gt;LLM provider, API keys, SLURM partition auto-detect, SSH remote test&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Real-time updates via WebSocket (tree changes) + SSE (log streaming).&lt;/p&gt;




&lt;h2&gt;
  
  
  Results — A Paper ARI Wrote Entirely on Its Own
&lt;/h2&gt;

&lt;p&gt;Here's what ARI autonomously produced: &lt;strong&gt;"Stoch-Loopline: Burstiness- and Tail-Latency-Aware Loopline Modeling for Robust Multi-Core CPU CSR SpMM Scaling"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Artificial Research Intelligence — April 6, 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research theme&lt;/strong&gt;: Performance modeling of CSR SpMM (sparse matrix × dense matrix multiply) on multi-core CPU&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware&lt;/strong&gt;: Fujitsu fx700 node, OpenMP 32 threads&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Existing roofline models predict average throughput only — they can't capture the non-monotonic performance variation depending on sparsity patterns and dense width N. The goal: model bursty, irregular memory access and associated tail latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  What ARI Autonomously Produced
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New analytical model&lt;/td&gt;
&lt;td&gt;Stoch-Loopline — extends loopline/roofline with burstiness, tail latency, and "scaling collapse risk"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 kernel implementations&lt;/td&gt;
&lt;td&gt;Variant-1 (row-parallel gather + explicit unroll) / Variant-3 (rows-in-flight window)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ablation study&lt;/td&gt;
&lt;td&gt;K-blocking / N-tiling+packing / scalar / no-AVX / prefetch disabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Synthetic CSR generator&lt;/td&gt;
&lt;td&gt;Uniform and Zipf modes (lognormal-based), with CV / skewness / Gini statistics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiment sweep&lt;/td&gt;
&lt;td&gt;Up to M = K = 200,000 (~3.2M nonzeros), dense width N ∈ {4, 8, 16, 32, 64, 128}&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 figures&lt;/td&gt;
&lt;td&gt;Throughput/bandwidth curves, operating point scatter plot, prefetch ablation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;References&lt;/td&gt;
&lt;td&gt;Alappat et al. (2020, 2021), Trotter et al. (2020), Lei et al. (2025) — auto-collected via Semantic Scholar&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Numerical Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;GFLOP/s&lt;/th&gt;
&lt;th&gt;Effective BW&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;K-blocked CSR SpMM (peak)&lt;/td&gt;
&lt;td&gt;23.82&lt;/td&gt;
&lt;td&gt;58.30 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation sweep (N=16, 32 threads)&lt;/td&gt;
&lt;td&gt;26.22&lt;/td&gt;
&lt;td&gt;65.55 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max measured BW (root sweep)&lt;/td&gt;
&lt;td&gt;17.17&lt;/td&gt;
&lt;td&gt;105.18 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Software prefetch improvement (width avg)&lt;/td&gt;
&lt;td&gt;+3.53&lt;/td&gt;
&lt;td&gt;+8.18 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most interesting: ARI autonomously discovered a &lt;strong&gt;"scaling collapse"&lt;/strong&gt; phenomenon. Increasing dense width N from 64 → 256 causes throughput to &lt;em&gt;drop&lt;/em&gt; from 26.22 → ~18.3 GFLOP/s and bandwidth from 65.5 → 41–42 GB/s. This is counterintuitive — you'd expect higher N to improve compute density. The paper explains this via Stoch-Loopline's "tail latency amplification" and "collapse risk" concepts.&lt;/p&gt;

&lt;p&gt;The pseudocode (Algorithm 1 / Algorithm 2) matches the actual compiled &lt;code&gt;spmm_stoch_loopline.cpp&lt;/code&gt; source structure and unroll counts (&lt;code&gt;unroll ∈ {4, 8}&lt;/code&gt;) exactly — because the Transform skill actually reads the source code from the BFTS tree before writing the paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Not Just "Write a Paper and Stop" — Reproducibility Loop
&lt;/h3&gt;

&lt;p&gt;After writing the paper, &lt;code&gt;ari-skill-paper-re&lt;/code&gt; automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Text-extracts the paper PDF&lt;/li&gt;
&lt;li&gt;Reads the configuration&lt;/li&gt;
&lt;li&gt;Re-runs the job&lt;/li&gt;
&lt;li&gt;Cross-checks claimed values against actual measurements&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the paper claims "26.22 GFLOP/s", a separate agent independently verifies that number is reproducible using only the paper text as its information source. This makes reproducibility a &lt;strong&gt;first-class design principle&lt;/strong&gt;, not an afterthought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install&lt;/span&gt;
git clone https://github.com/kotama7/ARI &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;ARI
bash setup.sh

&lt;span class="c"&gt;# 2. Set up AI model (choose one)&lt;/span&gt;
ollama pull qwen3:8b              &lt;span class="c"&gt;# free, local&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARI_BACKEND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openai &lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-…  &lt;span class="c"&gt;# or cloud API&lt;/span&gt;

&lt;span class="c"&gt;# 3a. Launch dashboard&lt;/span&gt;
ari viz ./checkpoints/ &lt;span class="nt"&gt;--port&lt;/span&gt; 8765
&lt;span class="c"&gt;# Open http://localhost:8765 → use wizard to create and launch experiments&lt;/span&gt;

&lt;span class="c"&gt;# 3b. Or run directly from CLI&lt;/span&gt;
ari run experiment.md             &lt;span class="c"&gt;# local experiment&lt;/span&gt;
ari run experiment.md &lt;span class="nt"&gt;--profile&lt;/span&gt; hpc  &lt;span class="c"&gt;# SLURM cluster (auto-detect + profile)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three environment profiles are provided: &lt;code&gt;laptop.yaml&lt;/code&gt; / &lt;code&gt;hpc.yaml&lt;/code&gt; / &lt;code&gt;cloud.yaml&lt;/code&gt;. &lt;code&gt;ari/env_detect.py&lt;/code&gt; auto-detects scheduler (SLURM / PBS / LSF / SGE / Kubernetes) and container runtime (Docker / Singularity / Apptainer).&lt;/p&gt;

&lt;p&gt;After a run, output is organized in &lt;code&gt;./checkpoints/&amp;lt;run_id&amp;gt;/&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tree.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full BFTS node tree (all nodes, metrics, parent-child links)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;science_data.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Science-ready formatted data (no internal BFTS terminology)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;full_paper.tex&lt;/code&gt; / &lt;code&gt;.pdf&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Generated LaTeX paper and PDF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;review_report.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM peer review scores and feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reproducibility_report.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Independent reproducibility verification results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;figures_manifest.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Figure paths and captions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cost_trace.jsonl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-call LLM cost tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;experiments/&amp;lt;slug&amp;gt;/&amp;lt;node_id&amp;gt;/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-node working directory and generated code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Design Principles Summary
&lt;/h2&gt;

&lt;p&gt;Five principles ARI never violates:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P1&lt;/td&gt;
&lt;td&gt;Domain-agnostic core&lt;/td&gt;
&lt;td&gt;Zero experiment-specific knowledge in &lt;code&gt;ari-core&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P2&lt;/td&gt;
&lt;td&gt;Deterministic by default&lt;/td&gt;
&lt;td&gt;MCP tools are deterministic by default; LLM-using tools are explicitly annotated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P3&lt;/td&gt;
&lt;td&gt;Multi-purpose metrics&lt;/td&gt;
&lt;td&gt;No hardcoded scalar scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P4&lt;/td&gt;
&lt;td&gt;Dependency injection&lt;/td&gt;
&lt;td&gt;Switching experiments = editing &lt;code&gt;.md&lt;/code&gt; only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P5&lt;/td&gt;
&lt;td&gt;Reproducibility first&lt;/td&gt;
&lt;td&gt;Hardware described by specs, not cluster names&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And the anti-goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not a replacement for experts (an amplifier)&lt;/li&gt;
&lt;li&gt;Not operated without human supervision at physical risk boundaries&lt;/li&gt;
&lt;li&gt;Not a black box (every decision is logged and traceable)&lt;/li&gt;
&lt;li&gt;Not hardcoding "what good science looks like"&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌐 Project homepage (demo + sample paper viewer): &lt;a href="https://kotama7.github.io/ARI/" rel="noopener noreferrer"&gt;https://kotama7.github.io/ARI/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 Sample paper PDF: &lt;a href="https://kotama7.github.io/ARI/sample_paper.pdf" rel="noopener noreferrer"&gt;https://kotama7.github.io/ARI/sample_paper.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 GitHub (MIT): &lt;a href="https://github.com/kotama7/ARI" rel="noopener noreferrer"&gt;https://github.com/kotama7/ARI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎬 Dashboard demo (EN): &lt;a href="https://github.com/kotama7/ARI/raw/main/docs/movie/en/ari_dashboard_demo.mp4" rel="noopener noreferrer"&gt;https://github.com/kotama7/ARI/raw/main/docs/movie/en/ari_dashboard_demo.mp4&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>autoresearch</category>
      <category>ai4science</category>
    </item>
  </channel>
</rss>
