LatchBio

Biotechnology Research

San Francisco, CA 8,579 followers

Data Infrastructure for Biology

See jobs Follow

Discover all 20 employees

About us

AI agents and data infrastructure for 40+ biotech solution providers. Bundled with kits + machines to serve 300+ biopharmas and R&D Labs around the world.

Website: https://latch.bio/
External link for LatchBio
Industry: Biotechnology Research
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Founded: 2021

Locations

Primary

San Francisco, CA 94103, US

Get directions

Employees at LatchBio

See all employees

Updates

LatchBio reposted this
Alfredo Andere 🦖
2d
Report this post
yesterday i wrote about why writing a biology benchmark isn't like writing a test suite. tolerances need to encode biological variation - sometimes methodological but sometimes wrong. but that's only half of it. each scBench eval is a snapshot of experimental data immediately before an analysis step. prior preprocessing already applied. target analysis not yet performed. that design choice creates a problem we had to engineer against. AnnData objects accumulate state. when you snapshot real workflows, that state comes with the data. for example you might forget adata.obsm["X_pca"] sitting in files where the eval was testing whether the agent could compute that representation it woudl . if you don't remove those fields, the eval measures whether the agent can find the answer in a file. for every eval in our benchmark, three adversarial checks before it was allowed in: can you answer from .obs/.uns directly? from textbook knowledge? do wrong answers leak the right one by being biologically implausible? any problem that did not pass one of the checks was revised or cut. one example: agent given DRG (peripheral nervous system) data, asked which brain-region signature scores highest. correct answer is microglia homeostatic - not because microglia are present, but because shared macrophage-like markers make that signature score highest in the data. textbook biology gets it wrong. computation gets it right. tolerances encode what variation is methodological. shortcut removal encodes what the agent is actually being asked to do. both required before your benchmark accuracy means anything. the best model hit 52.8%. its not much but its honest work.
2 Comments

Like Comment Share
LatchBio reposted this
Alfredo Andere 🦖
3d
Report this post
ask an agent to cluster a scRNA-seq dataset and report the number of clusters. what's the right answer? it depends on the resolution parameter. the algorithm. the tissue type. whether you're doing discovery or validation. two expert bioinformaticians looking at the same data will defensibly arrive at different numbers, and both could be correct. this is the grading problem when testing LLMs. in software, the test suite is the oracle. run the tests, they pass or they fail. writing good tests is not that straightforward in computational biology, it's why benchmark design for biology is a fundamentally different engineering challenge than benchmark design for code. here's what it actually takes: you have to run the analysis with many valid methods, measure the range of outputs those methods produce, and calibrate a tolerance window that accepts all of them while still rejecting genuinely wrong answers. for cell counts after QC filtering: ±50 cells on a dataset of 1.4M. for cell type distributions: ±5 percentage points per category. for gene marker lists: a recall threshold (≥0.50) instead of exact match, because an agent that finds novel markers not in the canonical set might also be correct. Leiden clustering is stochastic across random seeds. UMAP coordinates are arbitrary across library versions. if you don't know account for that in your design, your eval will flip pass/fail on two correct answers. this is what "verifiable" actually requires in biology. not just deterministic grading, but deeply reasoned tolerances that encode what variation is methodological and what variation is wrong.
Like Comment Share
LatchBio reposted this
Alfredo Andere 🦖
4d Edited
Report this post
people underestimate how much of biology will be computational biology. not next year, but in the next 10 years most of biology will be in silico. in the 70s and 80s, physics went through a simulation revolution. systems that were well-understood analytically - fluid dynamics, structural mechanics - got compressed into computational models. the math was clean enough that you could write equations, discretize them, and simulate forward. it changed everything about how those fields operate. biology never got there. biological systems are too messy, too nonlinear, too context-dependent. you can't write down the equations governing cell fate decisions the way you can write navier-stokes. there was no clean mathematical framework to discretize. AI changes this. not because it solves the equations - there are no equations. but because it can learn compressed representations of non-perfect systems directly from data. it doesn't need a closed-form model. it learns the function from observations. We're going from "biology is too complex to simulate" to "biology is too complex to simulate analytically, but we can learn compressions from enough data and use those as the simulator." we're early. the models aren't reliable yet. but its all rapidly coming into place. and 5 years from now, the idea that a biologist would design an experiment without a computational model predicting the end to end outcome will seem as strange as an aerospace engineer skipping CFD.
12 Comments

Like Comment Share
LatchBio reposted this
Kassem Ezzeddine
1w
Report this post
SWE-bench changed how people measured coding ability. biology needs the same shift. the part i'd add: you can't build the biology version from the outside. we see what breaks because we power 30+ bioinformatics platforms for companies like AtlasXomics Inc. and CS Genetics. the QC threshold that silently fails on a specific kit, the cell typing convention that changes by tissue — you only write that into a benchmark if you've seen it happen in production. that's why we built benchmarks.bio from inside real workflows, not from papers.
Alfredo Andere 🦖

Co-Founder and CEO at LatchBio — The Cloud for Biology | F. 30U30
1w Edited

software engineering went through this exact transition. coding benchmarks tested multiple choice: function completion, syntax tricks, isolated puzzles. models got very good at those. then SWE-bench asked: can you fix a bug in a real repository? scores dropped. everyone worried. the problems that mattered in practice weren't the problems anyone had been measuring. biology is at that inflection point now. most biology evaluations still reward textbook recall. clean inputs. one right answer. but real computational biology is messy datasets from specific machines with specific artifacts. QC decisions that change by tissue type. marker genes that mean different things across platforms. a bioinformatician doesn't pick from four options - they load data, write code, make judgment calls, and produce results over many days that drive months of downstream work. if your benchmark doesn't recreate that, you're measuring something, but it's not comp bio competence. the people building benchmarks will need to be in the ground floor next to the people doing the work. not reading about data flows - sitting inside them. knowing that a QC threshold that works on Chromium will silently produce garbage on a MissionBio Tapestri run. Only then will the benchmarks be a true measure of how useful these models are to the people doing the real work. SWE has made a lot of progress in this transition, and i'm confident that with the right people behind it comp bio will too. (Pictured: if you're interested in good benchmark design for comp bio agents my co-founder and CTO Kenny Workman will be discussing some of our work at an in-person talk at UC Berkeley this coming Tuesday)
Like Comment Share
LatchBio reposted this
Alfredo Andere 🦖
1w Edited
Report this post
software engineering went through this exact transition. coding benchmarks tested multiple choice: function completion, syntax tricks, isolated puzzles. models got very good at those. then SWE-bench asked: can you fix a bug in a real repository? scores dropped. everyone worried. the problems that mattered in practice weren't the problems anyone had been measuring. biology is at that inflection point now. most biology evaluations still reward textbook recall. clean inputs. one right answer. but real computational biology is messy datasets from specific machines with specific artifacts. QC decisions that change by tissue type. marker genes that mean different things across platforms. a bioinformatician doesn't pick from four options - they load data, write code, make judgment calls, and produce results over many days that drive months of downstream work. if your benchmark doesn't recreate that, you're measuring something, but it's not comp bio competence. the people building benchmarks will need to be in the ground floor next to the people doing the work. not reading about data flows - sitting inside them. knowing that a QC threshold that works on Chromium will silently produce garbage on a MissionBio Tapestri run. Only then will the benchmarks be a true measure of how useful these models are to the people doing the real work. SWE has made a lot of progress in this transition, and i'm confident that with the right people behind it comp bio will too. (Pictured: if you're interested in good benchmark design for comp bio agents my co-founder and CTO Kenny Workman will be discussing some of our work at an in-person talk at UC Berkeley this coming Tuesday)
Like Comment Share
LatchBio reposted this
Alfredo Andere 🦖
1w Edited
Report this post
agent (dot) bio refuses to do most types of data analyses. By design: Most agents are designed to attempt everything. any dataset, any question, any assay type. the demo looks great. the results don't hold up. agents confidently return wrong biology - wrong cell types, wrong genes, wrong conclusions - and nothing in the system can flag it. So we made a design choice. narrow scope. deep competence. our agent handles single-cell and spatial transcriptomics workflows - specific platforms, assay types, and analysis steps. everything else gets refused. Biology doesn't grade on effort. a wrong cell type annotation wastes months of downstream experiments. a bad differential expression call sends a team chasing genes that aren't relevant. in this field, a confident wrong answer is more expensive than no answer. So when the exact same model (Opus 4.5) wrapped in a domain-specific harness almost doubles accuracy (38% to 62%), what you choose not to attempt matters most. (pictured: a recent viral example of chatGPT being very confident but wrong)
4 Comments

Like Comment Share
LatchBio reposted this
Kenny Workman
1w
Report this post
Speaking in-person at Berkeley next Tuesday in Cory Hall. Will discuss internals of SpatialBench: - Our approach to verifiability with messy real world biology data - Building data infrastructure for hundreds of parallel agent environments - Scientific behavior of frontier models analyzing thousands of trajectories Welcome all engineers and scientists interested in practical details of large scale benchmarks for agents in biology. Thanks to Arshia Nayebnazar Aakarsh Vermani Sarrah Rose and Machine Learning at Berkeley for having me.

2 Comments

Like Comment Share
LatchBio reposted this
Alfredo Andere 🦖
1w
Report this post
Three benchmarks now exist for AI agents in computational biology: BioAgent Bench, BixBench, and our BenchmarksBio. All trying to answer whether agents can do real computational biology. BioAgent Bench (Entropic) tests pipeline execution. 10 end-to-end bioinformatics workflows - variant calling, metagenomics, differential expression. Can the agent install tools, process data, and produce the requested output? Top model Opus 4.5 hits 100% completion. But inject a decoy file from an unrelated organism and the agent incorporates it in 2 of 10 tasks. Add filler text to the prompt and agents complete 28% fewer of the necessary pipeline steps. Scoring uses an LLM judge - because finishing the pipeline is not the same as understanding the data, but it makes grading isn't fully reproducible. BixBench (FutureHouse) tests scientific reasoning. 53 real scenarios built by bioinformaticians. The agent gets data and questions (296), picks its own methods, runs analysis in a Jupyter notebook. Best open-answer accuracy at time of evaluation is 17% by Sonnet 3.5. They grade open answers with an LLM judge + some numeric/range verifiers. When an "I don't know" option is added, models perform close to random. scBench + SpatialBench (benchmarks.bio) tests step-level accuracy. 540 problems from real workflows across 11 single-cell and spatial platforms. Deterministic grading - no LLM scoring - make it fully reproducible. Best model is Opus 4.6 with 52.8% on scRNA-seq and Opus 4.5 with 38.4% on spatial. Easier tasks like normalization are approaching reliability (best model ~84%) while even the best model only reaches 48% on cell typing and 41% on differential expression - the stages where scientific judgment matters most. What everyone found: Agents can orchestrate tools but can't yet be trusted with scientific conclusions. Performance depends heavily on what technology the data came from, not just which model you use. And open-weight models lag behind closed ones - which matters when your data can't leave the building. BixBench: https://lnkd.in/dxhac6Wy BioAgent Bench: https://lnkd.in/dh2s2CBA Benchmarks.bio: https://lnkd.in/dAHmKFqK + https://lnkd.in/dWCvM6EE
3 Comments

Like Comment Share
LatchBio reposted this
Kassem Ezzeddine
1w Edited
Report this post
It's not "which model is best." It's what's wrapped around the model. We ran the exact same model — Opus 4.5 — on 146 real spatial biology analysis problems under three different agent harnesses. Same data. Same problems. Same model. The accuracy swing was massive. • Base harness: 38% • Claude Code: 48% • LatchBio agent: 62% That's a 23-point gap from engineering alone. For context, the gap between the best and worst frontier model on SpatialBench is 18 points. The scaffolding around the model moved the needle as much as swapping the model entirely. Where it matters most: the hard stuff. Clustering went from 33% to 66%. Differential expression from 37% to 64%. The tasks that require real scientific judgment — not just writing code — are exactly where harness design has the biggest effect. What is a harness? The tools, prompts, control flow, and execution environment that wrap a base model. Most people treat this as glue code. Our data says it can matter as much as the model itself. This doesn't mean models don't matter. They do. But progress requires joint optimization of both model and harness. Our data suggests harnesses should be treated as first-class objects in engineering and benchmarking — reported as rigorously as model versions. If you're evaluating AI agents for computational biology and only comparing base models, you're measuring the wrong thing.
3 Comments

Like Comment Share
LatchBio reposted this
Kassem Ezzeddine
2w
Report this post
You thought we'd stop at Spatial? Here's something wild: the sequencing platform you choose affects AI accuracy more than the model itself. We just released scBench - 394 real scRNA-seq analysis problems across 6 platforms(BD, 10x Genomics, CS Genetics, Illumina, Mission Bio, Parse Biosciences). The accuracy gap between platforms? 33 points. The gap between best and worst frontier models? 24 points. Translation: models basically memorized Scanpy tutorials. Put them on an underrepresented platform and they collapse. This matters because scRNA-seq is the dominant assay in modern biology. Way more adoption than spatial, way more public data. If we want agents that can actually do computational biology, this is the test. Current state: Opus 4.6 hits 53% accuracy. Better than spatial (38%), but still failing every other routine task. Procedural stuff like normalization? Getting there (70%). But judgment calls - cell typing (35%), differential expression (27%) - that's where models break down. Turns out writing code ≠ doing science.
7 Comments

Like Comment Share

Browse jobs

Funding

LatchBio 4 total rounds

Last Round

Series A Jul 2, 2022

US$ 28.0M

Investors

Lux Capital Coatue + 4 Other investors

See more info on crunchbase

LatchBio

Biotechnology Research

San Francisco, CA 8,579 followers

Data Infrastructure for Biology

About us

Locations

Employees at LatchBio

Soo Hee Lee

Zhen Yang

Nathan Manske

Zachary Hemminger

Updates

Join now to see what you are missing

Similar pages

Asimov

Zeon Systems (YC X25)

64x Bio

Seqera

Pylon

Serotiny

DNAnexus

Benchling

Twist Bioscience

Pythia BioSciences

Browse jobs

Engineer jobs

Bioinformatician jobs

Scientist jobs

Intern jobs

Full Stack Engineer jobs

Medical Engineer jobs

Mechanical Design Engineer jobs

Computational Biologist jobs

Machine Learning Engineer jobs

Research Intern jobs

Developer jobs

Associate jobs

Analyst jobs

Summer Intern jobs

Software Engineer jobs

Software Engineer Intern jobs

Business Development Associate jobs

Production Associate jobs

PHP Developer jobs

Staff Software Engineer jobs

Funding