Blog posts
Guest posts & Talks
-
2026-06-03
[Talk] LLM benchmarks in the time of agents
-
2026-05-16
Artifacts 21: Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.
-
2026-04-10
MirrorCode: Evidence that AI can already do some weeks-long coding tasks
-
2026-03-30
Artifacts 20: New orgs! New types of models! With Nemotron Super, Sarvam, Cohere Transcribe, & others
-
2026-03-03
Artifacts 19: Qwen 3.5, GLM 5, MiniMax 2.5 - Chinese labs' latest push of the frontier
-
2026-02-13
What do "economic value" benchmarks tell us?
-
2026-02-02
Artifacts 18: Arcee's 400B MoE, LiquidAI's underrated 1B model, new Kimi, and anticipation of a busy month
-
2026-01-05
Artifacts 17: NVIDIA, Arcee, Minimax, DeepSeek, Z.ai and others close an eventful year on a high note
-
2025-12-31
17 predictions for AI in 2026
-
2025-12-23
Why benchmarking is hard
-
2025-12-18
Open models: Hot or Not with Nathan Lambert & Florian Brand
-
2025-12-14
2025 Open Models Year in Review
-
2025-11-23
Artifacts 16: Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models
-
2025-10-30
What does OSWorld tell us about AI's ability to use computers?
-
2025-10-18
Artifacts 15: It's Qwen's world and we get to live in it, on CAISI's report, & GPT-OSS update
-
2025-09-11
Artifacts 14: NVIDIA's rise, "Swiss & UAE DeepSeek," and a resurgence of open data
-
2025-08-17
Ranking the Chinese Open Model Builders
-
2025-08-11
Artifacts 13: The abundance era of open models
-
2025-07-22
Artifacts 12: Chinese models continue to dominate throughout the summer 🦦
-
2025-06-26
Artifacts 11: Visualizing China's open models market share, Arcee's models, and VLAs for robotics
-
2025-06-13
What skills does SWE-bench Verified evaluate?
-
2025-05-29
Artifacts 10: New DeepSeek R1 0528!, more permissive licenses, everything as a reasoner, and from artifacts to agents
-
2025-04-21
Artifacts 09: RLHF book draft, where the open reasoning race is going, and unsung heroes of open LM work
-
2025-03-20
Artifacts 08: The return of ~30B models, side effects of OpenAI's proposed DeepSeek ban, and yet another reasoning roundup
-
2025-02-19
Artifacts 07: Alpaca era of reasoning models, China's continued dominance, and tons of multimodal advancements
-
2025-01-27
Artifacts 06: Reasoning models, China's lead in open-source, and a growing multimodal space
Interviews and media mentions