Benchmark

👋 Sign in for the ability to sort posts by relevant, latest, or top.

NY-squared2-agents

Apr 8

I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results

#ai #security #llm #benchmark

2 min read

João André Gomes Marques

Apr 7

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

#machinelearning #llm #performance #benchmark

4 min read

Josh Waldrep

Apr 7

I published my benchmark scores. Your turn.

#security #ai #opensource #benchmark

4 min read

DevOnBike

Apr 5

🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#

#dotnet #performance #ai #benchmark

3 min read

Max Quimby

Mar 29

ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

#ai #machinelearning #agents #benchmark

3 min read

ThomasP

Mar 28

GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

#ai #llm #benchmark #agents

8 min read

Mitul Shah for Ferro Labs AI

Mar 26

AI Gateways Are Not I/O-Bound Proxies I Benchmarked 5 of Them to Prove It

#ai #go #python #benchmark

9 min read

plasmon

Mar 25

I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline

#llm #gpu #benchmark #ai

8 min read

Tom Lee

Mar 25

FTS vs Hybrid Memory Search: A Real-World Benchmark

#ai #benchmark #search #agents

4 min read

delimitter

Apr 4

Token Efficiency: 16 Algorithms, 5 Languages, Zero Guesswork

#benchmark #llm #programming #rust

4 min read

Wu Long

Mar 21

I Built an Auto-Updating Archive of Every AI Arena Leaderboard

#ai #llm #benchmark #opensource

2 min read

MrJHSN

Mar 19

DGX Spark Inference Performance: Local LLM vs Cloud Benchmarks (2026)

#dgx #llm #inference #benchmark

5 min read

plasmon

Mar 22

Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp

#llm #gpu #benchmark #ai

7 min read

OpenMark

Mar 15

Benchmarking the Model Is the Wrong Abstraction

#ai #llm #benchmark #devtools

4 min read

lwgena for TinyAlg

Mar 19

2.78 TFLOPS on a Fanless MacBook Air? Benchmarking Apple's M4 with MLX

#benchmark #python #mlx #machinelearning

4 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# benchmark

I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

I published my benchmark scores. Your turn.

🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#

ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

AI Gateways Are Not I/O-Bound Proxies I Benchmarked 5 of Them to Prove It

I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline

FTS vs Hybrid Memory Search: A Real-World Benchmark

Token Efficiency: 16 Algorithms, 5 Languages, Zero Guesswork

I Built an Auto-Updating Archive of Every AI Arena Leaderboard

DGX Spark Inference Performance: Local LLM vs Cloud Benchmarks (2026)

Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp

Benchmarking the Model Is the Wrong Abstraction

2.78 TFLOPS on a Fanless MacBook Air? Benchmarking Apple's M4 with MLX