Absolutely humongous model. Mixture of 256 experts with 8 activated each time.
Aider leaderboard:
The only model above 馃悑 v3 here is OpenAI o1. DeepSeek is known to make amazing models and Aider rotates their benchmark over time, so it is unlikely that this is a train-on-benchmark situation.
Some more benchmarks: on Reddit.
You must log in or # to comment.
deleted by creator
Someone managed to run it on a cluster of Mac Minis lol https://blog.exolabs.net/day-2/
deleted by creator