LLM101n is an educational repository that walks you through building and understanding large language models from first principles. It emphasizes intuition and hands-on implementation, guiding you from tokenization and embeddings to attention, transformer blocks, and sampling. The materials favor compact, readable code and incremental steps, so learners can verify each concept before moving on. You’ll see how data pipelines, batching, masking, and positional encodings fit together to train a small GPT-style model end to end. The repo often complements explanations with runnable notebooks or scripts, encouraging experimentation and modification. By the end, the focus is less on polishing a production system and more on internalizing how LLM components interact to produce coherent text.

Features

  • Step-by-step build of a GPT-style transformer from scratch
  • Clear coverage of tokenization, embeddings, attention, and MLP blocks
  • Runnable code and exercises for experiential learning
  • Demonstrations of batching, masking, and positional encodings
  • Training and sampling loops you can inspect and modify
  • Emphasis on readability and conceptual understanding over framework magic

Project Samples

Project Activity

See All Activity >

Categories

Education

Follow LLM101n

LLM101n Web Site

You Might Also Like
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of LLM101n!

Additional Project Details

Registered

2025-10-15