Transformers

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Bharath Kadaluri

Apr 8

TurboQuant: How a Simple Spin Saves Gigabytes of GPU Memory

#turboquant #attention #transformers #llm

6 min read

Guatu

Apr 7

Attention Residuals: How Kimi Is Rethinking Transformer Depth

#ai #transformers #llmarchitecture #attention

3 min read

Simon Paxton

Apr 2

RBF Attention Reveals Dot‑Product's Hidden Norm Bias

#transformers #attentionmechanisms #airesearch #aihardware

8 min read

Tom Lee

Mar 20

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

#ai #agents #memory #transformers

4 min read

Valeria Solovyova

Mar 5

Anonymous User Claims Proof of d^2 Complexity for Attention Mechanisms, Challenging Transformer Optimization

#transformers #attention #optimization #complexity

10 min read

Valeria Solovyova

Mar 1

Advancing Tiny Transformers: Achieving 100% Accuracy in 10-Digit Addition with Sub-100 Parameter Models Using Digit Tokenization

#ai #transformers #efficiency #tokenization

16 min read

Alan West

Mar 21

Standard Transformer Attention vs. Attention-Residuals: A Practical Comparison

#transformers #deeplearning #attentionmechanism #pytorch

5 min read

seah-js

Mar 7

Attention Is All You Need — Full Paper Breakdown

#ai #transformers #deeplearning #machinelearning

4 min read

Mariano Gobea Alcoba

Feb 25

Transformers: Revolutionizing Natural Language Processing!

#transformers #nlp #attentionmechanism #huggingface

2 min read

Sreekar Reddy

Jan 14

👀 Attention Explained Like You're 5

#eli5 #ai #transformers #tutorial

1 min read

Yuvaraj

Feb 15

What are Transformers, Why do they Dominate the AI World?

#ai #transformers #machinelearning

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# transformers

TurboQuant: How a Simple Spin Saves Gigabytes of GPU Memory

Attention Residuals: How Kimi Is Rethinking Transformer Depth

RBF Attention Reveals Dot‑Product's Hidden Norm Bias

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

Anonymous User Claims Proof of d^2 Complexity for Attention Mechanisms, Challenging Transformer Optimization

Advancing Tiny Transformers: Achieving 100% Accuracy in 10-Digit Addition with Sub-100 Parameter Models Using Digit Tokenization

Standard Transformer Attention vs. Attention-Residuals: A Practical Comparison

Attention Is All You Need — Full Paper Breakdown

Transformers: Revolutionizing Natural Language Processing!

👀 Attention Explained Like You're 5

What are Transformers, Why do they Dominate the AI World?