<vetted />
LLM Fundamentals
Term 62 of 68

Transformer Architecture

The neural network architecture that powers modern language models.

Full Definition3 paragraphs

The Transformer is a neural network architecture introduced in the 2017 paper "Attention Is All You Need" that has become the foundation of modern language models. Its key innovation is the self-attention mechanism, which allows the model to weigh the importance of different parts of the input when processing each position.

Transformers replaced earlier recurrent architectures (RNNs, LSTMs) because they can process entire sequences in parallel, enabling much more efficient training on large datasets. The architecture consists of encoder and decoder stacks, though many LLMs use decoder-only variants (like GPT) or encoder-only variants (like BERT) depending on their purpose.

Understanding transformer basics helps AI engineers make better decisions about model selection, understand performance characteristics, and troubleshoot issues. Key concepts include attention heads, layer normalization, positional encodings, and the quadratic scaling of attention with sequence length (which limits context windows).

Key Concept

The neural network architecture that powers modern language models.

Apply your knowledge

Master AI Development

Join our network of elite AI-native engineers.