<vetted />
LLM Fundamentals
Term 60 of 68

Tokens

The basic units of text that language models process, typically word fragments.

Full Definition3 paragraphs

Tokens are the fundamental units that language models use to process and generate text. Rather than working with individual characters or complete words, most LLMs use subword tokenization, which breaks text into meaningful chunks that balance vocabulary size with representation efficiency.

Common tokenization schemes include BPE (Byte Pair Encoding) and SentencePiece. A rough approximation is that 1 token equals about 4 characters or 0.75 words in English, though this varies by language and content type. Code often tokenizes less efficiently than prose.

Understanding tokens is essential for AI engineers because: (1) API pricing is typically per-token, (2) context windows are measured in tokens, (3) generation speed is tokens-per-second, and (4) token boundaries can affect model behavior. Tools like tiktoken help count tokens for specific models, enabling cost estimation and context management in production applications.

Key Concept

The basic units of text that language models process, typically word fragments.

Apply your knowledge

Master AI Development

Join our network of elite AI-native engineers.