<vetted />
LLM Fundamentals
Term 29 of 68

Inference

The process of running a trained AI model to generate predictions or outputs.

Full Definition3 paragraphs

Inference is the process of running a trained AI model to generate predictions, completions, or other outputs from input data. When you call an LLM API, you're performing inference. This contrasts with training, which is the process of creating or updating the model's parameters.

For language models, inference involves: tokenizing input text, processing through the model's layers, sampling from probability distributions to generate tokens, and continuing until completion. Inference costs depend on model size, input/output length, and computational resources.

Understanding inference is important for AI engineers because: API costs are based on inference (tokens processed), latency depends on inference speed, self-hosted models require inference infrastructure, and optimization often focuses on making inference faster or cheaper. Factors like quantization, batching, and caching all relate to inference efficiency.

Key Concept

The process of running a trained AI model to generate predictions or outputs.

Apply your knowledge

Master AI Development

Join our network of elite AI-native engineers.