Glossary/LLM Fundamentals

LLM Fundamentals

Term 29 of 68

Inference

The process of running a trained AI model to generate predictions or outputs.

Browse All Terms

Full Definition3 paragraphs

Inference is the process of running a trained AI model to generate predictions, completions, or other outputs from input data. When you call an LLM API, you're performing inference. This contrasts with training, which is the process of creating or updating the model's parameters.

For language models, inference involves: tokenizing input text, processing through the model's layers, sampling from probability distributions to generate tokens, and continuing until completion. Inference costs depend on model size, input/output length, and computational resources.

Understanding inference is important for AI engineers because: API costs are based on inference (tokens processed), latency depends on inference speed, self-hosted models require inference infrastructure, and optimization often focuses on making inference faster or cheaper. Factors like quantization, batching, and caching all relate to inference efficiency.

Key Concept

The process of running a trained AI model to generate predictions or outputs.

Related Terms

LLM (Large Language Model)Latency Tokens

More in LLM Fundamentals

Context Window Fine-tuning LLM (Large Language Model)Multimodal AI

View all LLM Fundamentals

3Related

3Sections

Explore More

Agentic Workflow

AI systems that autonomously plan and execute multi-step tasks.

AI Agents

Autonomous AI systems that can plan, execute tasks, and use tools to achieve goals.

AI-Native Engineer

Engineers who leverage AI tools as a core part of their development workflow.

Tools & Platforms

Anthropic

The AI safety company that develops Claude and constitutional AI techniques.

API Rate Limiting

Restrictions on how frequently you can call an AI API within a time period.

Tools & Platforms

AWS Bedrock

Amazon's managed service for accessing foundation models from multiple providers.

Tools & Platforms

Azure OpenAI Service

Microsoft's enterprise offering of OpenAI models with Azure integration.

Caching

Storing AI responses to reduce costs and latency for repeated queries.

Apply your knowledge

Master AI Development

Join our network of elite AI-native engineers.

Apply Now Browse Jobs