<vetted />
AI Concepts
Term 48 of 68

Prompt Injection

Attacks that manipulate AI behavior by inserting malicious instructions.

Full Definition3 paragraphs

Prompt Injection is a security vulnerability where malicious inputs manipulate an AI system's behavior by overriding or extending its original instructions. It's analogous to SQL injection but for language models, where user-provided content can be crafted to hijack the model's intended purpose.

Common attack patterns include: direct injection (telling the model to ignore previous instructions), indirect injection (hiding malicious prompts in retrieved documents or external data), and jailbreaking (using creative prompts to bypass safety measures). These attacks can cause data leaks, generate harmful content, or compromise application integrity.

Defense strategies include: input sanitization and validation, separating system prompts from user content, using specialized prompt formats with delimiters, implementing output validation, limiting model capabilities through tool restrictions, monitoring for anomalous behavior, and keeping system prompts confidential. AI engineers must consider prompt injection risks when building any LLM-powered application.

Key Concept

Attacks that manipulate AI behavior by inserting malicious instructions.

Apply your knowledge

Master AI Development

Join our network of elite AI-native engineers.