Prompt Injection Protection

Overview

Prompt Injection Protection guardrail analyzes user input for attempts to manipulate the AI’s behavior or bypass safety constraints.

What It Detects

Jailbreak Attempts: Efforts to bypass AI safety constraints
System Prompt Overrides: Attempts to change core AI instructions
Role-playing Exploits: Manipulating AI to assume unauthorized personas
Instruction Injection: Embedding malicious commands within user input

Configuration Options

Block Message

Customize the message shown to users when their input is blocked for containing prompt injection attempts. Default: “Content blocked due to prompt injection”

Use Cases

Content Safety: Prevent users from bypassing content policies
System Security: Protect against attempts to modify AI behavior
Brand Protection: Ensure AI responses align with organizational values
Compliance: Meet safety requirements for AI applications

Best Practices

This guardrail should be evaluated on reasonable context. So consider including most recent conversations along with latest one for evaluation to effectively deter against multi turn attacks.
Monitor blocked attempts to understand attack patterns
Regularly review and update protection sensitivity
Consider user experience when setting block messages
Combine with other guardrails for comprehensive protection

Next Steps: Enhance protection with Sensitive Data detection or configure Persona Integrity controls.

Get Started

Integration

Explore UI

Use Cases

Overview

What It Detects

Configuration Options

Block Message

Use Cases

Best Practices

Get Started

Integration

Explore UI

Use Cases

​Overview

​What It Detects

​Configuration Options

​Block Message

​Use Cases

​Best Practices

Overview

What It Detects

Configuration Options

Block Message

Use Cases

Best Practices