Skip to main content

Overview

Prompt Injection Protection guardrail analyzes user input for attempts to manipulate the AI’s behavior or bypass safety constraints.

What It Detects

  • Jailbreak Attempts: Efforts to bypass AI safety constraints
  • System Prompt Overrides: Attempts to change core AI instructions
  • Role-playing Exploits: Manipulating AI to assume unauthorized personas
  • Instruction Injection: Embedding malicious commands within user input

Configuration Options

Block Message

Customize the message shown to users when their input is blocked for containing prompt injection attempts. Default: “Content blocked due to prompt injection”

Use Cases

  • Content Safety: Prevent users from bypassing content policies
  • System Security: Protect against attempts to modify AI behavior
  • Brand Protection: Ensure AI responses align with organizational values
  • Compliance: Meet safety requirements for AI applications

Best Practices

  • This guardrail should be evaluated on reasonable context. So consider including most recent conversations along with latest one for evaluation to effectively deter against multi turn attacks.
  • Monitor blocked attempts to understand attack patterns
  • Regularly review and update protection sensitivity
  • Consider user experience when setting block messages
  • Combine with other guardrails for comprehensive protection

Next Steps: Enhance protection with Sensitive Data detection or configure Persona Integrity controls.
I