Overview
Prompt Injection Protection guardrail analyzes user input for attempts to manipulate the AI’s behavior or bypass safety constraints.What It Detects
- Jailbreak Attempts: Efforts to bypass AI safety constraints
- System Prompt Overrides: Attempts to change core AI instructions
- Role-playing Exploits: Manipulating AI to assume unauthorized personas
- Instruction Injection: Embedding malicious commands within user input
Configuration Options
Block Message
Customize the message shown to users when their input is blocked for containing prompt injection attempts. Default: “Content blocked due to prompt injection”Use Cases
- Content Safety: Prevent users from bypassing content policies
- System Security: Protect against attempts to modify AI behavior
- Brand Protection: Ensure AI responses align with organizational values
- Compliance: Meet safety requirements for AI applications
Best Practices
- This guardrail should be evaluated on reasonable context. So consider including most recent conversations along with latest one for evaluation to effectively deter against multi turn attacks.
- Monitor blocked attempts to understand attack patterns
- Regularly review and update protection sensitivity
- Consider user experience when setting block messages
- Combine with other guardrails for comprehensive protection
Next Steps: Enhance protection with Sensitive Data detection or configure Persona Integrity controls.