AI safety
Lexey includes built-in safety guardrails that protect both your business and your customers.
Input filtering
Every customer message is screened before reaching the support agent. The input filter blocks:
- Prompt injection — Attempts to override the agent's instructions.
- System prompt extraction — Attempts to get the agent to reveal its instructions.
- Abusive content — Offensive or harmful language.
- Adversarial inputs — Crafted inputs designed to manipulate the agent.
Blocked messages receive a safe, professional refusal.
Output filtering
Every assistant response is screened after generation. The output filter flags:
- System prompt leakage — Responses that reveal internal instructions.
- Hallucinated policies — Made-up policies or information not in the knowledge base.
- Inappropriate content — Responses that don't meet content standards.
- Instruction compliance — Responses that indicate a successful prompt injection.
Web content safety
Content fetched via the management chat's URL import feature is screened by an AI classifier (URL assessment + content classification) before reaching the configuration agent or being persisted to the database.