AI safety

Lexey includes built-in safety guardrails that protect both your business and your customers.

Input filtering

Every customer message is screened before reaching the support agent. The input filter blocks:

Prompt injection — Attempts to override the agent's instructions.
System prompt extraction — Attempts to get the agent to reveal its instructions.
Abusive content — Offensive or harmful language.
Adversarial inputs — Crafted inputs designed to manipulate the agent.

Blocked messages receive a safe, professional refusal.

Output filtering

Every assistant response is screened after generation. The output filter flags:

System prompt leakage — Responses that reveal internal instructions.
Hallucinated policies — Made-up policies or information not in the knowledge base.
Inappropriate content — Responses that don't meet content standards.
Instruction compliance — Responses that indicate a successful prompt injection.

Web content safety

Content fetched via the management chat's URL import feature is screened by an AI classifier (URL assessment + content classification) before reaching the configuration agent or being persisted to the database.