AI safety

AI safety

Lexey includes built-in safety guardrails that protect both your business and your customers.

Input filtering

Every customer message is screened before reaching the support agent. The input filter blocks:

  • Prompt injection — Attempts to override the agent's instructions.
  • System prompt extraction — Attempts to get the agent to reveal its instructions.
  • Abusive content — Offensive or harmful language.
  • Adversarial inputs — Crafted inputs designed to manipulate the agent.

Blocked messages receive a safe, professional refusal.

Output filtering

Every assistant response is screened after generation. The output filter flags:

  • System prompt leakage — Responses that reveal internal instructions.
  • Hallucinated policies — Made-up policies or information not in the knowledge base.
  • Inappropriate content — Responses that don't meet content standards.
  • Instruction compliance — Responses that indicate a successful prompt injection.

Web content safety

Content fetched via the management chat's URL import feature is screened by an AI classifier (URL assessment + content classification) before reaching the configuration agent or being persisted to the database.