Ethical Check
Ethical Check is a post-generation validation step that evaluates agent output against configurable guidelines before returning it to the user. Outputs that fail are either blocked, flagged, or rewritten depending on configuration.
How it works
Ethical Check runs as a post-inference hook in the response pipeline:
- Post-inference hook — After the agent produces a response but before it is returned to the caller, the Ethical Check hook intercepts the output
- Guideline evaluation — The output is evaluated against each active guideline in order. Evaluation short-circuits on the first critical violation
- Three modes — Depending on the configured
mode, a failing output is either blocked (the request errors with a 451 status), flagged (the response is returned with anX-Ethical-Flagheader and the violation is logged), or rewritten (the agent is re-prompted with a constraint message and the corrected output is returned transparently)
Configuration
yaml
ethicalCheck:
enabled: true
mode: flag # block | flag | rewrite
guidelines:
- no_harmful_instructions
- no_pii_leakage
- no_hallucinated_citations
customRules: ./rules/ethical-rules.yaml # Path to additional rule definitionsDefault guidelines
| Guideline | Description | Severity |
|---|---|---|
no_harmful_instructions | Blocks instructions that could facilitate physical harm, illegal activity, or targeted harassment | Critical |
no_pii_leakage | Detects and redacts personally identifiable information (names, emails, phone numbers, SSNs) that was not present in the user's own input | High |
no_hallucinated_citations | Flags responses that cite specific papers, URLs, or statistics that cannot be verified against grounded sources | Medium |
Adding custom rules
Custom rules are defined in a separate YAML file referenced by customRules. Each rule specifies a name, a plain-language description used to guide evaluation, and the severity level:
yaml
rules:
- name: no_competitor_endorsement
description: >
Do not recommend or praise competitor products by name.
Neutral factual comparisons are permitted.
severity: medium
- name: no_medical_diagnosis
description: >
Do not provide a specific medical diagnosis or prescribe medication.
Encourage consulting a qualified healthcare provider.
severity: highAudit log
Every evaluation — pass or fail — is written to the Ethical Check audit log. Each entry records the agent ID, session ID, guideline that was evaluated, outcome, and a truncated excerpt of the flagged content. Logs are queryable via:
bash
# Retrieve audit log entries for a specific agent
GET /agents/:id/ethical-check/log
# Filter by outcome
GET /agents/:id/ethical-check/log?outcome=blocked
GET /agents/:id/ethical-check/log?outcome=flaggedℹUse
mode: rewrite with caution. Rewriting changes the content the user sees without explicit disclosure. Reserve it for low-severity cosmetic corrections; for high-severity violations, prefer block to maintain transparency.