Agents

Ethical Check

Ethical Check is a post-generation validation step that evaluates agent output against configurable guidelines before returning it to the user. Outputs that fail are either blocked, flagged, or rewritten depending on configuration.

How it works

Ethical Check runs as a post-inference hook in the response pipeline:

  1. Post-inference hook — After the agent produces a response but before it is returned to the caller, the Ethical Check hook intercepts the output
  2. Guideline evaluation — The output is evaluated against each active guideline in order. Evaluation short-circuits on the first critical violation
  3. Three modes — Depending on the configured mode, a failing output is either blocked (the request errors with a 451 status), flagged (the response is returned with an X-Ethical-Flag header and the violation is logged), or rewritten (the agent is re-prompted with a constraint message and the corrected output is returned transparently)

Configuration

yaml
ethicalCheck:
  enabled: true
  mode: flag                 # block | flag | rewrite
  guidelines:
    - no_harmful_instructions
    - no_pii_leakage
    - no_hallucinated_citations
  customRules: ./rules/ethical-rules.yaml   # Path to additional rule definitions

Default guidelines

GuidelineDescriptionSeverity
no_harmful_instructionsBlocks instructions that could facilitate physical harm, illegal activity, or targeted harassmentCritical
no_pii_leakageDetects and redacts personally identifiable information (names, emails, phone numbers, SSNs) that was not present in the user's own inputHigh
no_hallucinated_citationsFlags responses that cite specific papers, URLs, or statistics that cannot be verified against grounded sourcesMedium

Adding custom rules

Custom rules are defined in a separate YAML file referenced by customRules. Each rule specifies a name, a plain-language description used to guide evaluation, and the severity level:

yaml
rules:
  - name: no_competitor_endorsement
    description: >
      Do not recommend or praise competitor products by name.
      Neutral factual comparisons are permitted.
    severity: medium

  - name: no_medical_diagnosis
    description: >
      Do not provide a specific medical diagnosis or prescribe medication.
      Encourage consulting a qualified healthcare provider.
    severity: high

Audit log

Every evaluation — pass or fail — is written to the Ethical Check audit log. Each entry records the agent ID, session ID, guideline that was evaluated, outcome, and a truncated excerpt of the flagged content. Logs are queryable via:

bash
# Retrieve audit log entries for a specific agent
GET /agents/:id/ethical-check/log

# Filter by outcome
GET /agents/:id/ethical-check/log?outcome=blocked
GET /agents/:id/ethical-check/log?outcome=flagged
Use mode: rewrite with caution. Rewriting changes the content the user sees without explicit disclosure. Reserve it for low-severity cosmetic corrections; for high-severity violations, prefer block to maintain transparency.