Reflection

Reflection is an optional post-inference self-critique pass. After the agent generates a response, a separate inference call evaluates the response for accuracy, completeness, actionability, and clarity — flagging issues and optionally requesting a revision.

Cost consideration. Reflection doubles the inference calls per turn. It is not enabled by default in the agent loop — the orchestrator or a custom integration must invoke it explicitly. Use it selectively for high-stakes responses.

When to reflect

The shouldReflect heuristic determines whether a turn warrants self-evaluation.

typescript

function shouldReflect(toolResults, responseLength): boolean {
  // Triggers if ANY of these conditions are true:
  return (
    toolResults.some(r => r.error) ||   // any tool errored
    responseLength > 2000           ||   // long response (may be rambling)
    toolResults.length >= 3              // complex multi-tool turn
  )
}

The heuristic targets turns most likely to contain errors: tool failures that may have produced inaccurate data, long responses that may have drifted, and complex multi-tool interactions where synthesis errors are more likely.

Result format

typescript

interface ReflectionResult {
  confidence: number      // 0–1 score across all dimensions
  issues: string[]        // detected problems
  suggestions: string[]   // improvement recommendations
  shouldRevise: boolean   // true only for significant problems:
}                         //   wrong facts, missing critical info, harmful advice

shouldRevise is only set to true for significant problems — wrong facts, missing critical information, or potentially harmful advice. Minor style issues or verbose responses do not trigger revision.

Evaluation dimensions

text

// The reflection pass evaluates four dimensions:
1. Accuracy       — Are facts and claims correct?
2. Completeness   — Does the response fully address the question?
3. Actionability  — Can the user act on the advice given?
4. Clarity        — Is the response well-structured and easy to follow?

// Inference parameters:
temperature: 0.1    // low creativity — we want consistent evaluation
maxTokens: 300      // brief assessment, not a rewrite

The reflection inference uses a very low temperature (0.1) to ensure consistent, conservative evaluations. The 300-token cap keeps assessments brief — this is a quality gate, not a rewrite step.

Failure handling

Reflection errors are silently caught and never block the user's response. If the reflection inference fails, a safe default is returned.

typescript

// Reflection errors never block the main response
try {
  return await reflectOnResponse(userMsg, response, client, sessionId)
} catch {
  return {
    confidence: 0.5,
    issues: ['Reflection failed'],
    suggestions: [],
    shouldRevise: false    // safe default — let the response through
  }
}

How to use reflection

Reflection is available as an explicit call from the orchestrator, custom plugins, or the Debate Protocol. To add it to the standard agent loop, call reflectOnResponse after the loop completes and before sending the response to the user.

If shouldRevise is true, feed the issues back into a second agent turn with revision instructions
If shouldRevise is false, log the confidence score for observability and proceed
The confidence score can be tracked in Agent Metrics for quality monitoring over time