Skip to main content

What it tests

Ring 8 answers: Does behavior remain correct across updates, drift, and retrains? Voice agents degrade over time. Model updates change behavior. Prompt tweaks have unintended side effects. Provider changes introduce subtle differences. Ring 8 is the regression safety net — it re-runs tests to catch drift before it reaches production.

Prerequisites

None.

How it works

Ring 8 enables continuous evaluation:
  1. Baseline capture: After your agent passes Rings 0-7, those results become the behavioral baseline
  2. Re-run on change: When you update your agent (new model, prompt changes, provider switch), Ring 8 re-runs the same scenarios
  3. Diff analysis: Results are compared against the baseline to detect behavioral changes — both regressions (things that broke) and improvements

When to use it

  • After updating the underlying LLM model
  • After changing prompts or system instructions
  • After switching voice providers (STT/TTS)
  • After modifying call flows or policies
  • On a regular schedule to catch gradual drift

Agent versions

Ring 8 works hand-in-hand with Agent Versions. When you fork a new version, you can run Ring 8 to compare the new version’s behavior against the previous one. This gives you confidence that changes are intentional, not accidental.