Ring 8 -- Stays Good Over Time

What it tests

Ring 8 answers: Does behavior remain correct across updates, drift, and retrains? Voice agents degrade over time. Model updates change behavior. Prompt tweaks have unintended side effects. Provider changes introduce subtle differences. Ring 8 is the regression safety net - it re-runs tests to catch drift before it reaches production.

Prerequisites

None.

How it works

Ring 8 enables continuous evaluation:

Baseline capture: After your agent passes Rings 0-7, those results become the behavioral baseline
Re-run on change: When you update your agent (new model, prompt changes, provider switch), Ring 8 re-runs the same scenarios
Diff analysis: Results are compared against the baseline to detect behavioral changes - both regressions (things that broke) and improvements

When to use it

After updating the underlying LLM model
After changing prompts or system instructions
After switching voice providers (STT/TTS)
After modifying call flows or policies
On a regular schedule to catch gradual drift

Agent versions

Ring 8 works hand-in-hand with Agent Versions. When you fork a new version, you can run Ring 8 to compare the new version’s behavior against the previous one. This gives you confidence that changes are intentional, not accidental.

Ring 7 -- Handles Chaos Generating Scenarios

⌘I

Documentation Index

​What it tests

​Prerequisites

​How it works

​When to use it

​Agent versions

What it tests

Prerequisites

How it works

When to use it

Agent versions