Skip to main content

What it tests

Ring 7 answers: What breaks when multiple failure modes interact at once? Individual rings test isolated dimensions. But in production, problems compound: a frustrated caller with a heavy accent calls from a noisy street and tries to change their order mid-flow. Ring 7 stacks failure modes from Rings 1-6 to find these compound failures.

Prerequisites

None — but Ring 7 is most valuable after Rings 1-6 have been run.

How it works

Ring 7 is automatically triggered after simulation batches for Rings 1-6 complete for an agent:
  1. The system analyzes results from Rings 1-6 to identify failure patterns
  2. It generates compound scenarios that stack multiple failure modes (e.g., accent + noise + interruption + policy edge case)
  3. These scenarios are automatically simulated

Example compound scenarios

  • Accented speech (Ring 4) + background noise (Ring 6) + frustrated tone (Ring 5)
  • Policy edge case (Ring 2) + prompt injection attempt (Ring 3) + interruption (Ring 5)
  • Off-topic digression (Ring 5) + poor network (Ring 6) + flow branch change (Ring 1)

What it catches

  • Failures that only appear when multiple stressors interact
  • Pipeline cascades where STT errors compound into LLM misinterpretation
  • Edge cases where individually passing conditions combine into failure
  • Performance degradation under compound stress

Why auto-trigger?

Manually composing compound scenarios is combinatorially expensive and requires knowing what to combine. By auto-triggering after Rings 1-6 and using their results, Ring 7 intelligently selects the most likely compound failure modes.