Back to blog
ai-directorincidentvertex-aireliability

When the AI Director Goes Silent: Debugging a Live Executor Cascade

A post-mortem of a production incident where parse failures, rate limits, and connection timeouts cascaded to take the AI Director offline during a live broadcast.

·Sim RaceCenter Team·3 min read
On this page

During a recent live broadcast, the AI Director — the system that autonomously selects camera angles every 10–30 seconds — went silent for nearly 10 minutes. No sequences were generated. The broadcast fell back to a static camera while we diagnosed the issue.

This is the story of what happened, why it cascaded, and what we learned.

The Setup

The AI Director's Executor runs on every sequence request. It calls Gemini 2.5 Flash via Vertex AI, passing in the current race state and template library. The model picks a template, fills in the variables, and returns a JSON decision. The whole round-trip typically takes under 20 seconds.

The Executor prompt explicitly instructs the model:

Return ONLY the JSON object. No markdown fencing, no explanation.

The parseExecutorResponse function expects clean JSON, with a fallback to strip ```json fences:

function parseExecutorResponse(raw: string): ExecutorDecision | null {
  let cleaned = raw.trim();
  if (cleaned.startsWith('```')) {
    cleaned = cleaned
      .replace(/^```(?:json)?\n?/, '')
      .replace(/\n?```$/, '');
  }
 
  try {
    const parsed = JSON.parse(cleaned);
    if (
      typeof parsed.templateIndex !== 'number' ||
      !parsed.variables ||
      typeof parsed.variables !== 'object' ||
      typeof parsed.durationMs !== 'number'
    ) {
      return null;
    }
    return parsed as ExecutorDecision;
  } catch {
    return null;
  }
}

This works for the two expected formats:

  • Raw JSON: {"templateIndex": 2, "variables": {...}, "durationMs": 30000}
  • Fenced JSON: ```json\n{...}\n```

What Went Wrong

Three things failed in sequence, each making the next worse.

1. Parse Failures

The model started returning responses with preamble text before the JSON fence — something like:

Based on the current race state, I'll select the battle template.

```json
{"templateIndex": 2, "variables": {"targetDriver": "5"}, "durationMs": 30000}
```​

The regex only handles the case where ``` bookends the entire string. With preamble text, cleaned.startsWith('```') is false, and JSON.parse fails on the raw string. The response was 108–133 characters — a perfectly valid JSON payload buried in a few lines of explanation the model added despite being told not to.

Four calls failed this way in a 10-minute window.

2. Rate Limiting (429)

At 13:20, a 429 RESOURCE_EXHAUSTED error came back from Vertex AI. At our observed call rate of ~85 calls/hour, bursts can exceed the per-minute RPM quota for gemini-2.5-flash. One 429 isn't catastrophic — but we had no retry-with-backoff logic. The failed request simply returned null, and the Director immediately requested another sequence, adding more pressure.

3. Connection Timeouts

After the 429, subsequent requests started hitting 5-minute TCP timeouts — the request reached Google but never got a response. Each timeout took exactly ~303 seconds before failing. With no timeout override and no circuit breaker, each dead request occupied a connection for 5 minutes.

The cascade: 429 → immediate retry → timeout → another retry → another timeout. Within 10 minutes, the system was effectively dead.

The Timeline

WindowCallsSuccessParse FailTimeoutAvg LatencyNotes
13:10–13:2012111019sHealthy, 1 parse error
13:20–13:3025203135s429 hit, latency climbing
13:30–13:405203188sCascading timeouts

The pattern is clear in retrospect: one rate limit event at 13:20 preceded a cascade of connection timeouts that degraded the system from 19-second average latency to effectively offline.

The Fixes

More Robust Parsing

The parser now extracts JSON from anywhere in the response, not just when the string starts with a fence:

// Try raw JSON first
try {
  return JSON.parse(cleaned);
} catch {
  // Extract JSON from fenced or wrapped response
  const jsonMatch = cleaned.match(/\{[\s\S]*\}/);
  if (jsonMatch) {
    return JSON.parse(jsonMatch[0]);
  }
  return null;
}

This handles preamble text, trailing explanations, and any other wrapping the model decides to add. The model's instruction compliance is a hope, not a guarantee — the parser must be defensive.

Retry with Exponential Backoff

On 429 or 5xx responses, the Executor now retries up to 3 times with exponential backoff (1s, 2s, 4s). This absorbs transient rate limits without piling up requests.

Request Timeout

A 30-second timeout on the Vertex AI call prevents 5-minute TCP hangs. If the model doesn't respond in 30 seconds, we fail fast and either retry or fall back to the last known good sequence.

Circuit Breaker

After 3 consecutive failures, the Executor enters a cooldown period (60 seconds) where it returns cached sequences instead of making new API calls. This prevents the cascade pattern entirely — one bad minute doesn't turn into ten.

Lessons Learned

LLM output is non-deterministic at every level. The same prompt that produced clean JSON for 10,000 calls will occasionally produce wrapped, fenced, or annotated output. Parse defensively.

Rate limits cascade. A single 429 without backoff creates a retry storm that amplifies the original problem. Every external API call needs backoff and a circuit breaker.

TCP timeouts are silent killers. A 5-minute timeout looks like a hang, not an error. Set explicit request timeouts well below the TCP default.

The AI Director needs a degraded mode. When the Executor can't reach the model, the system should gracefully fall back to cached or default sequences rather than going silent. No sequence is worse than a stale sequence.