AI / LLMVoice AINode.jsWebSocketsReal-time

SettleVox: AI Voice Legal Intake Agent

Real-time AI voice agent conducting automated legal intake calls — built on WebSockets, Cartesia STT/TTS, Groq LLM, and an 8-phase finite state machine pipeline.

Stack:

Source Code Live Demo

Dashboard Overview — Call Analytics & Metrics

SettleVox is a fully automated voice AI agent designed to replace human intake coordinators at personal injury law firms. When a prospective client calls, SettleVox answers, conducts a structured legal intake interview in natural speech, qualifies or disqualifies the lead in real time, and persists a structured JSON record — all before a human ever picks up the phone.

The Problem

Personal injury law firms receive hundreds of intake calls per week. A significant percentage are from unqualified leads — callers with no third-party liability, no documented injuries, or who refuse consent. Routing these through human coordinators wastes thousands of hours annually.

The challenge: build a voice AI agent that:

Sounds natural — No robotic TTS delays or uncanny valley responses
Handles barge-ins — Callers interrupt constantly; the agent must stop mid-sentence and adapt
Extracts structured data — Names, incident dates, fault assessments must be parsed, not just logged as raw audio
Enforces legal compliance — TCPA consent must be obtained before recording; no workarounds

Architecture Overview

System Architecture

<!-- Professional Soft Shadows -->
<filter id="shadow-card" x="-20%" y="-15%" width="140%" height="135%">
  <feDropShadow dx="0" dy="2" stdDeviation="3" flood-color="#0f172a" flood-opacity="0.05" />
</filter>
<filter id="shadow-group" x="-5%" y="-5%" width="110%" height="110%">
  <feDropShadow dx="0" dy="1" stdDeviation="2" flood-color="#0f172a" flood-opacity="0.03" />
</filter>

<style>
  text { font-family: 'Inter', system-ui, -apple-system, sans-serif; }
  .group-title { font-size: 10px; font-weight: 700; letter-spacing: 0.8px; text-transform: uppercase; fill: #64748b; }
  .node-title { font-size: 12px; font-weight: 600; fill: #0f172a; }
  .node-sub { font-size: 10px; font-weight: 400; fill: #64748b; }
  .label-bg { fill: #ffffff; rx: 3px; }
  .label-text { font-size: 9px; font-weight: 600; fill: #475569; }
  .pill-bg { rx: 4px; }
  .pill-text { font-size: 9px; font-weight: 600; }
  
  /* Subtle, thin connection lines */
  .edge-grey { stroke: #cbd5e1; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; }
  .edge-grey-dash { stroke: #cbd5e1; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; stroke-dasharray: 4 3; }
  .edge-blue { stroke: #93c5fd; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; }
</style>

Telephony SettleVox Core Engine AI Providers React CRM Call Monitoring & Logs Twilio Voice Inbound Stream Twilio Voice Outbound Stream WSS Streamer twilio-stream.ts Audio Buffer audio-buffer.ts

<!-- SettleVox Logo (no background container, original blue color #0562EF) -->
<svg x="185" y="8" width="40" height="40" viewBox="0 0 1864 1863">
  <path d="M497.665 149.447C511.771 148.786 526.964 149.356 541.055 149.859C496.043 200.383 462.572 248.572 457.263 319.487C453.844 365.158 471.309 410.819 497.832 447.801C552.783 524.419 634.814 588.644 706.487 649.296C811.707 738.335 922.947 821.632 1014.62 925.049C1107.2 1029.5 1186.12 1156.14 1178.03 1300.81C1171.92 1410 1110.58 1511.37 1029.97 1582.4C923.187 1676.5 782.037 1721.92 640.544 1713.42C629.024 1712.7 617.061 1711.53 605.522 1710.56C684.337 1621.87 736.349 1520.83 731.153 1399C727.968 1338.24 708.966 1271.67 672.445 1221.41C600.891 1123.4 496.453 1048.97 404.223 971.421C354.563 929.664 306.03 892.416 254.509 853.416C223.66 830.231 193.905 805.625 165.34 779.677C83.0278 704.757 5.6258 605.384 0.343796 489.551C-3.2802 418.307 21.8658 348.594 70.1338 296.068C147.006 212.432 275.465 174.931 384.537 158.454C420.656 152.997 461.112 150.956 497.665 149.447Z" fill="#0562EF"/>
  <path d="M1353.3 148.5H1863.58L1356.41 1017.86L1266.25 1172.7C1219.45 994.677 1134.94 879.733 999.477 756.775C1005.24 744.622 1020.09 721.984 1027.41 709.1L1106.72 572.482L1229.12 362.28C1255.82 316.25 1325.91 194.015 1353.3 148.5Z" fill="#0562EF"/>
</svg>

<text x="205" y="66" class="node-title" text-anchor="middle">Agent Orchestrator (FSM)</text>
<text x="205" y="80" class="node-sub" text-anchor="middle">intake-fsm.ts</text>

<!-- FSM Micro Pills -->
<rect x="25" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="80" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">intent-classifier</text>

<rect x="150" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="205" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">prompt-builder</text>

<rect x="275" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="330" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">turn-manager</text>

<text x="205" y="132" class="node-sub" text-anchor="middle">TypeScript · Drizzle ORM · Event Emitter</text>

Neon Postgres Database & Session Store Cartesia STT Ink-Whisper / Ink-2 Groq API Llama 3.3 70B Cartesia TTS Sonic (Websocket) REST / WS μ-law Audio Stream frames decodes buffered raw audio transcript prompt JSON stream text reply PCM stream μ-law Playback state persist

Key Technical Decisions

The Real-Time Pipeline — No Off-the-Shelf Framework

Most voice AI frameworks (Vapi, Bland) abstract away the pipeline in exchange for control. SettleVox builds every layer from scratch using Node.js WebSockets, EventEmitter, and direct API integrations:

// Orchestrator: the beating heart of the pipeline
export class Orchestrator {
  private stt: SpeechToText;   // Cartesia Ink-2 via WebSocket
  private llm: LLMEngine;      // Groq Llama 3.3 70B, streaming
  private tts: TextToSpeech;   // Cartesia Sonic-3.5 via WebSocket
  private fsm: IntakeFSM;      // 8-phase deterministic state machine

  // Audio comes in from Twilio → decoded → forwarded to Cartesia STT
  public receiveAudio(audioBuffer: Buffer) {
    this.stt.sendAudio(audioBuffer);
  }
}

Three separate persistent WebSocket connections run in parallel for the entire call duration — Cartesia STT, Cartesia TTS, and Twilio Media Streams — all orchestrated through a single EventEmitter graph.

Finite State Machine (IntakeFSM) — 8 Deterministic Phases

The agent's conversational flow is governed by a typed FSM — not a giant prompt. This is what makes SettleVox reliable instead of probabilistic:

export const INTAKE_PHASES: Record<PhaseName, Phase> = {
  GREETING:         { requiredFields: ['consent_given'],              next: 'INCIDENT_DETAILS' },
  INCIDENT_DETAILS: { requiredFields: ['incident_description', 'incident_date', 'incident_location'], next: 'INJURY_ASSESSMENT' },
  INJURY_ASSESSMENT:{ requiredFields: ['injuries_described', 'treatment_status'],  next: 'LIABILITY' },
  LIABILITY:        { requiredFields: ['other_party_involved', 'fault_assessment'], next: 'INSURANCE' },
  INSURANCE:        { requiredFields: ['insurance_info_available'],   next: 'QUALIFICATION' },
  QUALIFICATION:    { requiredFields: ['is_qualified'],               next: 'WRAP_UP' },
  DISQUALIFICATION: { requiredFields: ['qualification_reason'],       next: null },
  WRAP_UP:          { requiredFields: ['caller_email'],               next: null },
};

The FSM fast-tracks automatically: if a caller provides their incident, injuries, and liability status all in one sentence, the LLM extracts all fields simultaneously and the FSM jumps directly to QUALIFICATION, skipping redundant phases.

The `<FSM_STATE>` Delimiter Protocol — Speech + Extraction in One Stream

Rather than making two separate LLM calls (one for speech, one for JSON extraction), SettleVox uses a custom single-stream protocol. The LLM produces a single response in two parts separated by a delimiter:

"I'm so sorry to hear about your accident. Just to confirm — 
you mentioned this happened on June 15th near downtown Chicago? 
<FSM_STATE>{"incident_date":"2026-06-15","incident_location":"downtown Chicago"}

The Orchestrator's streaming parser emits speech tokens to Cartesia TTS in real time, then parses the JSON block post-stream to update the FSM — all in a single API round-trip. This cuts latency by ~400ms per turn.

Barge-In Detection & Interruption

Callers interrupt agents constantly. SettleVox handles this properly:

private handleInterimTranscript(text: string) {
  // Require 2+ words or 5+ chars to prevent echo-cancellation false positives
  const words = text.trim().split(/\s+/);
  if (words.length >= 2 || text.length > 5) {
    this.handleBargeIn();
  }
}

private handleBargeIn() {
  this.llm.interrupt();          // Abort the active Groq stream
  this.tts.interrupt();          // Cancel the active Cartesia TTS context
  this.ttsBuffer = '';           // Discard buffered sentences
  this.twilioWs.send(JSON.stringify({
    event: 'clear', streamSid: this.streamSid  // Flush Twilio audio buffer
  }));
}

All three active streams — LLM generation, TTS audio, and Twilio playback — are cancelled atomically within a single event loop tick.

Multi-Key Rotation — Surviving Rate Limits Under Load

At peak hours, a single API key exhausts Groq's free tier in minutes. SettleVox supports unlimited key rotation:

// Load up to 50 Groq keys from env: GROQ_KEY_1, GROQ_KEY_2, ... GROQ_KEY_N
for (let i = 1; i <= 50; i++) {
  const key = process.env[`GROQ_KEY_${i}`];
  if (key && !GROQ_KEYS.includes(key)) GROQ_KEYS.push(key);
}

// Permanently blacklist keys blocked at org level (model_permission_blocked_org)
private rotateGroqKey(blacklistCurrent = false) {
  if (blacklistCurrent) GROQ_BLACKLISTED.add(activeGroqIndex);
  activeGroqIndex = (activeGroqIndex + 1) % GROQ_KEYS.length;
}

Rate limit errors trigger a key rotation with a 500ms backoff. Org-level model blocks are permanently blacklisted so they're never retried. This keeps 100% uptime across concurrent calls without a paid API tier.

Sentence-Level TTS Chunking — Sub-200ms First Audio

Instead of waiting for the full LLM response before speaking, SettleVox pipes each grammatically complete sentence to Cartesia the moment it's available:

private handleLLMChunk(chunk: string) {
  this.ttsBuffer += chunk;
  // Split on sentence-terminal punctuation: "." "?" "!" followed by whitespace
  let match = this.ttsBuffer.match(/[.?!]\s/);
  while (match && match.index !== undefined) {
    const sentence = this.ttsBuffer.substring(0, match.index + match[0].length);
    this.ttsBuffer = this.ttsBuffer.substring(match.index + match[0].length);
    // continue: true keeps the same TTS context for prosody continuity
    this.tts.sendText(sentence, this.currentContextId, true);
    match = this.ttsBuffer.match(/[.?!]\s/);
  }
}

Cartesia's continue: true flag keeps all sentence chunks in the same audio context, preserving prosody and natural rhythm across sentence boundaries.

TCPA Compliance — Consent-First Architecture

SettleVox enforces TCPA compliance at the FSM level, not the prompt level. If a caller refuses recording consent, the FSM immediately transitions to DISQUALIFICATION and the entire intake pipeline halts — the agent cannot accidentally continue:

if (extractedData.consent_given === false || extractedData.consent_given === 'REFUSED') {
  this.extractedData.qualification_reason = 'consent_refused';
  this.currentPhase = 'DISQUALIFICATION';  // Hard stop — no override possible
}

Feature Set

Feature	Implementation
Live Voice Calls	Twilio → WebSocket → Orchestrator pipeline
Speech-to-Text	Cartesia Ink-2 AutoFinalize (real-time turn detection)
LLM Brain	Groq Llama 3.3 70B with streaming + key rotation
Text-to-Speech	Cartesia Sonic-3.5 with sentence-level chunking
Barge-In	Atomic interruption of LLM + TTS + Twilio buffer
8-Phase FSM	Deterministic intake flow with fast-forward logic
TCPA Guardrails	FSM-level consent enforcement, not prompt-level
Zod Schema	Typed extraction with REFUSED/UNKNOWN fallbacks
AI Summary	Post-call sentiment analysis + 1-sentence summary
SSE Dashboard	Real-time call status push to React CRM frontend
Multi-Key Pool	Up to 50 Groq keys with permanent blacklist logic
Click-to-Talk	Demo portal triggers outbound call via Twilio REST

Demo Scenarios

The SettleVox live demo supports six edge case scenarios to verify the robustness of the FSM and pipeline:

#	Scenario	What It Tests
1	Golden Path	Full auto accident intake — qualification + wrap-up
2	Unqualified Lead	No third-party liability → fast-track disqualification
3	Barge-In	Interrupt agent mid-sentence → instant stream abort
4	Over-Sharer	All data in one breath → FSM auto-advances phases
5	Contact Refusal	Refuses name/email → Zod logs as REFUSED, continues
6	TCPA Refusal	Refuses recording consent → immediate pipeline halt

What I Learned

Building SettleVox revealed that the hard part of voice AI isn't the LLM — it's the real-time audio plumbing. Managing three simultaneous WebSocket connections with sub-50ms event propagation, handling TCP backpressure on 8kHz µ-law streams, and atomically cancelling in-flight API calls during barge-ins requires careful EventEmitter architecture.

The <FSM_STATE> delimiter protocol was the key insight: by encoding both speech and structured data into a single LLM stream, I eliminated an entire round-trip API call per conversational turn — making the agent feel genuinely fast and natural rather than AI-laggy.

AI / LLMVoice AINode.jsWebSocketsReal-time

SettleVox: AI Voice Legal Intake Agent

Real-time AI voice agent conducting automated legal intake calls — built on WebSockets, Cartesia STT/TTS, Groq LLM, and an 8-phase finite state machine pipeline.

Stack:

Source Code Live Demo

The Problem

The challenge: build a voice AI agent that:

Sounds natural — No robotic TTS delays or uncanny valley responses
Handles barge-ins — Callers interrupt constantly; the agent must stop mid-sentence and adapt
Extracts structured data — Names, incident dates, fault assessments must be parsed, not just logged as raw audio
Enforces legal compliance — TCPA consent must be obtained before recording; no workarounds

Architecture Overview

System Architecture

<!-- Professional Soft Shadows -->
<filter id="shadow-card" x="-20%" y="-15%" width="140%" height="135%">
  <feDropShadow dx="0" dy="2" stdDeviation="3" flood-color="#0f172a" flood-opacity="0.05" />
</filter>
<filter id="shadow-group" x="-5%" y="-5%" width="110%" height="110%">
  <feDropShadow dx="0" dy="1" stdDeviation="2" flood-color="#0f172a" flood-opacity="0.03" />
</filter>

<style>
  text { font-family: 'Inter', system-ui, -apple-system, sans-serif; }
  .group-title { font-size: 10px; font-weight: 700; letter-spacing: 0.8px; text-transform: uppercase; fill: #64748b; }
  .node-title { font-size: 12px; font-weight: 600; fill: #0f172a; }
  .node-sub { font-size: 10px; font-weight: 400; fill: #64748b; }
  .label-bg { fill: #ffffff; rx: 3px; }
  .label-text { font-size: 9px; font-weight: 600; fill: #475569; }
  .pill-bg { rx: 4px; }
  .pill-text { font-size: 9px; font-weight: 600; }
  
  /* Subtle, thin connection lines */
  .edge-grey { stroke: #cbd5e1; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; }
  .edge-grey-dash { stroke: #cbd5e1; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; stroke-dasharray: 4 3; }
  .edge-blue { stroke: #93c5fd; stroke-width: 1.25px; fill: none; stroke-linejoin: round; stroke-linecap: round; }
</style>

Telephony SettleVox Core Engine AI Providers React CRM Call Monitoring & Logs Twilio Voice Inbound Stream Twilio Voice Outbound Stream WSS Streamer twilio-stream.ts Audio Buffer audio-buffer.ts

<!-- SettleVox Logo (no background container, original blue color #0562EF) -->
<svg x="185" y="8" width="40" height="40" viewBox="0 0 1864 1863">
  <path d="M497.665 149.447C511.771 148.786 526.964 149.356 541.055 149.859C496.043 200.383 462.572 248.572 457.263 319.487C453.844 365.158 471.309 410.819 497.832 447.801C552.783 524.419 634.814 588.644 706.487 649.296C811.707 738.335 922.947 821.632 1014.62 925.049C1107.2 1029.5 1186.12 1156.14 1178.03 1300.81C1171.92 1410 1110.58 1511.37 1029.97 1582.4C923.187 1676.5 782.037 1721.92 640.544 1713.42C629.024 1712.7 617.061 1711.53 605.522 1710.56C684.337 1621.87 736.349 1520.83 731.153 1399C727.968 1338.24 708.966 1271.67 672.445 1221.41C600.891 1123.4 496.453 1048.97 404.223 971.421C354.563 929.664 306.03 892.416 254.509 853.416C223.66 830.231 193.905 805.625 165.34 779.677C83.0278 704.757 5.6258 605.384 0.343796 489.551C-3.2802 418.307 21.8658 348.594 70.1338 296.068C147.006 212.432 275.465 174.931 384.537 158.454C420.656 152.997 461.112 150.956 497.665 149.447Z" fill="#0562EF"/>
  <path d="M1353.3 148.5H1863.58L1356.41 1017.86L1266.25 1172.7C1219.45 994.677 1134.94 879.733 999.477 756.775C1005.24 744.622 1020.09 721.984 1027.41 709.1L1106.72 572.482L1229.12 362.28C1255.82 316.25 1325.91 194.015 1353.3 148.5Z" fill="#0562EF"/>
</svg>

<text x="205" y="66" class="node-title" text-anchor="middle">Agent Orchestrator (FSM)</text>
<text x="205" y="80" class="node-sub" text-anchor="middle">intake-fsm.ts</text>

<!-- FSM Micro Pills -->
<rect x="25" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="80" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">intent-classifier</text>

<rect x="150" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="205" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">prompt-builder</text>

<rect x="275" y="93" width="110" height="20" class="pill-bg" fill="#eff6ff" stroke="#bfdbfe" stroke-width="1" />
<text x="330" y="106" class="pill-text" fill="#2563eb" text-anchor="middle">turn-manager</text>

<text x="205" y="132" class="node-sub" text-anchor="middle">TypeScript · Drizzle ORM · Event Emitter</text>

Key Technical Decisions

The Real-Time Pipeline — No Off-the-Shelf Framework

// Orchestrator: the beating heart of the pipeline
export class Orchestrator {
  private stt: SpeechToText;   // Cartesia Ink-2 via WebSocket
  private llm: LLMEngine;      // Groq Llama 3.3 70B, streaming
  private tts: TextToSpeech;   // Cartesia Sonic-3.5 via WebSocket
  private fsm: IntakeFSM;      // 8-phase deterministic state machine

  // Audio comes in from Twilio → decoded → forwarded to Cartesia STT
  public receiveAudio(audioBuffer: Buffer) {
    this.stt.sendAudio(audioBuffer);
  }
}

Finite State Machine (IntakeFSM) — 8 Deterministic Phases

The agent's conversational flow is governed by a typed FSM — not a giant prompt. This is what makes SettleVox reliable instead of probabilistic:

export const INTAKE_PHASES: Record<PhaseName, Phase> = {
  GREETING:         { requiredFields: ['consent_given'],              next: 'INCIDENT_DETAILS' },
  INCIDENT_DETAILS: { requiredFields: ['incident_description', 'incident_date', 'incident_location'], next: 'INJURY_ASSESSMENT' },
  INJURY_ASSESSMENT:{ requiredFields: ['injuries_described', 'treatment_status'],  next: 'LIABILITY' },
  LIABILITY:        { requiredFields: ['other_party_involved', 'fault_assessment'], next: 'INSURANCE' },
  INSURANCE:        { requiredFields: ['insurance_info_available'],   next: 'QUALIFICATION' },
  QUALIFICATION:    { requiredFields: ['is_qualified'],               next: 'WRAP_UP' },
  DISQUALIFICATION: { requiredFields: ['qualification_reason'],       next: null },
  WRAP_UP:          { requiredFields: ['caller_email'],               next: null },
};

The `<FSM_STATE>` Delimiter Protocol — Speech + Extraction in One Stream

"I'm so sorry to hear about your accident. Just to confirm — 
you mentioned this happened on June 15th near downtown Chicago? 
<FSM_STATE>{"incident_date":"2026-06-15","incident_location":"downtown Chicago"}

Barge-In Detection & Interruption

Callers interrupt agents constantly. SettleVox handles this properly:

private handleInterimTranscript(text: string) {
  // Require 2+ words or 5+ chars to prevent echo-cancellation false positives
  const words = text.trim().split(/\s+/);
  if (words.length >= 2 || text.length > 5) {
    this.handleBargeIn();
  }
}

private handleBargeIn() {
  this.llm.interrupt();          // Abort the active Groq stream
  this.tts.interrupt();          // Cancel the active Cartesia TTS context
  this.ttsBuffer = '';           // Discard buffered sentences
  this.twilioWs.send(JSON.stringify({
    event: 'clear', streamSid: this.streamSid  // Flush Twilio audio buffer
  }));
}

All three active streams — LLM generation, TTS audio, and Twilio playback — are cancelled atomically within a single event loop tick.

Multi-Key Rotation — Surviving Rate Limits Under Load

At peak hours, a single API key exhausts Groq's free tier in minutes. SettleVox supports unlimited key rotation:

// Load up to 50 Groq keys from env: GROQ_KEY_1, GROQ_KEY_2, ... GROQ_KEY_N
for (let i = 1; i <= 50; i++) {
  const key = process.env[`GROQ_KEY_${i}`];
  if (key && !GROQ_KEYS.includes(key)) GROQ_KEYS.push(key);
}

// Permanently blacklist keys blocked at org level (model_permission_blocked_org)
private rotateGroqKey(blacklistCurrent = false) {
  if (blacklistCurrent) GROQ_BLACKLISTED.add(activeGroqIndex);
  activeGroqIndex = (activeGroqIndex + 1) % GROQ_KEYS.length;
}

Sentence-Level TTS Chunking — Sub-200ms First Audio

Instead of waiting for the full LLM response before speaking, SettleVox pipes each grammatically complete sentence to Cartesia the moment it's available:

private handleLLMChunk(chunk: string) {
  this.ttsBuffer += chunk;
  // Split on sentence-terminal punctuation: "." "?" "!" followed by whitespace
  let match = this.ttsBuffer.match(/[.?!]\s/);
  while (match && match.index !== undefined) {
    const sentence = this.ttsBuffer.substring(0, match.index + match[0].length);
    this.ttsBuffer = this.ttsBuffer.substring(match.index + match[0].length);
    // continue: true keeps the same TTS context for prosody continuity
    this.tts.sendText(sentence, this.currentContextId, true);
    match = this.ttsBuffer.match(/[.?!]\s/);
  }
}

Cartesia's continue: true flag keeps all sentence chunks in the same audio context, preserving prosody and natural rhythm across sentence boundaries.

TCPA Compliance — Consent-First Architecture

if (extractedData.consent_given === false || extractedData.consent_given === 'REFUSED') {
  this.extractedData.qualification_reason = 'consent_refused';
  this.currentPhase = 'DISQUALIFICATION';  // Hard stop — no override possible
}

Feature Set

Feature	Implementation
Live Voice Calls	Twilio → WebSocket → Orchestrator pipeline
Speech-to-Text	Cartesia Ink-2 AutoFinalize (real-time turn detection)
LLM Brain	Groq Llama 3.3 70B with streaming + key rotation
Text-to-Speech	Cartesia Sonic-3.5 with sentence-level chunking
Barge-In	Atomic interruption of LLM + TTS + Twilio buffer
8-Phase FSM	Deterministic intake flow with fast-forward logic
TCPA Guardrails	FSM-level consent enforcement, not prompt-level
Zod Schema	Typed extraction with REFUSED/UNKNOWN fallbacks
AI Summary	Post-call sentiment analysis + 1-sentence summary
SSE Dashboard	Real-time call status push to React CRM frontend
Multi-Key Pool	Up to 50 Groq keys with permanent blacklist logic
Click-to-Talk	Demo portal triggers outbound call via Twilio REST

Demo Scenarios

The SettleVox live demo supports six edge case scenarios to verify the robustness of the FSM and pipeline:

#	Scenario	What It Tests
1	Golden Path	Full auto accident intake — qualification + wrap-up
2	Unqualified Lead	No third-party liability → fast-track disqualification
3	Barge-In	Interrupt agent mid-sentence → instant stream abort
4	Over-Sharer	All data in one breath → FSM auto-advances phases
5	Contact Refusal	Refuses name/email → Zod logs as REFUSED, continues
6	TCPA Refusal	Refuses recording consent → immediate pipeline halt

SettleVox: AI Voice Legal Intake Agent

The Problem

Architecture Overview

Key Technical Decisions

The Real-Time Pipeline — No Off-the-Shelf Framework

Finite State Machine (IntakeFSM) — 8 Deterministic Phases

The <FSM_STATE> Delimiter Protocol — Speech + Extraction in One Stream

Barge-In Detection & Interruption

Multi-Key Rotation — Surviving Rate Limits Under Load

Sentence-Level TTS Chunking — Sub-200ms First Audio

TCPA Compliance — Consent-First Architecture

Feature Set

Demo Scenarios

What I Learned

SettleVox: AI Voice Legal Intake Agent

The Problem

Architecture Overview

Key Technical Decisions

The Real-Time Pipeline — No Off-the-Shelf Framework

Finite State Machine (IntakeFSM) — 8 Deterministic Phases

The <FSM_STATE> Delimiter Protocol — Speech + Extraction in One Stream

Barge-In Detection & Interruption

Multi-Key Rotation — Surviving Rate Limits Under Load

Sentence-Level TTS Chunking — Sub-200ms First Audio

TCPA Compliance — Consent-First Architecture

Feature Set

Demo Scenarios

What I Learned

The `<FSM_STATE>` Delimiter Protocol — Speech + Extraction in One Stream

The `<FSM_STATE>` Delimiter Protocol — Speech + Extraction in One Stream