Skip to content

How a call works (end to end)

Frontier provides real-time AI coaching for sales calls by orchestrating a sequence of highly optimized services and components. This end-to-end process transforms live conversation into actionable insights and timely guidance, delivered directly to the sales representative during a call. Understanding this flow helps orient engineers before diving into the codebase.

sequenceDiagram
autonumber
actor Rep as Rep (Desktop + HUD)
participant SDK as Recall Desktop SDK
participant Dash as Dashboard
participant CS as Call Server (per-call agents)
participant DG as Deepgram
participant KB as Knowledge base
participant LLM as LLM providers
Dash->>Dash: Calendar sync finds the upcoming call
SDK-->>Rep: meeting detected
Rep->>Dash: start recording
Dash-->>Rep: callId + upload token + call-server URL
Rep->>CS: open per-call WebSocket (JWT, org-scoped)
Rep->>Rep: flip window to the HUD overlay
SDK->>CS: capture audio (mic + system loopback, two feeds)
CS->>DG: stream audio
DG-->>CS: transcripts (partial + final)
CS->>CS: orchestrator fans out detectors (script / FAQ / question)
CS-->>Rep: script items shown and marked complete
Note over CS: prospect asks a question
CS->>KB: warm the answer cache (retrieve)
Rep->>CS: rep chooses to answer
CS->>LLM: Call Answer Agent generates a grounded answer
LLM-->>CS: streamed answer
CS-->>Rep: answer shown in the HUD
Rep->>CS: follow-up question (Chat Thread Agent threads it)
Note over CS,Rep: an FAQ match is surfaced as a card directly (no LLM generation)
ComponentRole
Desktop appThe Electron client on the rep’s Mac. Two windows — the Launcher (Main Window) for prep + call summaries, and the HUD overlay for live calls; it captures call audio and relays events. See Desktop app.
HUDThe Heads-Up Display, a real-time coaching overlay shown by the Desktop app during a call.
DashboardThe Next.js web application that handles call provisioning, Recall.ai webhooks, and hosts Inngest functions for asynchronous workflows.
InngestA durable async layer within the Dashboard, managing scheduled tasks (cron jobs) and processing webhooks without blocking hot paths.
Call ServerA set of Cloudflare Workers and Durable Objects that form the edge compute unit for live calls, hosting the core AI logic and orchestration.
CallAgent Durable ObjectThe primary per-call orchestrator within the Call Server, managing call state, script progression, and coordinating other agents.
TranscriptStreamAgent Durable ObjectHandles direct audio transcription, streaming from the HUD to Deepgram or Cloudflare Workers AI and forwarding events to the CallAgent.
TranscriptOrchestration workerA companion Cloudflare Worker that processes raw transcript events, persists final transcripts to Cloudflare D1, groups words by speaker, and dispatches detection tasks.
Companion service WorkersA set of dedicated Cloudflare Workers (e.g., script-completion, faq-detection, question-detection, logging) bound to the Call Server for specialized real-time processing.
CallChatAgent Durable ObjectA per-chat-thread agent responsible for knowledge chat, progressive quick answers, and follow-up threading.
QuickAnswerAgent Durable ObjectA per-call agent for fast, inline HUD quick answers, leveraging a warm knowledge cache.
CallAnswerAgent Durable ObjectA per-call agent providing streaming, context-aware answers to detected questions.
OrgAnswerAgent Durable ObjectAn organization-scoped agent for general knowledge retrieval, providing streaming answers.
Recall.aiAn external service (via the Recall Desktop SDK) for meeting capture, providing call audio and events. (The legacy cloud bot integration is deprecated.)
DeepgramAn external speech-to-text provider used for high-fidelity, word-level-timed transcription.
Cloudflare D1A Cloudflare D1 (SQLite database) used by the Call Server for persisting final transcripts and other call-specific data.
Cloudflare VectorizeA Cloudflare Vectorize index used for embedding storage and vector similarity search.
Cloudflare AI SearchA Cloudflare AI Search instance used for knowledge retrieval.
Cloudflare R2A Cloudflare R2 bucket (AI_SEARCH_KNOWLEDGE_BUCKET) used to back knowledge retrieval.
Cloudflare KVA Cloudflare KV namespace (SCRIPT_CACHE) used for speculative knowledge base caching.
SupabaseA managed Postgres database primarily used for configuration, background job state, and application data, with real-time subscription features.
LLM providersExternal Large Language Model APIs (Anthropic, Google, OpenAI, TogetherAI) used via the Vercel AI SDK for question detection and answer generation.
Cloudflare Workers AICloudflare’s platform for running AI models at the edge, used for embeddings and as a transcription fallback for the direct audio path in production.

The journey begins with calendar synchronization. An Inngest cron job, running hourly in the Dashboard, periodically scans connected calendars for upcoming calls up to 30 days in advance. This ensures the system is aware of scheduled meetings.

When a sales call starts, the Desktop app, running on the rep’s machine, detects the meeting via the Recall Desktop SDK. Upon acceptance by the rep, the Desktop app posts to a Dashboard API route to provision the call, receiving a unique callId, an uploadToken, and the callServerUrl. The Desktop app then creates and shows the HUD, a transparent overlay on the screen, which establishes a WebSocket connection to the designated per-call CallAgent Durable Object instance on the Call Server. This connection is authenticated using a JWT in the query string, and enforces the correct organization ID for the call.

The primary method for capturing call audio today is the Recall Desktop SDK running within the Desktop app. It relays call audio and events to the Call Server.

Frontier is actively migrating to a direct audio capture path for improved separation of speaker feeds. Under this new path, gated by a USE_DIRECT_AUDIO flag, the Desktop app directly taps into the system audio (via CoreAudio loopback on macOS) and the rep’s microphone. This dual-feed audio is then streamed as 16kHz mono PCM over a voice-protocol WebSocket to a TranscriptStreamAgent Durable Object instance within the Call Server. The TranscriptStreamAgent then connects directly to Deepgram for high-fidelity, word-level-timed transcription. In production, the direct path may also leverage Cloudflare Workers AI for transcription, while Deepgram is used in development environments.

For historical context, a deprecated path involved a Recall.ai cloud bot joining the meeting and relaying events via a WebSocketRelayAgent Durable Object. This path is no longer in use.

Regardless of the audio source, transcript events (both partial and final) are forwarded to the orchestrating CallAgent Durable Object.

The CallAgent Durable Object, the lean orchestrator, takes charge of the call’s state. It delegates the heavy lifting of transcript processing to the TranscriptOrchestration worker, a companion Cloudflare Worker. This worker persists final transcripts to Cloudflare D1, groups words by speaker, and applies throttling to partial transcripts (currently at a 5-second interval) to manage the inference load.

An UtteranceTracker within the TranscriptOrchestration worker continuously segments the spoken words into meaningful utterances based on speaker changes, silence gaps, and maximum duration. This segmented audio is crucial for accurate detection.

The TranscriptOrchestration worker then fans out detection tasks to other specialized companion Workers:

  • The script-completion worker (using a ScriptCompletionDetector) analyzes transcripts to determine when items in the rep’s predefined script have been covered, updating the CallAgent’s script stack.
  • The question-detection worker uses LLM providers (such as TogetherAI, Google’s Gemini, or Workers AI Llama) to identify when a prospect asks a question.
  • The faq-detection worker identifies when a prospect’s statement matches a known FAQ.

These detection workers communicate their findings back to the CallAgent Durable Object, which then broadcasts the relevant signals (detected questions, script progress) to the connected HUD clients via its Agents-SDK WebSocket connections.

When a prospect asks a question, it is detected by the question-detection worker using LLMs. This immediately triggers a speculative pre-warming of the answer cache, where relevant knowledge is retrieved from Cloudflare AI Search or Supermemory (a stop-gap knowledge base) and stored in Cloudflare KV (SCRIPT_CACHE) with rewritten, context-resolved queries. This pre-warming is done before the rep even decides to answer, aiming for near-instant retrieval later.

A live question is distinct from an FAQ detection:

  • Live question detection is LLM-based, aimed at extracting specific questions from the conversation.
  • FAQ detection is embedding and vector-similarity based (with a SIMILARITY_THRESHOLD of 0.8), searching an organization’s existing FAQ store.

When the rep decides to provide an answer, either through a quick answer prompt in the HUD or a more extensive chat, the QuickAnswerAgent Durable Object or CallChatAgent Durable Object handles the request. This initiates a progressive answer flow:

  1. Fast Retrieval: The system first checks the warm cache (SCRIPT_CACHE) for a 0ms retrieval. If not found, it performs a fast retrieval from Cloudflare AI Search.
  2. Classification: The answer is classified (e.g., direct FAQ match, a fast TLDR summary, or requiring a deeper “bridge” answer).
  3. Generation: A two-phase LLM generation process kicks in. Phase 1 provides a concise, low-latency answer (e.g., using LiquidAI/LFM2-24B-A2B or gpt-4o-mini), often suppressing extensive reasoning to prioritize speed. Phase 2 then refines this into a more comprehensive answer using a capable model like gemini-2.5-flash.
  4. Delivery: The answer is streamed token-by-token directly to the HUD via a streaming RPC callable, rather than broadcasting it through the CallAgent’s state, which is an optimization to avoid repeatedly sending large state objects.

Answering duties are split:

  • The CallAnswerAgent Durable Object provides call-scoped, streaming answers, making use of call context (like the transcript and warm cache).
  • The OrgAnswerAgent Durable Object provides organization-scoped, streaming answers, drawing from general knowledge.
  • The CallChatAgent Durable Object handles per-thread knowledge chat, validating thread, call, and organization context in Cloudflare D1 and interacting with the CallAgent to get call context. Follow-up threading is managed through RPC helpers that append messages server-side.

Frontier’s architecture is designed for extensibility, particularly in its detection capabilities. The TranscriptOrchestration worker acts as an orchestrator for various detection services. It uses decideDetectionFilters to determine which detection workers are relevant based on speaker and mode, and then buildDetectionWorkerPromises to dispatch tasks to them.

New detectors can be easily integrated by binding a new Cloudflare Worker service and adding its promise to this dispatch mechanism. This uniform WorkerCommunicationUtils.callWorker interface ensures that new detection logic can be added without extensive changes to the core orchestration.

Frontier continuously invests in improving real-time performance and expanding capabilities.

Running Conversation Summarization: Currently, conversation summarization is a post-call process, not a live feature.

Latency-Cutting Mechanisms: To achieve real-time responsiveness, several optimizations are in place and planned:

  • Two-Phase Answers: Rapid initial answers followed by more detailed, deeper generation, with the deeper phase hiding its latency behind the first.
  • Reasoning Suppression: For fast answers, LLMs are prompted to suppress extensive reasoning steps (e.g., thinkingBudget:0 for Gemini, reasoningEffort:'minimal' for OpenAI) to reduce Time-To-First-Token (TTFT).
  • Partial-Transcript Throttling: Inference on partial transcripts is throttled (currently 5 seconds) to prevent over-processing.
  • LFM2 Continuous Detection Throttling: An optional continuous question detection path using LFM2 is throttled at 1 second intervals.
  • Warm KB Cache and Speculative Prewarm/Prefetch: The knowledge base is speculatively pre-warmed for detected questions, storing context-resolved queries in Cloudflare KV so that answers can be retrieved with near-zero latency when the rep decides to ask.
  • AutoRAG Cold-Start Warmup: Best-effort warmups of Cloudflare AI Search are initiated when a chat agent connects to mitigate cold-start latencies.

Knowledge Base Evolution: The knowledge base is in active evaluation, with Cloudflare AI Search and Supermemory (an interim bridge) currently in use. Experimentation with Graph RAG services (e.g., Amazon Bedrock Knowledge Bases) is planned to enhance accuracy and completeness.