How it works

Frontier provides real-time AI sales-call coaching, offering immediate, actionable guidance to sales representatives during live conversations. It achieves this by carefully orchestrating a sequence of steps: capturing the call audio, transcribing it into text, intelligently processing individual speech segments, generating relevant coaching signals, and delivering them instantly to the rep’s screen. This intricate process transforms spoken words into valuable insights, enabling reps to respond effectively in the moment.

Capturing the Conversation

The first step in Frontier’s real-time coaching loop is securely capturing the live sales call audio. This is primarily done via the Recall Desktop SDK, running within the Frontier desktop application on the sales representative’s macOS machine. This approach directly integrates with the rep’s system, ensuring high-quality audio capture without requiring a separate cloud bot to join the meeting.

Frontier is also actively migrating to a direct mic and speaker capture method. This future enhancement will provide complete separation of the representative’s audio feed from the customer’s audio feed, enabling even more precise and targeted analysis. Both current and future capture methods then stream the audio to Deepgram for speech-to-text processing.

Transcribing and Segmenting Speech

Once the audio is captured, it is streamed to Deepgram, a specialized speech-to-text (STT) provider, which rapidly transcribes the spoken words into text. This continuous stream of text is then processed by Frontier’s utterance management service.

An “utterance” refers to a distinct segment of speech, often representing a complete thought or phrase from a single speaker. The utterance management service segments the ongoing conversation into these meaningful units, identifying natural breaks in speech, changes in speaker, or maximum segment durations. This precise segmentation is crucial because it allows the AI to analyze individual contributions to the conversation rather than a continuous stream of words.

Real-time Coaching Signals

After an utterance is segmented and transcribed, Frontier performs per-utterance inference to identify relevant coaching opportunities, or “signals.” To maintain efficiency and focus, the system intelligently gates when to trigger a full analysis; it doesn’t perform a deep inference on every single word. Instead, it prioritizes complete utterances and may, for partial transcripts, perform lighter tasks like keyword detection. While the system has the capability for speaker-based detection, it currently processes all speech regardless of speaker identity to ensure comprehensive coverage.

The system performs various detections concurrently:

Script Completion: Identifies when the rep has completed a step in their sales script or conversation guide.
FAQ Detection: Recognizes common questions asked by customers, often powered by embeddings and vector search rather than direct large language model (LLM) calls.
Quick Answers: When a question is detected, Frontier’s fast-answer service retrieves a relevant response from a knowledge base. This knowledge base layer is in active evaluation and intentionally designed for multi-backend support, currently leveraging Cloudflare AI Search and Supermemory (a stop-gap solution), with plans to experiment with Graph RAG services (such as Amazon Bedrock Knowledge Bases and Anthropic) for enhanced accuracy and completeness.

For generating answers, Frontier employs a two-phase LLM flow, often utilizing a fast, concise model for an immediate response, followed by a deeper, more comprehensive model if needed. These models are sourced from multiple providers like Google, Anthropic, OpenAI, and TogetherAI, allowing for runtime routing and optimization.

Delivering Insights to the HUD

Once a coaching signal is generated, it needs to reach the sales representative without delay. The coaching signals are rapidly transported back to the rep’s computer and displayed on their Heads-Up Display (HUD) – the overlay that provides real-time guidance.

The primary call coordination service, built using Cloudflare Durable Objects, manages the live communication for each call. It ensures that signals are broadcast reliably to all connected HUD clients over WebSockets, which are persistent, two-way communication channels. The journey from captured audio to a visible coaching signal involves several network hops through various Cloudflare services, ensuring efficient and resilient delivery.