Skip to content

AI & safety

Frontier uses AI to provide real-time coaching during sales calls, focusing on accuracy, relevance, and responsible data handling. This involves carefully orchestrated LLM (Large Language Model) calls for dynamic responses, a robust knowledge retrieval system for grounding, and a clear distinction between real-time, LLM-driven actions and post-call analyses. Our safety posture emphasizes multi-provider resilience, transparent evaluation methodologies, and clear identification of areas for future enhancement in data retention and human-in-the-loop validation.

Frontier’s core real-time coaching capability leverages AI models to provide timely guidance to sales representatives during live calls. This includes generating quick answers and detecting key moments.

LLMs are primarily employed for generating real-time answers and coaching signals presented in the HUD (heads-up display). For example, a detected customer question can trigger an AI-generated answer. The system employs a two-phase LLM flow for answer generation:

  1. Phase 1 (Fast Answer): Aims for rapid responses (targeting sub-second latency) using models optimized for speed, such as LiquidAI/LFM2-24B-A2B (configurable via QA_FAST_ANSWER_MODEL) with constrained output (up to 500 tokens) and minimal reasoning to reduce Time-to-First-Token (TTFT).
  2. Phase 2 (Deeper Answer): Provides more comprehensive information using models like gemini-2.5-flash (configurable via QA_DEEP_ANSWER_MODEL), allowing for longer outputs (up to 900 tokens) and more in-depth processing.

Not every spoken utterance triggers an LLM call. Partial transcripts are subject to a 5-second throttle (using a KV-backed timestamp) to prevent excessive inference requests. Final transcripts, however, bypass this throttle to ensure timely processing. Notably, core detection tasks such as FAQ matching and script completion are embeddings/vector-based and do not directly invoke LLMs.

Frontier utilizes a multi-provider strategy for its LLM calls to enhance resilience and allow for performance tuning. The system dynamically routes requests to various providers via the Vercel AI SDK, including:

  • Google Generative AI (e.g., Gemini models)
  • Anthropic (e.g., Claude models)
  • OpenAI (e.g., GPT models)
  • Together.ai
  • Cloudflare Workers AI (e.g., @cf/meta/llama models)

AI coaching responses are grounded in an organization’s specific knowledge base to ensure accuracy and relevance.

The knowledge base (KB) is architected with a multi-backend approach, currently in active evaluation and intentionally runtime-selectable. It uses:

  • Cloudflare AI Search (backed by Cloudflare R2 for source documents).
  • Supermemory (as a deliberate stop-gap solution).
  • A legacy Pinecone path.

Frontier is also planning to experiment with Graph RAG (Retrieval Augmented Generation) services for the KB layer, but this is not yet built.

The QuickAnswerAgent, a per-call Durable Object, retrieves information from these knowledge backends, using a warm cache where possible. Tenant isolation for knowledge documents is enforced:

  • Cloudflare AI Search: Queries include a hard org_id metadata filter, supplemented by a defense-in-depth post-filter that drops and logs any results with a mismatched org_id. R2 object keys are also org-prefixed (e.g., org/sites/example.com/page-1.html).
  • Supermemory: Isolates data by building container tags from the environment and orgId. Retrieval is further filtered by source_type and source_id metadata.

Beyond real-time coaching, LLMs are utilized for post-call analysis to generate structured summaries and insights. Schemas for post-call summaries, such as CoachingNotes, UnansweredQuestions, CallSummary, and FrontierScore, are defined separately from the real-time signal schemas. These structured outputs aid in post-call review and agent performance analysis.

Frontier implements several controls and evaluation mechanisms to ensure the safety and reliability of its AI-powered features.

  • Multi-provider strategy: Mitigates dependency on a single LLM vendor.
  • Timeouts: Live question detection includes an AbortController with a configurable timeout (default 10 seconds). Upon timeout or error, the system gracefully degrades, returning no detection and logging the failure, ensuring the call continues uninterrupted.
  • Graceful Degradation: The two-phase quick-answer flow is designed to always present an initial card (FAQ match, transcript-grounded answer, or a safe “Bridge” response) and then proceed with the full knowledge stream. If no relevant context is available, a safe bridge response is shown.
  • Offline quality evaluation: An offline CLI tool (eval-quick-answer.ts) tests the quick-answer path for quality and latency. It heuristically scores answers based on expected substrings, forbidden keywords (to detect wrong-FAQ hits or hallucinations), and correct bridging for out-of-domain questions.
  • Model probing: A standalone, development-only Cloudflare Worker (@frontierx/model-probe) benchmarks model latency and validates prompt responses, scoring against heuristic validators for intent, keywords, and quick-answer quality.

PII (Personally Identifiable Information) is processed by LLMs in various contexts, including:

  • Meeting transcripts (stored in Cloudflare D1 as transcript_words.text).
  • Participant names and emails (in Supabase Postgres call_participants).
  • User names, emails, and images (in Supabase Postgres users).
  • Meeting titles, URLs, and rich async transcripts with speaker names (in Supabase Postgres calls.async_transcript_data).
  • Knowledge document contents (in Cloudflare R2 and Supabase Storage), including any PII within indexed organizational knowledge.
  • Structural Transcription Drop: Only final transcripts are persistently stored in D1. If Deepgram sends partial transcripts but never a final one for a segment, those words are permanently lost without automatic recovery. Tools exist to detect these gaps, but not to recover the dropped words.
  • Observability Gaps: Some reliability features (e.g., enhanced retry logic, active SLO tracking, comprehensive alerting) are documented as proposals or planning stages within the call-server-hardening epic, indicating they are aspirational rather than fully implemented.
  • Logging Worker SPOF: Durable Object logs are routed to Axiom via a dedicated logging worker. If the authentication token for this worker is missing or expired, DO logs are silently dropped without error, representing a Single Point of Failure for critical observability data.