Skip to content

Data & storage

Frontier’s data layer is designed to support real-time AI coaching at the edge, leveraging Cloudflare’s global network for performance-critical operations and a robust set of managed services for durability and flexibility. This approach allows for efficient processing of high-volume, transient call data alongside structured application and knowledge base content.

Supabase (a managed Postgres database) serves as the system of record for application and user configuration. It stores static and semi-static data related to organizations, users, call metadata, sales scripts, FAQs, and AI responses. Its realtime subscription feature is utilized for live updates. Calls to Supabase are comparatively infrequent, as heavy real-time processing occurs on the Cloudflare network.

  • Organizational and User Data: Includes organizational settings, user accounts (names, emails, profile images synced from Clerk), and access controls.
  • Call Metadata: High-level details about calls, such as status, participants (names and emails), titles, and meeting URLs. This also includes rich async transcripts with speaker names.
  • Sales Content: Managed sales scripts, frequently asked questions (FAQs) with ideal answers, and structured organizational facts.
  • AI Responses: Stores generated AI answers and responses for post-call review and analysis.
  • Knowledge Documents: Metadata about uploaded knowledge documents.
  • Calendar Data: Information for scheduling and managing calls.

Tenant isolation in Supabase is enforced primarily through Postgres Row Level Security (RLS).

  • RLS policies restrict data access (CRUD operations) to authenticated users whose org_id (extracted from their Clerk-issued JWT) matches the org_id associated with the data.
  • Frontier’s RLS policies are designed to accommodate both a legacy JWT claim format (auth.jwt() -> 'o' ->> 'id') and a newer auth.jwt() ->> 'org_id' format, representing a mid-migration state.
  • For tables that do not directly carry an org_id column (e.g., call_status_changes, call_desktop_recordings), RLS policies achieve isolation by joining with a parent table (e.g., calls) which does have an org_id, ensuring all related data remains tenant-isolated.

Cloudflare D1, a serverless SQLite database, acts as the high-throughput, low-latency store for live call content. It is bound to the call-agent Cloudflare Worker.

  • Live Call Content:
    • The calls table stores core call metadata (ID, org_id, creation timestamp, status).
    • The transcript_words table holds individual spoken words (speaker_id, text, start_timestamp). This is where raw call transcript PII resides.
    • The questions table stores detected questions from the call, also with org_id.
  • Call-related chat threads are also stored here.

D1 does not natively support Row Level Security (RLS). Therefore, tenant isolation for D1 data is enforced at the application query layer.

  • The transcript_words table is designed without an org_id column; its isolation relies entirely on its call_id foreign key referencing the calls table, which does carry an org_id. When a calls record is deleted, its associated transcript_words and questions are automatically removed via ON DELETE CASCADE.
  • Both calls and questions tables directly include an org_id for explicit scoping.
  • All D1 databases are provisioned per environment (e.g., call-agent-db-prd, call-agent-db-dev, call-agent-db-demo, call-agent-db-stg) for physical separation, though the rc (release candidate) environment intentionally shares staging’s D1 resources.

Cloudflare Vectorize is a vector index for embeddings and retrieval. It is bound to the call-agent Worker.

  • Vector embeddings for various data, designed to support semantic search and retrieval.

Vectorize is provisioned with metadata indexes for filterable tenant and call fields, including callId, type, orgId, scriptItemId, and faqId. This enables tenant isolation through metadata filtering during queries.

Cloudflare AI Search (AutoRAG), with its backing Cloudflare R2 object storage, is the primary knowledge retrieval backend for Frontier’s knowledge base.

  • Indexed Knowledge: Cloudflare AI Search indexes documents and facts from various sources to enable efficient retrieval for AI coaching.
  • Original Documents: Cloudflare R2 (ai-search-knowledge-prd bucket) stores the original source documents that are indexed by AI Search. These objects are keyed with org-prefixed paths, such as org/sites/example.com/page-1.html.
  • AI Search: Tenant isolation is enforced with a hard org_id metadata-equality filter applied to every query. As a defense-in-depth measure, a post-filter also verifies results, logging and dropping any data whose org_id metadata mismatches the querying organization.
  • R2 Storage: R2 objects are stored with org-prefixed keys (${orgId}/${prefix}/${filename}) and metadata including org_id, source_type, and source_id. This provides both namespace separation and metadata-based isolation for the source files.
  • AI Search instances and R2 buckets are physically separated per environment (e.g., frontierx-knowledge-search-prd-v3, ai-search-knowledge-prd).

Cloudflare KV serves as a short-lived cache for various system configurations and transient data. It is bound as SCRIPT_CACHE to the call-agent Worker.

  • Caching: Stores results from knowledge and Supermemory retrieval operations with a 300-second TTL.
  • Configuration: Holds the global knowledge-backend configuration key (cfg:knowledge_backend).
  • Derived Content: Contains derived call content for short periods, not acting as a source of record.

Due to its role in caching derived and transient data, KV’s tenant isolation is implicit based on the data’s short lifespan and the application logic that populates and retrieves from it. KV namespaces are also separated per environment.

Per-call Durable Object Storage (Agent Storage)

Section titled “Per-call Durable Object Storage (Agent Storage)”

Each per-call Durable Object (DO) instance utilizes its own dedicated SQLite-backed storage for maintaining call-specific state.

  • Call State: This storage holds the ephemeral, real-time state pertinent to a single ongoing call being managed by that specific Durable Object instance. All Durable Object migrations utilize the new_sqlite_classes storage backend.

The isolation is inherent to the Durable Objects architecture: each DO instance is unique per call (keyed by callId) or per chat thread (threadId), providing a natural boundary for state. The OrgAnswerAgent and CallAnswerAgent Durable Objects enforce tenant binding at connection time by deriving the orgId from the JWT and rejecting cross-org reconnects as a defense-in-depth mechanism.

Frontier’s knowledge base is designed with a multi-backend, runtime-selectable architecture, reflecting an ongoing evaluation and investment in accuracy and completeness.

  • Cloudflare AI Search is the primary backend, as detailed above.
  • Supermemory serves as an alternate knowledge backend, used as a deliberate stop-gap. Its tenant isolation is implemented via a containerTag built from the environment and the orgId.
  • A legacy Pinecone path also exists as an alternate knowledge retrieval backend. Pinecone uses a metadata filter { org_id: orgId } for tenant isolation during queries, without relying on per-tenant namespaces. Pinecone vectors are mirrored and tracked in the Postgres pinecone_records table, which includes org_id for traceability.

Knowledge documents’ original uploaded files are stored in Supabase Storage at an org-scoped path (knowledge/{orgId}/{fileId}/original{ext}).

Understanding the flow and storage of Personally Identifiable Information (PII) is crucial for a real-time coaching system.

  • Transcripts: Raw, word-level call transcripts, including speaker_id and text, are stored in Cloudflare D1.
  • Participant Data: Participant names and emails are stored in Supabase Postgres.
  • User Data: User names, emails, and profile images (synced from Clerk) are stored in Supabase Postgres.
  • Meeting Details: Meeting titles, URLs, and rich async transcripts (with speaker names) are stored in Supabase Postgres.
  • Knowledge Content: Contents of uploaded knowledge documents are stored in Cloudflare R2 and Supabase Storage (org-scoped paths), and indexed into the vector backends (AI Search, Supermemory, Pinecone).

The following third-party services are integral to the data path:

  • Authentication (Clerk): Provides identity and authentication, issuing JWTs that drive RLS policies in Supabase and tenant isolation across Cloudflare services.
  • Speech-to-Text (Deepgram): For the live (direct-audio) transcription path, the call-server connects directly to Deepgram’s WebSocket API (nova-3 model) for word-level-timed transcription.
  • Large Language Models (LLM Providers): Frontier’s call-server routes requests to multiple LLM providers via the Vercel AI SDK, including Google Generative AI (Gemini), Anthropic (Claude), OpenAI (gpt-* models), Together.ai, and Cloudflare Workers AI (@cf/meta/llama models).
  • Call Capture (Recall Desktop SDK): The live path for audio capture is via the Recall Desktop SDK running in the Electron desktop app. Data related to this capture is stored in Supabase. The legacy Recall cloud bot integration and its associated data structures remain in the schema but are deprecated and unused for the live path.
  • Background Jobs (Inngest): Handles durable async and scheduled processing (e.g., webhook processing, knowledge ingestion, cron jobs) within the Dashboard application.
  • Secrets Management (Doppler): Manages environment variables and secrets, which are injected into Cloudflare Workers at deploy time.
  • Observability (Sentry & Axiom): Sentry captures errors, enriching events with org_id, user_id, and call_id tags. Axiom collects structured logs, with datasets named frontier-call-server-{env}.