Data & storage
Frontier’s data layer is designed to support real-time AI coaching at the edge, leveraging Cloudflare’s global network for performance-critical operations and a robust set of managed services for durability and flexibility. This approach allows for efficient processing of high-volume, transient call data alongside structured application and knowledge base content.
Supabase (Postgres)
Section titled “Supabase (Postgres)”Supabase (a managed Postgres database) serves as the system of record for application and user configuration. It stores static and semi-static data related to organizations, users, call metadata, sales scripts, FAQs, and AI responses. Its realtime subscription feature is utilized for live updates. Calls to Supabase are comparatively infrequent, as heavy real-time processing occurs on the Cloudflare network.
Data Stored
Section titled “Data Stored”- Organizational and User Data: Includes organizational settings, user accounts (names, emails, profile images synced from Clerk), and access controls.
- Call Metadata: High-level details about calls, such as status, participants (names and emails), titles, and meeting URLs. This also includes rich async transcripts with speaker names.
- Sales Content: Managed sales scripts, frequently asked questions (FAQs) with ideal answers, and structured organizational facts.
- AI Responses: Stores generated AI answers and responses for post-call review and analysis.
- Knowledge Documents: Metadata about uploaded knowledge documents.
- Calendar Data: Information for scheduling and managing calls.
Tenant Isolation
Section titled “Tenant Isolation”Tenant isolation in Supabase is enforced primarily through Postgres Row Level Security (RLS).
- RLS policies restrict data access (CRUD operations) to authenticated users whose
org_id(extracted from their Clerk-issued JWT) matches theorg_idassociated with the data. - Frontier’s RLS policies are designed to accommodate both a legacy JWT claim format (
auth.jwt() -> 'o' ->> 'id') and a newerauth.jwt() ->> 'org_id'format, representing a mid-migration state. - For tables that do not directly carry an
org_idcolumn (e.g.,call_status_changes,call_desktop_recordings), RLS policies achieve isolation by joining with a parent table (e.g.,calls) which does have anorg_id, ensuring all related data remains tenant-isolated.
Cloudflare D1 (SQLite database)
Section titled “Cloudflare D1 (SQLite database)”Cloudflare D1, a serverless SQLite database, acts as the high-throughput, low-latency store for live call content. It is bound to the call-agent Cloudflare Worker.
Data Stored
Section titled “Data Stored”- Live Call Content:
- The
callstable stores core call metadata (ID,org_id, creation timestamp, status). - The
transcript_wordstable holds individual spoken words (speaker_id,text,start_timestamp). This is where raw call transcript PII resides. - The
questionstable stores detected questions from the call, also withorg_id.
- The
- Call-related chat threads are also stored here.
Tenant Isolation
Section titled “Tenant Isolation”D1 does not natively support Row Level Security (RLS). Therefore, tenant isolation for D1 data is enforced at the application query layer.
- The
transcript_wordstable is designed without anorg_idcolumn; its isolation relies entirely on itscall_idforeign key referencing thecallstable, which does carry anorg_id. When acallsrecord is deleted, its associatedtranscript_wordsandquestionsare automatically removed viaON DELETE CASCADE. - Both
callsandquestionstables directly include anorg_idfor explicit scoping. - All D1 databases are provisioned per environment (e.g.,
call-agent-db-prd,call-agent-db-dev,call-agent-db-demo,call-agent-db-stg) for physical separation, though therc(release candidate) environment intentionally shares staging’s D1 resources.
Cloudflare Vectorize
Section titled “Cloudflare Vectorize”Cloudflare Vectorize is a vector index for embeddings and retrieval. It is bound to the call-agent Worker.
Data Stored
Section titled “Data Stored”- Vector embeddings for various data, designed to support semantic search and retrieval.
Tenant Isolation
Section titled “Tenant Isolation”Vectorize is provisioned with metadata indexes for filterable tenant and call fields, including callId, type, orgId, scriptItemId, and faqId. This enables tenant isolation through metadata filtering during queries.
Cloudflare AI Search and R2
Section titled “Cloudflare AI Search and R2”Cloudflare AI Search (AutoRAG), with its backing Cloudflare R2 object storage, is the primary knowledge retrieval backend for Frontier’s knowledge base.
Data Stored
Section titled “Data Stored”- Indexed Knowledge: Cloudflare AI Search indexes documents and facts from various sources to enable efficient retrieval for AI coaching.
- Original Documents: Cloudflare R2 (
ai-search-knowledge-prdbucket) stores the original source documents that are indexed by AI Search. These objects are keyed withorg-prefixedpaths, such asorg/sites/example.com/page-1.html.
Tenant Isolation
Section titled “Tenant Isolation”- AI Search: Tenant isolation is enforced with a hard
org_idmetadata-equality filter applied to every query. As a defense-in-depth measure, a post-filter also verifies results, logging and dropping any data whoseorg_idmetadata mismatches the querying organization. - R2 Storage: R2 objects are stored with
org-prefixedkeys (${orgId}/${prefix}/${filename}) and metadata includingorg_id,source_type, andsource_id. This provides both namespace separation and metadata-based isolation for the source files. - AI Search instances and R2 buckets are physically separated per environment (e.g.,
frontierx-knowledge-search-prd-v3,ai-search-knowledge-prd).
Cloudflare KV (Key-Value Store)
Section titled “Cloudflare KV (Key-Value Store)”Cloudflare KV serves as a short-lived cache for various system configurations and transient data. It is bound as SCRIPT_CACHE to the call-agent Worker.
Data Stored
Section titled “Data Stored”- Caching: Stores results from knowledge and Supermemory retrieval operations with a 300-second TTL.
- Configuration: Holds the global knowledge-backend configuration key (
cfg:knowledge_backend). - Derived Content: Contains derived call content for short periods, not acting as a source of record.
Tenant Isolation
Section titled “Tenant Isolation”Due to its role in caching derived and transient data, KV’s tenant isolation is implicit based on the data’s short lifespan and the application logic that populates and retrieves from it. KV namespaces are also separated per environment.
Per-call Durable Object Storage (Agent Storage)
Section titled “Per-call Durable Object Storage (Agent Storage)”Each per-call Durable Object (DO) instance utilizes its own dedicated SQLite-backed storage for maintaining call-specific state.
Data Stored
Section titled “Data Stored”- Call State: This storage holds the ephemeral, real-time state pertinent to a single ongoing call being managed by that specific Durable Object instance. All Durable Object migrations utilize the
new_sqlite_classesstorage backend.
Tenant Isolation
Section titled “Tenant Isolation”The isolation is inherent to the Durable Objects architecture: each DO instance is unique per call (keyed by callId) or per chat thread (threadId), providing a natural boundary for state. The OrgAnswerAgent and CallAnswerAgent Durable Objects enforce tenant binding at connection time by deriving the orgId from the JWT and rejecting cross-org reconnects as a defense-in-depth mechanism.
Knowledge Base Architecture
Section titled “Knowledge Base Architecture”Frontier’s knowledge base is designed with a multi-backend, runtime-selectable architecture, reflecting an ongoing evaluation and investment in accuracy and completeness.
- Cloudflare AI Search is the primary backend, as detailed above.
- Supermemory serves as an alternate knowledge backend, used as a deliberate stop-gap. Its tenant isolation is implemented via a
containerTagbuilt from the environment and theorgId. - A legacy Pinecone path also exists as an alternate knowledge retrieval backend. Pinecone uses a metadata filter
{ org_id: orgId }for tenant isolation during queries, without relying on per-tenant namespaces. Pinecone vectors are mirrored and tracked in the Postgrespinecone_recordstable, which includesorg_idfor traceability.
Knowledge documents’ original uploaded files are stored in Supabase Storage at an org-scoped path (knowledge/{orgId}/{fileId}/original{ext}).
Key PII Locations and Data Path Providers
Section titled “Key PII Locations and Data Path Providers”Understanding the flow and storage of Personally Identifiable Information (PII) is crucial for a real-time coaching system.
- Transcripts: Raw, word-level call transcripts, including
speaker_idandtext, are stored in Cloudflare D1. - Participant Data: Participant names and emails are stored in Supabase Postgres.
- User Data: User names, emails, and profile images (synced from Clerk) are stored in Supabase Postgres.
- Meeting Details: Meeting titles, URLs, and rich async transcripts (with speaker names) are stored in Supabase Postgres.
- Knowledge Content: Contents of uploaded knowledge documents are stored in Cloudflare R2 and Supabase Storage (org-scoped paths), and indexed into the vector backends (AI Search, Supermemory, Pinecone).
The following third-party services are integral to the data path:
- Authentication (Clerk): Provides identity and authentication, issuing JWTs that drive RLS policies in Supabase and tenant isolation across Cloudflare services.
- Speech-to-Text (Deepgram): For the live (direct-audio) transcription path, the
call-serverconnects directly to Deepgram’s WebSocket API (nova-3model) for word-level-timed transcription. - Large Language Models (LLM Providers): Frontier’s
call-serverroutes requests to multiple LLM providers via the Vercel AI SDK, including Google Generative AI (Gemini), Anthropic (Claude), OpenAI (gpt-* models), Together.ai, and Cloudflare Workers AI (@cf/meta/llamamodels). - Call Capture (Recall Desktop SDK): The live path for audio capture is via the Recall Desktop SDK running in the Electron desktop app. Data related to this capture is stored in Supabase. The legacy Recall cloud bot integration and its associated data structures remain in the schema but are deprecated and unused for the live path.
- Background Jobs (Inngest): Handles durable async and scheduled processing (e.g., webhook processing, knowledge ingestion, cron jobs) within the Dashboard application.
- Secrets Management (Doppler): Manages environment variables and secrets, which are injected into Cloudflare Workers at deploy time.
- Observability (Sentry & Axiom): Sentry captures errors, enriching events with
org_id,user_id, andcall_idtags. Axiom collects structured logs, with datasets namedfrontier-call-server-{env}.