Data

Frontier is built with a commitment to responsible data handling, ensuring that sales call coaching is effective while respecting privacy and maintaining strong security boundaries. This page details what data Frontier collects, where it is stored, and how tenant isolation is achieved across its services.

What We Collect

Frontier collects various types of data to provide real-time coaching and post-call insights. This includes identifying information about users and sales calls, the content of the calls themselves, and derived insights.

Data Category	Examples	PII?
User & Organization Metadata	Organization records (name, email, image URL), user profiles (first/last name, email, image URL), call metadata (title, meeting URL, status), call participant details (name, email, platform identifiers), calendar data (event title, meeting URL, attendee list).	Yes (names, emails, image URLs, meeting details, attendee lists).
Real-time Call Content	Word-level transcripts (spoken words, speaker ID, start timestamps), detected questions, AI-generated answers and responses, script progress.	Yes (raw transcripts, detected questions, AI answers may contain PII).
Knowledge Base Content	FAQs (question, ideal answer), call scripts, structured organization facts, uploaded knowledge documents (original files, filenames, content, MIME types).	Yes (any PII contained within uploaded documents, FAQs, scripts, or facts).
Derived Data & Caches	Short-lived retrieval caches holding derived call content or knowledge query results.	Potentially (if derived from PII-containing sources).
Observability Data	Error events, structured logs.	Yes (events are enriched with `org_id`, `user_id`, `call_id` tags).

Where It Lives

Frontier utilizes a distributed architecture with specialized data stores for different data types, ensuring scalability and performance for real-time operations.

Data Type	Primary Store (Provider)	Other Stores/Processors	Region/Residency
User/Org/Account Configuration, Call Metadata, Participants, FAQs, Scripts, Org Facts, AI Answers, Calendars	Supabase Postgres (Postgres)	Clerk (identity provider for authentication).	:::caution[GAP — founder supplies] Supabase project region.
Live Call Transcripts (word-level), Detected Questions, Call Metadata (subset)	Cloudflare D1 (SQLite database)		:::caution[GAP — founder supplies] Cloudflare D1 region.
Knowledge Base Source Documents	Cloudflare R2 (object storage) & Supabase Storage		:::caution[GAP — founder supplies] Cloudflare R2 and Supabase Storage regions.
Vector Embeddings / Knowledge Indexes	Cloudflare AI Search (AutoRAG), Supermemory	Pinecone (vector database, legacy path). Cloudflare Vectorize binding exists but is not used in the live retrieval path.	:::caution[GAP — founder supplies] Cloudflare AI Search, Supermemory, Pinecone regions.
Short-lived Caches	Cloudflare KV (Key-Value store)		:::caution[GAP — founder supplies] Cloudflare KV region.
LLM Inference (real-time + post-call)	External LLM providers (Anthropic, Google, OpenAI, Together.ai), Cloudflare Workers AI		:::caution[GAP — founder supplies] LLM provider regions.
Speech-to-Text (live direct-audio transcription)	Deepgram (speech-to-text provider)		:::caution[GAP — founder supplies] Deepgram region.
Error Reporting	Sentry (error reporting)		:::caution[GAP — founder supplies] Sentry region.
Structured Logging	Axiom (structured logging)		:::caution[GAP — founder supplies] Axiom region.
Background Jobs	Inngest (background job orchestration)		:::caution[GAP — founder supplies] Inngest region.
Secrets/Configuration	Doppler (secrets management)		:::caution[GAP — founder supplies] Doppler region.
Call Audio (Desktop Recordings)	Recall Desktop SDK for capture	:::caution[GAP — founder supplies] Specific storage location and retention of raw audio blobs are founder-supplied gaps.	:::caution[GAP — founder supplies] Recall Desktop SDK audio storage region.

Raw meeting transcripts and word-level data, which include PII, are primarily stored in Cloudflare D1. Participant names and emails are stored in Supabase Postgres. Knowledge document contents can reside in Cloudflare R2 or Supabase Storage.

Tenant Isolation

Frontier implements robust tenant isolation to ensure that each organization’s data is logically segregated and inaccessible to others. This is primarily achieved through org_id identifiers and application-level enforcement.

Supabase Postgres: Tenant isolation is enforced using Postgres Row-Level Security (RLS) policies. These policies authorize access to data based on the org_id claim extracted from the JSON Web Token (JWT) issued by Clerk, the identity provider. All sensitive tables have RLS enabled, restricting CRUD operations to data matching the user’s org_id. Tables without an org_id column enforce isolation by joining to a parent calls row and checking its org_id.
Cloudflare D1 (SQLite database): D1 does not natively support Row-Level Security. Isolation for data like transcript_words (which lacks its own org_id column) is achieved at the application query layer by linking to the call_id and subsequently to the calls table’s org_id. The calls and questions tables in D1 do carry an org_id directly.
Unlike Supabase Postgres, Cloudflare D1 relies solely on application-level query scoping for isolation. This means that while carefully implemented, a bug in application code could potentially lead to cross-tenant data access within D1, whereas Postgres RLS would block such attempts at the database level.
Cloudflare AI Search: Tenant isolation for indexed knowledge content in Cloudflare AI Search is implemented via a hard org_id metadata-equality filter applied to every query. As a defense-in-depth measure, a post-filter mechanism logs and drops any results whose org_id metadata does not match the querying organization. Source documents for AI Search live in Cloudflare R2 object storage, where object keys are also org-prefixed (e.g., org/sites/example.com/page-1.html).
Supermemory: For the Supermemory knowledge backend (currently an interim solution), tenant isolation is achieved using a container tag built from the environment and the orgId. Retrieval queries are filtered by source_type metadata and, for document drills, by source_id.
Pinecone (legacy): The legacy Pinecone vector database also uses a metadata-filter model for tenant isolation, applying an { org_id: orgId } filter to queries.
Cloudflare KV (Key-Value store): Used as a short-lived cache for knowledge retrieval results (300s TTL) and global configuration. It holds derived call content and not the source of record.
Durable Objects (OrgAnswerAgent, CallAnswerAgent): Warm per-tenant answer agents (OrgAnswerAgent and CallAnswerAgent) enforce org_id binding from the JWT provided at connection time. Any attempt to reconnect from a different organization to an already bound Durable Object is explicitly rejected with an “Unauthorized org” error.
Multi-backend Knowledge Base: Frontier’s knowledge base is designed to be multi-backend and is currently in active evaluation, supporting Cloudflare AI Search, Supermemory, and a legacy Pinecone path. The active backend is resolved at runtime based on per-request overrides, KV configurations, or environment defaults. This means an organization’s knowledge data may reside in multiple vector/storage backends simultaneously.

Data Retention, Residency, and Deletion

Frontier maintains data for the duration necessary to provide its services and meet business requirements.

Limited retention periods were found in code for specific components: Cloudflare KV retrieval caches expire after 300 seconds, and detected question events in Cloudflare D1 are cleaned up after 14 days.
The system supports cascading deletes (e.g., deleting a calls row in D1 cascades to its transcript_words and questions). However, the overall data retention schedule, including for primary stores like Cloudflare D1 transcripts and Supabase call/participant data, is:

Specific data retention periods and full deletion lifecycle for all primary data stores and types.
Data residency and geographical location of all data stores are also critical aspects:

The specific regions or jurisdictions for each data store (e.g., Supabase, Cloudflare D1/R2/Vectorize/KV, Deepgram, LLM providers).
The process for Data Subject Access Requests (DSARs), including data deletion and export, needs to be fully defined:

Detailed processes for data deletion and export in response to DSARs.

Data Processing Agreements

Frontier engages with various sub-processors to deliver its services. Ensuring appropriate contractual agreements, including Data Processing Agreements (DPAs), is crucial.

Sub-processors: Key sub-processors on the data path include Clerk (identity provider), Deepgram (speech-to-text), Anthropic, Google, OpenAI, Together.ai (LLM providers), Sentry (error reporting), Axiom (structured logging), and Inngest (background jobs). Recall Desktop SDK is used for audio capture.
LLM Provider Commitments:

Confirmation of zero-data-retention or no-training commitments with all LLM providers (Anthropic, Google, OpenAI, Together.ai) and Deepgram.
DPA Posture:

The current status of Data Processing Agreements (DPAs) with all sub-processors and an authoritative, published list of sub-processors along with the transfer basis for data.
Supply-Chain Hygiene: Frontier uses Dependabot to automatically track and update dependencies for the Bun ecosystem on a weekly schedule, contributing to overall supply-chain security.
Encryption and Key Management:

Specifics regarding encryption-at-rest and in-transit beyond provider defaults, and details on key management practices.