Knowledge base

The Frontier knowledge base provides real-time coaching by delivering relevant information during a sales call. It ingests company-specific data, such as product catalogs, FAQs, and public websites, processing it into an indexed format that can be quickly retrieved by AI agents to answer questions or guide a sales representative. This system is designed for flexibility, allowing for evaluation of multiple underlying retrieval technologies.

Knowledge Base Onboarding

The journey for company knowledge begins during organizational onboarding, orchestrated by an Inngest function named organization-onboard-start. This process creates a structured company profile, known as the onboarding manifest, which is validated by a ManifestSchema. This manifest captures essential information like identity, offerings, target buyers, pricing, and declared knowledge sources. An LLM agent, utilizing tools for server-side fact extraction from URLs (like extract_facts) and web scraping (like firecrawl_scrape), drives the creation of this manifest. For web content not directly discoverable, research_external uses external services such as Perplexity as a fallback.

The lifecycle state of this manifest, including draft_manifest and approved_manifest snapshots, is managed in the org_manifest_status table within Supabase (Postgres), the primary database for application and configuration data. From the approved_manifest, Company Facts are deterministically derived and stored as the durable source of truth in the org_facts table, also in Supabase. A critical Inngest function, manifest-facts-sync, reconciles these derived facts. It performs a dual-write: primarily to Cloudflare AI Search (an AutoRAG service), which acts as the blocking primary index, and on a best-effort basis to Supermemory, an interim bridge for knowledge retrieval.

Source Data Ingestion

The knowledge base integrates various types of source data:

Websites

Website content is ingested through a layered model. The process starts by inserting or updating a crawled_sites row in Supabase, then triggering an Inngest website/crawl.start event. crawled_sites tracks domain-level information, while crawled_pages stores details for individual pages, including their URL, title, and an R2 (object storage) key. Raw crawled page content is stored in Cloudflare R2, identified by an {orgId}/sites/{domain}/{filename} key, and includes a SHA-256 content_hash for efficient change detection during incremental re-crawls.

Documents

Customers can upload knowledge documents, which are stored in an org-scoped path within Supabase Storage (knowledge/{orgId}/{fileId}/original{ext}). These documents are indexed by the knowledge-sync-to-ai-search Inngest function, which uploads them to Cloudflare AI Search (skipping files larger than 4 MB) and attempts a best-effort dual-index into Supermemory using a 24-hour signed URL. A separate Inngest function, reindex-unindexed-documents, provides a self-healing mechanism to process any documents that may have become stuck without proper indexing.

FAQs

Frequently Asked Questions (FAQs) are managed in the faqs table within Supabase, capturing the question, ideal answer, relevant tags, and a when_relevant call-time relevance gate. Similar to other data types, these FAQs are indexed in parallel across multiple retrieval backends—Cloudflare AI Search, Supermemory, and a legacy Pinecone integration—via dedicated Inngest functions (e.g., faq-sync-to-ai-search).

Runtime Knowledge Retrieval

During a live sales call, knowledge retrieval occurs through a single, consistent interface within the Call Server (a set of Cloudflare Workers hosting Durable Objects). This retrieval path, exposed by knowledgeRetrieval.ts, dispatches requests to one of three underlying backends: Cloudflare AI Search, Supermemory, or Pinecone (a legacy vector database).

Supermemory provides an interim bridge by isolating each organization’s data using an environment-namespaced container tag ({env}:{org_id}). Its retrieval strategy employs a weighted four-lane fan-out, prioritizing curated sources like FAQs and Company Facts over general website content. Cloudflare AI Search, in contrast, relies on its native re-ranking capabilities and aggregates Company Facts and uploaded documents into a single ‘document’ source type.

Tenant Isolation for Knowledge Data

Tenant isolation is a critical aspect of the knowledge base, ensuring that each organization’s data is strictly separated and accessible only by authorized users within that organization. This is enforced at multiple layers:

Cloudflare AI Search: Isolation relies on metadata-equality filters, where every query includes { org_id } as a primary security boundary. This aligns with write-side metadata, where documents are uploaded with explicit org_id, source_type, and source_id attributes.
Pinecone: Similar to AI Search, Pinecone utilizes metadata filters ({ org_id }) during queries, without relying on physical per-tenant namespaces.
Supermemory: Tenant isolation is achieved via a unique containerTag for each organization, effectively segmenting data at the backend level.
Supabase Storage: Uploaded knowledge files are stored in org-scoped paths (knowledge/{orgId}/{fileId}/original{ext}), preventing cross-tenant access at the object storage layer.
Durable Objects: The OrgAnswerAgent and CallAnswerAgent, which provide warm, per-tenant knowledge retrieval, enforce defense-in-depth by deriving the orgId from the caller’s JWT during connection and rejecting any attempts to reconnect from a different organization to an already-bound Durable Object instance.
Per-Environment Separation: All Cloudflare resources, including Cloudflare D1 (SQLite database), Cloudflare AI Search instances, and Cloudflare R2 buckets, are entirely separate per environment (e.g., -prd, -dev, -stg).