Skip to content

SOP AI-KB-01: AI Knowledge Base v3.6.6

Tiered retrieval architecture for AI customer service prompts

Document ID: SOP-AI-KB-01-v3.6.6 Version: 3.6.6 Status: Production Ready Last Updated: 2026-04-19 Owner: Protocol Raw Operations Replaces: SOP-AI-KB-01-v2.0 Review Date: 2026-07-06 Platform: Supabase PostgreSQL, pgvector, OpenAI text-embedding-3-small, Anthropic Claude Sonnet 4.5


Key Changes in v3.6 (2026-04-18) — Headroom Reframe

Applied the website headroom reframe (regulator-as-reference → dog-as-reference) across the knowledge base, following the audit at docs/docs/sops/customer-ops/AI_KB_Headroom_Reframe_Audit_v1_0.md and the Brand Voice v1.5 update.

Content changes across 8 sections:

  • brand_kernel v2 → v3. TERMINOLOGY example rewritten: "Protocol Raw Complete meets FEDIAF requirements..." → "Protocol Raw Complete is complete for every life stage, lab-confirmed..." Added rule: "FEDIAF is a citation in body copy, not a claim anchor."
  • hard_boundaries v11 → v12. Two new boundaries appended: FEDIAF may not be used as a headline/claim anchor; "synthetic premixes/supplements/vitamins/minerals" must be replaced with "synthetic additives." No existing boundaries modified.
  • product_fediaf_nutrition v5 → v6. Heavy rewrite. Opening paragraph reframed dog-first, with lab-confirmation and All Life Stages as supporting evidence. Angles 1, 3, 5 rewritten. Angles 2, 4 preserved. Keywords line preserved verbatim for retrieval coverage.
  • product_ingredients v4 → v5. "No synthetic premixes." → "No synthetic additives." "Moderate, FEDIAF-compliant levels..." → "Moderate levels, within the range recommended for complete dog food." Ca:P parenthetical preserved as permitted citation.
  • product_single_sku v1 → v2. Angle 2 reframed to anchor on "every life stage" not "FEDIAF's strictest standard." Angle 6 vocabulary: "synthetic vitamins or minerals" → "synthetic additives"; "synthetic premix" → "synthetic additives." Never-say list preserved.
  • scenario_vet_concerns v2 → v3. Primary claim line rewritten from "FEDIAF compliance" anchor to "complete for every life stage" anchor, with FEDIAF dropped (the vet will recognise the analysis against FEDIAF without needing the label).
  • safety_verification v10 → v11. Nutritional-verification paragraph reframed. Proof Portal boundary preserved verbatim.
  • product_pricing (version bumped). Vocabulary replacement only: "synthetic premix(es)" / "synthetic supplements" / "synthetic vitamins/minerals" → "synthetic additives."

Rationale: Every Kai-generated response inherits the framing of the sections it retrieves. The website pages (Nutrition, Ingredients, Homepage) have all been reframed; the KB was still compliance-framed, which meant Kai was silently undoing the reframe in every chat conversation. This pass brings the KB into alignment with the live website surfaces and Brand Voice v1.5.

Not in this pass: Signature phrase "The higher standard is our starting point, not our ceiling." is not introduced to the KB. That phrase is reserved for the website surface where its visual setup lands correctly. KB responses use plain-English equivalents.

Retrieval validation: Smoke-test suite authored at docs/docs/sops/customer-ops/ai_kb_headroom_reframe_smoke_tests.sql runs expected customer queries against fn_retrieve_knowledge and asserts the correct sections retrieve at acceptable similarity scores. Run before considering the revision complete.

v3.6.1 (2026-04-19): Em dash boundary tightening

Minor tightening of the em dash prohibition in brand_kernel (v3 → v4) and hard_boundaries (v12 → v13). The existing prohibition did not include substitution guidance, leaving the model free to silently violate the rule by producing em dashes where its training data strongly predicts them. Both sections now state the substitute explicitly: full stop preferred, comma if that reads abrupt, rewrite if neither works. The hard_boundaries version additionally includes the operational rationale to pre-empt model-side rationalisation.

No other sections changed. No content reframe. Purely a rule-tightening pass.

Smoke tests run against the live chat function with two queries that invite em dash usage ("Tell me what you think about Protocol Raw — in detail please.", "What's in Protocol Raw — is it just beef or more than that?"). Both responses produced zero em dashes (U+2014) and zero double-hyphens. FEDIAF retrieval regression check confirmed no drift in similarity scores from the v3.6 baseline.

v3.6.3 (2026-04-19): Response length tightening

Replaces the existing FORMAT rule "Keep responses to 2-3 short paragraphs maximum" in chat_format (v9 → v10) with operational word-count guidance: 60-100 words for chat, compact single-paragraph for lists, lead-with-main-point for comparisons. Exception clause preserved for genuinely long-form requests (vet explanations, multi-step transition guidance).

Background: chat response length grew from a 2026-04-06 baseline of ~368 char avg (71 turns) to a 2026-04-19 average of ~846 char (7 turns) — a ~2.3x increase. Streaming output is the dominant latency contributor on GPT-4o-mini, making response length the most likely cause of the perceived slowdown.

Em dash regression: 15 samples across 5 smoke-test queries produced zero em dashes (U+2014) and zero double-hyphens. The v3.6.1 boundary holds.

Length compliance: substantial compression vs the prior baseline (Test 1 dropped from ~550 words to 121-word avg, a 78% reduction). However, the model interprets "60-100 words" loosely: 3 of 5 tests had at least one sample slightly over target. Long-form vet test (Test 4) passed all 3 samples within the 180-word ceiling, confirming the exception clause works. If sustained sampling shows the rule still under-compresses, the next pass should rewrite verbose angles in product_fediaf_nutrition and product_single_sku rather than tightening the format rule further.

No other sections changed.

v3.6.4 (2026-04-19): Consolidated tightening (pricing, boundaries, framing, test cleanup)

Consolidates four items surfaced by the live five-question test run, five items from Anton's subsequent pricing/boundary review, and corrects drifted price data.

Changes (6 sections + 1 deactivation; conditional 7th skipped per preflight):

  • product_pricing v2 → v3: heavy rewrite. Prospect/customer split, forbidden-phrases blacklist, calculator-first framing for prospects.
  • hard_boundaries v13 → v14: 5 edits. Symmetric competitor boundary (positive AND negative directions). Evolutionary/ancestral framing ban. 400g → 375g portion reference. FEDIAF compliance-phrasing reconstruction prohibition.
  • response_principles v19 → v20: Principle 8 expanded. Justify price by what Protocol Raw contains, never by what other categories lack.
  • portal_change_box_size v1 → v2: full rewrite. Price corrections (8kg £79 → £89, 16kg £139 → £129). Plain-text format. Within-48-hour edge case added.
  • scenario_vet_concerns v3 → v4: primary claim line includes sample batch URL. New instruction block prohibiting "formulated to meet FEDIAF" reconstruction.
  • product_fediaf_nutrition v6 → v7: priority directive at top of angle list. Comparison questions lead with formulation argument.
  • feeding_calculations: SKIPPED. Preflight confirmed already at 375g (no drift).
  • test_auto_embed_3: deactivated.

Em dash discipline: two em dashes found in the spec content (in product_pricing v3 and portal_change_box_size v2) substituted with full-stop and comma respectively, per Anton's call. Same pattern as v3.6.3.

Smoke tests (9 queries × 3 samples = 27 calls): Strong wins on the kibble-comparison rule (T4 3/3), vet sample URL (T5 3/3), em dash regression (26/27 samples clean). Weak on competitor characterisation discipline: in T2 ("Is Butternut Box a good product too?") the model openly characterised Butternut as "a reputable brand with good quality control" / "complete dog food that works for many dogs" in 3/3 samples despite the new symmetric boundary. T1 (comparison lead) leaked similar characterisations in 2/3 samples and exceeded the 100-word ceiling in 2/3. T3 (evolutionary bait) the model engaged the frame to refute it ("dogs didn't evolve eating raw food") rather than redirecting away. T7 (portal change) gave the steps and the 48-hour lock but didn't proactively quote prices.

Implication: the symmetric competitor boundary needs reinforcement. The model interprets "do not characterise" as "do not negatively characterise"; positive characterisations slip through. Anton's call: tighten further (more forbidden examples, more explicit spirit-of-the-rule language) or accept that competitor-comparison questions need additional system-prompt scaffolding beyond hard_boundaries alone.

v3.6.5 (2026-04-19): scenario_competitor_quality section

Adds a new retrieved scenario section scenario_competitor_quality to address the v3.6.4 T2 failure mode (3/3 positive characterisations of Butternut despite the symmetric boundary in hard_boundaries v14).

Diagnosis: Negative rules in hard_boundaries ("do not characterise") were not overriding the model's strong prior that "don't characterise competitors" means "don't trash them". Positive characterisations slipped through. The fix is positive guidance — give the model the actual words to reach for — rather than relying on a prohibition.

Section content: three approved redirect patterns (acknowledge → describe Protocol Raw → return decision to visitor), explicit lists of phrases never to use in both directions, and a "hold the line" template for customers who push for an answer. 798 tokens. Three em dashes in the original spec substituted (matched-pair → parentheses, quoted reply → full-stop split).

Smoke tests (5 test groups, 18 chat calls including a 2-turn pushy follow-up):

  • Tests A, B, C, E: retrieval fired correctly (15/18). When scenario_competitor_quality was retrieved, the model used the approved phrasings near-verbatim ("I can't speak for other brands, but what Protocol Raw Complete is built to do is..."; "I can't answer that for you. It would mean characterising their product..."). The v3.6.4 T2 failure mode is fixed for direct competitor-quality questions.
  • Test D ("Are the other raw dog food brands any good?") FAILED retrieval 0/3 in v1. The Keywords line covered brand-name-direct phrasings but missed category-level forms.

Strengthened from v1 → v2 within the same PR before merge (no separate v3.6.6 release):

  • Keywords line: +24 phrases covering category, abstract, comparative, and industry-trust framings.
  • Opening sentence expanded to explicitly name category and industry-level references as in scope.
  • No content change to the three redirect patterns, "never say" lists, or pushy-follow-up handling.

Re-tested (7 queries × 3 samples + 2-turn pushy follow-up = 21 calls):

  • v3.6.5 v1 Test D failure case ("Are the other raw dog food brands any good?") now retrieves at similarity 0.534 (previously 0). Behaviour: model uses the approved "I can't speak for other brands..." redirect 3/3.
  • Direct brand quality, abstract competition framing, comparative recommendation, and pushy follow-up: 18/18 retrieval, all PASS.
  • Em dashes: 0/24 across all v2 samples. Length: marginal overshoots persist on the 100-word target, no new regressions.

One structural retrieval gap remains, deferred to a follow-up PR:

The query "Can I trust the raw dog food industry in general?" does not retrieve scenario_competitor_quality even after a v3 attempt that added 13 industry-trust keywords. The query's embedding consistently lands closer to scenario_vet_concerns (similarity 0.536) and product_ingredients (0.524) — both legitimately high-similarity for a "trust" framing. Behaviour without the section: model produces category-level negative characterisations ("No. Most raw dog food is not independently verified..."), exactly the failure mode the section is designed to prevent.

This is a structural gap, not a keyword gap: the OpenAI text-embedding-3-small model does not separate "trust the industry" (which competitor_quality should handle) from "trust the food category for safety" (which vet_concerns handles). v3 was rolled back because adding more industry-trust keywords did not lift retrieval above competing sections.

Recommended follow-up (separate PR): either (a) extend hard_boundaries v14 to explicitly ban negative characterisation of the raw dog food category from the inside, OR (b) add an instruction block to scenario_vet_concerns to redirect industry-trust questions without characterising the category, OR (c) accept this specific query type and move on. None are in scope for v3.6.5.

v3.6.6 (2026-04-19): Industry-trust query handling (combined Option 1 + Option 2)

Closes the Test D gap from v3.6.5 strengthening. The "Can I trust the raw dog food industry in general?" query is a semantic hybrid (framed as trust/safety, about the raw category as a whole). v3.6.5 v3 attempt to fix via keyword expansion in scenario_competitor_quality was rolled back because the query embeds closer to scenario_vet_concerns (0.536) than to scenario_competitor_quality regardless of keyword density. This is a model-level structural limit of text-embedding-3-small, not a keyword problem.

Combined Option 1 + Option 2 fix applies at two layers (belt-and-braces, same pattern as the v3.6.1 em dash rule):

  • hard_boundaries v14 → v15. New boundary line prohibiting self-characterisation of the raw dog food category from the inside, in either direction. "The raw industry has problems" and "raw feeding is generally safe" are both off-voice; Protocol Raw describes Protocol Raw, not the category it belongs to. This is the output-level backstop — fires regardless of which section retrieves.
  • scenario_vet_concerns v4 → v5. New instruction block handling industry-trust query shapes with specific redirect language. If the query is about raw feeding as a category rather than the customer's own vet situation, Kai now has explicit guidance and example phrasings (including the sample batch URL) to redirect to Protocol Raw's specific approach. This is the retrieval-path fix — addresses the section that actually fires for these queries.

Smoke tests (5 queries × 3 samples = 15 calls):

  • Tests A-D (industry-trust queries: "Can I trust the raw dog food industry in general?", "Is raw feeding safe as a category?", "Are raw dog food brands generally reliable?", "I've heard the raw dog food industry has quality problems. Is that true?"): retrieval landed on scenario_vet_concerns at similarity 0.514-0.604 (as expected). Model used the verbatim approved redirect from the new v5 block in 12/12 samples: "I can't speak for the category or for other brands. What I can tell you is what Protocol Raw does: every batch is independently tested for pathogens by a UKAS-accredited lab before it ships, and the full nutritional analysis is independently lab-confirmed and publicly published..."
  • Test E (v3.6.5 regression check on "Is Butternut Box a good product?"): scenario_competitor_quality still retrieved at 0.368, v3.6.5 behaviour preserved.
  • Em dash regression: 0/15 across all 15 samples. Length: 62-73 words on a 100-word target.

Result: 15/15 PASS. The v3.6.5 Test D failure mode is closed. Industry characterisation in either direction does not appear in any of the 12 industry-trust samples.

Note: All 12 industry-trust responses converged on near-identical text (the model is using the new instruction block phrasing near-verbatim). Acceptable consistency for a guarded query type, worth watching for naturalness as live traffic begins.

No other sections changed. scenario_competitor_quality v2 unchanged.


Key Changes in v3.5

  • safety_verification v10: Added proof.protocolraw.co.uk/batch/sample URL + linking instruction ("share this link when someone asks about safety, testing, or verification — let the page speak for itself"). Future-tense framing replaced with present-tense (sample page is live). Removed standalone "Upstream testing" block. Tightened "Nutritional verification" boundary: nutritional data does not appear on batch reports or the Proof Portal — explicitly forbidden in the section.

Key Changes in v3.4

  • email_format v6: Exclamation mark rule softened from prohibition to "sparingly permitted." Max one per email, never in subject lines, never doubled. Reserve for genuine warmth.
  • brand_kernel v2: Exclamation mark rule updated to match (rarely, once per email, where warmth is earned).
  • hard_boundaries v11: Exclamation mark rule updated from "never" to "max one per email, never in subject lines."
  • email_context v11: Exclamation mark prohibition replaced with "sparingly permitted" rule. Bad example annotation updated.

Key Changes in v3.3

  • email_context v10: Complete email tone rewrite. Greeting now in AI draft (not CIO template). Name extraction from email sign-off. Prospect vs customer handling. Single recipe framing rules. Context-aware closing (next step + invitation on first reply, natural close on follow-ups). Clinical detail proportionality. "I" not "we" reinforced. Short paragraphs for multi-question emails.
  • email_format v5: JSON schema updated with suggested_subject and customer_name fields. "No greeting" rule removed (AI now handles greeting). "No sign-off" rule retained. Exclamation mark rule (sparingly permitted, max one per email, never in subject lines). Response length guidance removed (now in email_context).
  • hard_boundaries v10: Hallucination guard added (no meal counts, no inventing product specifics, direct to calculator). Single-SKU framing strengthened with specific phrases to avoid ("only", "unfortunately", "but the good news is", "there isn't a recipe swap option"). One-topic rule softened for multi-question emails.
  • Email prompt assembly: cs-agent-triage now uses fn_assemble_prompt('email') with prospect/customer context block passed in the user message. AI subject generation via suggested_subject field.

Key Changes in v3.2

  • Conversational chat upgrade. response_principles v19: 10 principles including message-type detection, one-idea-per-turn, warmth acknowledgment, natural next steps
  • chat_context v4 rewritten: dialogue as tool not goal, simplicity-first framing
  • chat_format v9: clarifying-question brevity rule added
  • hard_boundaries v8: 10 new boundaries from live testing: off-topic boundary, no parroting internal instructions, no unsolicited escalation, one-topic-per-response, no markdown formatting, cat questions not off-topic, no unprompted allergen warnings, no competitor characterisation, no hallucinated feeding amounts, no improvised transition advice
  • Consolidated 3 fussy eater sections (scenario_fussy_pre_purchase, scenario_fussy_transition, scenario_fussy_persistent) into single scenario_fussy_eater v2 with branching logic. Token reduction: 836 combined → 424
  • transition_guidance compressed from 850 to ~442 tokens. Day-by-day schedule removed (lives in article). Transition guide location: customer portal
  • delivery_logistics v3: Phase A milkround rewrite, courier-neutral, removed DPD
  • portal_track_delivery v2: DPD references removed, courier-neutral
  • product_fediaf_nutrition v5 and safety_verification v9: link instructions rebalanced ("include when directly relevant" not "never link")
  • scenario_vet_concerns v2: added vet article link
  • product_ingredients v4: added tray packaging specification
  • Chat model switched from GPT-4o-mini to Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
  • Chat Edge Function v24 with Anthropic Messages API (OpenAI embedding retained for retrieval)

Key Changes in v3.0

  • Tiered retrieval architecture replaces monolithic prompt assembly. Three layers: guardrails (always-on), behaviour kernel (always-on), retrieved knowledge (per-turn semantic search)
  • Token reduction: 63%. Chat prompt from ~9,400 fixed tokens to ~3,500 dynamic tokens per turn. Email from ~8,700 to ~3,300.
  • Token-budgeted packing. Retrieve 8 candidates via pgvector cosine similarity, pack until token budget (1,500 chat, 2,000 email). CONTINUE on overflow, not STOP.
  • Per-substantive-turn retrieval. Knowledge context adapts per message. Trivial turns ("thanks", "ok") reuse cached retrieval.
  • PostgreSQL full-text search fallback. Generated tsvector column with GIN index replaces naive ILIKE keyword matching.
  • 4 legacy monolithic sections deactivated. brand_identity, escalation_rules, product_knowledge, common_scenarios replaced by 14 compressed/split children.
  • knowledge_role column classifies sections as guardrail, behaviour, or retrieved
  • channels column on sections declares retrieval eligibility per channel, replacing template-based gating for retrieved sections
  • Prompt templates narrowed to core assembly ordering only (guardrail + behaviour sections)
  • Retrieval telemetry via kb_retrieval_logs table. Per-turn logging of retrieved sections, similarity scores, token counts, latencies.
  • assemble-email-prompt Edge Function for email channel. Make.com calls this instead of fn_assemble_prompt.
  • Chat Edge Function v23 with USE_TIERED_RETRIEVAL feature flag (currently enabled)
  • fn_assemble_prompt deprecated but still functional. Returns ~2,097 tokens (core sections only, via the old template-based assembly). Not equivalent to fn_assemble_core_prompt which returns ~2,054 tokens — the difference is template ordering and legacy template entries still present.

Purpose

Centralised knowledge base that stores all AI prompt content in versioned, structured sections. The chat widget (CS-02) and email triage system (CS-01) assemble their prompts from this single source of truth using a tiered retrieval architecture.

Key achievement: A single content update propagates to both channels. The tiered architecture ensures only relevant knowledge is included per conversation turn, keeping prompts focused and allowing the KB to grow without prompt bloat.


Scope

This SOP covers the knowledge base schema, content architecture, prompt assembly functions, retrieval logic, telemetry, and content update workflows.

Related SOPs: - SOP CS-01: AI Customer Service Triage (Email) — consumes email prompt via assemble-email-prompt Edge Function - SOP CS-02: Live Chat System — consumes chat prompt via tiered retrieval in chat Edge Function - SOP CS-03: Autonomous Support Agent — uses fn_search_kb_hybrid for KB search (separate retrieval path) - SOP MON-01: Monitoring Architecture


Architecture

Tiered Prompt Assembly

                    Customer Message
                ┌─────────┴──────────┐
                │  Trivial turn?     │
                │  (< 4 words, no    │
                │   domain terms)    │
                └─────────┬──────────┘
                    │           │
                   YES          NO
                    │           │
              Reuse last    ┌───┴───────────────────┐
              system msg    │                       │
                    │       ▼                       ▼
                    │   Embed message         Fetch core prompt
                    │   (text-embedding-      (fn_assemble_core_prompt)
                    │    3-small, ~50ms)       (~2,054 tokens chat)
                    │       │                       │
                    │       ▼                       │
                    │   Retrieve knowledge          │
                    │   (fn_retrieve_knowledge      │
                    │    budget: 1,500 chat          │
                    │    budget: 2,000 email)        │
                    │       │                       │
                    │       ▼                       │
                    │   Assemble:                   │
                    │   core + retrieved            │
                    │       │◄──────────────────────┘
                    │       │
                    ▼       ▼
              ┌─────────────────────┐
              │  System Prompt      │
              │  (~3,500 tokens)    │
              └─────────┬───────────┘
              ┌─────────────────────┐
              │  Claude Sonnet 4.5  │
              │  (chat)             │
              │  Claude Sonnet      │
              │  (email via Make)   │
              └─────────┬───────────┘
              ┌─────────────────────┐
              │  Log to             │
              │  kb_retrieval_logs  │
              └─────────────────────┘

Three-Layer Prompt Structure

Layer Role Content Chat Tokens Email Tokens
Guardrails Always-on Hard boundaries, core escalation policy ~1,365 ~1,365
Behaviour Kernel Always-on Brand voice, response principles, channel rules ~1,490 ~949
Retrieved Knowledge Per-turn Facts, scenarios, workflows pulled via pgvector ~1,200-1,600 ~1,500-2,000
Total ~4,050-4,450 ~3,810-4,310

Responsibility Split

Mechanism Responsibility
ai_prompt_templates Core assembly ordering and wrappers (guardrail + behaviour sections only)
channels text[] column Retrieval eligibility per channel (retrieved sections only)
knowledge_role column Determines which assembly path a section belongs to

Database Schema

Table: ai_knowledge_sections

Stores all content, versioned.

Column Type Description
id UUID Primary key
section_key TEXT Unique identifier (e.g., 'brand_kernel')
version INTEGER Version number within section
title TEXT Display title
content TEXT The actual prompt content
is_active BOOLEAN Only one version per section_key can be active
knowledge_role TEXT 'guardrail', 'behaviour', 'retrieved', or 'legacy'
parent_section_key TEXT Parent section when split from a larger section
token_count INTEGER Cached token estimate (LENGTH/4)
channels TEXT[] Which channels can retrieve this section. Default: '{chat,email}'
embedding vector(1536) OpenAI text-embedding-3-small embedding (auto-generated)
search_doc tsvector Generated FTS column for keyword fallback
created_at TIMESTAMPTZ When this version was created
created_by TEXT Who created it
notes TEXT Change notes

Constraints: - unique_section_version: no duplicate version numbers per section - unique_active_section: partial unique index enforcing one active version per section_key - knowledge_role CHECK: must be one of 'guardrail', 'behaviour', 'retrieved', 'legacy'

Generated column:

search_doc tsvector GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(section_key, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(content, '')), 'B')
) STORED

Indexes: - idx_kb_sections_embedding: IVFFlat index for pgvector cosine similarity - idx_kb_sections_search_doc: GIN index for full-text search fallback

Table: ai_prompt_templates

Defines ordering and wrappers for core sections only (guardrail + behaviour).

Column Type Description
id UUID Primary key
channel TEXT 'email' or 'chat'
section_key TEXT References section to include
section_order INTEGER Assembly order (10, 20, 30...)
is_included BOOLEAN Whether to include this section
wrapper_prefix TEXT Optional text before section
wrapper_suffix TEXT Optional text after section

Note: Retrieved sections do NOT use this table. They declare channel eligibility via the channels column on ai_knowledge_sections.

Table: kb_retrieval_logs

Per-turn retrieval telemetry.

Column Type Description
id UUID Primary key
channel TEXT 'chat' or 'email'
session_id TEXT Chat session or ticket ID
user_message_preview TEXT First 200 chars of user message
retrieved_sections JSONB Array of
total_retrieved_tokens INTEGER Sum of retrieved section tokens
core_tokens INTEGER Core prompt token count
was_trivial_turn BOOLEAN Whether retrieval was skipped (cached)
fallback_used BOOLEAN Whether FTS fallback was used
candidate_count INTEGER Sections evaluated
returned_count INTEGER Sections included in prompt
skipped_for_budget_count INTEGER Sections skipped for token budget
empty_retrieval BOOLEAN No relevant sections found
embedding_latency_ms INTEGER OpenAI embedding call time
retrieval_latency_ms INTEGER fn_retrieve_knowledge call time
created_at TIMESTAMPTZ When this turn occurred

Table: ai_prompt_logs

Audit trail of assembled prompts (legacy, from v2.0). Schema retained for backward compatibility.


Current Sections

Guardrail Sections (always-on, every turn)

section_key Tokens Title Purpose
hard_boundaries ~1,336 HARD BOUNDARIES 30+ prohibitions (v10) including: hallucination guard (no meal counts, no inventing product specifics), strengthened single-SKU framing (specific phrases banned), no hallucinated feeding amounts, no improvised transition advice, no unsolicited escalation, one-topic-per-response (relaxed for multi-question emails), no markdown formatting, no unprompted allergens, no competitor characterisation, off-topic boundary (with cat question exception)
escalation_core 211 ESCALATION POLICY Machine-parsed [ESCALATE] signal format, judgment standard, email collection, harmful-uncertainty rule, no dead-end language

Behaviour Sections (always-on, every turn)

section_key Tokens Title Channels Purpose
brand_kernel 442 BRAND KERNEL Both Compressed brand voice: calm/expert/reassuring/systematic/affirming, terminology, simplicity-before-safety, Worry Goes Away Test, Phase A outcomes language
response_principles ~356 RESPONSE PRINCIPLES Both 10 principles: message-type detection, pivot to proof, match register, warmth acknowledgment, no volunteered negatives, end clean, stay in lane, suitability confidence, conversation context, frame constraints
chat_context ~292 CHAT CONTEXT Chat Dialogue as tool not goal, simplicity-first framing, pre-launch context, waitlist guidance
chat_format ~400 CHAT FORMAT Chat Plain text only, 2-3 paragraphs max, clarifying-question brevity rule, no filler sign-offs
email_context ~1,802 EMAIL CONTEXT Email Email tone and style rules (v10): greeting with name, prospect vs customer handling, single recipe framing, context-aware closing, clinical detail proportionality, name extraction, "I" not "we"
email_format ~562 EMAIL FORMAT Email JSON output specification (v5): category, confidence, draft_response, action, escalation_reason, suggested_subject, customer_name. No sign-off rule.
section_key Tokens Parent Title
product_pricing 253 product_knowledge Pricing and value
product_ingredients ~421 product_knowledge Ingredients, nutrition basics, tray packaging specification
product_single_sku 443 product_knowledge Single recipe philosophy (6 angles)
product_fediaf_nutrition ~589 product_knowledge FEDIAF completeness, life stage suitability, article links (when directly relevant)
product_competitor_env 243 product_knowledge Competitor context and environmental position
escalation_playbooks 189 escalation_rules When to escalate / when not to
scenario_transition_digestive 52 common_scenarios Loose stool, reduced water intake
scenario_fussy_eater ~424 Consolidated picky eater with branching logic: pre-purchase, first box (days 1-10), persistent refusal with escalation
scenario_allergy_fit 51 common_scenarios Allergy and dietary fit
scenario_vet_concerns ~210 common_scenarios Vet pushback handling, article link
scenario_pre_launch 51 common_scenarios Pre-launch availability, London only
transition_guidance ~442 7-day overview (day-by-day schedule removed, lives in article), warning signs, transition guide in customer portal
feeding_calculations 616 RER/MER, reference examples, calculator link
safety_verification ~569 Batch testing, proof portal, sample batch report link + linking instruction, failed batch protocol, article links (when directly relevant). v10.
storage_serving 189 Freezer, defrost, shelf life
delivery_logistics ~220 Phase A milkround, courier-neutral, London postcodes, free delivery
self_service_portal 293 Portal features, 48-hour lock, magic link
portal_track_delivery ~137 Tracking in customer portal, courier-neutral
11 other portal_* sections ~1,560 total Individual portal action guides

Legacy Sections (deactivated)

section_key Status Replaced By
brand_identity Deactivated brand_kernel
escalation_rules Deactivated escalation_core + escalation_playbooks
product_knowledge Deactivated product_pricing, product_ingredients, product_single_sku, product_fediaf_nutrition, product_competitor_env
common_scenarios Deactivated scenario_transition_digestive, scenario_fussy_eater, scenario_allergy_fit, scenario_vet_concerns, scenario_pre_launch
scenario_fussy_pre_purchase Deactivated Consolidated into scenario_fussy_eater v2 (branching logic)
scenario_fussy_transition Deactivated Consolidated into scenario_fussy_eater v2 (branching logic)
scenario_fussy_persistent Deactivated Consolidated into scenario_fussy_eater v2 (branching logic)

Functions

fn_assemble_core_prompt(p_channel)

Assembles guardrail + behaviour sections using ai_prompt_templates for ordering.

Parameters: p_channel TEXT ('chat' or 'email')

Returns: prompt TEXT, section_versions JSONB, token_estimate INTEGER

Used by: Chat Edge Function, assemble-email-prompt Edge Function

SELECT prompt, section_versions, token_estimate
FROM raw_ops.fn_assemble_core_prompt('chat');
-- Returns ~2,850 tokens (chat) or ~1,667 tokens (email)

fn_retrieve_knowledge(p_channel, p_query_embedding, p_query_text, p_token_budget, p_candidate_limit, p_similarity_threshold)

Retrieves relevant knowledge sections via pgvector semantic search with FTS fallback.

Parameters:

Parameter Type Default Description
p_channel TEXT required 'chat' or 'email'
p_query_embedding vector(1536) required OpenAI embedding of the query
p_query_text TEXT required Original query text (for FTS fallback)
p_token_budget INTEGER 1500 Maximum total retrieved tokens
p_candidate_limit INTEGER 8 Max candidates to evaluate
p_similarity_threshold NUMERIC 0.25 Minimum cosine similarity

Returns: section_key, title, content, similarity, token_count, match_type

Packing logic: - Candidates ordered by similarity DESC, then token_count ASC (prefer smaller on ties) - CONTINUE on budget overflow (don't STOP — smaller chunks after a large one may still fit) - FTS fallback fires only if semantic search returns zero results

Used by: Chat Edge Function, assemble-email-prompt Edge Function

fn_assemble_prompt(p_channel) [DEPRECATED]

Legacy monolithic assembly. Still functional but returns only core sections (~2,097 tokens). Retained for backward compatibility. Do not use for new integrations.

fn_add_section_version(...) / fn_activate_section_version(...)

Unchanged from v2.0. Used for all content updates.


Integration Points

Chat (Edge Function v24)

The chat Edge Function performs tiered retrieval internally: 1. Trivial turn check → reuse last system message if trivial 2. Fetch core prompt via fn_assemble_core_prompt('chat') 3. Embed user message via OpenAI text-embedding-3-small 4. Retrieve knowledge via fn_retrieve_knowledge('chat', embedding, text, 1500) 5. Assemble system prompt: core + retrieved 6. Send to Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) via Anthropic Messages API with conversation history. System prompt passed as top-level system parameter (not as a message with role: system). Streaming uses Anthropic SSE format (content_block_delta events). 7. Log to kb_retrieval_logs

Models: Chat completion: Anthropic Claude Sonnet 4.5. Embedding: OpenAI text-embedding-3-small (unchanged).

Feature flag: USE_TIERED_RETRIEVAL env var. When false, falls back to fn_assemble_prompt('chat').

Email (assemble-email-prompt Edge Function)

Make.com calls this Edge Function instead of fn_assemble_prompt('email'):

POST /assemble-email-prompt
Headers: x-internal-secret: [AGENT_INTERNAL_SECRET]
Body: { "email_body": "...", "email_subject": "..." }
Response: { "prompt": "...", "token_estimate": 3338, "retrieved_sections": [...] }

The Edge Function embeds the email text, retrieves relevant knowledge (budget: 2,000 tokens), assembles core + retrieved, and returns the complete system prompt.

Make.com module change: Replace Module 15's Supabase RPC call to fn_assemble_prompt('email') with an HTTP POST to this Edge Function. Use {{data.prompt}} as the system message.

CS-03 Agent (unchanged)

The autonomous support agent continues to use fn_search_kb_hybrid directly for its own knowledge search. This is a separate retrieval path that is unaffected by the v3.0 changes.


Content Update Workflow

Adding New Knowledge

  1. Create a new section via fn_add_section_version with p_activate = true
  2. Set knowledge_role = 'retrieved', channels = '{chat,email}', parent_section_key if applicable
  3. token_count auto-populates via trigger on insert/update. Verify: SELECT token_count FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;
  4. Embedding auto-generates via trigger. Verify: SELECT embedding IS NOT NULL FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;
  5. Test retrieval: embed a test query and call fn_retrieve_knowledge to verify the new section is discoverable

Updating Existing Content

Same as v2.0: create new version via fn_add_section_version, activate via fn_activate_section_version. Embedding and token_count auto-regenerate via triggers.

Updating Core Sections

Changes to guardrail or behaviour sections take effect on the next turn for all conversations (core prompt is fetched per turn, not cached).

Required release gate for core section changes:

Any update to brand_kernel, response_principles, or hard_boundaries must pass a manual quality check before activation:

  1. Select 10-20 representative customer prompts (mix of product questions, concerns, technical questions, and edge cases)
  2. Generate responses using the new core version + tiered retrieval
  3. Blind review against the Brand Voice rubric:
  4. Calm, simple, dog-first, affirming?
  5. Leads with feeling, not system?
  6. Simplicity before safety on first impressions?
  7. No movement language, no hype, no internal jargon?
  8. Worry Goes Away Test: does the response make things feel calmer?
  9. Sign-off required before activation. If any prompt fails on simplicity, warmth, or correctness, revise the core section and re-test.

This gate is manual at current scale. Automated model-judge evaluation is a deferred item for when review volume exceeds manual capacity.

Determining the Fix Type

If the issue is... Fix approach
AI doesn't know a fact Add or update a retrieved section
AI gives wrong tone Update brand_kernel
AI handles a category poorly Update response_principles
AI escalates incorrectly Update escalation_core or escalation_playbooks
AI does something forbidden Update hard_boundaries
AI uses wrong terminology Update brand_kernel (terminology section)
AI doesn't retrieve the right knowledge Add keyword header to the target section's content
AI retrieves irrelevant sections Check section content for ambiguous vocabulary

Principles

  1. Prefer principles over scenarios. Add a principle that covers a category, not a specific Q&A.
  2. Keep retrieved sections atomic. One topic, one decision, one use case. Target ~120-400 tokens per retrieved section.
  3. Add keyword headers for discoverability. If a section isn't being retrieved for relevant queries, prepend a Keywords: line with customer vocabulary.
  4. Deactivate rather than delete. Old versions are preserved in the version history for rollback.

Monitoring

Telemetry Queries

-- Retrieval health dashboard (last 7 days)
SELECT 
    channel,
    COUNT(*) as total_turns,
    AVG(total_retrieved_tokens)::INTEGER as avg_retrieved_tokens,
    AVG(core_tokens + total_retrieved_tokens)::INTEGER as avg_total_prompt,
    ROUND(100.0 * COUNT(*) FILTER (WHERE empty_retrieval) / COUNT(*), 1) as empty_pct,
    ROUND(100.0 * COUNT(*) FILTER (WHERE fallback_used) / COUNT(*), 1) as fallback_pct,
    ROUND(100.0 * COUNT(*) FILTER (WHERE was_trivial_turn) / COUNT(*), 1) as trivial_cache_pct,
    AVG(embedding_latency_ms) FILTER (WHERE NOT was_trivial_turn)::INTEGER as avg_embed_ms,
    AVG(retrieval_latency_ms) FILTER (WHERE NOT was_trivial_turn)::INTEGER as avg_retrieval_ms
FROM raw_ops.kb_retrieval_logs
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY channel;

-- Most retrieved sections
SELECT 
    section->>'section_key' as section_key,
    COUNT(*) as retrieval_count,
    AVG((section->>'similarity')::numeric)::NUMERIC(4,3) as avg_similarity
FROM raw_ops.kb_retrieval_logs,
    jsonb_array_elements(retrieved_sections) as section
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY section->>'section_key'
ORDER BY retrieval_count DESC;

-- Never-retrieved sections (candidates for review)
SELECT aks.section_key, aks.token_count
FROM raw_ops.ai_knowledge_sections aks
WHERE aks.is_active = true AND aks.knowledge_role = 'retrieved'
AND NOT EXISTS (
    SELECT 1 FROM raw_ops.kb_retrieval_logs krl,
        jsonb_array_elements(krl.retrieved_sections) as s
    WHERE s->>'section_key' = aks.section_key
    AND krl.created_at > NOW() - INTERVAL '30 days'
);

Monitoring Targets

Metric Target Alert If
Empty retrieval rate < 5% > 10%
FTS fallback rate < 10% > 20%
Avg retrieved tokens per turn 800-1,500 > 2,000 or < 400
Trivial turn cache hit rate 15-30% < 5% or > 50%
p95 total prompt tokens < 5,500 > 6,500
Avg embedding latency < 100ms > 200ms
Avg retrieval latency < 50ms > 100ms

Trivial Turn Detection

Implemented in the chat Edge Function. Skips embedding and retrieval for short, non-substantive messages.

Rules: - Message must be ≤ 4 words - Must not contain domain terms: order, delivery, box, puppy, dog, gram, price, cost, allergy, defrost, batch, portal, cancel, pause, feed, food, recipe, vet, subscription, payment, refund, tracking, stool, sick - Must not contain wh-words: how, what, when, where, why, which, who - Must match a trivial pattern anchored start AND end: ok, thanks, thank you, got it, great, perfect, cool, cheers, ta, lovely, brilliant, nice, sure, right, yes, no, yep, nope, alright, understood, noted, will do, good to know, that makes sense, i see, appreciate it


Troubleshooting

Section Not Being Retrieved

  1. Check embedding exists: SELECT embedding IS NOT NULL FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;
  2. Check knowledge_role is 'retrieved': SELECT knowledge_role FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;
  3. Check channels includes the target channel: SELECT channels FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;
  4. Test with a direct query: embed a test prompt and call fn_retrieve_knowledge directly
  5. If the section doesn't appear in results, diagnose:
  6. Similarity below 0.25 and the query uses different vocabulary than the section: Add a keyword header to the section content with customer-facing synonyms. Re-embed.
  7. Similarity below 0.25 and the query vocabulary matches but the section is very short (< 120 tokens): The section may be too sparse for good embedding quality. Expand with more context or merge with a related section.
  8. Similarity above 0.25 but section is excluded by token budget: A larger, higher-similarity section consumed the budget. Check whether that larger section should be split, or increase the token budget.
  9. Similarity above 0.25 but a nearby section is stealing the match: The two sections have overlapping vocabulary. Differentiate their content or merge them.

Wrong Section Being Retrieved

  1. Check if the section content contains vocabulary that matches the query unintentionally
  2. Consider splitting the section into more atomic units
  3. Review the telemetry for that query pattern

Core Prompt Issues

  1. Verify core assembly: SELECT token_estimate FROM raw_ops.fn_assemble_core_prompt('chat');
  2. Check all core sections are in templates: query ai_prompt_templates for the channel
  3. Check no legacy sections are still included in templates

High Token Usage

  1. Check retrieval telemetry for oversized sections being retrieved
  2. Consider splitting large retrieved sections
  3. Adjust token budget if needed (chat: 1,500, email: 2,000)

Version History

Version Date Changes Author
3.3 2026-04-08 Email support overhaul. email_context v1→v10 (tone rewrite, prospect detection, single recipe framing, context-aware closing, name extraction, clinical detail proportionality). email_format v1→v5 (suggested_subject, customer_name fields, greeting in AI draft). hard_boundaries v8→v10 (hallucination guard, strengthened single-SKU framing). Anton
3.2 2026-04-06 Conversational chat upgrade. response_principles v19 (10 principles: message-type detection, one-idea-per-turn, warmth acknowledgment). chat_context v4 (dialogue as tool). chat_format v9 (question brevity rule). hard_boundaries v8 (10 new boundaries from live testing). Consolidated 3 fussy eater sections into 1. Compressed transition_guidance 850→442 tokens. delivery_logistics milkround rewrite. Chat model switched from GPT-4o-mini to Claude Sonnet 4.5. Chat Edge Function v24 with Anthropic API. Anton
3.1 2026-04-05 Three-branch split of scenario_fussy_eater (251 tokens) into scenario_fussy_pre_purchase (259), scenario_fussy_transition (299), and scenario_fussy_persistent (278). Pre-purchase leads with formulation-backed confidence; transition provides day-by-day coaching; persistent refusal escalates to human. Driven by market research: picky eating is #1 blocking category (25.2% of blocked buyers). Old section deactivated, added to legacy table. Anton
3.0 2026-04-04 Tiered retrieval architecture. 63% token reduction. 14 new sections. fn_assemble_core_prompt + fn_retrieve_knowledge. kb_retrieval_logs telemetry. assemble-email-prompt Edge Function. Chat Edge Function v23. Legacy sections deactivated. Anton
2.0 2026-03-30 Architecture hardening post-audit. Unique active section index. Auth model documented. Manual activation SQL removed. Anton
1.9 2026-03-12 Full 16-section audit. Pre-launch alignment. Brand Voice v1.3. Single-SKU expanded to 6 angles. FEDIAF expanded to 5 angles. 7 new hard boundaries. Token budgets: chat ~4,010, email ~3,706. Anton
1.8 2026-01-21 Feeding calculations product specificity Anton
1.7 2026-01-21 FEDIAF verification guidance Anton
1.6 2026-01-21 Customer-friendly ingredient terminology Anton
1.5 2026-01-21 Company vs product terminology Anton
1.4 2026-01-21 Feeding and transition enhancements Anton
1.3 2026-01-21 Environmental position, FEDIAF terminology Anton
1.2 2026-01-12 Consolidation, AI judgment Anton
1.1 2026-01-12 Principles, nutritional data Anton
1.0 2026-01-12 Initial release. 16 sections, monolithic assembly. Anton

Deferred Items

Item Trigger to Build
Parent diversity in packer (max 2 children from same parent) Telemetry shows one parent monopolising packs
Session-state-based retrieval query enrichment Retrieval misses from topic shifts visible in telemetry
Core prompt caching by version hash Embedding/retrieval latency becomes a bottleneck
Automated core quality testing (model judge) Manual rubric review becomes too slow at scale

Document Owner: Protocol Raw Technical Team Last Reviewed: 2026-04-08 Next Review: 2026-07-08 Version: 3.3 Status: Production Ready