SOP AI-KB-01: AI Knowledge Base v3.6.6¶
Tiered retrieval architecture for AI customer service prompts
Document ID: SOP-AI-KB-01-v3.6.6 Version: 3.6.6 Status: Production Ready Last Updated: 2026-04-19 Owner: Protocol Raw Operations Replaces: SOP-AI-KB-01-v2.0 Review Date: 2026-07-06 Platform: Supabase PostgreSQL, pgvector, OpenAI text-embedding-3-small, Anthropic Claude Sonnet 4.5
Key Changes in v3.6 (2026-04-18) — Headroom Reframe¶
Applied the website headroom reframe (regulator-as-reference → dog-as-reference) across the knowledge base, following the audit at docs/docs/sops/customer-ops/AI_KB_Headroom_Reframe_Audit_v1_0.md and the Brand Voice v1.5 update.
Content changes across 8 sections:
brand_kernelv2 → v3. TERMINOLOGY example rewritten: "Protocol Raw Complete meets FEDIAF requirements..." → "Protocol Raw Complete is complete for every life stage, lab-confirmed..." Added rule: "FEDIAF is a citation in body copy, not a claim anchor."hard_boundariesv11 → v12. Two new boundaries appended: FEDIAF may not be used as a headline/claim anchor; "synthetic premixes/supplements/vitamins/minerals" must be replaced with "synthetic additives." No existing boundaries modified.product_fediaf_nutritionv5 → v6. Heavy rewrite. Opening paragraph reframed dog-first, with lab-confirmation and All Life Stages as supporting evidence. Angles 1, 3, 5 rewritten. Angles 2, 4 preserved. Keywords line preserved verbatim for retrieval coverage.product_ingredientsv4 → v5. "No synthetic premixes." → "No synthetic additives." "Moderate, FEDIAF-compliant levels..." → "Moderate levels, within the range recommended for complete dog food." Ca:P parenthetical preserved as permitted citation.product_single_skuv1 → v2. Angle 2 reframed to anchor on "every life stage" not "FEDIAF's strictest standard." Angle 6 vocabulary: "synthetic vitamins or minerals" → "synthetic additives"; "synthetic premix" → "synthetic additives." Never-say list preserved.scenario_vet_concernsv2 → v3. Primary claim line rewritten from "FEDIAF compliance" anchor to "complete for every life stage" anchor, with FEDIAF dropped (the vet will recognise the analysis against FEDIAF without needing the label).safety_verificationv10 → v11. Nutritional-verification paragraph reframed. Proof Portal boundary preserved verbatim.product_pricing(version bumped). Vocabulary replacement only: "synthetic premix(es)" / "synthetic supplements" / "synthetic vitamins/minerals" → "synthetic additives."
Rationale: Every Kai-generated response inherits the framing of the sections it retrieves. The website pages (Nutrition, Ingredients, Homepage) have all been reframed; the KB was still compliance-framed, which meant Kai was silently undoing the reframe in every chat conversation. This pass brings the KB into alignment with the live website surfaces and Brand Voice v1.5.
Not in this pass: Signature phrase "The higher standard is our starting point, not our ceiling." is not introduced to the KB. That phrase is reserved for the website surface where its visual setup lands correctly. KB responses use plain-English equivalents.
Retrieval validation: Smoke-test suite authored at docs/docs/sops/customer-ops/ai_kb_headroom_reframe_smoke_tests.sql runs expected customer queries against fn_retrieve_knowledge and asserts the correct sections retrieve at acceptable similarity scores. Run before considering the revision complete.
v3.6.1 (2026-04-19): Em dash boundary tightening¶
Minor tightening of the em dash prohibition in brand_kernel (v3 → v4) and hard_boundaries (v12 → v13). The existing prohibition did not include substitution guidance, leaving the model free to silently violate the rule by producing em dashes where its training data strongly predicts them. Both sections now state the substitute explicitly: full stop preferred, comma if that reads abrupt, rewrite if neither works. The hard_boundaries version additionally includes the operational rationale to pre-empt model-side rationalisation.
No other sections changed. No content reframe. Purely a rule-tightening pass.
Smoke tests run against the live chat function with two queries that invite em dash usage ("Tell me what you think about Protocol Raw — in detail please.", "What's in Protocol Raw — is it just beef or more than that?"). Both responses produced zero em dashes (U+2014) and zero double-hyphens. FEDIAF retrieval regression check confirmed no drift in similarity scores from the v3.6 baseline.
v3.6.3 (2026-04-19): Response length tightening¶
Replaces the existing FORMAT rule "Keep responses to 2-3 short paragraphs maximum" in chat_format (v9 → v10) with operational word-count guidance: 60-100 words for chat, compact single-paragraph for lists, lead-with-main-point for comparisons. Exception clause preserved for genuinely long-form requests (vet explanations, multi-step transition guidance).
Background: chat response length grew from a 2026-04-06 baseline of ~368 char avg (71 turns) to a 2026-04-19 average of ~846 char (7 turns) — a ~2.3x increase. Streaming output is the dominant latency contributor on GPT-4o-mini, making response length the most likely cause of the perceived slowdown.
Em dash regression: 15 samples across 5 smoke-test queries produced zero em dashes (U+2014) and zero double-hyphens. The v3.6.1 boundary holds.
Length compliance: substantial compression vs the prior baseline (Test 1 dropped from ~550 words to 121-word avg, a 78% reduction). However, the model interprets "60-100 words" loosely: 3 of 5 tests had at least one sample slightly over target. Long-form vet test (Test 4) passed all 3 samples within the 180-word ceiling, confirming the exception clause works. If sustained sampling shows the rule still under-compresses, the next pass should rewrite verbose angles in product_fediaf_nutrition and product_single_sku rather than tightening the format rule further.
No other sections changed.
v3.6.4 (2026-04-19): Consolidated tightening (pricing, boundaries, framing, test cleanup)¶
Consolidates four items surfaced by the live five-question test run, five items from Anton's subsequent pricing/boundary review, and corrects drifted price data.
Changes (6 sections + 1 deactivation; conditional 7th skipped per preflight):
product_pricingv2 → v3: heavy rewrite. Prospect/customer split, forbidden-phrases blacklist, calculator-first framing for prospects.hard_boundariesv13 → v14: 5 edits. Symmetric competitor boundary (positive AND negative directions). Evolutionary/ancestral framing ban. 400g → 375g portion reference. FEDIAF compliance-phrasing reconstruction prohibition.response_principlesv19 → v20: Principle 8 expanded. Justify price by what Protocol Raw contains, never by what other categories lack.portal_change_box_sizev1 → v2: full rewrite. Price corrections (8kg £79 → £89, 16kg £139 → £129). Plain-text format. Within-48-hour edge case added.scenario_vet_concernsv3 → v4: primary claim line includes sample batch URL. New instruction block prohibiting "formulated to meet FEDIAF" reconstruction.product_fediaf_nutritionv6 → v7: priority directive at top of angle list. Comparison questions lead with formulation argument.feeding_calculations: SKIPPED. Preflight confirmed already at 375g (no drift).test_auto_embed_3: deactivated.
Em dash discipline: two em dashes found in the spec content (in product_pricing v3 and portal_change_box_size v2) substituted with full-stop and comma respectively, per Anton's call. Same pattern as v3.6.3.
Smoke tests (9 queries × 3 samples = 27 calls): Strong wins on the kibble-comparison rule (T4 3/3), vet sample URL (T5 3/3), em dash regression (26/27 samples clean). Weak on competitor characterisation discipline: in T2 ("Is Butternut Box a good product too?") the model openly characterised Butternut as "a reputable brand with good quality control" / "complete dog food that works for many dogs" in 3/3 samples despite the new symmetric boundary. T1 (comparison lead) leaked similar characterisations in 2/3 samples and exceeded the 100-word ceiling in 2/3. T3 (evolutionary bait) the model engaged the frame to refute it ("dogs didn't evolve eating raw food") rather than redirecting away. T7 (portal change) gave the steps and the 48-hour lock but didn't proactively quote prices.
Implication: the symmetric competitor boundary needs reinforcement. The model interprets "do not characterise" as "do not negatively characterise"; positive characterisations slip through. Anton's call: tighten further (more forbidden examples, more explicit spirit-of-the-rule language) or accept that competitor-comparison questions need additional system-prompt scaffolding beyond hard_boundaries alone.
v3.6.5 (2026-04-19): scenario_competitor_quality section¶
Adds a new retrieved scenario section scenario_competitor_quality to address the v3.6.4 T2 failure mode (3/3 positive characterisations of Butternut despite the symmetric boundary in hard_boundaries v14).
Diagnosis: Negative rules in hard_boundaries ("do not characterise") were not overriding the model's strong prior that "don't characterise competitors" means "don't trash them". Positive characterisations slipped through. The fix is positive guidance — give the model the actual words to reach for — rather than relying on a prohibition.
Section content: three approved redirect patterns (acknowledge → describe Protocol Raw → return decision to visitor), explicit lists of phrases never to use in both directions, and a "hold the line" template for customers who push for an answer. 798 tokens. Three em dashes in the original spec substituted (matched-pair → parentheses, quoted reply → full-stop split).
Smoke tests (5 test groups, 18 chat calls including a 2-turn pushy follow-up):
- Tests A, B, C, E: retrieval fired correctly (15/18). When
scenario_competitor_qualitywas retrieved, the model used the approved phrasings near-verbatim ("I can't speak for other brands, but what Protocol Raw Complete is built to do is..."; "I can't answer that for you. It would mean characterising their product..."). The v3.6.4 T2 failure mode is fixed for direct competitor-quality questions. - Test D ("Are the other raw dog food brands any good?") FAILED retrieval 0/3 in v1. The Keywords line covered brand-name-direct phrasings but missed category-level forms.
Strengthened from v1 → v2 within the same PR before merge (no separate v3.6.6 release):
- Keywords line: +24 phrases covering category, abstract, comparative, and industry-trust framings.
- Opening sentence expanded to explicitly name category and industry-level references as in scope.
- No content change to the three redirect patterns, "never say" lists, or pushy-follow-up handling.
Re-tested (7 queries × 3 samples + 2-turn pushy follow-up = 21 calls):
- v3.6.5 v1 Test D failure case ("Are the other raw dog food brands any good?") now retrieves at similarity 0.534 (previously 0). Behaviour: model uses the approved "I can't speak for other brands..." redirect 3/3.
- Direct brand quality, abstract competition framing, comparative recommendation, and pushy follow-up: 18/18 retrieval, all PASS.
- Em dashes: 0/24 across all v2 samples. Length: marginal overshoots persist on the 100-word target, no new regressions.
One structural retrieval gap remains, deferred to a follow-up PR:
The query "Can I trust the raw dog food industry in general?" does not retrieve scenario_competitor_quality even after a v3 attempt that added 13 industry-trust keywords. The query's embedding consistently lands closer to scenario_vet_concerns (similarity 0.536) and product_ingredients (0.524) — both legitimately high-similarity for a "trust" framing. Behaviour without the section: model produces category-level negative characterisations ("No. Most raw dog food is not independently verified..."), exactly the failure mode the section is designed to prevent.
This is a structural gap, not a keyword gap: the OpenAI text-embedding-3-small model does not separate "trust the industry" (which competitor_quality should handle) from "trust the food category for safety" (which vet_concerns handles). v3 was rolled back because adding more industry-trust keywords did not lift retrieval above competing sections.
Recommended follow-up (separate PR): either (a) extend hard_boundaries v14 to explicitly ban negative characterisation of the raw dog food category from the inside, OR (b) add an instruction block to scenario_vet_concerns to redirect industry-trust questions without characterising the category, OR (c) accept this specific query type and move on. None are in scope for v3.6.5.
v3.6.6 (2026-04-19): Industry-trust query handling (combined Option 1 + Option 2)¶
Closes the Test D gap from v3.6.5 strengthening. The "Can I trust the raw dog food industry in general?" query is a semantic hybrid (framed as trust/safety, about the raw category as a whole). v3.6.5 v3 attempt to fix via keyword expansion in scenario_competitor_quality was rolled back because the query embeds closer to scenario_vet_concerns (0.536) than to scenario_competitor_quality regardless of keyword density. This is a model-level structural limit of text-embedding-3-small, not a keyword problem.
Combined Option 1 + Option 2 fix applies at two layers (belt-and-braces, same pattern as the v3.6.1 em dash rule):
hard_boundariesv14 → v15. New boundary line prohibiting self-characterisation of the raw dog food category from the inside, in either direction. "The raw industry has problems" and "raw feeding is generally safe" are both off-voice; Protocol Raw describes Protocol Raw, not the category it belongs to. This is the output-level backstop — fires regardless of which section retrieves.scenario_vet_concernsv4 → v5. New instruction block handling industry-trust query shapes with specific redirect language. If the query is about raw feeding as a category rather than the customer's own vet situation, Kai now has explicit guidance and example phrasings (including the sample batch URL) to redirect to Protocol Raw's specific approach. This is the retrieval-path fix — addresses the section that actually fires for these queries.
Smoke tests (5 queries × 3 samples = 15 calls):
- Tests A-D (industry-trust queries: "Can I trust the raw dog food industry in general?", "Is raw feeding safe as a category?", "Are raw dog food brands generally reliable?", "I've heard the raw dog food industry has quality problems. Is that true?"): retrieval landed on
scenario_vet_concernsat similarity 0.514-0.604 (as expected). Model used the verbatim approved redirect from the new v5 block in 12/12 samples: "I can't speak for the category or for other brands. What I can tell you is what Protocol Raw does: every batch is independently tested for pathogens by a UKAS-accredited lab before it ships, and the full nutritional analysis is independently lab-confirmed and publicly published..." - Test E (v3.6.5 regression check on "Is Butternut Box a good product?"):
scenario_competitor_qualitystill retrieved at 0.368, v3.6.5 behaviour preserved. - Em dash regression: 0/15 across all 15 samples. Length: 62-73 words on a 100-word target.
Result: 15/15 PASS. The v3.6.5 Test D failure mode is closed. Industry characterisation in either direction does not appear in any of the 12 industry-trust samples.
Note: All 12 industry-trust responses converged on near-identical text (the model is using the new instruction block phrasing near-verbatim). Acceptable consistency for a guarded query type, worth watching for naturalness as live traffic begins.
No other sections changed. scenario_competitor_quality v2 unchanged.
Key Changes in v3.5¶
- ✅
safety_verificationv10: Addedproof.protocolraw.co.uk/batch/sampleURL + linking instruction ("share this link when someone asks about safety, testing, or verification — let the page speak for itself"). Future-tense framing replaced with present-tense (sample page is live). Removed standalone "Upstream testing" block. Tightened "Nutritional verification" boundary: nutritional data does not appear on batch reports or the Proof Portal — explicitly forbidden in the section.
Key Changes in v3.4¶
- ✅
email_formatv6: Exclamation mark rule softened from prohibition to "sparingly permitted." Max one per email, never in subject lines, never doubled. Reserve for genuine warmth. - ✅
brand_kernelv2: Exclamation mark rule updated to match (rarely, once per email, where warmth is earned). - ✅
hard_boundariesv11: Exclamation mark rule updated from "never" to "max one per email, never in subject lines." - ✅
email_contextv11: Exclamation mark prohibition replaced with "sparingly permitted" rule. Bad example annotation updated.
Key Changes in v3.3¶
- ✅
email_contextv10: Complete email tone rewrite. Greeting now in AI draft (not CIO template). Name extraction from email sign-off. Prospect vs customer handling. Single recipe framing rules. Context-aware closing (next step + invitation on first reply, natural close on follow-ups). Clinical detail proportionality. "I" not "we" reinforced. Short paragraphs for multi-question emails. - ✅
email_formatv5: JSON schema updated withsuggested_subjectandcustomer_namefields. "No greeting" rule removed (AI now handles greeting). "No sign-off" rule retained. Exclamation mark rule (sparingly permitted, max one per email, never in subject lines). Response length guidance removed (now in email_context). - ✅
hard_boundariesv10: Hallucination guard added (no meal counts, no inventing product specifics, direct to calculator). Single-SKU framing strengthened with specific phrases to avoid ("only", "unfortunately", "but the good news is", "there isn't a recipe swap option"). One-topic rule softened for multi-question emails. - ✅ Email prompt assembly:
cs-agent-triagenow usesfn_assemble_prompt('email')with prospect/customer context block passed in the user message. AI subject generation viasuggested_subjectfield.
Key Changes in v3.2¶
- ✅ Conversational chat upgrade.
response_principlesv19: 10 principles including message-type detection, one-idea-per-turn, warmth acknowledgment, natural next steps - ✅
chat_contextv4 rewritten: dialogue as tool not goal, simplicity-first framing - ✅
chat_formatv9: clarifying-question brevity rule added - ✅
hard_boundariesv8: 10 new boundaries from live testing: off-topic boundary, no parroting internal instructions, no unsolicited escalation, one-topic-per-response, no markdown formatting, cat questions not off-topic, no unprompted allergen warnings, no competitor characterisation, no hallucinated feeding amounts, no improvised transition advice - ✅ Consolidated 3 fussy eater sections (
scenario_fussy_pre_purchase,scenario_fussy_transition,scenario_fussy_persistent) into singlescenario_fussy_eaterv2 with branching logic. Token reduction: 836 combined → 424 - ✅
transition_guidancecompressed from 850 to ~442 tokens. Day-by-day schedule removed (lives in article). Transition guide location: customer portal - ✅
delivery_logisticsv3: Phase A milkround rewrite, courier-neutral, removed DPD - ✅
portal_track_deliveryv2: DPD references removed, courier-neutral - ✅
product_fediaf_nutritionv5 andsafety_verificationv9: link instructions rebalanced ("include when directly relevant" not "never link") - ✅
scenario_vet_concernsv2: added vet article link - ✅
product_ingredientsv4: added tray packaging specification - ✅ Chat model switched from GPT-4o-mini to Claude Sonnet 4.5 (
claude-sonnet-4-5-20250929) - ✅ Chat Edge Function v24 with Anthropic Messages API (OpenAI embedding retained for retrieval)
Key Changes in v3.0¶
- ✅ Tiered retrieval architecture replaces monolithic prompt assembly. Three layers: guardrails (always-on), behaviour kernel (always-on), retrieved knowledge (per-turn semantic search)
- ✅ Token reduction: 63%. Chat prompt from ~9,400 fixed tokens to ~3,500 dynamic tokens per turn. Email from ~8,700 to ~3,300.
- ✅ Token-budgeted packing. Retrieve 8 candidates via pgvector cosine similarity, pack until token budget (1,500 chat, 2,000 email). CONTINUE on overflow, not STOP.
- ✅ Per-substantive-turn retrieval. Knowledge context adapts per message. Trivial turns ("thanks", "ok") reuse cached retrieval.
- ✅ PostgreSQL full-text search fallback. Generated tsvector column with GIN index replaces naive ILIKE keyword matching.
- ✅ 4 legacy monolithic sections deactivated.
brand_identity,escalation_rules,product_knowledge,common_scenariosreplaced by 14 compressed/split children. - ✅
knowledge_rolecolumn classifies sections as guardrail, behaviour, or retrieved - ✅
channelscolumn on sections declares retrieval eligibility per channel, replacing template-based gating for retrieved sections - ✅ Prompt templates narrowed to core assembly ordering only (guardrail + behaviour sections)
- ✅ Retrieval telemetry via
kb_retrieval_logstable. Per-turn logging of retrieved sections, similarity scores, token counts, latencies. - ✅
assemble-email-promptEdge Function for email channel. Make.com calls this instead offn_assemble_prompt. - ✅ Chat Edge Function v23 with
USE_TIERED_RETRIEVALfeature flag (currently enabled) - ✅
fn_assemble_promptdeprecated but still functional. Returns ~2,097 tokens (core sections only, via the old template-based assembly). Not equivalent tofn_assemble_core_promptwhich returns ~2,054 tokens — the difference is template ordering and legacy template entries still present.
Purpose¶
Centralised knowledge base that stores all AI prompt content in versioned, structured sections. The chat widget (CS-02) and email triage system (CS-01) assemble their prompts from this single source of truth using a tiered retrieval architecture.
Key achievement: A single content update propagates to both channels. The tiered architecture ensures only relevant knowledge is included per conversation turn, keeping prompts focused and allowing the KB to grow without prompt bloat.
Scope¶
This SOP covers the knowledge base schema, content architecture, prompt assembly functions, retrieval logic, telemetry, and content update workflows.
Related SOPs:
- SOP CS-01: AI Customer Service Triage (Email) — consumes email prompt via assemble-email-prompt Edge Function
- SOP CS-02: Live Chat System — consumes chat prompt via tiered retrieval in chat Edge Function
- SOP CS-03: Autonomous Support Agent — uses fn_search_kb_hybrid for KB search (separate retrieval path)
- SOP MON-01: Monitoring Architecture
Architecture¶
Tiered Prompt Assembly¶
Customer Message
│
┌─────────┴──────────┐
│ Trivial turn? │
│ (< 4 words, no │
│ domain terms) │
└─────────┬──────────┘
│ │
YES NO
│ │
Reuse last ┌───┴───────────────────┐
system msg │ │
│ ▼ ▼
│ Embed message Fetch core prompt
│ (text-embedding- (fn_assemble_core_prompt)
│ 3-small, ~50ms) (~2,054 tokens chat)
│ │ │
│ ▼ │
│ Retrieve knowledge │
│ (fn_retrieve_knowledge │
│ budget: 1,500 chat │
│ budget: 2,000 email) │
│ │ │
│ ▼ │
│ Assemble: │
│ core + retrieved │
│ │◄──────────────────────┘
│ │
▼ ▼
┌─────────────────────┐
│ System Prompt │
│ (~3,500 tokens) │
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Claude Sonnet 4.5 │
│ (chat) │
│ Claude Sonnet │
│ (email via Make) │
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Log to │
│ kb_retrieval_logs │
└─────────────────────┘
Three-Layer Prompt Structure¶
| Layer | Role | Content | Chat Tokens | Email Tokens |
|---|---|---|---|---|
| Guardrails | Always-on | Hard boundaries, core escalation policy | ~1,365 | ~1,365 |
| Behaviour Kernel | Always-on | Brand voice, response principles, channel rules | ~1,490 | ~949 |
| Retrieved Knowledge | Per-turn | Facts, scenarios, workflows pulled via pgvector | ~1,200-1,600 | ~1,500-2,000 |
| Total | ~4,050-4,450 | ~3,810-4,310 |
Responsibility Split¶
| Mechanism | Responsibility |
|---|---|
ai_prompt_templates |
Core assembly ordering and wrappers (guardrail + behaviour sections only) |
channels text[] column |
Retrieval eligibility per channel (retrieved sections only) |
knowledge_role column |
Determines which assembly path a section belongs to |
Database Schema¶
Table: ai_knowledge_sections¶
Stores all content, versioned.
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| section_key | TEXT | Unique identifier (e.g., 'brand_kernel') |
| version | INTEGER | Version number within section |
| title | TEXT | Display title |
| content | TEXT | The actual prompt content |
| is_active | BOOLEAN | Only one version per section_key can be active |
| knowledge_role | TEXT | 'guardrail', 'behaviour', 'retrieved', or 'legacy' |
| parent_section_key | TEXT | Parent section when split from a larger section |
| token_count | INTEGER | Cached token estimate (LENGTH/4) |
| channels | TEXT[] | Which channels can retrieve this section. Default: '{chat,email}' |
| embedding | vector(1536) | OpenAI text-embedding-3-small embedding (auto-generated) |
| search_doc | tsvector | Generated FTS column for keyword fallback |
| created_at | TIMESTAMPTZ | When this version was created |
| created_by | TEXT | Who created it |
| notes | TEXT | Change notes |
Constraints:
- unique_section_version: no duplicate version numbers per section
- unique_active_section: partial unique index enforcing one active version per section_key
- knowledge_role CHECK: must be one of 'guardrail', 'behaviour', 'retrieved', 'legacy'
Generated column:
search_doc tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(section_key, '')), 'A') ||
setweight(to_tsvector('english', coalesce(content, '')), 'B')
) STORED
Indexes:
- idx_kb_sections_embedding: IVFFlat index for pgvector cosine similarity
- idx_kb_sections_search_doc: GIN index for full-text search fallback
Table: ai_prompt_templates¶
Defines ordering and wrappers for core sections only (guardrail + behaviour).
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| channel | TEXT | 'email' or 'chat' |
| section_key | TEXT | References section to include |
| section_order | INTEGER | Assembly order (10, 20, 30...) |
| is_included | BOOLEAN | Whether to include this section |
| wrapper_prefix | TEXT | Optional text before section |
| wrapper_suffix | TEXT | Optional text after section |
Note: Retrieved sections do NOT use this table. They declare channel eligibility via the channels column on ai_knowledge_sections.
Table: kb_retrieval_logs¶
Per-turn retrieval telemetry.
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| channel | TEXT | 'chat' or 'email' |
| session_id | TEXT | Chat session or ticket ID |
| user_message_preview | TEXT | First 200 chars of user message |
| retrieved_sections | JSONB | Array of |
| total_retrieved_tokens | INTEGER | Sum of retrieved section tokens |
| core_tokens | INTEGER | Core prompt token count |
| was_trivial_turn | BOOLEAN | Whether retrieval was skipped (cached) |
| fallback_used | BOOLEAN | Whether FTS fallback was used |
| candidate_count | INTEGER | Sections evaluated |
| returned_count | INTEGER | Sections included in prompt |
| skipped_for_budget_count | INTEGER | Sections skipped for token budget |
| empty_retrieval | BOOLEAN | No relevant sections found |
| embedding_latency_ms | INTEGER | OpenAI embedding call time |
| retrieval_latency_ms | INTEGER | fn_retrieve_knowledge call time |
| created_at | TIMESTAMPTZ | When this turn occurred |
Table: ai_prompt_logs¶
Audit trail of assembled prompts (legacy, from v2.0). Schema retained for backward compatibility.
Current Sections¶
Guardrail Sections (always-on, every turn)¶
| section_key | Tokens | Title | Purpose |
|---|---|---|---|
| hard_boundaries | ~1,336 | HARD BOUNDARIES | 30+ prohibitions (v10) including: hallucination guard (no meal counts, no inventing product specifics), strengthened single-SKU framing (specific phrases banned), no hallucinated feeding amounts, no improvised transition advice, no unsolicited escalation, one-topic-per-response (relaxed for multi-question emails), no markdown formatting, no unprompted allergens, no competitor characterisation, off-topic boundary (with cat question exception) |
| escalation_core | 211 | ESCALATION POLICY | Machine-parsed [ESCALATE] signal format, judgment standard, email collection, harmful-uncertainty rule, no dead-end language |
Behaviour Sections (always-on, every turn)¶
| section_key | Tokens | Title | Channels | Purpose |
|---|---|---|---|---|
| brand_kernel | 442 | BRAND KERNEL | Both | Compressed brand voice: calm/expert/reassuring/systematic/affirming, terminology, simplicity-before-safety, Worry Goes Away Test, Phase A outcomes language |
| response_principles | ~356 | RESPONSE PRINCIPLES | Both | 10 principles: message-type detection, pivot to proof, match register, warmth acknowledgment, no volunteered negatives, end clean, stay in lane, suitability confidence, conversation context, frame constraints |
| chat_context | ~292 | CHAT CONTEXT | Chat | Dialogue as tool not goal, simplicity-first framing, pre-launch context, waitlist guidance |
| chat_format | ~400 | CHAT FORMAT | Chat | Plain text only, 2-3 paragraphs max, clarifying-question brevity rule, no filler sign-offs |
| email_context | ~1,802 | EMAIL CONTEXT | Email tone and style rules (v10): greeting with name, prospect vs customer handling, single recipe framing, context-aware closing, clinical detail proportionality, name extraction, "I" not "we" | |
| email_format | ~562 | EMAIL FORMAT | JSON output specification (v5): category, confidence, draft_response, action, escalation_reason, suggested_subject, customer_name. No sign-off rule. |
Retrieved Sections (pulled per turn via semantic search)¶
| section_key | Tokens | Parent | Title |
|---|---|---|---|
| product_pricing | 253 | product_knowledge | Pricing and value |
| product_ingredients | ~421 | product_knowledge | Ingredients, nutrition basics, tray packaging specification |
| product_single_sku | 443 | product_knowledge | Single recipe philosophy (6 angles) |
| product_fediaf_nutrition | ~589 | product_knowledge | FEDIAF completeness, life stage suitability, article links (when directly relevant) |
| product_competitor_env | 243 | product_knowledge | Competitor context and environmental position |
| escalation_playbooks | 189 | escalation_rules | When to escalate / when not to |
| scenario_transition_digestive | 52 | common_scenarios | Loose stool, reduced water intake |
| scenario_fussy_eater | ~424 | — | Consolidated picky eater with branching logic: pre-purchase, first box (days 1-10), persistent refusal with escalation |
| scenario_allergy_fit | 51 | common_scenarios | Allergy and dietary fit |
| scenario_vet_concerns | ~210 | common_scenarios | Vet pushback handling, article link |
| scenario_pre_launch | 51 | common_scenarios | Pre-launch availability, London only |
| transition_guidance | ~442 | — | 7-day overview (day-by-day schedule removed, lives in article), warning signs, transition guide in customer portal |
| feeding_calculations | 616 | — | RER/MER, reference examples, calculator link |
| safety_verification | ~569 | — | Batch testing, proof portal, sample batch report link + linking instruction, failed batch protocol, article links (when directly relevant). v10. |
| storage_serving | 189 | — | Freezer, defrost, shelf life |
| delivery_logistics | ~220 | — | Phase A milkround, courier-neutral, London postcodes, free delivery |
| self_service_portal | 293 | — | Portal features, 48-hour lock, magic link |
| portal_track_delivery | ~137 | — | Tracking in customer portal, courier-neutral |
| 11 other portal_* sections | ~1,560 total | — | Individual portal action guides |
Legacy Sections (deactivated)¶
| section_key | Status | Replaced By |
|---|---|---|
| brand_identity | Deactivated | brand_kernel |
| escalation_rules | Deactivated | escalation_core + escalation_playbooks |
| product_knowledge | Deactivated | product_pricing, product_ingredients, product_single_sku, product_fediaf_nutrition, product_competitor_env |
| common_scenarios | Deactivated | scenario_transition_digestive, scenario_fussy_eater, scenario_allergy_fit, scenario_vet_concerns, scenario_pre_launch |
| scenario_fussy_pre_purchase | Deactivated | Consolidated into scenario_fussy_eater v2 (branching logic) |
| scenario_fussy_transition | Deactivated | Consolidated into scenario_fussy_eater v2 (branching logic) |
| scenario_fussy_persistent | Deactivated | Consolidated into scenario_fussy_eater v2 (branching logic) |
Functions¶
fn_assemble_core_prompt(p_channel)¶
Assembles guardrail + behaviour sections using ai_prompt_templates for ordering.
Parameters: p_channel TEXT ('chat' or 'email')
Returns: prompt TEXT, section_versions JSONB, token_estimate INTEGER
Used by: Chat Edge Function, assemble-email-prompt Edge Function
SELECT prompt, section_versions, token_estimate
FROM raw_ops.fn_assemble_core_prompt('chat');
-- Returns ~2,850 tokens (chat) or ~1,667 tokens (email)
fn_retrieve_knowledge(p_channel, p_query_embedding, p_query_text, p_token_budget, p_candidate_limit, p_similarity_threshold)¶
Retrieves relevant knowledge sections via pgvector semantic search with FTS fallback.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| p_channel | TEXT | required | 'chat' or 'email' |
| p_query_embedding | vector(1536) | required | OpenAI embedding of the query |
| p_query_text | TEXT | required | Original query text (for FTS fallback) |
| p_token_budget | INTEGER | 1500 | Maximum total retrieved tokens |
| p_candidate_limit | INTEGER | 8 | Max candidates to evaluate |
| p_similarity_threshold | NUMERIC | 0.25 | Minimum cosine similarity |
Returns: section_key, title, content, similarity, token_count, match_type
Packing logic: - Candidates ordered by similarity DESC, then token_count ASC (prefer smaller on ties) - CONTINUE on budget overflow (don't STOP — smaller chunks after a large one may still fit) - FTS fallback fires only if semantic search returns zero results
Used by: Chat Edge Function, assemble-email-prompt Edge Function
fn_assemble_prompt(p_channel) [DEPRECATED]¶
Legacy monolithic assembly. Still functional but returns only core sections (~2,097 tokens). Retained for backward compatibility. Do not use for new integrations.
fn_add_section_version(...) / fn_activate_section_version(...)¶
Unchanged from v2.0. Used for all content updates.
Integration Points¶
Chat (Edge Function v24)¶
The chat Edge Function performs tiered retrieval internally:
1. Trivial turn check → reuse last system message if trivial
2. Fetch core prompt via fn_assemble_core_prompt('chat')
3. Embed user message via OpenAI text-embedding-3-small
4. Retrieve knowledge via fn_retrieve_knowledge('chat', embedding, text, 1500)
5. Assemble system prompt: core + retrieved
6. Send to Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) via Anthropic Messages API with conversation history. System prompt passed as top-level system parameter (not as a message with role: system). Streaming uses Anthropic SSE format (content_block_delta events).
7. Log to kb_retrieval_logs
Models: Chat completion: Anthropic Claude Sonnet 4.5. Embedding: OpenAI text-embedding-3-small (unchanged).
Feature flag: USE_TIERED_RETRIEVAL env var. When false, falls back to fn_assemble_prompt('chat').
Email (assemble-email-prompt Edge Function)¶
Make.com calls this Edge Function instead of fn_assemble_prompt('email'):
POST /assemble-email-prompt
Headers: x-internal-secret: [AGENT_INTERNAL_SECRET]
Body: { "email_body": "...", "email_subject": "..." }
Response: { "prompt": "...", "token_estimate": 3338, "retrieved_sections": [...] }
The Edge Function embeds the email text, retrieves relevant knowledge (budget: 2,000 tokens), assembles core + retrieved, and returns the complete system prompt.
Make.com module change: Replace Module 15's Supabase RPC call to fn_assemble_prompt('email') with an HTTP POST to this Edge Function. Use {{data.prompt}} as the system message.
CS-03 Agent (unchanged)¶
The autonomous support agent continues to use fn_search_kb_hybrid directly for its own knowledge search. This is a separate retrieval path that is unaffected by the v3.0 changes.
Content Update Workflow¶
Adding New Knowledge¶
- Create a new section via
fn_add_section_versionwithp_activate = true - Set
knowledge_role = 'retrieved',channels = '{chat,email}',parent_section_keyif applicable token_countauto-populates via trigger on insert/update. Verify:SELECT token_count FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true;- Embedding auto-generates via trigger. Verify:
SELECT embedding IS NOT NULL FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true; - Test retrieval: embed a test query and call
fn_retrieve_knowledgeto verify the new section is discoverable
Updating Existing Content¶
Same as v2.0: create new version via fn_add_section_version, activate via fn_activate_section_version. Embedding and token_count auto-regenerate via triggers.
Updating Core Sections¶
Changes to guardrail or behaviour sections take effect on the next turn for all conversations (core prompt is fetched per turn, not cached).
Required release gate for core section changes:
Any update to brand_kernel, response_principles, or hard_boundaries must pass a manual quality check before activation:
- Select 10-20 representative customer prompts (mix of product questions, concerns, technical questions, and edge cases)
- Generate responses using the new core version + tiered retrieval
- Blind review against the Brand Voice rubric:
- Calm, simple, dog-first, affirming?
- Leads with feeling, not system?
- Simplicity before safety on first impressions?
- No movement language, no hype, no internal jargon?
- Worry Goes Away Test: does the response make things feel calmer?
- Sign-off required before activation. If any prompt fails on simplicity, warmth, or correctness, revise the core section and re-test.
This gate is manual at current scale. Automated model-judge evaluation is a deferred item for when review volume exceeds manual capacity.
Determining the Fix Type¶
| If the issue is... | Fix approach |
|---|---|
| AI doesn't know a fact | Add or update a retrieved section |
| AI gives wrong tone | Update brand_kernel |
| AI handles a category poorly | Update response_principles |
| AI escalates incorrectly | Update escalation_core or escalation_playbooks |
| AI does something forbidden | Update hard_boundaries |
| AI uses wrong terminology | Update brand_kernel (terminology section) |
| AI doesn't retrieve the right knowledge | Add keyword header to the target section's content |
| AI retrieves irrelevant sections | Check section content for ambiguous vocabulary |
Principles¶
- Prefer principles over scenarios. Add a principle that covers a category, not a specific Q&A.
- Keep retrieved sections atomic. One topic, one decision, one use case. Target ~120-400 tokens per retrieved section.
- Add keyword headers for discoverability. If a section isn't being retrieved for relevant queries, prepend a
Keywords:line with customer vocabulary. - Deactivate rather than delete. Old versions are preserved in the version history for rollback.
Monitoring¶
Telemetry Queries¶
-- Retrieval health dashboard (last 7 days)
SELECT
channel,
COUNT(*) as total_turns,
AVG(total_retrieved_tokens)::INTEGER as avg_retrieved_tokens,
AVG(core_tokens + total_retrieved_tokens)::INTEGER as avg_total_prompt,
ROUND(100.0 * COUNT(*) FILTER (WHERE empty_retrieval) / COUNT(*), 1) as empty_pct,
ROUND(100.0 * COUNT(*) FILTER (WHERE fallback_used) / COUNT(*), 1) as fallback_pct,
ROUND(100.0 * COUNT(*) FILTER (WHERE was_trivial_turn) / COUNT(*), 1) as trivial_cache_pct,
AVG(embedding_latency_ms) FILTER (WHERE NOT was_trivial_turn)::INTEGER as avg_embed_ms,
AVG(retrieval_latency_ms) FILTER (WHERE NOT was_trivial_turn)::INTEGER as avg_retrieval_ms
FROM raw_ops.kb_retrieval_logs
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY channel;
-- Most retrieved sections
SELECT
section->>'section_key' as section_key,
COUNT(*) as retrieval_count,
AVG((section->>'similarity')::numeric)::NUMERIC(4,3) as avg_similarity
FROM raw_ops.kb_retrieval_logs,
jsonb_array_elements(retrieved_sections) as section
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY section->>'section_key'
ORDER BY retrieval_count DESC;
-- Never-retrieved sections (candidates for review)
SELECT aks.section_key, aks.token_count
FROM raw_ops.ai_knowledge_sections aks
WHERE aks.is_active = true AND aks.knowledge_role = 'retrieved'
AND NOT EXISTS (
SELECT 1 FROM raw_ops.kb_retrieval_logs krl,
jsonb_array_elements(krl.retrieved_sections) as s
WHERE s->>'section_key' = aks.section_key
AND krl.created_at > NOW() - INTERVAL '30 days'
);
Monitoring Targets¶
| Metric | Target | Alert If |
|---|---|---|
| Empty retrieval rate | < 5% | > 10% |
| FTS fallback rate | < 10% | > 20% |
| Avg retrieved tokens per turn | 800-1,500 | > 2,000 or < 400 |
| Trivial turn cache hit rate | 15-30% | < 5% or > 50% |
| p95 total prompt tokens | < 5,500 | > 6,500 |
| Avg embedding latency | < 100ms | > 200ms |
| Avg retrieval latency | < 50ms | > 100ms |
Trivial Turn Detection¶
Implemented in the chat Edge Function. Skips embedding and retrieval for short, non-substantive messages.
Rules: - Message must be ≤ 4 words - Must not contain domain terms: order, delivery, box, puppy, dog, gram, price, cost, allergy, defrost, batch, portal, cancel, pause, feed, food, recipe, vet, subscription, payment, refund, tracking, stool, sick - Must not contain wh-words: how, what, when, where, why, which, who - Must match a trivial pattern anchored start AND end: ok, thanks, thank you, got it, great, perfect, cool, cheers, ta, lovely, brilliant, nice, sure, right, yes, no, yep, nope, alright, understood, noted, will do, good to know, that makes sense, i see, appreciate it
Troubleshooting¶
Section Not Being Retrieved¶
- Check embedding exists:
SELECT embedding IS NOT NULL FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true; - Check knowledge_role is 'retrieved':
SELECT knowledge_role FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true; - Check channels includes the target channel:
SELECT channels FROM raw_ops.ai_knowledge_sections WHERE section_key = '...' AND is_active = true; - Test with a direct query: embed a test prompt and call
fn_retrieve_knowledgedirectly - If the section doesn't appear in results, diagnose:
- Similarity below 0.25 and the query uses different vocabulary than the section: Add a keyword header to the section content with customer-facing synonyms. Re-embed.
- Similarity below 0.25 and the query vocabulary matches but the section is very short (< 120 tokens): The section may be too sparse for good embedding quality. Expand with more context or merge with a related section.
- Similarity above 0.25 but section is excluded by token budget: A larger, higher-similarity section consumed the budget. Check whether that larger section should be split, or increase the token budget.
- Similarity above 0.25 but a nearby section is stealing the match: The two sections have overlapping vocabulary. Differentiate their content or merge them.
Wrong Section Being Retrieved¶
- Check if the section content contains vocabulary that matches the query unintentionally
- Consider splitting the section into more atomic units
- Review the telemetry for that query pattern
Core Prompt Issues¶
- Verify core assembly:
SELECT token_estimate FROM raw_ops.fn_assemble_core_prompt('chat'); - Check all core sections are in templates: query
ai_prompt_templatesfor the channel - Check no legacy sections are still included in templates
High Token Usage¶
- Check retrieval telemetry for oversized sections being retrieved
- Consider splitting large retrieved sections
- Adjust token budget if needed (chat: 1,500, email: 2,000)
Version History¶
| Version | Date | Changes | Author |
|---|---|---|---|
| 3.3 | 2026-04-08 | Email support overhaul. email_context v1→v10 (tone rewrite, prospect detection, single recipe framing, context-aware closing, name extraction, clinical detail proportionality). email_format v1→v5 (suggested_subject, customer_name fields, greeting in AI draft). hard_boundaries v8→v10 (hallucination guard, strengthened single-SKU framing). | Anton |
| 3.2 | 2026-04-06 | Conversational chat upgrade. response_principles v19 (10 principles: message-type detection, one-idea-per-turn, warmth acknowledgment). chat_context v4 (dialogue as tool). chat_format v9 (question brevity rule). hard_boundaries v8 (10 new boundaries from live testing). Consolidated 3 fussy eater sections into 1. Compressed transition_guidance 850→442 tokens. delivery_logistics milkround rewrite. Chat model switched from GPT-4o-mini to Claude Sonnet 4.5. Chat Edge Function v24 with Anthropic API. | Anton |
| 3.1 | 2026-04-05 | Three-branch split of scenario_fussy_eater (251 tokens) into scenario_fussy_pre_purchase (259), scenario_fussy_transition (299), and scenario_fussy_persistent (278). Pre-purchase leads with formulation-backed confidence; transition provides day-by-day coaching; persistent refusal escalates to human. Driven by market research: picky eating is #1 blocking category (25.2% of blocked buyers). Old section deactivated, added to legacy table. |
Anton |
| 3.0 | 2026-04-04 | Tiered retrieval architecture. 63% token reduction. 14 new sections. fn_assemble_core_prompt + fn_retrieve_knowledge. kb_retrieval_logs telemetry. assemble-email-prompt Edge Function. Chat Edge Function v23. Legacy sections deactivated. | Anton |
| 2.0 | 2026-03-30 | Architecture hardening post-audit. Unique active section index. Auth model documented. Manual activation SQL removed. | Anton |
| 1.9 | 2026-03-12 | Full 16-section audit. Pre-launch alignment. Brand Voice v1.3. Single-SKU expanded to 6 angles. FEDIAF expanded to 5 angles. 7 new hard boundaries. Token budgets: chat ~4,010, email ~3,706. | Anton |
| 1.8 | 2026-01-21 | Feeding calculations product specificity | Anton |
| 1.7 | 2026-01-21 | FEDIAF verification guidance | Anton |
| 1.6 | 2026-01-21 | Customer-friendly ingredient terminology | Anton |
| 1.5 | 2026-01-21 | Company vs product terminology | Anton |
| 1.4 | 2026-01-21 | Feeding and transition enhancements | Anton |
| 1.3 | 2026-01-21 | Environmental position, FEDIAF terminology | Anton |
| 1.2 | 2026-01-12 | Consolidation, AI judgment | Anton |
| 1.1 | 2026-01-12 | Principles, nutritional data | Anton |
| 1.0 | 2026-01-12 | Initial release. 16 sections, monolithic assembly. | Anton |
Deferred Items¶
| Item | Trigger to Build |
|---|---|
| Parent diversity in packer (max 2 children from same parent) | Telemetry shows one parent monopolising packs |
| Session-state-based retrieval query enrichment | Retrieval misses from topic shifts visible in telemetry |
| Core prompt caching by version hash | Embedding/retrieval latency becomes a bottleneck |
| Automated core quality testing (model judge) | Manual rubric review becomes too slow at scale |
Document Owner: Protocol Raw Technical Team Last Reviewed: 2026-04-08 Next Review: 2026-07-08 Version: 3.3 Status: Production Ready