Skip to content

SOP CS-03: Autonomous Support Agent (v1.6)

Status: 🟡 Shadow Mode Validation Last Updated: 2026-03-19 Owner: Protocol Raw Operations Platform: Supabase Edge Functions + Claude API + Cloudflare Workers Prerequisites: SOP CS-01 v2.1, SOP CS-02 v1.4, SOP AI-KB-01 v1.2


Executive Summary

What This Is

An autonomous AI agent that receives customer support tickets, investigates context using Protocol Raw's database and external APIs, determines appropriate resolution, and executes actions—with human oversight only for high-stakes decisions.

Why It Matters

Metric Current State (AI-Assist) Target State (Agentic)
Human involvement 100% of tickets <20% of tickets
Response time 15-60 min (human review) <2 min (autonomous)
Tickets per hour capacity 15-20 (human limited) Unlimited
First hire threshold ~500 customers ~5,000 customers
Cost per ticket ~£3-5 (human time) ~£0.05-0.10 (API costs)

The Strategic Case

This agent embodies the "AI-native" thesis that underpins Protocol Raw's 10-19× capital efficiency claim. By handling 80%+ of routine support autonomously, we can:

  1. Validate faster — Phase A customers get instant responses, improving NPS and retention
  2. Defer hiring — Operations Lead focuses on high-value work, not ticket clearing
  3. Build moat — Accumulated decision data trains better judgment over time
  4. Scale confidently — 10k customers requires no more human support capacity than 500

v1.6 Changes

  • Status reclassified — Document reclassified from "Production Ready - Shadow Mode" to "Shadow Mode Validation." Status must now be earned by passing the validation gate, not declared at document creation
  • Null outcome fix — Agent execution guarantees non-null outcome on every run via try/catch/finally with ensureOutcomeWritten fallback. CHECK constraint on database prevents null outcomes on completed executions
  • Timeout-to-escalation — Agent loop checks elapsed time at each iteration. If >80% of timeout budget consumed, escalates to human rather than risking silent failure
  • Outcome monitoring — New view v_agent_outcome_health tracks execution health daily
  • Least-privilege rolesupport_agent PostgreSQL role with explicit grants; SET ROLE at execution start (CS03-004)
  • Partial-failure recovery — All agent writes via SECURITY DEFINER RPCs with failure recovery to agent_execution_failures table. Customer.io retry logic with 5s backoff (CS03-005)
  • Graduation criteria defined — Explicit thresholds for exiting shadow mode with v_agent_graduation_readiness view (CS03-006)
  • KB auto-embedding — Database trigger on ai_knowledge_sections automatically generates embeddings via pg_net when sections are created or content is updated. Manual backfill_all retained for recovery only (CS03-009)
  • Cost and SLO monitoring — Daily views for API cost per ticket, execution health, and email delivery rate. SLO targets defined for autonomous mode (CS03-010)

v1.5 Changes

  • Email Threading — Outbound emails include In-Reply-To and References headers for proper inbox threading
  • [Support] Prefix — All outbound subject lines automatically prefixed with [Support] to distinguish from marketing
  • Conversation History — Messages logged to ticket_messages table for full thread tracking
  • Unified Email Functionsend-support-email Edge Function used by both CS-03 and Ops Portal

v1.4 Changes

  • Semantic KB Search — pgvector embeddings with OpenAI text-embedding-3-small
  • Hybrid Retrieval — Semantic first, keyword fallback if similarity < 0.3
  • Search Logging — Query, top match, similarity score, match type logged
  • Slack Alerting — Support queue monitor every 15 min via native Supabase stack

v1.3 Changes

  • Support Personas — Sophie, Tom, Lucy assigned deterministically per customer
  • Operating Hours — 8am-9pm London time (timezone-aware)
  • Response Delay — 3-12 minute random delay (anti-bot humanization)
  • Email-Before-Resolve Gate — Code-enforced: cannot resolve without sending email first

v1.2 Changes (Production Hardening)

  • Preflight Policy Gate — Code-enforced escalation rules run BEFORE Claude (zero-token escalation)
  • Idempotency — dedupe_key prevents duplicate processing from retries
  • Auth Hardening — X-Internal-Secret header required (throws on boot if missing)
  • CORS Hardening — Only allowed origins receive CORS headers
  • Confidence Enforcement — Agent must provide confidence score; <70% blocks resolution
  • Human-Readable Reason Codes — Machine codes (e.g., health_vomiting) for Metabase analytics
  • Timeline Whitelist Narrowed — Only tracking-cited timelines exempt; vet advice NEVER whitelisted
  • Shadow Mode Default — All tickets escalate for human review during validation

Email Threading (v1.5)

Purpose

Customer replies now thread correctly in their inbox, appearing as a single conversation rather than separate emails.

How It Works

When an inbound email arrives: 1. Cloudflare email-ingest Worker parses the MIME Message-ID header 2. This is stored in support_tickets.email_message_id via cs-agent-triage

When sending a reply: 1. Edge Function fetches the original email_message_id from the ticket 2. Adds In-Reply-To and References headers to the outbound email 3. Customer.io sends with these headers

Database Schema

-- Added to support_tickets
ALTER TABLE raw_ops.support_tickets 
ADD COLUMN email_message_id TEXT,
ADD COLUMN email_references TEXT;

Threading Headers

CIO accepts a plain key-value object (not the JSON string format their OpenAPI spec claims). Tested and confirmed 2026-04-08.

{
  "headers": {
    "In-Reply-To": "<original-message-id@gmail.com>",
    "References": "<original-message-id@gmail.com>"
  }
}

Edge Function Response

The send-support-email function now returns:

{
  "success": true,
  "delivery_id": "abc123",
  "persona": "Sophie",
  "threading_enabled": true,
  "sent_immediately": true
}

Subject Line Prefix (v1.5)

Purpose

Distinguish support emails from marketing communications in customer inbox.

Implementation

All outbound support emails automatically receive [Support] prefix. The Re: prefix is only added when there are existing outbound messages on the ticket (not on first reply). Placeholder subjects ("(no subject)", "(pending)", "(none)", "(empty)") are normalised to category-based fallbacks.

// [Support] prefix - applied to all outbound
const emailSubject = baseSubject.startsWith('[Support]')
  ? baseSubject
  : `[Support] ${baseSubject}`;

// Re: prefix - only on follow-up replies
const { count: priorReplies } = await supabase
  .from('ticket_messages')
  .select('id', { count: 'exact', head: true })
  .eq('ticket_id', ticket_id)
  .eq('direction', 'outbound');

baseSubject = (priorReplies > 0) ? `Re: ${clean}` : clean;

Examples

Scenario Final Subject
Customer subject: "Where is my order?" (first reply) [Support] Where is my order?
Same ticket, second reply [Support] Re: Where is my order?
No subject, AI generates "Loose stools during transition" [Support] Loose stools during transition
Already prefixed [Support] Already prefixed (no duplication)

Conversation History (v1.5)

Purpose

Track the full back-and-forth conversation between customer and support, enabling: - Context for agents handling follow-up replies - Historical view in Ops Portal - Audit trail of all communications

Database Schema

CREATE TABLE raw_ops.ticket_messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  ticket_id UUID NOT NULL REFERENCES raw_ops.support_tickets(id) ON DELETE CASCADE,
  direction TEXT NOT NULL CHECK (direction IN ('inbound', 'outbound')),
  sender TEXT NOT NULL,  -- customer email or agent persona
  subject TEXT,
  body TEXT NOT NULL,
  email_message_id TEXT,  -- For threading
  delivery_id TEXT,       -- Customer.io delivery ID for outbound
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_ticket_messages_ticket_id ON raw_ops.ticket_messages(ticket_id);

Message Flow

Inbound (customer email): - Trigger on support_tickets INSERT creates first message automatically - Direction: inbound, Sender: customer email

Outbound (our reply): - Created by send-support-email Edge Function - Direction: outbound, Sender: persona name (Sophie/Tom/Lucy)

Retrieval Function

CREATE FUNCTION public.get_ticket_thread(p_ticket_id UUID)
RETURNS TABLE (
  message_id UUID,
  direction TEXT,
  sender TEXT,
  subject TEXT,
  body TEXT,
  created_at TIMESTAMPTZ
);

Support Personas (v1.3)

Overview

The agent uses named personas to humanize automated responses and create brand consistency.

Persona Email Sign-off
Sophie Sophie
Tom Tom
Lucy Lucy

Assignment Logic

In send-support-email, persona is assigned randomly. The CS-03 autonomous agent assigns deterministically per customer (same customer always gets same persona).

// send-support-email (random)
const persona = PERSONAS[Math.floor(Math.random() * PERSONAS.length)];

// CS-03 autonomous agent (deterministic)
function getPersonaForCustomer(customerId: string): string {
  const hash = customerId.split('').reduce((sum, char) => sum + char.charCodeAt(0), 0);
  return PERSONAS[hash % PERSONAS.length];
}

Sign-off Handling

The AI draft must NOT include any sign-off. The email_context and email_format KB sections instruct the AI not to add any closing, name, or farewell. A belt-and-braces regex in cs-agent-triage strips trailing sign-off patterns (persona names, "Best regards", "[Name]", etc.) from the draft before storing on the ticket — examining only the last few lines to avoid stripping body content.

send-support-email appends the persona sign-off ("{persona}\nProtocol Raw") to the message body before sending to Customer.io. The sign-off is part of the message_data.body field. The CIO Support Layout does not inject any greeting or sign-off — it only provides card chrome (accent line, wordmark, footer).

Legacy: Previous versions used [Name] as a placeholder in the AI draft. This is no longer used. send-support-email still replaces [Name] with the persona for backward compatibility.


Operating Hours & Response Timing (v1.3)

Operating Hours

Setting Value
Start 8:00 AM London time
End 9:00 PM London time
Days Every day (7 days/week)
Timezone Europe/London (handles BST/GMT automatically)

Response Delay

To avoid the "instant bot" tell, responses are delayed:

Scenario Delay
Within operating hours 3-12 minutes (random)
Outside operating hours Queued to 8am + 3-12 min delay

Implementation

function getLondonHour(): number {
  const formatter = new Intl.DateTimeFormat('en-GB', {
    timeZone: 'Europe/London',
    hour: 'numeric',
    hour12: false,
  });
  return parseInt(formatter.format(new Date()), 10);
}

function isWithinOperatingHours(): boolean {
  const hour = getLondonHour();
  return hour >= 8 && hour < 21; // 8am to 9pm
}

function getResponseDelayMs(): number {
  const minDelay = 3 * 60 * 1000;  // 3 minutes
  const maxDelay = 12 * 60 * 1000; // 12 minutes
  return Math.floor(Math.random() * (maxDelay - minDelay + 1)) + minDelay;
}

Email-Before-Resolve Gate (v1.3)

Purpose

Code-enforced rule: Agent cannot resolve a ticket without first sending a response email to the customer.

This prevents "silent resolutions" where the ticket is marked resolved but the customer never received a response.

Implementation

State tracking:

interface AgentState {
  // ... other fields
  emailSent: boolean;  // Set true when send_customer_email succeeds
}

Gate in executeResolveTicket:

async function executeResolveTicket(input, state): Promise<ToolResult> {
  // HARD GATE: Cannot resolve without sending email first
  if (!state.emailSent) {
    return {
      success: false,
      error: 'Cannot resolve ticket without sending a response to the customer first. Use send_customer_email, then resolve.',
    };
  }

  // ... rest of resolution logic
}

Behavior

Scenario Result
Agent tries to resolve without emailing Tool returns error, agent must email first
Agent emails then resolves Resolution proceeds normally
Agent escalates without emailing Allowed (escalation doesn't require email)

Architecture Overview

Component Responsibilities

Component Responsibility Technology
Trigger Layer Receive emails, chat escalations Cloudflare email-ingest Worker + chat Edge Function
Preflight Policy Gate Code-enforced escalation rules (v1.2) Edge Function
Orchestrator Coordinate agent execution, enforce guardrails Supabase Edge Function
Agent Brain Reason about tickets, select tools, generate responses Claude API (claude-sonnet-4-20250514)
Safety Filter Validate responses before sending Edge Function
Persona & Timing Assign persona, calculate send time (v1.3) Edge Function
Email Sender Send emails with threading headers (v1.5) send-support-email Edge Function
Tool Executor Perform database queries, API calls Edge Function + RPC
Audit Logger Record every decision and action Supabase tables
Escalation Router Send complex cases to Slack for human review Slack API

Request Flow (v1.5)

┌─────────────────────────────────────────────────────────────────────────────┐
│                           AGENT REQUEST FLOW v1.5                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. INGRESS                                                                 │
│     ├─ Auth Check (X-Internal-Secret) ──→ 401 if invalid                   │
│     └─ Idempotency Check (dedupe_key) ──→ Return prior result if exists    │
│                                                                             │
│  2. PREFLIGHT (Code-Enforced)                                              │
│     ├─ query_customer_context ──→ Get flags + customer_id                  │
│     └─ Policy Gate Check:                                                  │
│        ├─ Health/Safety keywords? ──→ ESCALATE (zero Claude tokens)        │
│        ├─ Financial/Legal language? ──→ ESCALATE                           │
│        ├─ Negative sentiment? ──→ ESCALATE                                 │
│        ├─ Attachments present? ──→ ESCALATE                                │
│        ├─ is_repeat_contacter? ──→ ESCALATE                                │
│        ├─ has_pending_replacement? ──→ ESCALATE                            │
│        └─ has_recent_credit + compensation request? ──→ ESCALATE           │
│                                                                             │
│  3. CLAUDE AGENT LOOP (only if policy gate passes)                         │
│     ├─ Investigate (tools)                                                 │
│     ├─ Search knowledge base (semantic + keyword)                          │
│     ├─ Draft response (no sign-off)                                        │
│     └─ Safety filter check                                                 │
│                                                                             │
│  4. SEND EMAIL (v1.5)                                                      │
│     ├─ Fetch email_message_id from ticket                                  │
│     ├─ Add [Support] prefix to subject                                     │
│     ├─ Apply persona sign-off                                              │
│     ├─ Add threading headers (In-Reply-To, References)                     │
│     ├─ Send via Customer.io                                                │
│     ├─ Log to ticket_messages table ──→ state.emailSent = true             │
│     └─ Log to ticket_notes for audit                                       │
│                                                                             │
│  5. RESOLUTION GATE (v1.3)                                                 │
│     └─ state.emailSent? ──→ No = Block resolution, return error            │
│                                                                             │
│  6. CONFIDENCE ENFORCEMENT                                                 │
│     ├─ Confidence provided? ──→ No = ESCALATE                              │
│     └─ Confidence >= 70%? ──→ No = ESCALATE                                │
│                                                                             │
│  7. SHADOW MODE (default)                                                  │
│     └─ All resolutions ──→ ESCALATE for human review                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Preflight Policy Gate (v1.2)

Purpose

Code-enforced escalation rules that run before Claude API is called. This ensures:

  1. Deterministic safety — Critical rules in code, not prompt suggestions
  2. Cost efficiency — Zero API tokens for obvious escalations
  3. Auditability — Every policy trigger logged with machine-readable code
  4. Speed — Policy escalations complete in <100ms

Policy Rules

Category Pattern Code Severity
Health/Safety sick, ill, unwell, poorly health_unwell critical
vomit, vomiting, threw up health_vomiting critical
diarrhea, loose stool health_digestive critical
vet, veterinarian, animal hospital health_vet_mention critical
foreign object, plastic, metal, bone fragment quality_foreign_object critical
temperature, cold chain, warm, thawed quality_cold_chain critical
allergic, allergy, reaction, swelling health_allergy critical
lethargic, not eating, refusing food health_appetite critical
blood, bleeding health_blood critical
emergency, urgent, rushed health_emergency critical
Financial/Legal refund, money back, reimburse financial_refund high
compensation, compensate financial_compensation high
solicitor, lawyer, legal action, trading standards legal_threat high
small claims, court legal_court high
chargeback, dispute the charge financial_chargeback high
Sentiment disgusting, disgraceful, unacceptable sentiment_negative high
worst experience, never again, cancel everything sentiment_churn_risk high
social media, twitter, facebook, review sentiment_social_threat high
tell everyone, warn others, public sentiment_public_threat high
furious, livid, fuming, outraged sentiment_anger high
Special passed away, died, rainbow bridge, euthanasia special_bereavement medium
wholesale, bulk order, retailer, breeder, B2B special_b2b medium
speak to human, real person, manager special_human_request medium
Context Flags 3+ tickets in 7 days context_repeat_contacter medium
Pending replacement exists context_pending_replacement medium
Recent credit + compensation request context_recent_credit_request medium
Attachments Any attachment present attachment_present high

Reason Codes for Metabase

All policy triggers are logged with machine-readable reasonCode enabling:

  • Escalation cause breakdown charts
  • Pattern trend analysis over time
  • Identification of new patterns requiring rules

Tool Definitions

Overview

Category Tools Autonomous?
Read query_customer_context, check_order_status, get_courier_tracking, search_knowledge_base ✅ Always
Respond send_customer_email, resolve_ticket ✅ Within guidelines + safety filter + email gate
Act (Low Risk) skip_next_delivery ✅ Always
Act (Medium Risk) apply_store_credit ✅ Up to £20
Act (High Risk) trigger_replacement_order ⚠️ Requires confirmation
Escalate escalate_to_human ✅ Always (throttled in early phase)
Audit add_internal_note ✅ Always

Tool: send_customer_email (v1.5)

Purpose: Send response email to customer with persona, timing, and threading applied.

Implementation: Calls send-support-email Edge Function (shared with Ops Portal).

Input:

{
  "ticket_id": "uuid",
  "subject": "Re: Customer's subject",
  "body": "Response text with [Name] placeholder for persona sign-off"
}

Output (v1.5):

{
  "success": true,
  "data": {
    "delivery_id": "RIabDAUAAZvbWwCC18...",
    "persona": "Sophie",
    "threading_enabled": true,
    "sent_immediately": true
  }
}

With delayed send:

{
  "success": true,
  "data": {
    "delivery_id": "RIabDAUAAZvbWwCC18...",
    "persona": "Sophie",
    "threading_enabled": true,
    "sent_immediately": false,
    "scheduled_for": "2026-01-20T08:07:32.000Z",
    "delay_minutes": 7,
    "within_operating_hours": false
  }
}

Side Effects: - Sets state.emailSent = true (enables resolution) - Logs to ticket_messages table (direction: outbound) - Logs to ticket_notes with persona email (e.g., sophie@protocolraw.co.uk) - Adds [Support] prefix to subject - Adds threading headers if email_message_id exists on ticket - Replaces [Name] placeholder with assigned persona name

Tool: resolve_ticket (v1.3)

Purpose: Mark ticket as resolved with category and summary.

Pre-conditions (code-enforced): 1. state.emailSent must be true (email gate) 2. confidence must be ≥70%

Input:

{
  "ticket_id": "uuid",
  "resolution_category": "subscription|delivery|quality|feeding|other",
  "resolution_type": "informational|action_taken|goodwill_gesture|clarification_only",
  "resolution_summary": "Brief description",
  "confidence": 85
}


Knowledge Base Search (v1.4)

Semantic Search with pgvector

The agent uses OpenAI embeddings + pgvector for semantic similarity search, with keyword fallback.

Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)

Search Flow

Query → OpenAI Embedding → pgvector cosine similarity → Results
                          If similarity < 0.3
                          Keyword fallback search

Database Schema

-- Embedding column on KB sections
ALTER TABLE raw_ops.ai_knowledge_sections 
ADD COLUMN embedding vector(1536);

-- IVFFlat index for fast similarity search
CREATE INDEX idx_kb_sections_embedding 
ON raw_ops.ai_knowledge_sections 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 10);

Hybrid Search RPC

-- fn_search_kb_hybrid(p_query_embedding, p_query_text, p_match_threshold, p_limit)
-- Returns: section_key, title, content, similarity, match_type
Parameter Default Description
p_query_embedding required 1536-dim vector from OpenAI
p_query_text required Original query for keyword fallback
p_match_threshold 0.3 Minimum cosine similarity
p_limit 5 Max results

Search Output

{
  "success": true,
  "data": {
    "results": [
      {
        "section_key": "portal_pause_subscription",
        "title": "Pausing Your Subscription",
        "content": "...",
        "similarity": 0.543,
        "match_type": "semantic"
      }
    ],
    "query": "pause subscription holiday",
    "search_type": "semantic"
  }
}

Logging

Every search logs to Edge Function console:

KB search: query="pause subscription", top_match="portal_pause_subscription", similarity=0.543, match_type=semantic

Embedding Generation

Edge Function: embedding-generator

Action Input Description
backfill_all Embed all sections without embeddings
embed_one section_id Embed single section

Future: Trigger on KB insert/update for auto-embedding.


Support Queue Monitoring (v1.4)

Overview

Native Supabase monitoring alerts when tickets need human review.

Components

Component Purpose
v_support_needs_review View showing unreviewed escalations
fn_check_support_health_v2() Support health monitor (queue, triage quality, intake)
monitor-cs-01-support-health pg_cron job every 15 min
ops-alerter Edge Function sending to Slack

Alert Severity

Queue Age Severity Channel
< 30 min info #ops-alerts
30-60 min warning #ops-alerts
> 60 min critical #ops-urgent

Slack Alert Format

📬 Support Queue Needs Review
1 ticket(s) awaiting human review
Queue Size: 1
Oldest: 45 min
Action: https://ops.protocolraw.co.uk → Support

Configuration

Edge Function Config

const CONFIG: AgentConfig = {
  model: 'claude-sonnet-4-20250514',
  maxIterations: 10,
  maxTokens: 4096,
  timeoutMs: 30000,
  creditLimitPence: 2000,
  escalationThrottleEnabled: true,
  escalationThrottleMaxPerHour: 20,
  shadowMode: true,        // v1.2: Default ON
  requireAuth: true,       // v1.2: Auth required
  allowedOrigins: [        // v1.2: CORS whitelist
    'https://protocolraw.co.uk',
    'https://www.protocolraw.co.uk',
    'https://ops.protocolraw.co.uk',
    'https://hook.eu1.make.com',
  ],
};

// v1.3: Personas
const PERSONAS = ['Sophie', 'Tom', 'Lucy'];

// v1.3: Operating hours (London time)
// 8am (8) to 9pm (21)

Environment Variables

Variable Required Description
SUPABASE_URL Yes Supabase project URL
SUPABASE_SERVICE_ROLE_KEY Yes Service role key
ANTHROPIC_API_KEY Yes Claude API key
OPENAI_API_KEY Yes OpenAI API key (for embeddings)
CUSTOMERIO_API_KEY Yes Customer.io transactional API key
AGENT_INTERNAL_SECRET Yes (if requireAuth) Internal auth secret

Metabase Dashboards

Persona Distribution

SELECT 
  SPLIT_PART(created_by, '@', 1) as persona,
  COUNT(*) as emails_sent
FROM raw_ops.ticket_notes
WHERE note_type = 'email_sent'
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY 1
ORDER BY 2 DESC;

Response Timing Analysis

SELECT 
  tool_output->'data'->>'within_operating_hours' as within_hours,
  AVG((tool_output->'data'->>'delay_minutes')::int) as avg_delay_min,
  COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'send_customer_email'
  AND success = true
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1;

Threading Adoption (v1.5)

SELECT 
  DATE(created_at) as date,
  COUNT(*) FILTER (WHERE threading_enabled = true) as threaded,
  COUNT(*) FILTER (WHERE threading_enabled = false OR threading_enabled IS NULL) as not_threaded
FROM raw_ops.ticket_messages
WHERE direction = 'outbound'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY 1 DESC;

Conversation Thread Depth

SELECT 
  t.id as ticket_id,
  t.subject,
  COUNT(*) as message_count,
  COUNT(*) FILTER (WHERE m.direction = 'inbound') as customer_messages,
  COUNT(*) FILTER (WHERE m.direction = 'outbound') as our_replies
FROM raw_ops.support_tickets t
JOIN raw_ops.ticket_messages m ON m.ticket_id = t.id
WHERE t.created_at > NOW() - INTERVAL '30 days'
GROUP BY 1, 2
HAVING COUNT(*) > 1
ORDER BY message_count DESC;

Policy Gate Escalation Breakdown

SELECT 
  tool_output->'data'->>'reasonCode' as reason_code,
  tool_output->'data'->>'severity' as severity,
  COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'policy_gate'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY count DESC;

Confidence Calibration

WITH calibration AS (
  SELECT 
    CASE 
      WHEN confidence_score >= 90 THEN '90-100%'
      WHEN confidence_score >= 80 THEN '80-89%'
      WHEN confidence_score >= 70 THEN '70-79%'
      ELSE '<70%'
    END as bucket,
    human_agreed
  FROM raw_ops.agent_decisions
  WHERE human_reviewed_at IS NOT NULL
    AND created_at > NOW() - INTERVAL '30 days'
)
SELECT 
  bucket,
  COUNT(*) as total,
  SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) as agreed,
  ROUND(100.0 * SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) / COUNT(*), 1) as accuracy_pct
FROM calibration
GROUP BY bucket
ORDER BY bucket DESC;

Agent Performance Summary

SELECT
  DATE(created_at) as date,
  COUNT(*) as total_executions,
  SUM(CASE WHEN outcome = 'resolved' THEN 1 ELSE 0 END) as resolved,
  SUM(CASE WHEN outcome = 'escalated' THEN 1 ELSE 0 END) as escalated,
  SUM(CASE WHEN 'policy_gate' = ANY(tools_used) THEN 1 ELSE 0 END) as policy_escalations,
  AVG(duration_ms) as avg_duration_ms,
  SUM(total_tokens) as total_tokens
FROM raw_ops.agent_executions
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;

API Cost per Ticket (v1.6)

SELECT * FROM raw_ops.v_agent_cost_daily;

Execution Health (v1.6)

SELECT * FROM raw_ops.v_agent_execution_health;

Email Delivery Rate (v1.6)

SELECT * FROM raw_ops.v_agent_email_delivery;

Graduation Readiness (v1.6)

SELECT * FROM raw_ops.v_agent_graduation_readiness;

Deployment Checklist

First Deployment

  • [ ] Run database migrations (tables, indexes, dedupe_key column)
  • [ ] Add email_message_id and email_references columns to support_tickets
  • [ ] Create ticket_messages table with trigger
  • [ ] Deploy RPC functions (fn_agent_get_customer_context, fn_agent_get_order_status, get_ticket_thread)
  • [ ] Set environment variables in Supabase (including CUSTOMERIO_API_KEY)
  • [ ] Deploy send-support-email Edge Function
  • [ ] Deploy agent Edge Function with requireAuth: false for testing
  • [ ] Test policy gate with health ticket
  • [ ] Test clean ticket through Claude
  • [ ] Verify persona assignment is deterministic
  • [ ] Verify timing shows correct London hours
  • [ ] Verify email threading works (In-Reply-To headers)
  • [ ] Enable requireAuth: true
  • [ ] Set AGENT_INTERNAL_SECRET
  • [ ] Verify email-ingest Worker is deployed and passing Message-ID to cs-agent-triage

Shadow Mode Validation Gate

All items below must be completed before this SOP can be reclassified to "Production Ready — Shadow Mode." Until then, the document status remains "Shadow Mode Validation."

  • [ ] Shadow mode enabled and processing tickets
  • [ ] 100+ test tickets processed with zero null outcomes
  • [ ] Confidence calibration reviewed (70% threshold validated against human agreement rate)
  • [ ] No safety filter false negatives in test set
  • [ ] Persona consistency verified (same customer = same persona)
  • [ ] Operating hours verified (emails queued outside 8am-9pm)
  • [ ] Email threading verified (replies appear in same thread)
  • [ ] [Support] prefix appearing on all outbound emails
  • [ ] Conversation history populating in ticket_messages
  • [ ] Metabase dashboards configured and returning data
  • [ ] Slack escalation channel (#ops-urgent) ready and tested
  • [ ] Customer.io integration tested with scheduled sends
  • [ ] Least-privilege database role deployed (v1.6 — CS03-004) ✅
  • [ ] Partial-failure recovery model implemented (v1.6 — CS03-005) ✅
  • [ ] Timeout-to-escalation path implemented (v1.6 — CS03-003) ✅
  • [ ] Null outcome fix deployed (v1.6 — CS03-001) ✅

Success Metrics

Metric Shadow Mode Full Autonomy Target
Tickets processed Track Track
Policy gate escalations Track <30%
Claude escalations 100% (shadow) <20%
Correct decisions Track >95%
Response time (policy gate) <100ms <100ms
Response time (Claude) <30s <30s
Safety filter triggers Track <5%
Email gate blocks Track <1% (prompt compliance)
Threading enabled Track (v1.5) >95%
Multi-message threads Track (v1.5) Track
Customer.io delivery success Track (v1.6) >99%
Execution completion (non-null outcome) >99% (v1.6) >99.5%
Avg API cost per ticket Track (v1.6) <£0.15

Shadow Mode Graduation Criteria (v1.6)

Purpose

Shadow mode exists to validate that the agent makes correct decisions before granting it autonomy. Graduation is data-driven with explicit thresholds, not date-driven.

Prerequisites (all must be true)

  1. Shadow Mode Validation Gate (Deployment Checklist) fully completed
  2. All v1.6 changes implemented and stable for 14+ days

Graduation Thresholds

All thresholds must be met simultaneously over the most recent 200 human-reviewed decisions:

Metric Threshold Source
Sample size ≥ 200 reviewed decisions agent_decisions
Human agreement rate ≥ 90% agent_decisions.human_agreed
Draft acceptance (sent as-is) ≥ 70% agent_decisions.human_agreed
Outcome health ≥ 99% resolved or escalated agent_executions.outcome
Unrecovered failures 0 agent_execution_failures

Blocker Conditions (any one blocks graduation)

  • Any unrecovered agent_execution_failures
  • Confidence calibration not reviewed in the last 30 days
  • Metabase dashboards not returning data for all success metrics

Graduation Query

SELECT * FROM raw_ops.v_agent_graduation_readiness;

Returns a single row with graduation_status = 'READY' or 'NOT READY' and individual gate pass/fail status.

Graduation Process

  1. Run graduation query. All gates must show PASS
  2. Anton reviews the full dashboard and confirms
  3. SOP status updated to "Production Ready — Shadow Mode"
  4. Enable selective autonomy: auto-send for confidence ≥ 90% AND no policy gate AND category not in mandatory_review list
  5. Full autonomy criteria defined after selective autonomy data

Version History

Version Date Changes
1.0 2026-01-17 Initial specification
1.1 2026-01-17 Safety filter, resolution types, escalation throttle, vet advice enforcement
1.2 2026-01-19 Production hardening: Preflight policy gate, idempotency, auth hardening, CORS hardening, confidence enforcement, human-readable reason codes, shadow mode default
1.3 2026-01-19 Humanization & gates: Support personas (Sophie/Tom/Lucy deterministic per customer), operating hours (8am-9pm London), response delay (3-12 min), email-before-resolve gate
1.4 2026-01-19 Semantic search & monitoring: pgvector embeddings with OpenAI, hybrid retrieval (semantic + keyword fallback), support queue Slack alerting via native Supabase stack
1.5 2026-01-20 Email threading & history: In-Reply-To/References headers for inbox threading, [Support] subject prefix, ticket_messages table for conversation history, unified send-support-email Edge Function
1.6 2026-03-19 Governance & resilience: Status reclassified to Shadow Mode Validation. Null outcome fix with ensureOutcomeWritten and CHECK constraint. Timeout-to-escalation at 80% budget. All agent writes via SECURITY DEFINER RPCs with failure recovery to agent_execution_failures table. Customer.io retry logic. Least-privilege support_agent role with explicit grants. Shadow-mode graduation criteria with v_agent_graduation_readiness view. KB auto-embedding trigger via pg_net. Cost and execution health monitoring views. Known limitations reduced from 8 to 3.

Known Limitations (v1.6)

Limitation Status Plan
Attachments always escalate Accepted (conservative) v1.7: Smart attachment handling
Threading requires email_message_id email-ingest Worker extracts from MIME Documented in CS-01 v2.3
Old tickets lack threading Backfill not planned New tickets only

File Locations

File Location
Autonomous Agent supabase/functions/autonomous-support-agent/index.ts
Email Sender supabase/functions/send-support-email/index.ts
Embedding Generator supabase/functions/embedding-generator/index.ts
Database Migrations supabase/migrations/
Email Ingestion Worker email-ingest/src/index.ts (Cloudflare Worker)

Document Owner: Protocol Raw Operations Version: 1.6 Status: 🟡 Shadow Mode Validation

End of SOP CS-03 v1.6