SOP CS-03: Autonomous Support Agent (v1.6)¶

Status: 🟡 Shadow Mode Validation Last Updated: 2026-03-19 Owner: Protocol Raw Operations Platform: Supabase Edge Functions + Claude API + Cloudflare Workers Prerequisites: SOP CS-01 v2.1, SOP CS-02 v1.4, SOP AI-KB-01 v1.2

Executive Summary¶

What This Is¶

An autonomous AI agent that receives customer support tickets, investigates context using Protocol Raw's database and external APIs, determines appropriate resolution, and executes actions—with human oversight only for high-stakes decisions.

Why It Matters¶

Metric	Current State (AI-Assist)	Target State (Agentic)
Human involvement	100% of tickets	<20% of tickets
Response time	15-60 min (human review)	<2 min (autonomous)
Tickets per hour capacity	15-20 (human limited)	Unlimited
First hire threshold	~500 customers	~5,000 customers
Cost per ticket	~£3-5 (human time)	~£0.05-0.10 (API costs)

The Strategic Case¶

This agent embodies the "AI-native" thesis that underpins Protocol Raw's 10-19× capital efficiency claim. By handling 80%+ of routine support autonomously, we can:

Validate faster — Phase A customers get instant responses, improving NPS and retention
Defer hiring — Operations Lead focuses on high-value work, not ticket clearing
Build moat — Accumulated decision data trains better judgment over time
Scale confidently — 10k customers requires no more human support capacity than 500

v1.6 Changes¶

Status reclassified — Document reclassified from "Production Ready - Shadow Mode" to "Shadow Mode Validation." Status must now be earned by passing the validation gate, not declared at document creation
Null outcome fix — Agent execution guarantees non-null outcome on every run via try/catch/finally with ensureOutcomeWritten fallback. CHECK constraint on database prevents null outcomes on completed executions
Timeout-to-escalation — Agent loop checks elapsed time at each iteration. If >80% of timeout budget consumed, escalates to human rather than risking silent failure
Outcome monitoring — New view v_agent_outcome_health tracks execution health daily
Least-privilege role — support_agent PostgreSQL role with explicit grants; SET ROLE at execution start (CS03-004)
Partial-failure recovery — All agent writes via SECURITY DEFINER RPCs with failure recovery to agent_execution_failures table. Customer.io retry logic with 5s backoff (CS03-005)
Graduation criteria defined — Explicit thresholds for exiting shadow mode with v_agent_graduation_readiness view (CS03-006)
KB auto-embedding — Database trigger on ai_knowledge_sections automatically generates embeddings via pg_net when sections are created or content is updated. Manual backfill_all retained for recovery only (CS03-009)
Cost and SLO monitoring — Daily views for API cost per ticket, execution health, and email delivery rate. SLO targets defined for autonomous mode (CS03-010)

v1.5 Changes¶

Email Threading — Outbound emails include In-Reply-To and References headers for proper inbox threading
[Support] Prefix — All outbound subject lines automatically prefixed with [Support] to distinguish from marketing
Conversation History — Messages logged to ticket_messages table for full thread tracking
Unified Email Function — send-support-email Edge Function used by both CS-03 and Ops Portal

v1.4 Changes¶

Semantic KB Search — pgvector embeddings with OpenAI text-embedding-3-small
Hybrid Retrieval — Semantic first, keyword fallback if similarity < 0.3
Search Logging — Query, top match, similarity score, match type logged
Slack Alerting — Support queue monitor every 15 min via native Supabase stack

v1.3 Changes¶

Support Personas — Sophie, Tom, Lucy assigned deterministically per customer
Operating Hours — 8am-9pm London time (timezone-aware)
Response Delay — 3-12 minute random delay (anti-bot humanization)
Email-Before-Resolve Gate — Code-enforced: cannot resolve without sending email first

v1.2 Changes (Production Hardening)¶

Preflight Policy Gate — Code-enforced escalation rules run BEFORE Claude (zero-token escalation)
Idempotency — dedupe_key prevents duplicate processing from retries
Auth Hardening — X-Internal-Secret header required (throws on boot if missing)
CORS Hardening — Only allowed origins receive CORS headers
Confidence Enforcement — Agent must provide confidence score; <70% blocks resolution
Human-Readable Reason Codes — Machine codes (e.g., health_vomiting) for Metabase analytics
Timeline Whitelist Narrowed — Only tracking-cited timelines exempt; vet advice NEVER whitelisted
Shadow Mode Default — All tickets escalate for human review during validation

Email Threading (v1.5)¶

Purpose¶

Customer replies now thread correctly in their inbox, appearing as a single conversation rather than separate emails.

How It Works¶

When an inbound email arrives: 1. Cloudflare email-ingest Worker parses the MIME Message-ID header 2. This is stored in support_tickets.email_message_id via cs-agent-triage

When sending a reply: 1. Edge Function fetches the original email_message_id from the ticket 2. Adds In-Reply-To and References headers to the outbound email 3. Customer.io sends with these headers

Database Schema¶

-- Added to support_tickets
ALTER TABLE raw_ops.support_tickets 
ADD COLUMN email_message_id TEXT,
ADD COLUMN email_references TEXT;

Threading Headers¶

CIO accepts a plain key-value object (not the JSON string format their OpenAPI spec claims). Tested and confirmed 2026-04-08.

{
  "headers": {
    "In-Reply-To": "<original-message-id@gmail.com>",
    "References": "<original-message-id@gmail.com>"
  }
}

Edge Function Response¶

The send-support-email function now returns:

{
  "success": true,
  "delivery_id": "abc123",
  "persona": "Sophie",
  "threading_enabled": true,
  "sent_immediately": true
}

Subject Line Prefix (v1.5)¶

Purpose¶

Distinguish support emails from marketing communications in customer inbox.

Implementation¶

All outbound support emails automatically receive [Support] prefix. The Re: prefix is only added when there are existing outbound messages on the ticket (not on first reply). Placeholder subjects ("(no subject)", "(pending)", "(none)", "(empty)") are normalised to category-based fallbacks.

// [Support] prefix - applied to all outbound
const emailSubject = baseSubject.startsWith('[Support]')
  ? baseSubject
  : `[Support] ${baseSubject}`;

// Re: prefix - only on follow-up replies
const { count: priorReplies } = await supabase
  .from('ticket_messages')
  .select('id', { count: 'exact', head: true })
  .eq('ticket_id', ticket_id)
  .eq('direction', 'outbound');

baseSubject = (priorReplies > 0) ? `Re: ${clean}` : clean;

Examples¶

Scenario	Final Subject
Customer subject: "Where is my order?" (first reply)	[Support] Where is my order?
Same ticket, second reply	[Support] Re: Where is my order?
No subject, AI generates "Loose stools during transition"	[Support] Loose stools during transition
Already prefixed	[Support] Already prefixed (no duplication)

Conversation History (v1.5)¶

Purpose¶

Track the full back-and-forth conversation between customer and support, enabling: - Context for agents handling follow-up replies - Historical view in Ops Portal - Audit trail of all communications

Database Schema¶

CREATE TABLE raw_ops.ticket_messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  ticket_id UUID NOT NULL REFERENCES raw_ops.support_tickets(id) ON DELETE CASCADE,
  direction TEXT NOT NULL CHECK (direction IN ('inbound', 'outbound')),
  sender TEXT NOT NULL,  -- customer email or agent persona
  subject TEXT,
  body TEXT NOT NULL,
  email_message_id TEXT,  -- For threading
  delivery_id TEXT,       -- Customer.io delivery ID for outbound
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_ticket_messages_ticket_id ON raw_ops.ticket_messages(ticket_id);

Message Flow¶

Inbound (customer email): - Trigger on support_tickets INSERT creates first message automatically - Direction: inbound, Sender: customer email

Outbound (our reply): - Created by send-support-email Edge Function - Direction: outbound, Sender: persona name (Sophie/Tom/Lucy)

Retrieval Function¶

CREATE FUNCTION public.get_ticket_thread(p_ticket_id UUID)
RETURNS TABLE (
  message_id UUID,
  direction TEXT,
  sender TEXT,
  subject TEXT,
  body TEXT,
  created_at TIMESTAMPTZ
);

Support Personas (v1.3)¶

Overview¶

The agent uses named personas to humanize automated responses and create brand consistency.

Persona	Email Sign-off
Sophie	`Sophie`
Tom	`Tom`
Lucy	`Lucy`

Assignment Logic¶

In send-support-email, persona is assigned randomly. The CS-03 autonomous agent assigns deterministically per customer (same customer always gets same persona).

// send-support-email (random)
const persona = PERSONAS[Math.floor(Math.random() * PERSONAS.length)];

// CS-03 autonomous agent (deterministic)
function getPersonaForCustomer(customerId: string): string {
  const hash = customerId.split('').reduce((sum, char) => sum + char.charCodeAt(0), 0);
  return PERSONAS[hash % PERSONAS.length];
}

Sign-off Handling¶

The AI draft must NOT include any sign-off. The email_context and email_format KB sections instruct the AI not to add any closing, name, or farewell. A belt-and-braces regex in cs-agent-triage strips trailing sign-off patterns (persona names, "Best regards", "[Name]", etc.) from the draft before storing on the ticket — examining only the last few lines to avoid stripping body content.

send-support-email appends the persona sign-off ("{persona}\nProtocol Raw") to the message body before sending to Customer.io. The sign-off is part of the message_data.body field. The CIO Support Layout does not inject any greeting or sign-off — it only provides card chrome (accent line, wordmark, footer).

Legacy: Previous versions used [Name] as a placeholder in the AI draft. This is no longer used. send-support-email still replaces [Name] with the persona for backward compatibility.

Operating Hours & Response Timing (v1.3)¶

Operating Hours¶

Setting	Value
Start	8:00 AM London time
End	9:00 PM London time
Days	Every day (7 days/week)
Timezone	Europe/London (handles BST/GMT automatically)

Response Delay¶

To avoid the "instant bot" tell, responses are delayed:

Scenario	Delay
Within operating hours	3-12 minutes (random)
Outside operating hours	Queued to 8am + 3-12 min delay

Implementation¶

function getLondonHour(): number {
  const formatter = new Intl.DateTimeFormat('en-GB', {
    timeZone: 'Europe/London',
    hour: 'numeric',
    hour12: false,
  });
  return parseInt(formatter.format(new Date()), 10);
}

function isWithinOperatingHours(): boolean {
  const hour = getLondonHour();
  return hour >= 8 && hour < 21; // 8am to 9pm
}

function getResponseDelayMs(): number {
  const minDelay = 3 * 60 * 1000;  // 3 minutes
  const maxDelay = 12 * 60 * 1000; // 12 minutes
  return Math.floor(Math.random() * (maxDelay - minDelay + 1)) + minDelay;
}

Email-Before-Resolve Gate (v1.3)¶

Purpose¶

Code-enforced rule: Agent cannot resolve a ticket without first sending a response email to the customer.

This prevents "silent resolutions" where the ticket is marked resolved but the customer never received a response.

Implementation¶

State tracking:

interface AgentState {
  // ... other fields
  emailSent: boolean;  // Set true when send_customer_email succeeds
}

Gate in executeResolveTicket:

async function executeResolveTicket(input, state): Promise<ToolResult> {
  // HARD GATE: Cannot resolve without sending email first
  if (!state.emailSent) {
    return {
      success: false,
      error: 'Cannot resolve ticket without sending a response to the customer first. Use send_customer_email, then resolve.',
    };
  }

  // ... rest of resolution logic
}

Behavior¶

Scenario	Result
Agent tries to resolve without emailing	Tool returns error, agent must email first
Agent emails then resolves	Resolution proceeds normally
Agent escalates without emailing	Allowed (escalation doesn't require email)

Architecture Overview¶

Component Responsibilities¶

Component	Responsibility	Technology
Trigger Layer	Receive emails, chat escalations	Cloudflare email-ingest Worker + chat Edge Function
Preflight Policy Gate	Code-enforced escalation rules (v1.2)	Edge Function
Orchestrator	Coordinate agent execution, enforce guardrails	Supabase Edge Function
Agent Brain	Reason about tickets, select tools, generate responses	Claude API (claude-sonnet-4-20250514)
Safety Filter	Validate responses before sending	Edge Function
Persona & Timing	Assign persona, calculate send time (v1.3)	Edge Function
Email Sender	Send emails with threading headers (v1.5)	`send-support-email` Edge Function
Tool Executor	Perform database queries, API calls	Edge Function + RPC
Audit Logger	Record every decision and action	Supabase tables
Escalation Router	Send complex cases to Slack for human review	Slack API

Request Flow (v1.5)¶

┌─────────────────────────────────────────────────────────────────────────────â”
│                           AGENT REQUEST FLOW v1.5                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. INGRESS                                                                 │
│     ├─ Auth Check (X-Internal-Secret) ──→ 401 if invalid                   │
│     └─ Idempotency Check (dedupe_key) ──→ Return prior result if exists    │
│                                                                             │
│  2. PREFLIGHT (Code-Enforced)                                              │
│     ├─ query_customer_context ──→ Get flags + customer_id                  │
│     └─ Policy Gate Check:                                                  │
│        ├─ Health/Safety keywords? ──→ ESCALATE (zero Claude tokens)        │
│        ├─ Financial/Legal language? ──→ ESCALATE                           │
│        ├─ Negative sentiment? ──→ ESCALATE                                 │
│        ├─ Attachments present? ──→ ESCALATE                                │
│        ├─ is_repeat_contacter? ──→ ESCALATE                                │
│        ├─ has_pending_replacement? ──→ ESCALATE                            │
│        └─ has_recent_credit + compensation request? ──→ ESCALATE           │
│                                                                             │
│  3. CLAUDE AGENT LOOP (only if policy gate passes)                         │
│     ├─ Investigate (tools)                                                 │
│     ├─ Search knowledge base (semantic + keyword)                          │
│     ├─ Draft response (no sign-off)                                        │
│     └─ Safety filter check                                                 │
│                                                                             │
│  4. SEND EMAIL (v1.5)                                                      │
│     ├─ Fetch email_message_id from ticket                                  │
│     ├─ Add [Support] prefix to subject                                     │
│     ├─ Apply persona sign-off                                              │
│     ├─ Add threading headers (In-Reply-To, References)                     │
│     ├─ Send via Customer.io                                                │
│     ├─ Log to ticket_messages table ──→ state.emailSent = true             │
│     └─ Log to ticket_notes for audit                                       │
│                                                                             │
│  5. RESOLUTION GATE (v1.3)                                                 │
│     └─ state.emailSent? ──→ No = Block resolution, return error            │
│                                                                             │
│  6. CONFIDENCE ENFORCEMENT                                                 │
│     ├─ Confidence provided? ──→ No = ESCALATE                              │
│     └─ Confidence >= 70%? ──→ No = ESCALATE                                │
│                                                                             │
│  7. SHADOW MODE (default)                                                  │
│     └─ All resolutions ──→ ESCALATE for human review                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Preflight Policy Gate (v1.2)¶

Purpose¶

Code-enforced escalation rules that run before Claude API is called. This ensures:

Deterministic safety — Critical rules in code, not prompt suggestions
Cost efficiency — Zero API tokens for obvious escalations
Auditability — Every policy trigger logged with machine-readable code
Speed — Policy escalations complete in <100ms

Policy Rules¶

Category	Pattern	Code	Severity
Health/Safety	sick, ill, unwell, poorly	`health_unwell`	critical
	vomit, vomiting, threw up	`health_vomiting`	critical
	diarrhea, loose stool	`health_digestive`	critical
	vet, veterinarian, animal hospital	`health_vet_mention`	critical
	foreign object, plastic, metal, bone fragment	`quality_foreign_object`	critical
	temperature, cold chain, warm, thawed	`quality_cold_chain`	critical
	allergic, allergy, reaction, swelling	`health_allergy`	critical
	lethargic, not eating, refusing food	`health_appetite`	critical
	blood, bleeding	`health_blood`	critical
	emergency, urgent, rushed	`health_emergency`	critical
Financial/Legal	refund, money back, reimburse	`financial_refund`	high
	compensation, compensate	`financial_compensation`	high
	solicitor, lawyer, legal action, trading standards	`legal_threat`	high
	small claims, court	`legal_court`	high
	chargeback, dispute the charge	`financial_chargeback`	high
Sentiment	disgusting, disgraceful, unacceptable	`sentiment_negative`	high
	worst experience, never again, cancel everything	`sentiment_churn_risk`	high
	social media, twitter, facebook, review	`sentiment_social_threat`	high
	tell everyone, warn others, public	`sentiment_public_threat`	high
	furious, livid, fuming, outraged	`sentiment_anger`	high
Special	passed away, died, rainbow bridge, euthanasia	`special_bereavement`	medium
	wholesale, bulk order, retailer, breeder, B2B	`special_b2b`	medium
	speak to human, real person, manager	`special_human_request`	medium
Context Flags	3+ tickets in 7 days	`context_repeat_contacter`	medium
	Pending replacement exists	`context_pending_replacement`	medium
	Recent credit + compensation request	`context_recent_credit_request`	medium
Attachments	Any attachment present	`attachment_present`	high

Reason Codes for Metabase¶

All policy triggers are logged with machine-readable reasonCode enabling:

Escalation cause breakdown charts
Pattern trend analysis over time
Identification of new patterns requiring rules

Tool Definitions¶

Overview¶

Category	Tools	Autonomous?
Read	query_customer_context, check_order_status, get_courier_tracking, search_knowledge_base	✅ Always
Respond	send_customer_email, resolve_ticket	✅ Within guidelines + safety filter + email gate
Act (Low Risk)	skip_next_delivery	✅ Always
Act (Medium Risk)	apply_store_credit	✅ Up to £20
Act (High Risk)	trigger_replacement_order	⚠ï¸ Requires confirmation
Escalate	escalate_to_human	✅ Always (throttled in early phase)
Audit	add_internal_note	✅ Always

Tool: send_customer_email (v1.5)¶

Purpose: Send response email to customer with persona, timing, and threading applied.

Implementation: Calls send-support-email Edge Function (shared with Ops Portal).

Input:

{
  "ticket_id": "uuid",
  "subject": "Re: Customer's subject",
  "body": "Response text with [Name] placeholder for persona sign-off"
}

Output (v1.5):

{
  "success": true,
  "data": {
    "delivery_id": "RIabDAUAAZvbWwCC18...",
    "persona": "Sophie",
    "threading_enabled": true,
    "sent_immediately": true
  }
}

With delayed send:

{
  "success": true,
  "data": {
    "delivery_id": "RIabDAUAAZvbWwCC18...",
    "persona": "Sophie",
    "threading_enabled": true,
    "sent_immediately": false,
    "scheduled_for": "2026-01-20T08:07:32.000Z",
    "delay_minutes": 7,
    "within_operating_hours": false
  }
}

Side Effects: - Sets state.emailSent = true (enables resolution) - Logs to ticket_messages table (direction: outbound) - Logs to ticket_notes with persona email (e.g., sophie@protocolraw.co.uk) - Adds [Support] prefix to subject - Adds threading headers if email_message_id exists on ticket - Replaces [Name] placeholder with assigned persona name

Tool: resolve_ticket (v1.3)¶

Purpose: Mark ticket as resolved with category and summary.

Pre-conditions (code-enforced): 1. state.emailSent must be true (email gate) 2. confidence must be ≥70%

Input:

{
  "ticket_id": "uuid",
  "resolution_category": "subscription|delivery|quality|feeding|other",
  "resolution_type": "informational|action_taken|goodwill_gesture|clarification_only",
  "resolution_summary": "Brief description",
  "confidence": 85
}

Knowledge Base Search (v1.4)¶

Semantic Search with pgvector¶

The agent uses OpenAI embeddings + pgvector for semantic similarity search, with keyword fallback.

Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)

Search Flow¶

Query → OpenAI Embedding → pgvector cosine similarity → Results
                                    ↓
                          If similarity < 0.3
                                    ↓
                          Keyword fallback search

Database Schema¶

-- Embedding column on KB sections
ALTER TABLE raw_ops.ai_knowledge_sections 
ADD COLUMN embedding vector(1536);

-- IVFFlat index for fast similarity search
CREATE INDEX idx_kb_sections_embedding 
ON raw_ops.ai_knowledge_sections 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 10);

Hybrid Search RPC¶

-- fn_search_kb_hybrid(p_query_embedding, p_query_text, p_match_threshold, p_limit)
-- Returns: section_key, title, content, similarity, match_type

Parameter	Default	Description
`p_query_embedding`	required	1536-dim vector from OpenAI
`p_query_text`	required	Original query for keyword fallback
`p_match_threshold`	0.3	Minimum cosine similarity
`p_limit`	5	Max results

Search Output¶

{
  "success": true,
  "data": {
    "results": [
      {
        "section_key": "portal_pause_subscription",
        "title": "Pausing Your Subscription",
        "content": "...",
        "similarity": 0.543,
        "match_type": "semantic"
      }
    ],
    "query": "pause subscription holiday",
    "search_type": "semantic"
  }
}

Logging¶

Every search logs to Edge Function console:

KB search: query="pause subscription", top_match="portal_pause_subscription", similarity=0.543, match_type=semantic

Embedding Generation¶

Edge Function: embedding-generator

Action	Input	Description
`backfill_all`	—	Embed all sections without embeddings
`embed_one`	`section_id`	Embed single section

Future: Trigger on KB insert/update for auto-embedding.

Support Queue Monitoring (v1.4)¶

Overview¶

Native Supabase monitoring alerts when tickets need human review.

Components¶

Component	Purpose
`v_support_needs_review`	View showing unreviewed escalations
`fn_check_support_health_v2()`	Support health monitor (queue, triage quality, intake)
`monitor-cs-01-support-health`	pg_cron job every 15 min
`ops-alerter`	Edge Function sending to Slack

Alert Severity¶

Queue Age	Severity	Channel
< 30 min	info	#ops-alerts
30-60 min	warning	#ops-alerts
> 60 min	critical	#ops-urgent

Slack Alert Format¶

📬 Support Queue Needs Review
1 ticket(s) awaiting human review
Queue Size: 1
Oldest: 45 min
Action: https://ops.protocolraw.co.uk → Support

Configuration¶

Edge Function Config¶

const CONFIG: AgentConfig = {
  model: 'claude-sonnet-4-20250514',
  maxIterations: 10,
  maxTokens: 4096,
  timeoutMs: 30000,
  creditLimitPence: 2000,
  escalationThrottleEnabled: true,
  escalationThrottleMaxPerHour: 20,
  shadowMode: true,        // v1.2: Default ON
  requireAuth: true,       // v1.2: Auth required
  allowedOrigins: [        // v1.2: CORS whitelist
    'https://protocolraw.co.uk',
    'https://www.protocolraw.co.uk',
    'https://ops.protocolraw.co.uk',
    'https://hook.eu1.make.com',
  ],
};

// v1.3: Personas
const PERSONAS = ['Sophie', 'Tom', 'Lucy'];

// v1.3: Operating hours (London time)
// 8am (8) to 9pm (21)

Environment Variables¶

Variable	Required	Description
`SUPABASE_URL`	Yes	Supabase project URL
`SUPABASE_SERVICE_ROLE_KEY`	Yes	Service role key
`ANTHROPIC_API_KEY`	Yes	Claude API key
`OPENAI_API_KEY`	Yes	OpenAI API key (for embeddings)
`CUSTOMERIO_API_KEY`	Yes	Customer.io transactional API key
`AGENT_INTERNAL_SECRET`	Yes (if requireAuth)	Internal auth secret

Metabase Dashboards¶

Persona Distribution¶

SELECT 
  SPLIT_PART(created_by, '@', 1) as persona,
  COUNT(*) as emails_sent
FROM raw_ops.ticket_notes
WHERE note_type = 'email_sent'
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY 1
ORDER BY 2 DESC;

Response Timing Analysis¶

SELECT 
  tool_output->'data'->>'within_operating_hours' as within_hours,
  AVG((tool_output->'data'->>'delay_minutes')::int) as avg_delay_min,
  COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'send_customer_email'
  AND success = true
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1;

Threading Adoption (v1.5)¶

SELECT 
  DATE(created_at) as date,
  COUNT(*) FILTER (WHERE threading_enabled = true) as threaded,
  COUNT(*) FILTER (WHERE threading_enabled = false OR threading_enabled IS NULL) as not_threaded
FROM raw_ops.ticket_messages
WHERE direction = 'outbound'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY 1 DESC;

Conversation Thread Depth¶

SELECT 
  t.id as ticket_id,
  t.subject,
  COUNT(*) as message_count,
  COUNT(*) FILTER (WHERE m.direction = 'inbound') as customer_messages,
  COUNT(*) FILTER (WHERE m.direction = 'outbound') as our_replies
FROM raw_ops.support_tickets t
JOIN raw_ops.ticket_messages m ON m.ticket_id = t.id
WHERE t.created_at > NOW() - INTERVAL '30 days'
GROUP BY 1, 2
HAVING COUNT(*) > 1
ORDER BY message_count DESC;

Policy Gate Escalation Breakdown¶

SELECT 
  tool_output->'data'->>'reasonCode' as reason_code,
  tool_output->'data'->>'severity' as severity,
  COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'policy_gate'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY count DESC;

Confidence Calibration¶

WITH calibration AS (
  SELECT 
    CASE 
      WHEN confidence_score >= 90 THEN '90-100%'
      WHEN confidence_score >= 80 THEN '80-89%'
      WHEN confidence_score >= 70 THEN '70-79%'
      ELSE '<70%'
    END as bucket,
    human_agreed
  FROM raw_ops.agent_decisions
  WHERE human_reviewed_at IS NOT NULL
    AND created_at > NOW() - INTERVAL '30 days'
)
SELECT 
  bucket,
  COUNT(*) as total,
  SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) as agreed,
  ROUND(100.0 * SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) / COUNT(*), 1) as accuracy_pct
FROM calibration
GROUP BY bucket
ORDER BY bucket DESC;

Agent Performance Summary¶

SELECT
  DATE(created_at) as date,
  COUNT(*) as total_executions,
  SUM(CASE WHEN outcome = 'resolved' THEN 1 ELSE 0 END) as resolved,
  SUM(CASE WHEN outcome = 'escalated' THEN 1 ELSE 0 END) as escalated,
  SUM(CASE WHEN 'policy_gate' = ANY(tools_used) THEN 1 ELSE 0 END) as policy_escalations,
  AVG(duration_ms) as avg_duration_ms,
  SUM(total_tokens) as total_tokens
FROM raw_ops.agent_executions
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;

API Cost per Ticket (v1.6)¶

SELECT * FROM raw_ops.v_agent_cost_daily;

Execution Health (v1.6)¶

SELECT * FROM raw_ops.v_agent_execution_health;

Email Delivery Rate (v1.6)¶

SELECT * FROM raw_ops.v_agent_email_delivery;

Graduation Readiness (v1.6)¶

SELECT * FROM raw_ops.v_agent_graduation_readiness;

Deployment Checklist¶

First Deployment¶

[ ] Run database migrations (tables, indexes, dedupe_key column)
[ ] Add email_message_id and email_references columns to support_tickets
[ ] Create ticket_messages table with trigger
[ ] Deploy RPC functions (fn_agent_get_customer_context, fn_agent_get_order_status, get_ticket_thread)
[ ] Set environment variables in Supabase (including CUSTOMERIO_API_KEY)
[ ] Deploy send-support-email Edge Function
[ ] Deploy agent Edge Function with requireAuth: false for testing
[ ] Test policy gate with health ticket
[ ] Test clean ticket through Claude
[ ] Verify persona assignment is deterministic
[ ] Verify timing shows correct London hours
[ ] Verify email threading works (In-Reply-To headers)
[ ] Enable requireAuth: true
[ ] Set AGENT_INTERNAL_SECRET
[ ] Verify email-ingest Worker is deployed and passing Message-ID to cs-agent-triage

Shadow Mode Validation Gate¶

All items below must be completed before this SOP can be reclassified to "Production Ready — Shadow Mode." Until then, the document status remains "Shadow Mode Validation."

[ ] Shadow mode enabled and processing tickets
[ ] 100+ test tickets processed with zero null outcomes
[ ] Confidence calibration reviewed (70% threshold validated against human agreement rate)
[ ] No safety filter false negatives in test set
[ ] Persona consistency verified (same customer = same persona)
[ ] Operating hours verified (emails queued outside 8am-9pm)
[ ] Email threading verified (replies appear in same thread)
[ ] [Support] prefix appearing on all outbound emails
[ ] Conversation history populating in ticket_messages
[ ] Metabase dashboards configured and returning data
[ ] Slack escalation channel (#ops-urgent) ready and tested
[ ] Customer.io integration tested with scheduled sends
[ ] Least-privilege database role deployed (v1.6 — CS03-004) ✅
[ ] Partial-failure recovery model implemented (v1.6 — CS03-005) ✅
[ ] Timeout-to-escalation path implemented (v1.6 — CS03-003) ✅
[ ] Null outcome fix deployed (v1.6 — CS03-001) ✅

Success Metrics¶

Metric	Shadow Mode	Full Autonomy Target
Tickets processed	Track	Track
Policy gate escalations	Track	<30%
Claude escalations	100% (shadow)	<20%
Correct decisions	Track	>95%
Response time (policy gate)	<100ms	<100ms
Response time (Claude)	<30s	<30s
Safety filter triggers	Track	<5%
Email gate blocks	Track	<1% (prompt compliance)
Threading enabled	Track (v1.5)	>95%
Multi-message threads	Track (v1.5)	Track
Customer.io delivery success	Track (v1.6)	>99%
Execution completion (non-null outcome)	>99% (v1.6)	>99.5%
Avg API cost per ticket	Track (v1.6)	<£0.15

Shadow Mode Graduation Criteria (v1.6)¶

Purpose¶

Shadow mode exists to validate that the agent makes correct decisions before granting it autonomy. Graduation is data-driven with explicit thresholds, not date-driven.

Prerequisites (all must be true)¶

Shadow Mode Validation Gate (Deployment Checklist) fully completed
All v1.6 changes implemented and stable for 14+ days

Graduation Thresholds¶

All thresholds must be met simultaneously over the most recent 200 human-reviewed decisions:

Metric	Threshold	Source
Sample size	≥ 200 reviewed decisions	agent_decisions
Human agreement rate	≥ 90%	agent_decisions.human_agreed
Draft acceptance (sent as-is)	≥ 70%	agent_decisions.human_agreed
Outcome health	≥ 99% resolved or escalated	agent_executions.outcome
Unrecovered failures	0	agent_execution_failures

Blocker Conditions (any one blocks graduation)¶

Any unrecovered agent_execution_failures
Confidence calibration not reviewed in the last 30 days
Metabase dashboards not returning data for all success metrics

Graduation Query¶

SELECT * FROM raw_ops.v_agent_graduation_readiness;

Returns a single row with graduation_status = 'READY' or 'NOT READY' and individual gate pass/fail status.

Graduation Process¶

Run graduation query. All gates must show PASS
Anton reviews the full dashboard and confirms
SOP status updated to "Production Ready — Shadow Mode"
Enable selective autonomy: auto-send for confidence ≥ 90% AND no policy gate AND category not in mandatory_review list
Full autonomy criteria defined after selective autonomy data

Version History¶

Version	Date	Changes
1.0	2026-01-17	Initial specification
1.1	2026-01-17	Safety filter, resolution types, escalation throttle, vet advice enforcement
1.2	2026-01-19	Production hardening: Preflight policy gate, idempotency, auth hardening, CORS hardening, confidence enforcement, human-readable reason codes, shadow mode default
1.3	2026-01-19	Humanization & gates: Support personas (Sophie/Tom/Lucy deterministic per customer), operating hours (8am-9pm London), response delay (3-12 min), email-before-resolve gate
1.4	2026-01-19	Semantic search & monitoring: pgvector embeddings with OpenAI, hybrid retrieval (semantic + keyword fallback), support queue Slack alerting via native Supabase stack
1.5	2026-01-20	Email threading & history: In-Reply-To/References headers for inbox threading, [Support] subject prefix, ticket_messages table for conversation history, unified send-support-email Edge Function
1.6	2026-03-19	Governance & resilience: Status reclassified to Shadow Mode Validation. Null outcome fix with ensureOutcomeWritten and CHECK constraint. Timeout-to-escalation at 80% budget. All agent writes via SECURITY DEFINER RPCs with failure recovery to agent_execution_failures table. Customer.io retry logic. Least-privilege support_agent role with explicit grants. Shadow-mode graduation criteria with v_agent_graduation_readiness view. KB auto-embedding trigger via pg_net. Cost and execution health monitoring views. Known limitations reduced from 8 to 3.

Known Limitations (v1.6)¶

Limitation	Status	Plan
Attachments always escalate	Accepted (conservative)	v1.7: Smart attachment handling
Threading requires email_message_id	email-ingest Worker extracts from MIME	Documented in CS-01 v2.3
Old tickets lack threading	Backfill not planned	New tickets only

File Locations¶

File	Location
Autonomous Agent	`supabase/functions/autonomous-support-agent/index.ts`
Email Sender	`supabase/functions/send-support-email/index.ts`
Embedding Generator	`supabase/functions/embedding-generator/index.ts`
Database Migrations	`supabase/migrations/`
Email Ingestion Worker	`email-ingest/src/index.ts` (Cloudflare Worker)

Document Owner: Protocol Raw Operations Version: 1.6 Status: 🟡 Shadow Mode Validation

End of SOP CS-03 v1.6