SOP CS-03: Autonomous Support Agent (v1.6)¶
Status: 🟡 Shadow Mode Validation Last Updated: 2026-03-19 Owner: Protocol Raw Operations Platform: Supabase Edge Functions + Claude API + Cloudflare Workers Prerequisites: SOP CS-01 v2.1, SOP CS-02 v1.4, SOP AI-KB-01 v1.2
Executive Summary¶
What This Is¶
An autonomous AI agent that receives customer support tickets, investigates context using Protocol Raw's database and external APIs, determines appropriate resolution, and executes actions—with human oversight only for high-stakes decisions.
Why It Matters¶
| Metric | Current State (AI-Assist) | Target State (Agentic) |
|---|---|---|
| Human involvement | 100% of tickets | <20% of tickets |
| Response time | 15-60 min (human review) | <2 min (autonomous) |
| Tickets per hour capacity | 15-20 (human limited) | Unlimited |
| First hire threshold | ~500 customers | ~5,000 customers |
| Cost per ticket | ~£3-5 (human time) | ~£0.05-0.10 (API costs) |
The Strategic Case¶
This agent embodies the "AI-native" thesis that underpins Protocol Raw's 10-19× capital efficiency claim. By handling 80%+ of routine support autonomously, we can:
- Validate faster — Phase A customers get instant responses, improving NPS and retention
- Defer hiring — Operations Lead focuses on high-value work, not ticket clearing
- Build moat — Accumulated decision data trains better judgment over time
- Scale confidently — 10k customers requires no more human support capacity than 500
v1.6 Changes¶
- Status reclassified — Document reclassified from "Production Ready - Shadow Mode" to "Shadow Mode Validation." Status must now be earned by passing the validation gate, not declared at document creation
- Null outcome fix — Agent execution guarantees non-null outcome on every run via try/catch/finally with ensureOutcomeWritten fallback. CHECK constraint on database prevents null outcomes on completed executions
- Timeout-to-escalation — Agent loop checks elapsed time at each iteration. If >80% of timeout budget consumed, escalates to human rather than risking silent failure
- Outcome monitoring — New view
v_agent_outcome_healthtracks execution health daily - Least-privilege role —
support_agentPostgreSQL role with explicit grants; SET ROLE at execution start (CS03-004) - Partial-failure recovery — All agent writes via SECURITY DEFINER RPCs with failure recovery to
agent_execution_failurestable. Customer.io retry logic with 5s backoff (CS03-005) - Graduation criteria defined — Explicit thresholds for exiting shadow mode with
v_agent_graduation_readinessview (CS03-006) - KB auto-embedding — Database trigger on
ai_knowledge_sectionsautomatically generates embeddings via pg_net when sections are created or content is updated. Manualbackfill_allretained for recovery only (CS03-009) - Cost and SLO monitoring — Daily views for API cost per ticket, execution health, and email delivery rate. SLO targets defined for autonomous mode (CS03-010)
v1.5 Changes¶
- Email Threading — Outbound emails include
In-Reply-ToandReferencesheaders for proper inbox threading - [Support] Prefix — All outbound subject lines automatically prefixed with
[Support]to distinguish from marketing - Conversation History — Messages logged to
ticket_messagestable for full thread tracking - Unified Email Function —
send-support-emailEdge Function used by both CS-03 and Ops Portal
v1.4 Changes¶
- Semantic KB Search — pgvector embeddings with OpenAI text-embedding-3-small
- Hybrid Retrieval — Semantic first, keyword fallback if similarity < 0.3
- Search Logging — Query, top match, similarity score, match type logged
- Slack Alerting — Support queue monitor every 15 min via native Supabase stack
v1.3 Changes¶
- Support Personas — Sophie, Tom, Lucy assigned deterministically per customer
- Operating Hours — 8am-9pm London time (timezone-aware)
- Response Delay — 3-12 minute random delay (anti-bot humanization)
- Email-Before-Resolve Gate — Code-enforced: cannot resolve without sending email first
v1.2 Changes (Production Hardening)¶
- Preflight Policy Gate — Code-enforced escalation rules run BEFORE Claude (zero-token escalation)
- Idempotency — dedupe_key prevents duplicate processing from retries
- Auth Hardening — X-Internal-Secret header required (throws on boot if missing)
- CORS Hardening — Only allowed origins receive CORS headers
- Confidence Enforcement — Agent must provide confidence score; <70% blocks resolution
- Human-Readable Reason Codes — Machine codes (e.g.,
health_vomiting) for Metabase analytics - Timeline Whitelist Narrowed — Only tracking-cited timelines exempt; vet advice NEVER whitelisted
- Shadow Mode Default — All tickets escalate for human review during validation
Email Threading (v1.5)¶
Purpose¶
Customer replies now thread correctly in their inbox, appearing as a single conversation rather than separate emails.
How It Works¶
When an inbound email arrives:
1. Cloudflare email-ingest Worker parses the MIME Message-ID header
2. This is stored in support_tickets.email_message_id via cs-agent-triage
When sending a reply:
1. Edge Function fetches the original email_message_id from the ticket
2. Adds In-Reply-To and References headers to the outbound email
3. Customer.io sends with these headers
Database Schema¶
-- Added to support_tickets
ALTER TABLE raw_ops.support_tickets
ADD COLUMN email_message_id TEXT,
ADD COLUMN email_references TEXT;
Threading Headers¶
CIO accepts a plain key-value object (not the JSON string format their OpenAPI spec claims). Tested and confirmed 2026-04-08.
{
"headers": {
"In-Reply-To": "<original-message-id@gmail.com>",
"References": "<original-message-id@gmail.com>"
}
}
Edge Function Response¶
The send-support-email function now returns:
{
"success": true,
"delivery_id": "abc123",
"persona": "Sophie",
"threading_enabled": true,
"sent_immediately": true
}
Subject Line Prefix (v1.5)¶
Purpose¶
Distinguish support emails from marketing communications in customer inbox.
Implementation¶
All outbound support emails automatically receive [Support] prefix. The Re: prefix is only added when there are existing outbound messages on the ticket (not on first reply). Placeholder subjects ("(no subject)", "(pending)", "(none)", "(empty)") are normalised to category-based fallbacks.
// [Support] prefix - applied to all outbound
const emailSubject = baseSubject.startsWith('[Support]')
? baseSubject
: `[Support] ${baseSubject}`;
// Re: prefix - only on follow-up replies
const { count: priorReplies } = await supabase
.from('ticket_messages')
.select('id', { count: 'exact', head: true })
.eq('ticket_id', ticket_id)
.eq('direction', 'outbound');
baseSubject = (priorReplies > 0) ? `Re: ${clean}` : clean;
Examples¶
| Scenario | Final Subject |
|---|---|
| Customer subject: "Where is my order?" (first reply) | [Support] Where is my order? |
| Same ticket, second reply | [Support] Re: Where is my order? |
| No subject, AI generates "Loose stools during transition" | [Support] Loose stools during transition |
| Already prefixed | [Support] Already prefixed (no duplication) |
Conversation History (v1.5)¶
Purpose¶
Track the full back-and-forth conversation between customer and support, enabling: - Context for agents handling follow-up replies - Historical view in Ops Portal - Audit trail of all communications
Database Schema¶
CREATE TABLE raw_ops.ticket_messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ticket_id UUID NOT NULL REFERENCES raw_ops.support_tickets(id) ON DELETE CASCADE,
direction TEXT NOT NULL CHECK (direction IN ('inbound', 'outbound')),
sender TEXT NOT NULL, -- customer email or agent persona
subject TEXT,
body TEXT NOT NULL,
email_message_id TEXT, -- For threading
delivery_id TEXT, -- Customer.io delivery ID for outbound
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_ticket_messages_ticket_id ON raw_ops.ticket_messages(ticket_id);
Message Flow¶
Inbound (customer email):
- Trigger on support_tickets INSERT creates first message automatically
- Direction: inbound, Sender: customer email
Outbound (our reply):
- Created by send-support-email Edge Function
- Direction: outbound, Sender: persona name (Sophie/Tom/Lucy)
Retrieval Function¶
CREATE FUNCTION public.get_ticket_thread(p_ticket_id UUID)
RETURNS TABLE (
message_id UUID,
direction TEXT,
sender TEXT,
subject TEXT,
body TEXT,
created_at TIMESTAMPTZ
);
Support Personas (v1.3)¶
Overview¶
The agent uses named personas to humanize automated responses and create brand consistency.
| Persona | Email Sign-off |
|---|---|
| Sophie | Sophie |
| Tom | Tom |
| Lucy | Lucy |
Assignment Logic¶
In send-support-email, persona is assigned randomly. The CS-03 autonomous agent assigns deterministically per customer (same customer always gets same persona).
// send-support-email (random)
const persona = PERSONAS[Math.floor(Math.random() * PERSONAS.length)];
// CS-03 autonomous agent (deterministic)
function getPersonaForCustomer(customerId: string): string {
const hash = customerId.split('').reduce((sum, char) => sum + char.charCodeAt(0), 0);
return PERSONAS[hash % PERSONAS.length];
}
Sign-off Handling¶
The AI draft must NOT include any sign-off. The email_context and email_format KB sections instruct the AI not to add any closing, name, or farewell. A belt-and-braces regex in cs-agent-triage strips trailing sign-off patterns (persona names, "Best regards", "[Name]", etc.) from the draft before storing on the ticket — examining only the last few lines to avoid stripping body content.
send-support-email appends the persona sign-off ("{persona}\nProtocol Raw") to the message body before sending to Customer.io. The sign-off is part of the message_data.body field. The CIO Support Layout does not inject any greeting or sign-off — it only provides card chrome (accent line, wordmark, footer).
Legacy: Previous versions used [Name] as a placeholder in the AI draft. This is no longer used. send-support-email still replaces [Name] with the persona for backward compatibility.
Operating Hours & Response Timing (v1.3)¶
Operating Hours¶
| Setting | Value |
|---|---|
| Start | 8:00 AM London time |
| End | 9:00 PM London time |
| Days | Every day (7 days/week) |
| Timezone | Europe/London (handles BST/GMT automatically) |
Response Delay¶
To avoid the "instant bot" tell, responses are delayed:
| Scenario | Delay |
|---|---|
| Within operating hours | 3-12 minutes (random) |
| Outside operating hours | Queued to 8am + 3-12 min delay |
Implementation¶
function getLondonHour(): number {
const formatter = new Intl.DateTimeFormat('en-GB', {
timeZone: 'Europe/London',
hour: 'numeric',
hour12: false,
});
return parseInt(formatter.format(new Date()), 10);
}
function isWithinOperatingHours(): boolean {
const hour = getLondonHour();
return hour >= 8 && hour < 21; // 8am to 9pm
}
function getResponseDelayMs(): number {
const minDelay = 3 * 60 * 1000; // 3 minutes
const maxDelay = 12 * 60 * 1000; // 12 minutes
return Math.floor(Math.random() * (maxDelay - minDelay + 1)) + minDelay;
}
Email-Before-Resolve Gate (v1.3)¶
Purpose¶
Code-enforced rule: Agent cannot resolve a ticket without first sending a response email to the customer.
This prevents "silent resolutions" where the ticket is marked resolved but the customer never received a response.
Implementation¶
State tracking:
interface AgentState {
// ... other fields
emailSent: boolean; // Set true when send_customer_email succeeds
}
Gate in executeResolveTicket:
async function executeResolveTicket(input, state): Promise<ToolResult> {
// HARD GATE: Cannot resolve without sending email first
if (!state.emailSent) {
return {
success: false,
error: 'Cannot resolve ticket without sending a response to the customer first. Use send_customer_email, then resolve.',
};
}
// ... rest of resolution logic
}
Behavior¶
| Scenario | Result |
|---|---|
| Agent tries to resolve without emailing | Tool returns error, agent must email first |
| Agent emails then resolves | Resolution proceeds normally |
| Agent escalates without emailing | Allowed (escalation doesn't require email) |
Architecture Overview¶
Component Responsibilities¶
| Component | Responsibility | Technology |
|---|---|---|
| Trigger Layer | Receive emails, chat escalations | Cloudflare email-ingest Worker + chat Edge Function |
| Preflight Policy Gate | Code-enforced escalation rules (v1.2) | Edge Function |
| Orchestrator | Coordinate agent execution, enforce guardrails | Supabase Edge Function |
| Agent Brain | Reason about tickets, select tools, generate responses | Claude API (claude-sonnet-4-20250514) |
| Safety Filter | Validate responses before sending | Edge Function |
| Persona & Timing | Assign persona, calculate send time (v1.3) | Edge Function |
| Email Sender | Send emails with threading headers (v1.5) | send-support-email Edge Function |
| Tool Executor | Perform database queries, API calls | Edge Function + RPC |
| Audit Logger | Record every decision and action | Supabase tables |
| Escalation Router | Send complex cases to Slack for human review | Slack API |
Request Flow (v1.5)¶
┌─────────────────────────────────────────────────────────────────────────────â”
│ AGENT REQUEST FLOW v1.5 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. INGRESS │
│ ├─ Auth Check (X-Internal-Secret) ──→ 401 if invalid │
│ └─ Idempotency Check (dedupe_key) ──→ Return prior result if exists │
│ │
│ 2. PREFLIGHT (Code-Enforced) │
│ ├─ query_customer_context ──→ Get flags + customer_id │
│ └─ Policy Gate Check: │
│ ├─ Health/Safety keywords? ──→ ESCALATE (zero Claude tokens) │
│ ├─ Financial/Legal language? ──→ ESCALATE │
│ ├─ Negative sentiment? ──→ ESCALATE │
│ ├─ Attachments present? ──→ ESCALATE │
│ ├─ is_repeat_contacter? ──→ ESCALATE │
│ ├─ has_pending_replacement? ──→ ESCALATE │
│ └─ has_recent_credit + compensation request? ──→ ESCALATE │
│ │
│ 3. CLAUDE AGENT LOOP (only if policy gate passes) │
│ ├─ Investigate (tools) │
│ ├─ Search knowledge base (semantic + keyword) │
│ ├─ Draft response (no sign-off) │
│ └─ Safety filter check │
│ │
│ 4. SEND EMAIL (v1.5) │
│ ├─ Fetch email_message_id from ticket │
│ ├─ Add [Support] prefix to subject │
│ ├─ Apply persona sign-off │
│ ├─ Add threading headers (In-Reply-To, References) │
│ ├─ Send via Customer.io │
│ ├─ Log to ticket_messages table ──→ state.emailSent = true │
│ └─ Log to ticket_notes for audit │
│ │
│ 5. RESOLUTION GATE (v1.3) │
│ └─ state.emailSent? ──→ No = Block resolution, return error │
│ │
│ 6. CONFIDENCE ENFORCEMENT │
│ ├─ Confidence provided? ──→ No = ESCALATE │
│ └─ Confidence >= 70%? ──→ No = ESCALATE │
│ │
│ 7. SHADOW MODE (default) │
│ └─ All resolutions ──→ ESCALATE for human review │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Preflight Policy Gate (v1.2)¶
Purpose¶
Code-enforced escalation rules that run before Claude API is called. This ensures:
- Deterministic safety — Critical rules in code, not prompt suggestions
- Cost efficiency — Zero API tokens for obvious escalations
- Auditability — Every policy trigger logged with machine-readable code
- Speed — Policy escalations complete in <100ms
Policy Rules¶
| Category | Pattern | Code | Severity |
|---|---|---|---|
| Health/Safety | sick, ill, unwell, poorly | health_unwell |
critical |
| vomit, vomiting, threw up | health_vomiting |
critical | |
| diarrhea, loose stool | health_digestive |
critical | |
| vet, veterinarian, animal hospital | health_vet_mention |
critical | |
| foreign object, plastic, metal, bone fragment | quality_foreign_object |
critical | |
| temperature, cold chain, warm, thawed | quality_cold_chain |
critical | |
| allergic, allergy, reaction, swelling | health_allergy |
critical | |
| lethargic, not eating, refusing food | health_appetite |
critical | |
| blood, bleeding | health_blood |
critical | |
| emergency, urgent, rushed | health_emergency |
critical | |
| Financial/Legal | refund, money back, reimburse | financial_refund |
high |
| compensation, compensate | financial_compensation |
high | |
| solicitor, lawyer, legal action, trading standards | legal_threat |
high | |
| small claims, court | legal_court |
high | |
| chargeback, dispute the charge | financial_chargeback |
high | |
| Sentiment | disgusting, disgraceful, unacceptable | sentiment_negative |
high |
| worst experience, never again, cancel everything | sentiment_churn_risk |
high | |
| social media, twitter, facebook, review | sentiment_social_threat |
high | |
| tell everyone, warn others, public | sentiment_public_threat |
high | |
| furious, livid, fuming, outraged | sentiment_anger |
high | |
| Special | passed away, died, rainbow bridge, euthanasia | special_bereavement |
medium |
| wholesale, bulk order, retailer, breeder, B2B | special_b2b |
medium | |
| speak to human, real person, manager | special_human_request |
medium | |
| Context Flags | 3+ tickets in 7 days | context_repeat_contacter |
medium |
| Pending replacement exists | context_pending_replacement |
medium | |
| Recent credit + compensation request | context_recent_credit_request |
medium | |
| Attachments | Any attachment present | attachment_present |
high |
Reason Codes for Metabase¶
All policy triggers are logged with machine-readable reasonCode enabling:
- Escalation cause breakdown charts
- Pattern trend analysis over time
- Identification of new patterns requiring rules
Tool Definitions¶
Overview¶
| Category | Tools | Autonomous? |
|---|---|---|
| Read | query_customer_context, check_order_status, get_courier_tracking, search_knowledge_base | ✅ Always |
| Respond | send_customer_email, resolve_ticket | ✅ Within guidelines + safety filter + email gate |
| Act (Low Risk) | skip_next_delivery | ✅ Always |
| Act (Medium Risk) | apply_store_credit | ✅ Up to £20 |
| Act (High Risk) | trigger_replacement_order | ⚠ï¸ Requires confirmation |
| Escalate | escalate_to_human | ✅ Always (throttled in early phase) |
| Audit | add_internal_note | ✅ Always |
Tool: send_customer_email (v1.5)¶
Purpose: Send response email to customer with persona, timing, and threading applied.
Implementation: Calls send-support-email Edge Function (shared with Ops Portal).
Input:
{
"ticket_id": "uuid",
"subject": "Re: Customer's subject",
"body": "Response text with [Name] placeholder for persona sign-off"
}
Output (v1.5):
{
"success": true,
"data": {
"delivery_id": "RIabDAUAAZvbWwCC18...",
"persona": "Sophie",
"threading_enabled": true,
"sent_immediately": true
}
}
With delayed send:
{
"success": true,
"data": {
"delivery_id": "RIabDAUAAZvbWwCC18...",
"persona": "Sophie",
"threading_enabled": true,
"sent_immediately": false,
"scheduled_for": "2026-01-20T08:07:32.000Z",
"delay_minutes": 7,
"within_operating_hours": false
}
}
Side Effects:
- Sets state.emailSent = true (enables resolution)
- Logs to ticket_messages table (direction: outbound)
- Logs to ticket_notes with persona email (e.g., sophie@protocolraw.co.uk)
- Adds [Support] prefix to subject
- Adds threading headers if email_message_id exists on ticket
- Replaces [Name] placeholder with assigned persona name
Tool: resolve_ticket (v1.3)¶
Purpose: Mark ticket as resolved with category and summary.
Pre-conditions (code-enforced):
1. state.emailSent must be true (email gate)
2. confidence must be ≥70%
Input:
{
"ticket_id": "uuid",
"resolution_category": "subscription|delivery|quality|feeding|other",
"resolution_type": "informational|action_taken|goodwill_gesture|clarification_only",
"resolution_summary": "Brief description",
"confidence": 85
}
Knowledge Base Search (v1.4)¶
Semantic Search with pgvector¶
The agent uses OpenAI embeddings + pgvector for semantic similarity search, with keyword fallback.
Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)
Search Flow¶
Query → OpenAI Embedding → pgvector cosine similarity → Results
↓
If similarity < 0.3
↓
Keyword fallback search
Database Schema¶
-- Embedding column on KB sections
ALTER TABLE raw_ops.ai_knowledge_sections
ADD COLUMN embedding vector(1536);
-- IVFFlat index for fast similarity search
CREATE INDEX idx_kb_sections_embedding
ON raw_ops.ai_knowledge_sections
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 10);
Hybrid Search RPC¶
-- fn_search_kb_hybrid(p_query_embedding, p_query_text, p_match_threshold, p_limit)
-- Returns: section_key, title, content, similarity, match_type
| Parameter | Default | Description |
|---|---|---|
p_query_embedding |
required | 1536-dim vector from OpenAI |
p_query_text |
required | Original query for keyword fallback |
p_match_threshold |
0.3 | Minimum cosine similarity |
p_limit |
5 | Max results |
Search Output¶
{
"success": true,
"data": {
"results": [
{
"section_key": "portal_pause_subscription",
"title": "Pausing Your Subscription",
"content": "...",
"similarity": 0.543,
"match_type": "semantic"
}
],
"query": "pause subscription holiday",
"search_type": "semantic"
}
}
Logging¶
Every search logs to Edge Function console:
KB search: query="pause subscription", top_match="portal_pause_subscription", similarity=0.543, match_type=semantic
Embedding Generation¶
Edge Function: embedding-generator
| Action | Input | Description |
|---|---|---|
backfill_all |
— | Embed all sections without embeddings |
embed_one |
section_id |
Embed single section |
Future: Trigger on KB insert/update for auto-embedding.
Support Queue Monitoring (v1.4)¶
Overview¶
Native Supabase monitoring alerts when tickets need human review.
Components¶
| Component | Purpose |
|---|---|
v_support_needs_review |
View showing unreviewed escalations |
fn_check_support_health_v2() |
Support health monitor (queue, triage quality, intake) |
monitor-cs-01-support-health |
pg_cron job every 15 min |
ops-alerter |
Edge Function sending to Slack |
Alert Severity¶
| Queue Age | Severity | Channel |
|---|---|---|
| < 30 min | info | #ops-alerts |
| 30-60 min | warning | #ops-alerts |
| > 60 min | critical | #ops-urgent |
Slack Alert Format¶
📬 Support Queue Needs Review
1 ticket(s) awaiting human review
Queue Size: 1
Oldest: 45 min
Action: https://ops.protocolraw.co.uk → Support
Configuration¶
Edge Function Config¶
const CONFIG: AgentConfig = {
model: 'claude-sonnet-4-20250514',
maxIterations: 10,
maxTokens: 4096,
timeoutMs: 30000,
creditLimitPence: 2000,
escalationThrottleEnabled: true,
escalationThrottleMaxPerHour: 20,
shadowMode: true, // v1.2: Default ON
requireAuth: true, // v1.2: Auth required
allowedOrigins: [ // v1.2: CORS whitelist
'https://protocolraw.co.uk',
'https://www.protocolraw.co.uk',
'https://ops.protocolraw.co.uk',
'https://hook.eu1.make.com',
],
};
// v1.3: Personas
const PERSONAS = ['Sophie', 'Tom', 'Lucy'];
// v1.3: Operating hours (London time)
// 8am (8) to 9pm (21)
Environment Variables¶
| Variable | Required | Description |
|---|---|---|
SUPABASE_URL |
Yes | Supabase project URL |
SUPABASE_SERVICE_ROLE_KEY |
Yes | Service role key |
ANTHROPIC_API_KEY |
Yes | Claude API key |
OPENAI_API_KEY |
Yes | OpenAI API key (for embeddings) |
CUSTOMERIO_API_KEY |
Yes | Customer.io transactional API key |
AGENT_INTERNAL_SECRET |
Yes (if requireAuth) | Internal auth secret |
Metabase Dashboards¶
Persona Distribution¶
SELECT
SPLIT_PART(created_by, '@', 1) as persona,
COUNT(*) as emails_sent
FROM raw_ops.ticket_notes
WHERE note_type = 'email_sent'
AND created_at > NOW() - INTERVAL '30 days'
GROUP BY 1
ORDER BY 2 DESC;
Response Timing Analysis¶
SELECT
tool_output->'data'->>'within_operating_hours' as within_hours,
AVG((tool_output->'data'->>'delay_minutes')::int) as avg_delay_min,
COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'send_customer_email'
AND success = true
AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1;
Threading Adoption (v1.5)¶
SELECT
DATE(created_at) as date,
COUNT(*) FILTER (WHERE threading_enabled = true) as threaded,
COUNT(*) FILTER (WHERE threading_enabled = false OR threading_enabled IS NULL) as not_threaded
FROM raw_ops.ticket_messages
WHERE direction = 'outbound'
AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY 1 DESC;
Conversation Thread Depth¶
SELECT
t.id as ticket_id,
t.subject,
COUNT(*) as message_count,
COUNT(*) FILTER (WHERE m.direction = 'inbound') as customer_messages,
COUNT(*) FILTER (WHERE m.direction = 'outbound') as our_replies
FROM raw_ops.support_tickets t
JOIN raw_ops.ticket_messages m ON m.ticket_id = t.id
WHERE t.created_at > NOW() - INTERVAL '30 days'
GROUP BY 1, 2
HAVING COUNT(*) > 1
ORDER BY message_count DESC;
Policy Gate Escalation Breakdown¶
SELECT
tool_output->'data'->>'reasonCode' as reason_code,
tool_output->'data'->>'severity' as severity,
COUNT(*) as count
FROM raw_ops.agent_tool_calls
WHERE tool_name = 'policy_gate'
AND created_at > NOW() - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY count DESC;
Confidence Calibration¶
WITH calibration AS (
SELECT
CASE
WHEN confidence_score >= 90 THEN '90-100%'
WHEN confidence_score >= 80 THEN '80-89%'
WHEN confidence_score >= 70 THEN '70-79%'
ELSE '<70%'
END as bucket,
human_agreed
FROM raw_ops.agent_decisions
WHERE human_reviewed_at IS NOT NULL
AND created_at > NOW() - INTERVAL '30 days'
)
SELECT
bucket,
COUNT(*) as total,
SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) as agreed,
ROUND(100.0 * SUM(CASE WHEN human_agreed THEN 1 ELSE 0 END) / COUNT(*), 1) as accuracy_pct
FROM calibration
GROUP BY bucket
ORDER BY bucket DESC;
Agent Performance Summary¶
SELECT
DATE(created_at) as date,
COUNT(*) as total_executions,
SUM(CASE WHEN outcome = 'resolved' THEN 1 ELSE 0 END) as resolved,
SUM(CASE WHEN outcome = 'escalated' THEN 1 ELSE 0 END) as escalated,
SUM(CASE WHEN 'policy_gate' = ANY(tools_used) THEN 1 ELSE 0 END) as policy_escalations,
AVG(duration_ms) as avg_duration_ms,
SUM(total_tokens) as total_tokens
FROM raw_ops.agent_executions
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;
API Cost per Ticket (v1.6)¶
Execution Health (v1.6)¶
Email Delivery Rate (v1.6)¶
Graduation Readiness (v1.6)¶
Deployment Checklist¶
First Deployment¶
- [ ] Run database migrations (tables, indexes, dedupe_key column)
- [ ] Add
email_message_idandemail_referencescolumns to support_tickets - [ ] Create
ticket_messagestable with trigger - [ ] Deploy RPC functions (
fn_agent_get_customer_context,fn_agent_get_order_status,get_ticket_thread) - [ ] Set environment variables in Supabase (including CUSTOMERIO_API_KEY)
- [ ] Deploy
send-support-emailEdge Function - [ ] Deploy agent Edge Function with
requireAuth: falsefor testing - [ ] Test policy gate with health ticket
- [ ] Test clean ticket through Claude
- [ ] Verify persona assignment is deterministic
- [ ] Verify timing shows correct London hours
- [ ] Verify email threading works (In-Reply-To headers)
- [ ] Enable
requireAuth: true - [ ] Set
AGENT_INTERNAL_SECRET - [ ] Verify email-ingest Worker is deployed and passing Message-ID to cs-agent-triage
Shadow Mode Validation Gate¶
All items below must be completed before this SOP can be reclassified to "Production Ready — Shadow Mode." Until then, the document status remains "Shadow Mode Validation."
- [ ] Shadow mode enabled and processing tickets
- [ ] 100+ test tickets processed with zero null outcomes
- [ ] Confidence calibration reviewed (70% threshold validated against human agreement rate)
- [ ] No safety filter false negatives in test set
- [ ] Persona consistency verified (same customer = same persona)
- [ ] Operating hours verified (emails queued outside 8am-9pm)
- [ ] Email threading verified (replies appear in same thread)
- [ ] [Support] prefix appearing on all outbound emails
- [ ] Conversation history populating in ticket_messages
- [ ] Metabase dashboards configured and returning data
- [ ] Slack escalation channel (#ops-urgent) ready and tested
- [ ] Customer.io integration tested with scheduled sends
- [ ] Least-privilege database role deployed (v1.6 — CS03-004) ✅
- [ ] Partial-failure recovery model implemented (v1.6 — CS03-005) ✅
- [ ] Timeout-to-escalation path implemented (v1.6 — CS03-003) ✅
- [ ] Null outcome fix deployed (v1.6 — CS03-001) ✅
Success Metrics¶
| Metric | Shadow Mode | Full Autonomy Target |
|---|---|---|
| Tickets processed | Track | Track |
| Policy gate escalations | Track | <30% |
| Claude escalations | 100% (shadow) | <20% |
| Correct decisions | Track | >95% |
| Response time (policy gate) | <100ms | <100ms |
| Response time (Claude) | <30s | <30s |
| Safety filter triggers | Track | <5% |
| Email gate blocks | Track | <1% (prompt compliance) |
| Threading enabled | Track (v1.5) | >95% |
| Multi-message threads | Track (v1.5) | Track |
| Customer.io delivery success | Track (v1.6) | >99% |
| Execution completion (non-null outcome) | >99% (v1.6) | >99.5% |
| Avg API cost per ticket | Track (v1.6) | <£0.15 |
Shadow Mode Graduation Criteria (v1.6)¶
Purpose¶
Shadow mode exists to validate that the agent makes correct decisions before granting it autonomy. Graduation is data-driven with explicit thresholds, not date-driven.
Prerequisites (all must be true)¶
- Shadow Mode Validation Gate (Deployment Checklist) fully completed
- All v1.6 changes implemented and stable for 14+ days
Graduation Thresholds¶
All thresholds must be met simultaneously over the most recent 200 human-reviewed decisions:
| Metric | Threshold | Source |
|---|---|---|
| Sample size | ≥ 200 reviewed decisions | agent_decisions |
| Human agreement rate | ≥ 90% | agent_decisions.human_agreed |
| Draft acceptance (sent as-is) | ≥ 70% | agent_decisions.human_agreed |
| Outcome health | ≥ 99% resolved or escalated | agent_executions.outcome |
| Unrecovered failures | 0 | agent_execution_failures |
Blocker Conditions (any one blocks graduation)¶
- Any unrecovered agent_execution_failures
- Confidence calibration not reviewed in the last 30 days
- Metabase dashboards not returning data for all success metrics
Graduation Query¶
Returns a single row with graduation_status = 'READY' or 'NOT READY' and individual gate pass/fail status.
Graduation Process¶
- Run graduation query. All gates must show PASS
- Anton reviews the full dashboard and confirms
- SOP status updated to "Production Ready — Shadow Mode"
- Enable selective autonomy: auto-send for confidence ≥ 90% AND no policy gate AND category not in mandatory_review list
- Full autonomy criteria defined after selective autonomy data
Version History¶
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-01-17 | Initial specification |
| 1.1 | 2026-01-17 | Safety filter, resolution types, escalation throttle, vet advice enforcement |
| 1.2 | 2026-01-19 | Production hardening: Preflight policy gate, idempotency, auth hardening, CORS hardening, confidence enforcement, human-readable reason codes, shadow mode default |
| 1.3 | 2026-01-19 | Humanization & gates: Support personas (Sophie/Tom/Lucy deterministic per customer), operating hours (8am-9pm London), response delay (3-12 min), email-before-resolve gate |
| 1.4 | 2026-01-19 | Semantic search & monitoring: pgvector embeddings with OpenAI, hybrid retrieval (semantic + keyword fallback), support queue Slack alerting via native Supabase stack |
| 1.5 | 2026-01-20 | Email threading & history: In-Reply-To/References headers for inbox threading, [Support] subject prefix, ticket_messages table for conversation history, unified send-support-email Edge Function |
| 1.6 | 2026-03-19 | Governance & resilience: Status reclassified to Shadow Mode Validation. Null outcome fix with ensureOutcomeWritten and CHECK constraint. Timeout-to-escalation at 80% budget. All agent writes via SECURITY DEFINER RPCs with failure recovery to agent_execution_failures table. Customer.io retry logic. Least-privilege support_agent role with explicit grants. Shadow-mode graduation criteria with v_agent_graduation_readiness view. KB auto-embedding trigger via pg_net. Cost and execution health monitoring views. Known limitations reduced from 8 to 3. |
Known Limitations (v1.6)¶
| Limitation | Status | Plan |
|---|---|---|
| Attachments always escalate | Accepted (conservative) | v1.7: Smart attachment handling |
| Threading requires email_message_id | email-ingest Worker extracts from MIME | Documented in CS-01 v2.3 |
| Old tickets lack threading | Backfill not planned | New tickets only |
File Locations¶
| File | Location |
|---|---|
| Autonomous Agent | supabase/functions/autonomous-support-agent/index.ts |
| Email Sender | supabase/functions/send-support-email/index.ts |
| Embedding Generator | supabase/functions/embedding-generator/index.ts |
| Database Migrations | supabase/migrations/ |
| Email Ingestion Worker | email-ingest/src/index.ts (Cloudflare Worker) |
Document Owner: Protocol Raw Operations Version: 1.6 Status: 🟡 Shadow Mode Validation
End of SOP CS-03 v1.6