SOP-OPS-01: Operations Self-Service Actions v1.2¶

Every operationally recoverable alert gets a safe portal action. Infrastructure and architecture alerts are explicitly classified as founder-only.

Document ID: SOP-OPS-01-v1.2 Version: 1.2 Status: ✅ Specification Ready (for current Ops Portal architecture) Last Updated: 2026-04-12 Owner: Protocol Raw Operations Replaces: SOP-OPS-01-v1.1 Review Date: 2026-07-12

Auth Note: Auth pattern follows established Ops Portal perimeter model (Cloudflare Access + browser-side service_role client). Actor email resolved and logged server-side for audit traceability. Fine-grained role gating deferred to Phase B when multiple operators are added.

Key Changes in v1.2¶

✅ Auth section reworded: honest about existing perimeter model, not claiming upgraded trust
✅ Dashboard contract completed: support tile added, SLO metric added to order pipeline
✅ Lab entry idempotency fixed: unique on batch_id regardless of entry method
✅ Lab entry hard-blocks PASS when pathogen detected (no longer a warning)
✅ Worker Controls allowlist aligned with spec table (webhook processor + lifecycle worker added)
✅ Proof job recovery panel restored (dropped in v1.1 by document drift)
✅ All 5 TBD compliance RPCs completed with insert functions
✅ "All RPC wrappers complete" claim is now true
✅ Status line reflects scope: "for current Ops Portal architecture"

Key Changes in v1.1¶

✅ Governing principle rewritten: honest about founder-only alerts
✅ Auth model: Cloudflare Access JWT → agent_id, no user-supplied actor fields
✅ Every action has an explicit execution path (what consumer picks up the state change)
✅ Manual lab entry redesigned: lab verdict first, idempotency guard, full micro inputs
✅ Bulk retry actions replaced with safe controls (preview, select, cap, confirm)
✅ Missing-event recovery added (replay order, run worker now)
✅ Dashboard function rewritten to match all tiles, cron health from registry
✅ Broken SQL fixed (webhook retry, outbox source/function mismatch)
✅ Rejected batch disposition form added
✅ UI language guidelines added (human-first, technical in expandable detail)

Purpose¶

The Operations Lead (joining Month 7-9) must be able to run day-to-day operations, resolve operationally recoverable alerts, and handle exceptions entirely from the Operations Portal.

Two classes of alert exist:

Class	Examples	Resolution
Operator-recoverable	Failed webhook, stuck email, quarantined lab result, dead-lettered refund, inventory discrepancy, fraud-flagged referral	Portal action. No founder involvement.
Founder-only (infrastructure)	Canary system health failure, Edge Function crash/redeploy needed, database schema issue, Supabase Storage failure, repeated cron job failures after operator retry	Ops Lead escalates to founder with context from portal diagnostics.

The Ops Lead should never need to access the Supabase dashboard, run SQL, use a terminal, or deploy code. The founder's involvement is reserved for architecture changes, code deployments, and infrastructure failures.

Scope¶

Applies to: Operations Portal (ops.protocolraw.co.uk), Portal v4.1+

New Modules:

Module	Tab	Purpose
System Health Dashboard	System Health (new)	Morning check, all-system visibility
Exception Queue	System Health	Item-level actions for recoverable failures
Worker Controls	System Health	Run-now buttons for stalled background workers
Cron Job Status	System Health	Visibility into scheduled job health
Canned Diagnostics	System Health	Pre-built read-only views
Compliance Recording	Compliance (new)	Temperature audits, evidence packs, spot checks, sample logs, batch disposition

Related SOPs: MON-01, LAB-01, INV-01, REF-01, ORD-01, ORD-02, ORD-03, SUB-00, CS-04, DLV-01, COLD-01, PACK-01, PROOF-00, PROOF-01

Part 1: Authentication Model¶

Existing Portal Pattern¶

The Ops Portal has used the same trust model since v2.0: Cloudflare Access (Google OAuth) as the perimeter gate, with browser-side Supabase service_role client for database calls. Every existing feature (batch creation, refund processing, support workstation, live chat) operates within this boundary.

SOP-OPS-01 inherits this pattern. It does not introduce a new auth architecture.

What this SOP improves over the existing portal pattern: Actor identity is now resolved server-side and logged on every mutation. Previously, portal actions either didn't log who performed them or used ad-hoc identifiers. The new modules ensure every action is attributable to a specific agent via the agents table.

What this does not do: This is not cryptographic caller verification. The browser sends the agent email extracted from the Cloudflare Access JWT, and the database resolves it. A caller who somehow bypasses Cloudflare Access could spoof the email. The compensating control is that Cloudflare Access is the sole entry point, and only the founder can add/remove authorised emails.

Phase B enhancement (multi-operator): When multiple operators are added, introduce role-based action gating (e.g., ops_lead can retry webhooks but not adjust inventory without ops_senior role). Not needed with a single operator.

Browser (ops.protocolraw.co.uk)
    ↓
Cloudflare Access (Google OAuth, validates email)
    ↓
Portal JavaScript (reads CF Access JWT, extracts email)
    ↓
Supabase RPC call (service_role key, agent email passed as function argument)
    ↓
Public RPC wrapper resolves email → agent_id via fn_resolve_ops_agent_v1
    ↓
Domain function receives trusted agent_id UUID
    ↓
All audit logging uses resolved agent_id

Agent Resolution Function¶

CREATE OR REPLACE FUNCTION raw_ops.fn_resolve_ops_agent_v1(p_email TEXT)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_agent RECORD;
BEGIN
  SELECT id, display_name, role INTO v_agent
  FROM agents
  WHERE email = lower(trim(p_email)) AND active = true;

  IF v_agent IS NULL THEN
    RAISE EXCEPTION 'Agent not found or inactive: %', p_email
      USING ERRCODE = 'P0002';
  END IF;

  RETURN jsonb_build_object(
    'agent_id', v_agent.id,
    'display_name', v_agent.display_name,
    'role', v_agent.role
  );
END;
$$;

Portal-Side: Extracting Agent Email¶

The Ops Portal already loads behind Cloudflare Access. The CF Access JWT is available as a cookie (CF_Authorization). The portal JavaScript decodes the JWT to extract the authenticated email:

// In system-health.js module init
function getAgentEmail() {
  const token = document.cookie
    .split('; ')
    .find(c => c.startsWith('CF_Authorization='))
    ?.split('=')[1];
  if (!token) return null;
  try {
    const payload = JSON.parse(atob(token.split('.')[1]));
    return payload.email;
  } catch { return null; }
}

RPC Wrapper Pattern (v1.2)¶

Every mutation RPC accepts the agent email, resolves it server-side, and uses the resolved agent_id for audit logging. The email comes from the Cloudflare Access JWT, not from user input.

-- Example: all v1.2 mutation RPCs follow this pattern
CREATE OR REPLACE FUNCTION public.ops_retry_lab_quarantine(p_failure_id UUID, p_agent_email TEXT)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_agent JSONB;
BEGIN
  -- Resolve agent (throws if not found/inactive)
  v_agent := raw_ops.fn_resolve_ops_agent_v1(p_agent_email);

  -- Delegate to domain function with trusted agent_id
  RETURN raw_ops.fn_retry_lab_quarantine_v1(p_failure_id, (v_agent->>'agent_id')::UUID);
END;
$$;

GRANT EXECUTE ON FUNCTION public.ops_retry_lab_quarantine TO service_role;

Key Constraints¶

No function accepts p_entered_by TEXT or similar free-text actor fields. All actor identity is resolved from agent email via fn_resolve_ops_agent_v1.
All domain functions accept p_agent_id UUID, not email or display name.
The agents table is the source of truth. Daniel's record is created before he starts. Only the founder can add/remove agents.
Read-only functions (dashboard, diagnostics, cron health) do not require agent resolution.

Part 2: System Health Dashboard¶

Purpose¶

Single screen the Ops Lead checks every morning. Shows the health of every automated system with green/amber/red indicators. Clicking any non-green tile opens the relevant Exception Queue panel.

Health Tiles¶

#	Tile	Green	Amber	Red
1	Order Pipeline	0 failed webhooks, SLO met	1-5 failed in 24h	>5 failed OR SLO breach
2	Email Delivery	0 stuck, <10 pending	1-10 stuck OR 10-50 pending	>10 stuck OR >50 pending
3	Inventory	All SKUs above reorder	Any SKU in LOW band	Any SKU CRITICAL
4	Batch Testing	0 batches >3 days in QA_HOLD	1+ batch >3 days	1+ batch >5 days
5	Lab Processing	0 quarantined emails	1-2 quarantined	3+ quarantined
6	Refunds	0 failed or stuck	1-3 retrying	Any stuck >1h or dead-lettered
7	Support	0 unassigned >1h	1-3 waiting	>3 waiting >1h
8	Subscriptions	0 failed Seal webhooks	1-5 failed	>5 failed
9	Payments	No dunning stalls	Day 7+ without email	Day 10+ (imminent pause)
10	Deliveries	<2% exception rate (7d)	2-5% exceptions	>5% exceptions
11	Referrals	0 fraud-flagged	1-3 flagged	>3 flagged
12	Proof Pages	Last check passed	Last check >1h ago	Last check failed
13	Background Jobs	All critical jobs healthy	1+ stale (missed expected interval)	1+ overdue (missed 2x expected interval), failed, missing, or never run

Dashboard Data Function¶

CREATE OR REPLACE FUNCTION raw_ops.fn_get_system_health_dashboard_v1()
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_result JSONB;
BEGIN
  SELECT jsonb_build_object(
    'generated_at', now(),

    'order_pipeline', (
      SELECT jsonb_build_object(
        'failed_24h', COUNT(*) FILTER (WHERE processing_status = 'failed'
          AND received_at > now() - interval '24 hours'),
        'pending', COUNT(*) FILTER (WHERE processing_status = 'pending'),
        'slo_breach', EXISTS (
          SELECT 1 FROM webhook_inbox
          WHERE processing_status = 'pending'
            AND received_at < now() - interval '5 minutes'
        ),
        'oldest_pending_mins', COALESCE(
          EXTRACT(EPOCH FROM (now() - MIN(received_at)
            FILTER (WHERE processing_status = 'pending')))::int / 60, 0)
      ) FROM webhook_inbox
    ),

    'email_delivery', (
      SELECT jsonb_build_object(
        'stuck', COUNT(*) FILTER (WHERE status = 'pending'
          AND created_at < now() - interval '30 minutes'),
        'pending_total', COUNT(*) FILTER (WHERE status = 'pending'),
        'failed_today', COUNT(*) FILTER (WHERE status = 'failed'
          AND created_at >= CURRENT_DATE)
      ) FROM email_outbox
    ),

    'inventory', (
      SELECT COALESCE(jsonb_agg(jsonb_build_object(
        'sku', sku, 'total_units', total_units,
        'level', CASE
          WHEN total_units <= 20 THEN 'critical'
          WHEN total_units <= 40 THEN 'low'
          ELSE 'ok' END
      )), '[]'::jsonb)
      FROM (
        SELECT p.sku, SUM(FLOOR(i.qty_available / p.kg))::int as total_units
        FROM inventory i
        JOIN batches b ON i.batch_id = b.id
        JOIN products p ON b.product_id = p.id
        WHERE b.status = 'RELEASED' AND i.qty_available > 0
        GROUP BY p.sku
      ) inv
    ),

    'batch_testing', (
      SELECT jsonb_build_object(
        'qa_hold_count', COUNT(*),
        'oldest_qa_hold_days', COALESCE(EXTRACT(DAY FROM (now() - MIN(created_at)))::int, 0),
        'over_3_days', COUNT(*) FILTER (WHERE created_at < now() - interval '3 days'),
        'over_5_days', COUNT(*) FILTER (WHERE created_at < now() - interval '5 days')
      ) FROM batches WHERE status = 'QA_HOLD'
    ),

    'lab_processing', (
      SELECT jsonb_build_object(
        'pending_review', COUNT(*) FILTER (WHERE status = 'pending_review'),
        'oldest_hours', COALESCE(
          EXTRACT(EPOCH FROM (now() - MIN(created_at)
            FILTER (WHERE status = 'pending_review')))::int / 3600, 0)
      ) FROM lab_email_failures
    ),

    'refunds', (
      SELECT jsonb_build_object(
        'dead_lettered', COUNT(*) FILTER (WHERE dead_lettered_at IS NOT NULL),
        'failed_retrying', COUNT(*) FILTER (
          WHERE status = 'failed' AND dead_lettered_at IS NULL),
        'pending', COUNT(*) FILTER (WHERE status = 'pending'),
        'oldest_pending_mins', COALESCE(
          EXTRACT(EPOCH FROM (now() - MIN(created_at)
            FILTER (WHERE status = 'pending')))::int / 60, 0)
      ) FROM refunds
    ),

    'subscriptions', (
      SELECT jsonb_build_object(
        'failed_webhooks', COUNT(*) FILTER (WHERE processing_status = 'failed'),
        'pending_webhooks', COUNT(*) FILTER (WHERE processing_status = 'pending')
      ) FROM seal_webhook_inbox
    ),

    'payments', (
      SELECT jsonb_build_object(
        'dunning_active', COUNT(*),
        'day7_plus', COUNT(*) FILTER (
          WHERE dunning_started_at < now() - interval '7 days'),
        'day10_plus', COUNT(*) FILTER (
          WHERE dunning_started_at < now() - interval '10 days')
      ) FROM subscriptions
      WHERE payment_status IN ('failing', 'failed') AND status = 'ACTIVE'
    ),

    'deliveries', (
      SELECT jsonb_build_object(
        'total_7d', COUNT(*),
        'exceptions_7d', COUNT(*) FILTER (
          WHERE status IN ('EXCEPTION', 'FAILED', 'RETURNED')),
        'exception_rate_pct', CASE WHEN COUNT(*) > 0
          THEN ROUND(COUNT(*) FILTER (
            WHERE status IN ('EXCEPTION', 'FAILED', 'RETURNED'))
            * 100.0 / COUNT(*), 1)
          ELSE 0 END
      ) FROM shipments
      WHERE created_at > now() - interval '7 days'
    ),

    'support', (
      SELECT jsonb_build_object(
        'unassigned_total', COUNT(*) FILTER (
          WHERE assigned_to IS NULL AND status NOT IN ('resolved', 'closed')),
        'unassigned_over_1h', COUNT(*) FILTER (
          WHERE assigned_to IS NULL AND status NOT IN ('resolved', 'closed')
          AND created_at < now() - interval '1 hour'),
        'open_total', COUNT(*) FILTER (
          WHERE status NOT IN ('resolved', 'closed'))
      ) FROM support_tickets
    ),

    'referrals', (
      SELECT jsonb_build_object(
        'fraud_flagged', COUNT(*) FILTER (WHERE status = 'fraud_flagged'),
        'oldest_flagged_days', COALESCE(
          EXTRACT(DAY FROM (now() - MIN(created_at)
            FILTER (WHERE status = 'fraud_flagged')))::int, 0)
      ) FROM referrals
    ),

    'proof_pages', (
      SELECT jsonb_build_object(
        'last_check_at', MAX(run_at),
        'last_status', (
          SELECT status FROM monitoring_runs
          WHERE check_name = 'proof_portal_health'
          ORDER BY run_at DESC LIMIT 1),
        'minutes_since_check', COALESCE(
          EXTRACT(EPOCH FROM (now() - MAX(run_at)))::int / 60, 999)
      ) FROM monitoring_runs
      WHERE check_name = 'proof_portal_health'
    ),

    'background_jobs', raw_ops.fn_get_cron_health_summary_v1()

  ) INTO v_result;

  RETURN v_result;
END;
$$;

Cron Health from Registry (not pattern matching)¶

v1.0 used LIKE '%* * * *%' on cron schedule strings. That is unreliable. v1.1 uses an explicit registry of critical jobs with expected maximum silence thresholds.

CREATE TABLE IF NOT EXISTS raw_ops.critical_cron_registry (
  job_name TEXT PRIMARY KEY,
  display_name TEXT NOT NULL,
  max_silence_minutes INTEGER NOT NULL,
  category TEXT NOT NULL DEFAULT 'operations',
  created_at TIMESTAMPTZ DEFAULT now()
);

INSERT INTO raw_ops.critical_cron_registry (job_name, display_name, max_silence_minutes, category)
VALUES
  ('process-outbox-every-minute', 'Email sender', 5, 'email'),
  ('process-lifecycle-events', 'Lifecycle events', 5, 'email'),
  ('monitor-outbox-health', 'Outbox health check', 10, 'monitoring'),
  ('monitor-order-ingestion-slo', 'Order SLO check', 10, 'monitoring'),
  ('monitor-heartbeat', 'Monitoring heartbeat', 20, 'monitoring'),
  ('monitor-webhook-health', 'Webhook health check', 15, 'monitoring'),
  ('monitor-stock-levels', 'Stock level check', 20, 'inventory'),
  ('monitor-lab-ingestion', 'Lab ingestion check', 20, 'lab'),
  ('dunning-processor', 'Payment dunning', 30, 'subscriptions'),
  ('canary-system-health', 'Infrastructure canary', 15, 'infrastructure'),
  ('process-refund-retries', 'Refund retry processor', 5, 'refunds'),
  ('process-proof-jobs', 'Proof page generator', 5, 'proof'),
  ('process-proof-jobs', 'Proof page generator', 5, 'proof'),
  ('process-lifecycle-events', 'Lifecycle event sender', 5, 'email')
ON CONFLICT (job_name) DO NOTHING;

CREATE OR REPLACE FUNCTION raw_ops.fn_get_cron_health_summary_v1()
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
BEGIN
  RETURN (
    SELECT
      jsonb_build_object(
      'total_critical', COUNT(*),
      'healthy', COUNT(*) FILTER (WHERE health = 'healthy'),
      'stale', COUNT(*) FILTER (WHERE health = 'stale'),
      'critical', COUNT(*) FILTER (WHERE health IN ('failed', 'overdue', 'missing', 'never_run')),
      'disabled', COUNT(*) FILTER (WHERE health = 'disabled'),
      'jobs', jsonb_agg(job_detail ORDER BY health DESC, display_name)
    )
    FROM (
      SELECT
        r.job_name,
        r.display_name,
        r.max_silence_minutes,
        r.category,
        j.active,
        lr.start_time as last_run,
        lr.status as last_status,
        CASE
          WHEN j.jobid IS NULL THEN 'missing'
          WHEN j.active = false THEN 'disabled'
          WHEN lr.status = 'failed' THEN 'failed'
          WHEN lr.start_time IS NULL THEN 'never_run'
          WHEN lr.start_time < now() - (r.max_silence_minutes * 2 || ' minutes')::interval THEN 'overdue'
          WHEN lr.start_time < now() - (r.max_silence_minutes || ' minutes')::interval THEN 'stale'
          ELSE 'healthy'
        END as health,
        jsonb_build_object(
          'job_name', r.job_name,
          'display_name', r.display_name,
          'category', r.category,
          'active', COALESCE(j.active, false),
          'last_run', lr.start_time,
          'last_status', lr.status,
          'last_error', CASE WHEN lr.status = 'failed'
            THEN LEFT(lr.return_message, 200) ELSE NULL END,
          'max_silence_minutes', r.max_silence_minutes,
          'health', CASE
            WHEN j.jobid IS NULL THEN 'missing'
            WHEN j.active = false THEN 'disabled'
            WHEN lr.status = 'failed' THEN 'failed'
            WHEN lr.start_time IS NULL THEN 'never_run'
            WHEN lr.start_time < now() - (r.max_silence_minutes * 2 || ' minutes')::interval THEN 'overdue'
            WHEN lr.start_time < now() - (r.max_silence_minutes || ' minutes')::interval THEN 'stale'
            ELSE 'healthy'
          END
        ) as job_detail
      FROM raw_ops.critical_cron_registry r
      LEFT JOIN cron.job j ON j.jobname = r.job_name
      LEFT JOIN LATERAL (
        SELECT * FROM cron.job_run_details jrd
        WHERE jrd.jobid = j.jobid
        ORDER BY jrd.start_time DESC LIMIT 1
      ) lr ON true
    ) health_check
  );
END;
$$;

UI Specification¶

Auto-refresh: Every 60 seconds. Manual refresh button.

Tile interaction: Clicking any amber/red tile scrolls to and expands the relevant Exception Queue panel.

Tile layout: 4 columns desktop, 2 tablet, 1 mobile. Sort: red first, amber second, green last.

Colour tokens: Green #2D5144 on #E8F5E9, Amber #F57F17 on #FFF8E1, Red #C62828 on #FFEBEE.

Part 3: Exception Queue¶

Design Principles¶

Every action has an explicit execution path. "What picks it up, when, and what happens if it doesn't" is documented for every state-flip.
No bulk "retry all" buttons. Operator selects specific items or uses capped batches with confirmation.
Human-first labels. UUIDs hidden by default. Error types shown as plain English. Technical detail in expandable panel.
Confirmation dialogs on every mutation with downstream impact description.

3.1 Lab Quarantine Viewer¶

Source: raw_ops.lab_email_failures WHERE status = 'pending_review'

Columns (human labels):

Display Label	Source
Received	`created_at` (formatted: "2 hours ago")
Email Subject	`email_subject`
Problem	Human label from error_type (see table below)
Batch (if known)	Extracted from email subject or `batch_code` field

Error type → human label:

error_type	Display	Guidance Text
batch_not_found	Batch not created yet	Create the batch in the Batches tab first, then retry.
batch_not_in_qa_hold	Batch already processed	This batch was already released or rejected. Check batch status.
ai_parsing_failed	Could not read the PDF	Open the PDF manually and enter results using the form below.
incomplete_parse	PDF partially readable	Some data was missing. Open the PDF and enter results manually.
no_pdf_attachment	No PDF in email	This email matched the lab filter but had no PDF attached. Dismiss if not a lab result.
storage_upload_failed	System error	Escalate to founder. This is an infrastructure issue.
db_write_failed	System error	Escalate to founder. This is an infrastructure issue.

Actions:

"View PDF" — Opens PDF from Supabase Storage. No mutation.

"Retry" — Re-queues for processing. Enabled only for batch_not_found, ai_parsing_failed, and incomplete_parse.

Execution path: Sets status to retry_queued. The lab-email-ingestion Edge Function's 30-minute fallback poll picks up retry_queued items and reprocesses them. If the Edge Function is not running, the lab_ingestion_health monitor will fire a warning within 15 minutes.

CREATE OR REPLACE FUNCTION raw_ops.fn_retry_lab_quarantine_v1(
  p_failure_id UUID,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_failure RECORD;
BEGIN
  SELECT * INTO v_failure
  FROM lab_email_failures
  WHERE id = p_failure_id AND status = 'pending_review';

  IF v_failure IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Item not found or not pending review');
  END IF;

  IF v_failure.error_type IN ('storage_upload_failed', 'db_write_failed') THEN
    RETURN jsonb_build_object('success', false,
      'error', 'This is an infrastructure error. Please escalate to the founder.');
  END IF;

  -- Only allow retry for error types where retry has a reasonable chance of success
  IF v_failure.error_type NOT IN ('batch_not_found', 'ai_parsing_failed', 'incomplete_parse') THEN
    RETURN jsonb_build_object('success', false,
      'error', 'This error type (' || v_failure.error_type || ') cannot be resolved by retry. '
        || 'Use "Enter Results Manually" or "Dismiss" instead.');
  END IF;

  UPDATE lab_email_failures
  SET status = 'retry_queued', updated_at = now()
  WHERE id = p_failure_id;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('LAB_QUARANTINE', p_failure_id, 'INFO', 'Quarantined lab email queued for retry',
    jsonb_build_object('agent_id', p_agent_id, 'error_type', v_failure.error_type));

  RETURN jsonb_build_object('success', true, 'failure_id', p_failure_id);
END;
$$;

"Enter Results Manually" — Opens manual lab result form. See Part 4.

"Dismiss" — For non-lab emails. Requires a reason.

CREATE OR REPLACE FUNCTION raw_ops.fn_dismiss_lab_quarantine_v1(
  p_failure_id UUID,
  p_reason TEXT,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
BEGIN
  UPDATE lab_email_failures
  SET status = 'dismissed', resolved_at = now(),
      resolved_by = p_agent_id::text,
      resolution_notes = p_reason
  WHERE id = p_failure_id AND status = 'pending_review';

  IF NOT FOUND THEN
    RETURN jsonb_build_object('success', false, 'error', 'Item not found or not pending');
  END IF;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('LAB_QUARANTINE', p_failure_id, 'INFO', 'Quarantined email dismissed',
    jsonb_build_object('agent_id', p_agent_id, 'reason', p_reason));

  RETURN jsonb_build_object('success', true);
END;
$$;

3.2 Dead-Lettered Refunds¶

Source: raw_ops.refunds WHERE dead_lettered_at IS NOT NULL

Columns: Customer (email), Order (order number), Amount, Type, Failure Reason, When

Expandable detail: Shopify error message, retry count, refund ID (for Shopify Admin lookup).

Actions:

"Send again" — Resets refund to pending for automatic retry.

Execution path: Sets status to pending, clears dead-letter fields. The on_refund_status_change database trigger fires, which calls process-refund Edge Function via pg_net within seconds. If the Edge Function is down, the refund stays in pending and the refund_health monitor fires within 15 minutes.

CREATE OR REPLACE FUNCTION raw_ops.fn_retry_dead_lettered_refund_v1(
  p_refund_id UUID,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_refund RECORD;
BEGIN
  SELECT * INTO v_refund FROM refunds WHERE id = p_refund_id;

  IF v_refund IS NULL OR v_refund.dead_lettered_at IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Refund not found or not dead-lettered');
  END IF;

  UPDATE refunds
  SET status = 'pending',
      dead_lettered_at = NULL,
      dead_letter_reason = NULL,
      retry_count = 0,
      next_retry_at = NULL,
      shopify_error = NULL
  WHERE id = p_refund_id;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('REFUND', p_refund_id, 'INFO', 'Dead-lettered refund retried via portal',
    jsonb_build_object('agent_id', p_agent_id, 'amount_gbp', v_refund.amount_gbp));

  RETURN jsonb_build_object('success', true, 'refund_id', p_refund_id);
END;
$$;

"Record manual refund" — For cases where the Ops Lead processed the refund directly in Shopify Admin.

Form: Shopify refund ID (from Shopify Admin), confirmation checkbox ("I have confirmed this refund was processed in Shopify").

CREATE OR REPLACE FUNCTION raw_ops.fn_resolve_refund_manually_v1(
  p_refund_id UUID,
  p_shopify_refund_id TEXT,
  p_notes TEXT,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
BEGIN
  UPDATE refunds
  SET status = 'processed',
      dead_lettered_at = NULL,
      dead_letter_reason = NULL,
      shopify_refund_id = p_shopify_refund_id,
      internal_notes = COALESCE(internal_notes, '') ||
        E'\n[Manual: ' || p_notes || ' | agent: ' || p_agent_id || ' | ' || now()::text || ']'
  WHERE id = p_refund_id;

  IF NOT FOUND THEN
    RETURN jsonb_build_object('success', false, 'error', 'Refund not found');
  END IF;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('REFUND', p_refund_id, 'INFO', 'Refund resolved manually via portal',
    jsonb_build_object('agent_id', p_agent_id, 'shopify_refund_id', p_shopify_refund_id));

  RETURN jsonb_build_object('success', true);
END;
$$;

3.3 Failed Webhooks (Safe Retry)¶

Replaces v1.0's "Retry All Failed" with safe operator controls.

Source: webhook_inbox WHERE processing_status = 'failed' and seal_webhook_inbox WHERE processing_status = 'failed'

Two sub-panels: Shopify Webhooks, Seal Webhooks. Same UI pattern for both.

Columns: Received (relative time), Type (webhook topic, human-labelled), Error (first 100 chars), Attempts

UI Controls:

Summary bar: "12 failed webhooks. 8 are the same error: Connection timeout."
Error grouping: Group by error message. Show count per group.
Select: Checkbox per item. "Select all in this group" shortcut.
"Retry selected" button: Max 25 per action. Confirmation: "Retry [N] failed webhooks? If the underlying issue is still present, these will fail again."
"Retry next 10" button: Takes the 10 oldest failed items only.

CREATE OR REPLACE FUNCTION raw_ops.fn_retry_selected_webhooks_v1(
  p_webhook_ids UUID[],
  p_agent_id UUID,
  p_source TEXT DEFAULT 'shopify'  -- 'shopify' or 'seal'
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_count INTEGER;
  v_capped_ids UUID[];
BEGIN
  -- Validate source
  IF p_source NOT IN ('shopify', 'seal') THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Invalid source: ' || p_source || '. Must be shopify or seal.');
  END IF;

  -- Hard cap at 25
  v_capped_ids := p_webhook_ids[1:25];

  IF p_source = 'shopify' THEN
    UPDATE webhook_inbox
    SET processing_status = 'pending', retry_count = retry_count + 1
    WHERE id = ANY(v_capped_ids) AND processing_status = 'failed' AND retry_count < 10;
  ELSIF p_source = 'seal' THEN
    UPDATE seal_webhook_inbox
    SET processing_status = 'pending', retry_count = retry_count + 1
    WHERE id = ANY(v_capped_ids) AND processing_status = 'failed' AND retry_count < 10;
  END IF;

  GET DIAGNOSTICS v_count = ROW_COUNT;

  INSERT INTO ops_events (entity_type, kind, message, meta)
  VALUES ('WEBHOOK', 'INFO', v_count || ' ' || p_source || ' webhooks queued for retry',
    jsonb_build_object('agent_id', p_agent_id, 'count', v_count,
      'source', p_source, 'requested', array_length(p_webhook_ids, 1)));

  RETURN jsonb_build_object('success', true, 'retried', v_count,
    'capped', array_length(p_webhook_ids, 1) > 25);
END;
$$;

Execution path: Setting processing_status = 'pending' causes the webhook handler (triggered by the existing webhook processing cron or real-time trigger) to reprocess them. The webhook_health monitor checks every 5 minutes.

3.4 Email Delivery Recovery¶

Source: email_outbox WHERE status = 'failed'

Note: v1.0 had a mismatch between source (stuck pending emails) and function (retried failed emails). v1.1 fixes this.

Panel shows two metrics: - Failed emails (status = 'failed'): Can be retried. - Stuck emails (status = 'pending', age > 30 minutes): Cannot be retried (they're already pending). Indicates the email worker itself is stalled. Guidance: "Check Background Jobs tile. If the email sender job is stale or failed, use 'Run email sender now' in Worker Controls."

Action: "Retry failed emails" (max 25)

CREATE OR REPLACE FUNCTION raw_ops.fn_retry_failed_outbox_v1(
  p_limit INTEGER DEFAULT 25,
  p_agent_id UUID DEFAULT NULL
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_count INTEGER;
BEGIN
  WITH retried AS (
    UPDATE email_outbox
    SET status = 'pending', updated_at = now(),
        retry_count = COALESCE(retry_count, 0) + 1
    WHERE id IN (
      SELECT id FROM email_outbox
      WHERE status = 'failed' AND COALESCE(retry_count, 0) < 5
      ORDER BY created_at ASC
      LIMIT p_limit
    )
    RETURNING id
  )
  SELECT COUNT(*) INTO v_count FROM retried;

  INSERT INTO ops_events (entity_type, kind, message, meta)
  VALUES ('EMAIL_OUTBOX', 'INFO', v_count || ' failed emails queued for retry',
    jsonb_build_object('agent_id', p_agent_id, 'count', v_count));

  RETURN jsonb_build_object('success', true, 'retried', v_count);
END;
$$;

Execution path: Emails set to pending are picked up by the process-outbox cron job (runs every minute). If that job is stalled, the outbox health monitor fires within 2 minutes.

3.5 Inventory Adjustment Form¶

Unchanged from v1.0 except: uses agent_id from auth, not free-text.

Calls existing raw_ops.execute_manual_inventory_adjustment() with agent_id resolved from JWT.

Confirmation dialog: "You are about to adjust batch [CODE] by [±X] kg. Current stock: [Y] kg. After adjustment: [Z] kg. Reason: [reason]. This cannot be undone. Continue?"

3.6 Referral Fraud Review Panel¶

Unchanged from v1.0 except: uses agent_id from auth.

Calls existing raw_ops.manual_override_referral() with agent_id.

3.7 Dunning Pipeline Viewer¶

Unchanged from v1.0 except: uses agent_id from auth, and Day 10+ rows show warning: "This subscription will auto-pause tomorrow. Consider calling the customer directly."

3.8 Stuck Proof Jobs¶

Source: raw_ops.proof_jobs WHERE state IN ('queued', 'failed') AND created_at < now() - interval '5 minutes'

When: Batch created but no QR code or proof page generated. The proof_system_health monitor fires.

Columns: Batch Code, Created, State, Attempts, Last Error

Action: "Retry" — Resets proof job for reprocessing.

Execution path: Sets state to queued and resets attempts. The process-proof-jobs Edge Function (invoked by pg_cron every minute) picks up queued jobs. If the proof worker itself is stalled, the Ops Lead should also use "Run proof generator now" in Worker Controls.

CREATE OR REPLACE FUNCTION raw_ops.fn_retry_stuck_proof_job_v1(
  p_batch_id UUID,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_job_id UUID;
BEGIN
  -- Reset existing stuck job
  UPDATE proof_jobs
  SET state = 'queued', attempts = 0, last_error = NULL
  WHERE batch_id = p_batch_id AND state IN ('queued', 'failed')
  RETURNING id INTO v_job_id;

  IF v_job_id IS NULL THEN
    -- Create new job if none exists
    INSERT INTO proof_jobs (batch_id, job_key, state)
    VALUES (p_batch_id, 'proof:batch:' || p_batch_id, 'queued')
    RETURNING id INTO v_job_id;
  END IF;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('PROOF_JOB', v_job_id, 'INFO', 'Proof job retried via portal',
    jsonb_build_object('agent_id', p_agent_id, 'batch_id', p_batch_id));

  RETURN jsonb_build_object('success', true, 'proof_job_id', v_job_id);
END;
$$;

Part 4: Manual Lab Result Entry (Redesigned)¶

Problem with v1.0¶

v1.0 captured a single Enterobacteriaceae value and declared PASS if < 5000. The actual lab certificate contains an official verdict and sub-sample data. A single-value check can produce a false PASS.

v1.2 Design: Lab Verdict First¶

The Ops Lead enters what the lab certificate says. The system records it. The lab's official verdict governs batch release, not a portal-side calculation.

Form Fields:

Field	Type	Required	Notes
Batch	Dropdown (QA_HOLD batches only)	Yes
Lab Name	Text (default: i2 Analytical)	Yes
Certificate Date	Date picker	Yes
Certificate Reference	Text	No
Lab Verdict	Dropdown: PASS / FAIL	Yes	As stated on the certificate
Salmonella (25g)	Dropdown: Not Detected / Detected	Yes
Listeria (25g)	Dropdown: Not Detected / Detected	Yes
Enterobacteriaceae	Number (cfu/g, official value)	Yes
Lab Certificate PDF	File upload	Yes	Stored in Supabase Storage. Required for both PASS and FAIL.
Linked Quarantine Item	Auto-populated if opened from quarantine	Auto

Safety Guards:

Lab verdict is the governing field. However, a PASS verdict is hard-blocked if pathogens are detected. If the operator selects PASS but enters Salmonella = Detected or Listeria = Detected, the form refuses submission: "Cannot submit PASS with a detected pathogen. Check the certificate. If the lab certificate genuinely says PASS despite these values, escalate to the founder." This is a safety invariant, not a UX choice.
One result per batch. Idempotency guard prevents double-submit and prevents manual entry when an automated result already exists:

-- One lab result per batch, period. Prevents both double-manual and manual-after-automated.
CREATE UNIQUE INDEX IF NOT EXISTS idx_lab_results_one_per_batch
  ON raw_ops.lab_results (batch_id);

Batch must be in QA_HOLD. Function rejects if batch is already released/rejected.
PDF required for both PASS and FAIL. Failed-batch evidence is as important as pass evidence for compliance.

CREATE OR REPLACE FUNCTION raw_ops.fn_manual_lab_result_entry_v1(
  p_batch_code TEXT,
  p_lab_name TEXT,
  p_reported_at DATE,
  p_certificate_ref TEXT,
  p_lab_verdict TEXT,          -- 'PASS' or 'FAIL' (from certificate)
  p_salmonella TEXT,           -- 'absent' or 'detected'
  p_listeria TEXT,             -- 'absent' or 'detected'
  p_enterobacteriaceae NUMERIC,
  p_certificate_pdf_path TEXT,
  p_agent_id UUID,
  p_quarantine_id UUID DEFAULT NULL
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_batch RECORD;
  v_lab_result_id UUID;
BEGIN
  -- Validate batch exists and is in QA_HOLD
  SELECT * INTO v_batch FROM batches WHERE batch_code = p_batch_code;
  IF v_batch IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;
  IF v_batch.status != 'QA_HOLD' THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Batch is ' || v_batch.status || ', not QA_HOLD. Cannot enter results.');
  END IF;

  -- Hard-block: PASS with detected pathogen is not allowed
  IF p_lab_verdict = 'PASS' AND (p_salmonella = 'detected' OR p_listeria = 'detected') THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Cannot record PASS with a detected pathogen. '
        || 'Check the certificate. If the lab certificate genuinely says PASS despite these values, '
        || 'escalate to the founder for manual review.');
  END IF;

  -- Insert lab result. The on_lab_result_insert trigger handles batch release/rejection.
  -- Lab verdict governs the outcome, not portal-side calculation.
  INSERT INTO lab_results (
    batch_id, lab_name, reported_at, certificate_reference,
    salmonella_result, listeria_result, enterobacteriaceae_cfu_g,
    outcome, entered_by, entry_method, certificate_pdf_path,
    source_email_id  -- reuse for idempotency (NULL for manual, unique index on batch_id + entry_method)
  ) VALUES (
    v_batch.id, p_lab_name, p_reported_at, p_certificate_ref,
    p_salmonella, p_listeria, p_enterobacteriaceae,
    p_lab_verdict, p_agent_id::text, 'manual_portal', p_certificate_pdf_path
  )
  RETURNING id INTO v_lab_result_id;

  -- Mark quarantine item as resolved if linked
  IF p_quarantine_id IS NOT NULL THEN
    UPDATE lab_email_failures
    SET status = 'resolved_manual', resolved_at = now(),
        resolved_by = p_agent_id::text
    WHERE id = p_quarantine_id;
  END IF;

  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('LAB_RESULT', v_lab_result_id, 'INFO', 'Lab result entered manually via portal',
    jsonb_build_object(
      'agent_id', p_agent_id, 'batch_code', p_batch_code,
      'lab_verdict', p_lab_verdict
    ));

  RETURN jsonb_build_object(
    'success', true,
    'lab_result_id', v_lab_result_id,
    'batch_code', p_batch_code,
    'lab_verdict', p_lab_verdict,
    'batch_new_status', CASE WHEN p_lab_verdict = 'PASS' THEN 'RELEASED' ELSE 'REJECTED' END
  );
EXCEPTION
  WHEN unique_violation THEN
    RETURN jsonb_build_object('success', false,
      'error', 'A lab result has already been entered for this batch. Check the Batches tab.');
END;
$$;

Execution path: The INSERT INTO lab_results fires the existing on_lab_result_insert trigger, which handles batch status change to RELEASED or REJECTED, order allocation (if PASS), proof portal update, and partner notification. This is the same path as automated lab processing. The manual form is just a different entry point into the same pipeline.

Part 5: Worker Controls¶

Purpose¶

When a background worker is stalled, the Ops Lead needs a "run it now" button, not just visibility that it's broken.

Available Controls¶

Button Label	What It Does	When To Use
Run email sender now	Invokes `process-outbox` Edge Function	Emails stuck as pending, email worker cron stale
Run proof generator now	Invokes `process-proof-jobs` Edge Function	Proof jobs stuck as queued
Run webhook processor now	Invokes `shopify-webhook` reprocessor	Webhooks stuck as pending after retry
Run dunning processor now	Invokes `process-dunning` Edge Function	Dunning emails not sending on schedule
Run lifecycle processor now	Invokes `send-lifecycle-events` Edge Function	Lifecycle events not sending on schedule

Implementation¶

Each button calls an Edge Function via the Ops Portal's authenticated Supabase client. The Edge Functions already exist; the portal just needs a way to invoke them manually.

CREATE OR REPLACE FUNCTION raw_ops.fn_trigger_worker_run_v1(
  p_worker TEXT,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
DECLARE
  v_allowed_workers TEXT[] := ARRAY[
    'process-outbox', 'process-proof-jobs', 'process-dunning',
    'shopify-webhook', 'send-lifecycle-events'
  ];
BEGIN
  IF NOT (p_worker = ANY(v_allowed_workers)) THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Worker not in allowed list: ' || p_worker);
  END IF;

  -- Use pg_net to invoke the Edge Function (same pattern as fn_invoke_monitor)
  PERFORM raw_ops.fn_invoke_edge_function(p_worker, '{}'::jsonb);

  INSERT INTO ops_events (entity_type, kind, message, meta)
  VALUES ('WORKER', 'INFO', 'Worker manually triggered: ' || p_worker,
    jsonb_build_object('agent_id', p_agent_id, 'worker', p_worker));

  RETURN jsonb_build_object('success', true, 'worker', p_worker, 'invoked_at', now());
END;
$$;

Confirmation dialog: "This will run the [worker name] immediately. Use this when the background job appears stalled. Continue?"

Cooldown: Button disabled for 60 seconds after invocation to prevent double-firing.

Part 6: Missing-Event Recovery¶

Problem¶

v1.0 handled failed rows. It did not handle events that never arrived.

6.1 Pull Missing Order from Shopify¶

When: A customer says their order went through but it's not in the system. Or the order_ingestion_slo monitor fires but webhook_inbox shows no failed webhooks (meaning the webhook never arrived).

Form: Single field: Shopify Order Number (e.g., #1042)

Implementation: Calls a new Edge Function that fetches the order from Shopify Admin API and replays it through the standard ingestion pipeline.

Portal → ops_replay_shopify_order RPC → pg_net → replay-shopify-order Edge Function
    ↓
Edge Function:
  1. GET /admin/api/2024-01/orders/{order_id}.json (by order number)
  2. Transform to webhook payload format
  3. Call fn_process_shopify_order_v3() (same function used by live webhooks)
  4. Return result

CREATE OR REPLACE FUNCTION raw_ops.fn_request_order_replay_v1(
  p_order_number TEXT,
  p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = raw_ops, pg_temp
AS $$
BEGIN
  -- Check if order already exists
  IF EXISTS (SELECT 1 FROM orders WHERE order_number_label = p_order_number) THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Order ' || p_order_number || ' already exists in the system.');
  END IF;

  -- Invoke the replay Edge Function via pg_net
  PERFORM raw_ops.fn_invoke_edge_function('replay-shopify-order',
    jsonb_build_object('order_number', p_order_number));

  INSERT INTO ops_events (entity_type, kind, message, meta)
  VALUES ('ORDER', 'INFO', 'Order replay requested: ' || p_order_number,
    jsonb_build_object('agent_id', p_agent_id, 'order_number', p_order_number));

  RETURN jsonb_build_object('success', true,
    'message', 'Order replay requested. Check the order list in 30 seconds.');
END;
$$;

Note: The replay-shopify-order Edge Function is a new function that needs to be built. It is a thin wrapper: fetch from Shopify Admin API, transform, call the existing processing function.

Part 7: Compliance Recording¶

7.1-7.4: Unchanged from v1.0¶

Temperature Audit Form, Proof Portal Spot Check, HACCP Evidence Pack Logger, Lab Sample Submission Log. All unchanged except: all use agent_id from auth, not free-text.

7.5 Rejected Batch Disposition (New)¶

When: A batch fails lab testing (status = REJECTED). Someone must record what happened to the physical stock.

Form Fields:

Field	Type	Required
Batch Code	Dropdown (REJECTED batches without disposition)	Yes
Disposition Action	Dropdown: Destroyed / Returned to Supplier / Held for Re-test	Yes
Quantity Disposed (kg)	Number	Yes
Disposal Method	Text (e.g., "Commercial waste contractor")	Yes if Destroyed
Witnessed By	Text	Yes if Destroyed
Date	Date	Yes
Notes	Text	No
Evidence (photos/docs)	File upload	No

CREATE TABLE IF NOT EXISTS raw_ops.batch_dispositions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  batch_id UUID NOT NULL REFERENCES raw_ops.batches(id),
  batch_code TEXT NOT NULL,
  disposition_action TEXT NOT NULL
    CHECK (disposition_action IN ('destroyed', 'returned_to_supplier', 'held_for_retest')),
  quantity_kg NUMERIC NOT NULL,
  disposal_method TEXT,
  witnessed_by TEXT,
  disposition_date DATE NOT NULL,
  evidence_paths TEXT[],
  notes TEXT,
  recorded_by UUID NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE UNIQUE INDEX idx_batch_disposition_one_per_batch
  ON raw_ops.batch_dispositions(batch_id);

7.6 Compliance Insert Functions¶

All compliance forms use the same pattern: validate inputs, insert record, log to ops_events, return success.

-- Temperature Audit
CREATE OR REPLACE FUNCTION raw_ops.fn_record_temperature_audit_v1(
  p_audit_date DATE, p_shipment_id UUID, p_logger_serial TEXT,
  p_max_temp NUMERIC, p_min_temp NUMERIC, p_duration_above_minus5 INTEGER,
  p_evidence_paths TEXT[], p_notes TEXT, p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql SECURITY DEFINER SET search_path = raw_ops, pg_temp AS $$
DECLARE v_pass BOOLEAN; v_id UUID;
BEGIN
  v_pass := (p_max_temp <= -5.0) OR (p_max_temp <= 0.0 AND COALESCE(p_duration_above_minus5, 0) < 30);
  INSERT INTO temperature_audit_records (audit_date, shipment_id, logger_serial,
    max_temp_celsius, min_temp_celsius, duration_above_minus5_mins, pass,
    evidence_paths, notes, recorded_by)
  VALUES (p_audit_date, p_shipment_id, p_logger_serial, p_max_temp, p_min_temp,
    p_duration_above_minus5, v_pass, p_evidence_paths, p_notes, p_agent_id::text)
  RETURNING id INTO v_id;
  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('TEMP_AUDIT', v_id, 'INFO', 'Temperature audit recorded',
    jsonb_build_object('agent_id', p_agent_id, 'pass', v_pass, 'shipment_id', p_shipment_id));
  RETURN jsonb_build_object('success', true, 'id', v_id, 'pass', v_pass);
END; $$;

-- Proof Portal Spot Check
CREATE OR REPLACE FUNCTION raw_ops.fn_record_proof_spot_check_v1(
  p_check_date DATE, p_batch_code TEXT, p_qr_ok BOOLEAN, p_page_ok BOOLEAN,
  p_results_ok BOOLEAN, p_pdf_ok BOOLEAN, p_notes TEXT, p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql SECURITY DEFINER SET search_path = raw_ops, pg_temp AS $$
DECLARE v_id UUID;
BEGIN
  INSERT INTO proof_portal_spot_checks (check_date, batch_code, qr_scan_ok,
    page_loads_ok, results_display_ok, pdf_download_ok, notes, checked_by)
  VALUES (p_check_date, p_batch_code, p_qr_ok, p_page_ok, p_results_ok, p_pdf_ok,
    p_notes, p_agent_id::text)
  RETURNING id INTO v_id;
  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('PROOF_SPOT_CHECK', v_id, 'INFO', 'Proof portal spot check recorded',
    jsonb_build_object('agent_id', p_agent_id, 'batch_code', p_batch_code,
      'all_pass', p_qr_ok AND p_page_ok AND p_results_ok AND p_pdf_ok));
  RETURN jsonb_build_object('success', true, 'id', v_id);
END; $$;

-- HACCP Evidence Pack
CREATE OR REPLACE FUNCTION raw_ops.fn_record_evidence_pack_v1(
  p_ship_date DATE, p_ship_day_type TEXT,
  p_pick_list BOOLEAN, p_qc_record BOOLEAN, p_packout_sheet BOOLEAN,
  p_photo_proofs BOOLEAN, p_logger_serials BOOLEAN, p_courier_manifest BOOLEAN,
  p_file_paths TEXT[], p_notes TEXT, p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql SECURITY DEFINER SET search_path = raw_ops, pg_temp AS $$
DECLARE v_id UUID;
BEGIN
  -- Normalize to lowercase to match CHECK constraint
  INSERT INTO evidence_packs (ship_date, ship_day_type, pick_list_filed, qc_record_filed,
    packout_sheet_filed, photo_proofs_filed, logger_serials_recorded, courier_manifest_filed,
    file_paths, notes, recorded_by)
  VALUES (p_ship_date, lower(trim(p_ship_day_type)), p_pick_list, p_qc_record, p_packout_sheet,
    p_photo_proofs, p_logger_serials, p_courier_manifest, p_file_paths, p_notes, p_agent_id::text)
  RETURNING id INTO v_id;
  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('EVIDENCE_PACK', v_id, 'INFO', 'Evidence pack recorded',
    jsonb_build_object('agent_id', p_agent_id, 'ship_date', p_ship_date));
  RETURN jsonb_build_object('success', true, 'id', v_id);
END; $$;

-- Lab Sample Submission
CREATE OR REPLACE FUNCTION raw_ops.fn_record_sample_submission_v1(
  p_batch_code TEXT, p_sample_sent_date DATE, p_lab_name TEXT,
  p_tracking_ref TEXT, p_trays_sampled INTEGER, p_notes TEXT, p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql SECURITY DEFINER SET search_path = raw_ops, pg_temp AS $$
DECLARE v_batch_id UUID; v_id UUID; v_batch_status TEXT;
BEGIN
  SELECT id, status INTO v_batch_id, v_batch_status FROM batches WHERE batch_code = p_batch_code;
  IF v_batch_id IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;
  IF v_batch_status != 'QA_HOLD' THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Batch is ' || v_batch_status || ', not in testing. Sample submission is only for batches awaiting lab results.');
  END IF;
  IF (SELECT status FROM batches WHERE id = v_batch_id) != 'QA_HOLD' THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Samples can only be logged for batches in testing (QA_HOLD)');
  END IF;
  INSERT INTO lab_sample_submissions (batch_id, batch_code, sample_sent_date,
    lab_name, tracking_reference, trays_sampled, notes, recorded_by)
  VALUES (v_batch_id, p_batch_code, p_sample_sent_date, p_lab_name,
    p_tracking_ref, p_trays_sampled, p_notes, p_agent_id::text)
  RETURNING id INTO v_id;
  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('LAB_SAMPLE', v_id, 'INFO', 'Lab sample submission recorded',
    jsonb_build_object('agent_id', p_agent_id, 'batch_code', p_batch_code));
  RETURN jsonb_build_object('success', true, 'id', v_id);
END; $$;

-- Batch Disposition
CREATE OR REPLACE FUNCTION raw_ops.fn_record_batch_disposition_v1(
  p_batch_code TEXT, p_disposition_action TEXT, p_quantity_kg NUMERIC,
  p_disposal_method TEXT, p_witnessed_by TEXT, p_disposition_date DATE,
  p_evidence_paths TEXT[], p_notes TEXT, p_agent_id UUID
)
RETURNS JSONB
LANGUAGE plpgsql SECURITY DEFINER SET search_path = raw_ops, pg_temp AS $$
DECLARE v_batch RECORD; v_id UUID;
BEGIN
  SELECT * INTO v_batch FROM batches WHERE batch_code = p_batch_code;
  IF v_batch IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;
  IF v_batch.status != 'REJECTED' THEN
    RETURN jsonb_build_object('success', false,
      'error', 'Only rejected batches can have a disposition record');
  END IF;
  -- Enforce witness and disposal method for destroyed batches
  IF p_disposition_action = 'destroyed' THEN
    IF p_disposal_method IS NULL OR trim(p_disposal_method) = '' THEN
      RETURN jsonb_build_object('success', false,
        'error', 'Disposal method is required when batch is destroyed');
    END IF;
    IF p_witnessed_by IS NULL OR trim(p_witnessed_by) = '' THEN
      RETURN jsonb_build_object('success', false,
        'error', 'Witness name is required when batch is destroyed');
    END IF;
  END IF;
  INSERT INTO batch_dispositions (batch_id, batch_code, disposition_action, quantity_kg,
    disposal_method, witnessed_by, disposition_date, evidence_paths, notes, recorded_by)
  VALUES (v_batch.id, p_batch_code, p_disposition_action, p_quantity_kg,
    p_disposal_method, p_witnessed_by, p_disposition_date, p_evidence_paths, p_notes, p_agent_id)
  RETURNING id INTO v_id;
  INSERT INTO ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('BATCH_DISPOSITION', v_id, 'INFO', 'Batch disposition recorded',
    jsonb_build_object('agent_id', p_agent_id, 'batch_code', p_batch_code,
      'action', p_disposition_action, 'quantity_kg', p_quantity_kg));
  RETURN jsonb_build_object('success', true, 'id', v_id);
END; $$;

Part 8: Canned Diagnostics¶

Unchanged from v1.0 structurally. 10 pre-built read-only views. All wrapped in fn_diagnostic_*_v1() functions returning JSONB arrays, rendered as sortable/filterable tables.

Part 9: UI Language Guidelines¶

Principle¶

The Ops Lead is a compliance-trained operator, not an engineer. Every label, message, and error should be written for someone who has never seen a database.

Rules¶

Instead of	Write
UUID	(hide entirely, show order number or customer email)
dead-lettered	stuck (needs attention)
retry_count	attempts
webhook_topic	event type (with human label: "New Order", "Subscription Updated")
processing_status = 'failed'	failed
QA_HOLD	Testing in progress
RELEASED	Cleared for sale
REJECTED	Failed testing
DEPLETED	Out of stock
fn_retry_*	(never shown, behind button label)

Error Messages to Operator¶

System Error	Show to Operator
`P0001: Batch not found`	That batch code wasn't found. Check for typos.
`unique_violation`	This has already been done. Check the relevant tab.
`P0002: Agent not found`	Your account isn't set up yet. Ask the founder to add you.
Network error	Connection lost. Check your internet and try again.
500 from Edge Function	Something went wrong on our end. Try once more. If it fails again, escalate to the founder.

Expandable Technical Detail¶

Every exception item has a "Show details" toggle that reveals: IDs, raw error messages, timestamps, retry counts. This is for when the Ops Lead is on a call with the founder describing an issue. It is collapsed by default.

Part 10: Alert-to-Resolution Mapping¶

Complete mapping of every MON-01 alert to portal action.

Monitor	Severity	Operator Action	Founder-Only?
order_ingestion_slo	warning/critical	Check Order Pipeline tile → Retry failed webhooks → Pull missing order	If persistent after retry
outbox_health	warning/critical	Check Email Delivery tile → Retry failed emails → Run email sender now	If worker repeatedly crashes
stock_levels	critical	Inventory Adjustment Form. Call co-packer (Jamie) for reorder	No
lab_sla	warning	Check Lab Processing tile → Lab Quarantine Viewer. Check Lab Sample Submission Log	No
working_capital	warning	View Batch Testing tile. Review with founder if structural	Yes (if systemic)
daily_snapshot	info	Review System Health Dashboard. No action unless red	No
allocation_health	warning	Check Inventory tile. Inventory Adjustment if needed	No
webhook_health	warning	Retry selected failed webhooks	No
webhook_daily_digest	info	Review only	No
monitoring_heartbeat	critical	Check Background Jobs. Escalate if heartbeat itself is down	Yes
lab_ingestion_health	warning/critical	Lab Quarantine Viewer	If `storage_upload_failed`
courier_watchdog_health	warning	Review in Support Workstation. Call DPD if systemic	No
3pl_export_health	warning	Check Background Jobs. Manual export in Fulfillment tab	No
support_health	warning	Assign and resolve tickets in Support Workstation	No
dispatch_processing_health	warning	Check Background Jobs + Fulfillment tab	No
ad_spend_freshness	warning	Check Make.com dashboard (view-only). Informational	No
cost_tracking_health	warning	Check Make.com dashboard (view-only). Informational	No
proof_system_health	warning	Stuck Proof Jobs panel (3.8). Run proof generator now (Worker Controls)	If repeated failures
proof_portal_health	warning	Run Proof Portal Spot Check. Report to founder if page down	Yes (if site down)
lifecycle_events_health	warning	Check Background Jobs. Run lifecycle processor now (Worker Controls)	If repeatedly crashing
abandonment_health	warning	Check Background Jobs	No
health_data_completeness	info	Informational only	No
inventory_planning_health	warning	Review inventory diagnostics. Contact co-packer	No
referral_system_health	warning	Referral Fraud Review Panel	No
monitor_coverage	info	Informational. Escalate if new monitor needed	Yes
canary_system_health	critical	Escalate to founder immediately	Always
refund_health	warning	Dead-Lettered Refunds panel	No

Part 11: Implementation Plan¶

Build Order¶

Phase	What	Days	Gate
0	Auth model	1	Agent resolution function, RPC wrapper pattern, portal-side JWT extraction. Test with one sample action. ChatGPT review.
1a	Dashboard data function + cron registry	1	All 13 tiles returning correct data. ChatGPT review of tile logic.
1b	Dashboard UI	1.5	Tiles rendering, auto-refresh, click-to-expand working.
1c	Lab Quarantine (most complex exception)	2	List, retry, dismiss, manual entry with idempotency guard. Full test with mock data.
1d	Refund + Webhook + Outbox exception panels	2	Safe retry with caps, confirmation dialogs, execution path verified.
1e	Remaining exceptions (inventory, referral, dunning)	1.5	Forms + actions working.
1f	Worker Controls + Order Replay	1.5	Run-now buttons, replay Edge Function, cooldown.
1g	Canned Diagnostics	1	10 views rendering as sortable tables.
2	Compliance Recording Forms	2	All 5 forms (temp audit, spot check, evidence pack, sample log, batch disposition).
	Total	~13.5 days

Review Gates¶

Each phase ends with: 1. Claude Code implementation complete 2. Functional test with mock data 3. ChatGPT review of the implemented module (persona: Daniel) 4. Fix any gaps found 5. Move to next phase

Part 12: RPC Registry (Complete)¶

Public RPC	Delegates To	Auth	Purpose
`ops_get_system_health`	`fn_get_system_health_dashboard_v1`	Read-only	Dashboard data
`ops_get_cron_health`	`fn_get_cron_health_summary_v1`	Read-only	Cron job status
`ops_diagnostic_*` (10 functions)	`fn_diagnostic_*_v1`	Read-only	Canned diagnostics
`ops_retry_lab_quarantine`	`fn_retry_lab_quarantine_v1`	Agent resolved	Retry quarantined lab email
`ops_dismiss_lab_quarantine`	`fn_dismiss_lab_quarantine_v1`	Agent resolved	Dismiss non-lab email
`ops_manual_lab_result`	`fn_manual_lab_result_entry_v1`	Agent resolved	Enter lab results manually
`ops_retry_dead_lettered_refund`	`fn_retry_dead_lettered_refund_v1`	Agent resolved	Retry stuck refund
`ops_resolve_refund_manually`	`fn_resolve_refund_manually_v1`	Agent resolved	Mark refund processed in Shopify
`ops_retry_selected_webhooks`	`fn_retry_selected_webhooks_v1`	Agent resolved	Retry selected failed webhooks
`ops_retry_failed_outbox`	`fn_retry_failed_outbox_v1`	Agent resolved	Retry failed emails
`ops_mark_dunning_email_sent`	`fn_ops_mark_dunning_email_sent_v1`	Agent resolved	Mark dunning email sent
`ops_retry_proof_job`	`fn_retry_stuck_proof_job_v1`	Agent resolved	Retry stuck proof job
`ops_trigger_worker`	`fn_trigger_worker_run_v1`	Agent resolved	Run background worker now
`ops_request_order_replay`	`fn_request_order_replay_v1`	Agent resolved	Pull missing order from Shopify
`ops_execute_inventory_adjustment`	`execute_manual_inventory_adjustment`	Agent resolved	Manual stock correction
`ops_override_referral`	`manual_override_referral`	Agent resolved	Approve/reject flagged referral
`ops_record_temp_audit`	`fn_record_temperature_audit_v1`	Agent resolved	Temperature audit entry
`ops_record_spot_check`	`fn_record_proof_spot_check_v1`	Agent resolved	Proof portal spot check
`ops_record_evidence_pack`	`fn_record_evidence_pack_v1`	Agent resolved	HACCP evidence pack
`ops_record_sample_submission`	`fn_record_sample_submission_v1`	Agent resolved	Lab sample log
`ops_record_batch_disposition`	`fn_record_batch_disposition_v1`	Agent resolved	Rejected batch disposal

Part 13: ChatGPT Review Prompt (Updated)¶

You are Daniel, joining Protocol Raw as Operations Lead at Month 7.

Background: Agent Onboarding Supervisor at RIA Money Transfers.
Strong process execution and compliance skills. Zero coding ability.
Never used a database, terminal, or IDE.

Your tools:
- Operations Portal (ops.protocolraw.co.uk)
- Slack (#ops-alerts, #ops-urgent, #daily-ops)
- Metabase dashboards (read-only)
- Customer.io dashboard (email campaign monitoring)
- Make.com dashboard (view-only)
- Google Drive (HACCP documentation)
- Phone/email (supplier, lab, 3PL coordination)

Physical operations responsibilities:
- 3PL/DPD coordination
- Lab sample logistics (composite samples to UKAS lab)
- HACCP documentation and FSA audit readiness
- Temperature logger audit programme (weekly, 10 shipments)
- Packaging material supply chain coordination
- Co-packer schedule alignment (with Jamie)
- Proof portal QC spot checks (monthly, 5 batches)
- Evidence pack filing per ship day
- Rejected batch disposal coordination and recording

This is v1.2, with targeted fixes after your review of v1.1 found:
- Auth transport was internally inconsistent (header vs argument)
- Server-side retry eligibility not enforced for lab quarantine
- Invalid webhook source fell through silently
- Cron health severity model did not match tile rule
- Batch disposition did not enforce witness/disposal for destroyed batches
- ship_day_type case mismatch between UI and schema
- Document drift (v1.1 labels remaining in a v1.2 document)

Review v1.2 and confirm whether v1.0 and v1.1 issues are resolved.
Then identify any remaining gaps:

1. Any scenario where you're stuck without calling the founder
   (that isn't explicitly classified as founder-only)
2. Any action that assumes technical knowledge
3. Any alert without a portal resolution path
4. Any physical workflow without a recording form
5. Any information shown without agency to act on it
6. Any confusing terminology
7. Any compliance obligation without evidence capture
8. Any execution path that is still implicit or hopeful
9. Any auth bypass or governance hole

Be thorough. Be adversarial. Find every gap.

Version History¶

Version	Date	Changes	Author
1.0	2026-04-12	Initial specification	Anton + Claude
1.1	2026-04-12	Complete rewrite: auth model, execution paths, lab entry safety, missing-event recovery, bulk retry safety, honest governing principle, batch disposition, UI language guidelines	Anton + Claude
1.2	2026-04-12	Targeted fixes from ChatGPT review: auth section reworded honestly, dashboard contract completed (support tile, SLO metric), lab idempotency scope fixed, PASS+pathogen hard-blocked, worker allowlist aligned, proof job panel restored, all compliance RPCs completed. Final cleanup: auth transport unified (RPC argument only), server-side retry eligibility enforced, invalid webhook source rejected, cron severity model aligned to tile rules (stale/overdue distinction), batch disposition witness/disposal enforced for destroyed, ship_day_type normalized, all v1.1 drift removed	Anton + Claude

Document Owner: Protocol Raw Operations Technical Owner: Anton (Founder, Systems Architect) Next Review: 2026-07-12

End of SOP-OPS-01 v1.2