Skip to content

SOP-PROOF-00: Proof-of-Safety System v1.0

End-to-end proof generation, label production, and public verification for every batch

Document ID: SOP-PROOF-00-v1.0 Version: 1.0 Status: ✅ Production Ready Last Updated: 2026-03-20 Owner: Anton (Founder) Replaces: SOP-0R v4.2 Review Date: 2026-06-20


Key Changes from SOP-0R v4.2

  • Complete architecture rewrite to mandated Supabase-native pattern (pg_cron → pg_net → Edge Function → PostgreSQL function → ops-alerter)
  • "Background worker" eliminated — replaced with named execution chain through process-proof-jobs Edge Function and PostgreSQL state machine
  • Proof job state machine fully specifiedfn_create_batch_v1(), fn_claim_proof_jobs_v1(), fn_finalize_proof_job_v1(), fn_mark_proof_job_failed_v1()
  • Queue-claiming mechanics definedFOR UPDATE SKIP LOCKED with LIMIT 10 for concurrent-safe, bounded processing
  • Dead-letter queue — 5 attempts max, then dead_letter state with critical Slack alert
  • Retry policy for all external dependencies — QR generation (internal, no external API), Browserless (3 attempts, 15s timeout), Resend (3 attempts, 10s timeout)
  • QR generation internalisedqrcode Deno library replaces third-party api.qrserver.com dependency
  • Partial completion detection — monitoring catches batches where some steps succeeded but others failed
  • SOP-MON-01 integrationfn_check_proof_system_health_v2() monitor on 5-minute schedule
  • RLS policies specified in full — not just named, but defined with CREATE POLICY statements
  • Auth model documented for all operational Edge Functions
  • ops_events logging at every state transition
  • Honest operational claims — removed unsupported "zero manual intervention" assertions
  • Removed "Production Ready for 100,000+ Customers" pending real-world validation at scale

Purpose

This SOP defines the complete proof-of-safety system: from batch creation through QR code generation, pouch label production, co-packer delivery, and public proof page verification. It is the operational backbone of Protocol Raw's "Proof, Not Promises" differentiator.

Every batch gets a unique, publicly accessible proof page with independently verified lab results. Customers scan a QR code on their pouch and see the evidence. This system generates the QR codes, produces the labels, and delivers them to the co-packer automatically.

This SOP does not cover: - Lab result processing and batch release (see SOP-LAB-01) - Proof portal frontend design and copy (see SOP-PROOF-01) - Batch allocation and fulfilment (see SOP-INV-01, SOP-ORD-02)


Scope

Applies to: All batches created via the Operations Portal

This SOP owns: - fn_create_batch_v1() — because proof generation begins at batch INSERT. This function is called from the Ops Portal via the create-batch Edge Function. SOP-LAB-01 owns the Ops Portal UI, operational procedures for when to create batches, and all downstream lab processing - Proof job queue and state machine (queuing, claiming, step execution, retry, dead letter) - QR code generation for pouch labels - Pouch label PDF generation (Browserless) - Co-packer email delivery (Resend) - Batch asset URL updates (fn_set_batch_qr_assets_v1, fn_set_batch_label_asset_v1) - Monitoring of the proof generation pipeline

This SOP does NOT own: - When to create batches or Ops Portal batch creation UI (SOP-LAB-01) - Lab result processing, batch release/rejection logic (SOP-LAB-01) - Proof portal page design, copy, analytics, share functionality (SOP-PROOF-01) - Order allocation or fulfilment (SOP-LAB-01, SOP-ORD-02)

Users: - Operations team (batch creation via Ops Portal) - Automated systems (proof generation, label generation, co-packer email) - Monitoring and alerts (see SOP-MON-01) - Customers (proof verification via QR scan)

Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release (upstream — lab processing, batch release) - SOP-PROOF-01: Proof Portal System (downstream — frontend display, analytics, caching) - SOP-MON-01: Monitoring and Alerting Architecture (monitoring pattern) - SOP-INV-01: Inventory Control (allocation after release) - SOP-ORD-02: 3PL Order Export (fulfilment after allocation)


Architecture

Complete Execution Chain

Ops Portal UI → create-batch Edge Function (thin wrapper)
raw_ops.fn_create_batch_v1() (PostgreSQL — all batch creation logic)
raw_ops.batches INSERT fires trigger
raw_ops.fn_enqueue_proof_job() (trigger function — creates proof_jobs record)
pg_cron (every 30 seconds)
pg_net (async HTTP)
process-proof-jobs Edge Function (thin wrapper — external I/O only)
raw_ops.fn_claim_proof_jobs_v1() (claim queued jobs, FOR UPDATE SKIP LOCKED)
Edge Function executes external steps:
  1. Generate QR code (internal library, no external API)
  2. Upload QR to Supabase Storage
  3. Generate label PDF (Browserless API)
  4. Upload label PDF to Supabase Storage
  5. Send co-packer email (Resend API)
raw_ops.fn_finalize_proof_job_v1() (mark done, update batch record)
ops_events audit log

Failure Chain

External step fails → retry within Edge Function (per-dependency policy)
All retries exhausted for this invocation → Edge Function calls:
raw_ops.fn_mark_proof_job_failed_v1() (increment attempts, log error)
Next pg_cron cycle picks up job again (still in retryable state)
After 5 total attempts → fn_mark_proof_job_failed_v1() sets state = 'dead_letter'
ops-alerter → Slack #ops-urgent (critical severity)
Manual resolution via fn_resolve_proof_job_manually_v1()

Monitoring Architecture (SOP-MON-01 compliant)

pg_cron (every 5 minutes)
pg_net (async HTTP)
run-monitor Edge Function
raw_ops.fn_check_proof_system_health_v2()
monitoring_runs (execution log)
    ↓ (if threshold breached)
ops-alerter Edge Function
Slack (#ops-alerts or #ops-urgent)

Database Schema

All tables in raw_ops schema. RLS enabled on all tables.

Core Tables

Table Purpose RLS Policy
batches Master batch records Service role only + public read on RELEASED batches
lab_results Lab test results per batch Service role only
formulations Formulation version tracking Service role only
proof_jobs Proof generation queue and state machine Service role only
batch_status_history Audit log for all batch status changes Service role only
proof_page_events Proof portal analytics Anon INSERT + SELECT (public portal)

proof_jobs Table

CREATE TABLE raw_ops.proof_jobs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  batch_id UUID NOT NULL REFERENCES raw_ops.batches(id),
  job_key TEXT NOT NULL UNIQUE,
  state TEXT NOT NULL DEFAULT 'queued',
  attempts INTEGER NOT NULL DEFAULT 0,
  max_attempts INTEGER NOT NULL DEFAULT 5,
  last_error TEXT,
  error_category TEXT,
  processing_duration_ms INTEGER,
  -- Step completion tracking (partial completion detection)
  qr_generated BOOLEAN NOT NULL DEFAULT false,
  qr_uploaded BOOLEAN NOT NULL DEFAULT false,
  label_generated BOOLEAN NOT NULL DEFAULT false,
  label_uploaded BOOLEAN NOT NULL DEFAULT false,
  email_sent BOOLEAN NOT NULL DEFAULT false,
  -- Timestamps
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  claimed_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  CONSTRAINT valid_state CHECK (state IN ('queued', 'claimed', 'done', 'failed', 'dead_letter', 'resolved_manual')),
  CONSTRAINT valid_attempts CHECK (attempts >= 0 AND attempts <= max_attempts)
);

COMMENT ON TABLE raw_ops.proof_jobs IS 'Proof generation state machine. Each row represents one proof generation job for a batch. Processed by process-proof-jobs Edge Function via pg_cron.';

Processing Status Values

Status Meaning Transition
queued Awaiting processing claimed
claimed Picked up by process-proof-jobs done or failed
done All steps completed successfully Terminal
failed Processing failed, eligible for retry claimed (on next cycle) or dead_letter
dead_letter 5 attempts exhausted, manual intervention required resolved_manual or → queued (reset)
resolved_manual Manually resolved via controlled function Terminal

formulations Table

CREATE TABLE raw_ops.formulations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  version TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  description TEXT,
  effective_from DATE NOT NULL,
  effective_until DATE,
  shopify_page_url TEXT NOT NULL,
  lab_certificate_pdf_url TEXT,
  lab_name TEXT,
  certificate_reference TEXT,
  analysis_date DATE,
  fediaf_verified BOOLEAN DEFAULT true,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE raw_ops.formulations IS 'Formulation versions with effective date ranges. Batches auto-link to the active formulation based on production date.';

Key Indexes

-- Proof job queue processing (critical for claim query performance)
CREATE INDEX idx_proof_jobs_claimable
  ON raw_ops.proof_jobs (created_at ASC)
  WHERE state IN ('queued', 'failed');

-- Proof job by batch (lookup during batch creation verification)
CREATE INDEX idx_proof_jobs_batch_id
  ON raw_ops.proof_jobs (batch_id);

-- Proof job dead letter monitoring
CREATE INDEX idx_proof_jobs_dead_letter
  ON raw_ops.proof_jobs (created_at DESC)
  WHERE state = 'dead_letter';

-- Batch code lookup (used during production and customer scans)
CREATE INDEX idx_batches_batch_code
  ON raw_ops.batches (batch_code);

-- Batch status filtering for public access and monitoring
CREATE INDEX idx_batches_status
  ON raw_ops.batches (status);

-- Lab result lookups by batch
CREATE INDEX idx_lab_results_batch_id
  ON raw_ops.lab_results (batch_id);

RLS Policies

-- All operational tables: RLS enabled, no policies = service role only, anon blocked
ALTER TABLE raw_ops.proof_jobs ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.formulations ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.batch_status_history ENABLE ROW LEVEL SECURITY;

-- Batches: public read access for RELEASED batches only (proof portal)
ALTER TABLE raw_ops.batches ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Allow public read on released batches"
  ON raw_ops.batches
  FOR SELECT
  TO anon
  USING (status = 'RELEASED');

-- Proof page events: public insert + select (analytics from proof portal)
ALTER TABLE raw_ops.proof_page_events ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Allow anon insert proof events"
  ON raw_ops.proof_page_events
  FOR INSERT
  TO anon
  WITH CHECK (true);

CREATE POLICY "Allow anon read proof events"
  ON raw_ops.proof_page_events
  FOR SELECT
  TO anon
  USING (true);

RLS posture: With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables. The two exceptions are batches (public read for RELEASED only) and proof_page_events (public insert/read for portal analytics).


QR Code System

There are TWO types of QR codes in the Protocol Raw system with different purposes.

QR Type Location Links To Generated
Proof Verification QR Product pouch labels proof.protocolraw.co.uk/batch/{public_batch_id} Per-batch, during proof job
Transition Guide QR In-box insert card portal.protocolraw.co.uk?view=transition Static, bulk printed

Proof Verification QR (On Pouches)

Batch-specific QR codes generated automatically during proof job processing.

Content: https://proof.protocolraw.co.uk/batch/{public_batch_id} Storage: proof-assets/qr-codes/{batch_code}.png Size on label: 25mm x 25mm (300 DPI) Generation: Internal library (qrcode Deno module) inside process-proof-jobs Edge Function. No external API dependency.

Transition Guide QR (On Insert Card)

Universal QR code, same for all boxes.

Content: https://portal.protocolraw.co.uk?view=transition Purpose: Onboarding. Personalised 10-day transition feeding plan. Generation: Static. Not batch-specific. Same QR on all cards.

Design Principle

The insert card educates and onboards. The pouch QR verifies. These are separate concerns with separate touchpoints.


Physical Insert Cards

Insert cards are universal static cards, bulk printed, with identical content for every box. The proof_insert_pdf_url column is deprecated.

Card Purpose

  • Onboarding new customers with transition guidance
  • Explaining the proof verification system
  • Driving portal engagement for personalised feeding plans

Card Design: Front Side (Transition)

Headline: "Your First 10 Days"

Content: - Visual transition structure (Day 1-2: 25%, Day 3-4: 50%, etc.) - QR code linking to portal.protocolraw.co.uk?view=transition - CTA: "Scan for your personalised feeding plan" - Safe handling reminder

Card Design: Back Side (Proof Education)

Headline: "Verified Safe, Batch by Batch"

Content: - Explains that every pouch has a QR code linking to lab results - Points customer to scan pouch QR (no separate proof QR on card) - CTA: "Scan any pouch to verify" - Brief explanation of independent lab testing

Card Specifications

Attribute Specification
Size TBD by designer (A7 or business card format recommended)
QR Codes ONE QR only (front side) linking to portal transition guide
Batch Code NONE. Card is universal (same card for all boxes)
Print Run Bulk print. Order in batches of 1,000+ (NOT generated per-batch)
Delivery Stocked at packing location, one card per box
Brand Guidelines Visual Identity Guide v2.4 (Forest Green, Cream, Espresso)

PostgreSQL Functions

fn_create_batch_v1()

Purpose: All batch creation business logic. Called by create-batch Edge Function.

Responsibilities: - Validate inputs (production_date, kg_produced) - Generate batch code (PR-YYMMDD-NNN format, next sequential number for date) - Look up active formulation for production date - Look up COGS per kg from config - Calculate expiry date (production + 12 months) - INSERT batch record with status QA_HOLD - Public batch ID generated by existing trigger (trigger_set_public_batch_id) - Generate proof URL - Log to ops_events - Return batch record as JSONB

CREATE OR REPLACE FUNCTION raw_ops.fn_create_batch_v1(
  p_production_date DATE,
  p_kg_produced NUMERIC
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
  v_seq INTEGER;
  v_formulation_id UUID;
  v_product_id UUID;
  v_cogs_per_kg NUMERIC;
  v_batch_id UUID;
  v_public_batch_id TEXT;
  v_proof_url TEXT;
  v_expiry_date DATE;
BEGIN
  -- Validate inputs
  IF p_production_date IS NULL THEN
    RAISE EXCEPTION 'production_date is required';
  END IF;
  IF p_kg_produced IS NULL OR p_kg_produced <= 0 THEN
    RAISE EXCEPTION 'kg_produced must be positive';
  END IF;

  -- Generate sequential batch code for date
  SELECT COALESCE(MAX(
    CAST(SUBSTRING(batch_code FROM '\d{3}$') AS INTEGER)
  ), 0) + 1
  INTO v_seq
  FROM raw_ops.batches
  WHERE batch_code LIKE 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-%';

  v_batch_code := 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-' || LPAD(v_seq::TEXT, 3, '0');

  -- Look up active formulation
  SELECT id INTO v_formulation_id
  FROM raw_ops.formulations
  WHERE effective_from <= p_production_date
    AND (effective_until IS NULL OR effective_until >= p_production_date)
  ORDER BY effective_from DESC
  LIMIT 1;

  IF v_formulation_id IS NULL THEN
    RAISE EXCEPTION 'No active formulation found for date %', p_production_date;
  END IF;

  -- Look up product (single SKU)
  SELECT id INTO v_product_id
  FROM raw_ops.products
  LIMIT 1;

  -- Look up COGS per kg from config
  SELECT value::NUMERIC INTO v_cogs_per_kg
  FROM raw_ops.config
  WHERE key = 'default_cogs_per_kg';

  -- Calculate expiry
  v_expiry_date := p_production_date + INTERVAL '12 months';

  -- Insert batch
  INSERT INTO raw_ops.batches (
    batch_code, produced_at, expiry_date, kg_produced,
    product_id, formulation_id, status, cogs_per_kg_gbp
  ) VALUES (
    v_batch_code, p_production_date, v_expiry_date, p_kg_produced,
    v_product_id, v_formulation_id, 'QA_HOLD', v_cogs_per_kg
  )
  RETURNING id, public_batch_id INTO v_batch_id, v_public_batch_id;

  -- Set proof URL (public_batch_id set by trigger_set_public_batch_id)
  v_proof_url := 'https://proof.protocolraw.co.uk/batch/' || v_public_batch_id;

  UPDATE raw_ops.batches
  SET proof_url = v_proof_url
  WHERE id = v_batch_id;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', v_batch_id, 'INFO', 'Batch created via Ops Portal',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'public_batch_id', v_public_batch_id,
      'kg_produced', p_kg_produced,
      'production_date', p_production_date,
      'formulation_id', v_formulation_id
    )
  );

  RETURN jsonb_build_object(
    'success', true,
    'batch_id', v_batch_id,
    'batch_code', v_batch_code,
    'public_batch_id', v_public_batch_id,
    'proof_url', v_proof_url,
    'expiry_date', v_expiry_date
  );

EXCEPTION WHEN OTHERS THEN
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', NULL, 'ERROR', 'Batch creation failed: ' || SQLERRM,
    jsonb_build_object('production_date', p_production_date, 'kg_produced', p_kg_produced)
  );
  RETURN jsonb_build_object('success', false, 'error', SQLERRM);
END;
$$;

fn_enqueue_proof_job()

Purpose: Trigger function. Creates a proof job record when a batch is inserted.

CREATE OR REPLACE FUNCTION raw_ops.fn_enqueue_proof_job()
RETURNS TRIGGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
  INSERT INTO raw_ops.proof_jobs (batch_id, job_key, state)
  VALUES (NEW.id, 'proof:batch:' || NEW.id::TEXT, 'queued');

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', NEW.id, 'INFO', 'Proof job enqueued',
    jsonb_build_object('batch_code', NEW.batch_code, 'batch_id', NEW.id)
  );

  RETURN NEW;
END;
$$;

CREATE TRIGGER trigger_enqueue_proof_job
  AFTER INSERT ON raw_ops.batches
  FOR EACH ROW
  EXECUTE FUNCTION raw_ops.fn_enqueue_proof_job();

fn_claim_proof_jobs_v1()

Purpose: Atomically claim up to N queued or failed proof jobs for processing. Returns the exact rows claimed, deterministically, via UPDATE ... RETURNING. Uses FOR UPDATE SKIP LOCKED for concurrent safety. No secondary query needed.

CREATE OR REPLACE FUNCTION raw_ops.fn_claim_proof_jobs_v1(
  p_limit INTEGER DEFAULT 10
)
RETURNS SETOF raw_ops.proof_jobs
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
  RETURN QUERY
  WITH claimable AS (
    SELECT id
    FROM raw_ops.proof_jobs
    WHERE state IN ('queued', 'failed')
      AND attempts < max_attempts
    ORDER BY
      CASE WHEN state = 'queued' THEN 0 ELSE 1 END,  -- queued first, retries second
      created_at ASC
    LIMIT p_limit
    FOR UPDATE SKIP LOCKED
  )
  UPDATE raw_ops.proof_jobs pj
  SET
    state = 'claimed',
    claimed_at = now(),
    updated_at = now()
  FROM claimable c
  WHERE pj.id = c.id
  RETURNING pj.*;
END;
$$;

Concurrency guarantee: If two workers call this function simultaneously, FOR UPDATE SKIP LOCKED ensures each worker claims a disjoint set of jobs. No duplicate processing is possible. The RETURNING pj.* clause returns exactly the rows that were updated in this transaction, not an approximation.

fn_set_batch_qr_assets_v1()

Purpose: Update a batch record with QR image URL and proof URL after QR generation. Called by process-proof-jobs Edge Function after successful QR upload. Ensures batch asset mutations happen in PostgreSQL, not in Edge Function code.

CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_qr_assets_v1(
  p_batch_id UUID,
  p_qr_image_url TEXT,
  p_proof_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
BEGIN
  UPDATE raw_ops.batches
  SET
    qr_image_url = p_qr_image_url,
    proof_url = p_proof_url,
    updated_at = now()
  WHERE id = p_batch_id
  RETURNING batch_code INTO v_batch_code;

  IF v_batch_code IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', p_batch_id, 'INFO', 'QR assets set for batch ' || v_batch_code,
    jsonb_build_object(
      'batch_code', v_batch_code,
      'qr_image_url', p_qr_image_url,
      'proof_url', p_proof_url
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_set_batch_label_asset_v1()

Purpose: Update a batch record with label PDF URL after label generation. Called by process-proof-jobs Edge Function after successful label upload.

CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_label_asset_v1(
  p_batch_id UUID,
  p_label_pdf_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
BEGIN
  UPDATE raw_ops.batches
  SET
    label_pdf_url = p_label_pdf_url,
    updated_at = now()
  WHERE id = p_batch_id
  RETURNING batch_code INTO v_batch_code;

  IF v_batch_code IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', p_batch_id, 'INFO', 'Label PDF set for batch ' || v_batch_code,
    jsonb_build_object('batch_code', v_batch_code, 'label_pdf_url', p_label_pdf_url)
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_finalize_proof_job_v1()

Purpose: Mark a proof job as successfully completed. Update the batch record with generated asset URLs.

CREATE OR REPLACE FUNCTION raw_ops.fn_finalize_proof_job_v1(
  p_job_id UUID,
  p_qr_image_url TEXT,
  p_label_pdf_url TEXT,
  p_processing_duration_ms INTEGER
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_id UUID;
  v_batch_code TEXT;
BEGIN
  -- Update proof job
  UPDATE raw_ops.proof_jobs
  SET
    state = 'done',
    completed_at = now(),
    updated_at = now(),
    processing_duration_ms = p_processing_duration_ms,
    qr_generated = true,
    qr_uploaded = true,
    label_generated = true,
    label_uploaded = true,
    email_sent = true
  WHERE id = p_job_id AND state = 'claimed'
  RETURNING batch_id INTO v_batch_id;

  IF v_batch_id IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found or not in claimed state');
  END IF;

  -- Update batch with asset URLs
  UPDATE raw_ops.batches
  SET
    qr_image_url = p_qr_image_url,
    label_pdf_url = p_label_pdf_url,
    updated_at = now()
  WHERE id = v_batch_id
  RETURNING batch_code INTO v_batch_code;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job completed successfully',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'duration_ms', p_processing_duration_ms,
      'qr_image_url', p_qr_image_url,
      'label_pdf_url', p_label_pdf_url
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_mark_proof_job_failed_v1()

Purpose: Record a proof job failure. If max attempts reached, move to dead letter queue and trigger critical alert.

CREATE OR REPLACE FUNCTION raw_ops.fn_mark_proof_job_failed_v1(
  p_job_id UUID,
  p_error TEXT,
  p_error_category TEXT DEFAULT 'unknown',
  p_step_progress JSONB DEFAULT '{}'::jsonb
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_new_state TEXT;
  v_attempts INTEGER;
  v_max_attempts INTEGER;
  v_batch_id UUID;
  v_batch_code TEXT;
  v_alert_needed BOOLEAN := false;
BEGIN
  -- Increment attempts and determine new state
  UPDATE raw_ops.proof_jobs
  SET
    attempts = attempts + 1,
    last_error = p_error,
    error_category = p_error_category,
    updated_at = now(),
    -- Preserve step progress so we know what completed before failure
    qr_generated = COALESCE((p_step_progress->>'qr_generated')::boolean, qr_generated),
    qr_uploaded = COALESCE((p_step_progress->>'qr_uploaded')::boolean, qr_uploaded),
    label_generated = COALESCE((p_step_progress->>'label_generated')::boolean, label_generated),
    label_uploaded = COALESCE((p_step_progress->>'label_uploaded')::boolean, label_uploaded),
    email_sent = COALESCE((p_step_progress->>'email_sent')::boolean, email_sent)
  WHERE id = p_job_id
  RETURNING attempts, max_attempts, batch_id INTO v_attempts, v_max_attempts, v_batch_id;

  IF v_batch_id IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found');
  END IF;

  -- Determine state based on attempt count
  IF v_attempts >= v_max_attempts THEN
    v_new_state := 'dead_letter';
    v_alert_needed := true;
  ELSE
    v_new_state := 'failed';
  END IF;

  UPDATE raw_ops.proof_jobs
  SET state = v_new_state
  WHERE id = p_job_id;

  -- Get batch code for logging
  SELECT batch_code INTO v_batch_code
  FROM raw_ops.batches WHERE id = v_batch_id;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id,
    CASE WHEN v_new_state = 'dead_letter' THEN 'ERROR' ELSE 'EXCEPTION' END,
    CASE WHEN v_new_state = 'dead_letter'
      THEN 'Proof job moved to dead letter queue after ' || v_attempts || ' attempts'
      ELSE 'Proof job failed (attempt ' || v_attempts || '/' || v_max_attempts || ')'
    END,
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'error', p_error,
      'error_category', p_error_category,
      'attempts', v_attempts,
      'max_attempts', v_max_attempts,
      'step_progress', p_step_progress
    )
  );

  RETURN jsonb_build_object(
    'success', true,
    'new_state', v_new_state,
    'attempts', v_attempts,
    'max_attempts', v_max_attempts,
    'alert_needed', v_alert_needed,
    'alert_config', CASE
      WHEN NOT v_alert_needed THEN NULL
      ELSE jsonb_build_object(
        'channel', 'ops-urgent',
        'severity', 'critical',
        'title', 'Proof Job Dead Letter: ' || v_batch_code,
        'message', 'Proof generation failed after ' || v_attempts || ' attempts. Manual intervention required.',
        'fields', jsonb_build_array(
          jsonb_build_object('label', 'Batch', 'value', v_batch_code, 'inline', true),
          jsonb_build_object('label', 'Attempts', 'value', v_attempts::TEXT, 'inline', true),
          jsonb_build_object('label', 'Error', 'value', LEFT(p_error, 200), 'inline', false),
          jsonb_build_object('label', 'Category', 'value', p_error_category, 'inline', true)
        )
      )
    END
  );
END;
$$;

fn_resolve_proof_job_manually_v1()

Purpose: Controlled resolution of dead-letter proof jobs with audit trail. Verifies that all expected batch assets exist before allowing resolution. Direct UPDATE statements are prohibited.

CREATE OR REPLACE FUNCTION raw_ops.fn_resolve_proof_job_manually_v1(
  p_job_id UUID,
  p_resolution_note TEXT,
  p_skip_asset_check BOOLEAN DEFAULT false
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_old_state TEXT;
  v_batch_id UUID;
  v_batch_code TEXT;
  v_qr_image_url TEXT;
  v_label_pdf_url TEXT;
  v_proof_url TEXT;
  v_missing_assets TEXT[];
BEGIN
  -- Only accept dead_letter jobs
  SELECT state, batch_id INTO v_old_state, v_batch_id
  FROM raw_ops.proof_jobs WHERE id = p_job_id;

  IF v_old_state IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found');
  END IF;

  IF v_old_state != 'dead_letter' THEN
    RETURN jsonb_build_object('success', false, 'error',
      'Only dead_letter jobs can be manually resolved. Current state: ' || v_old_state);
  END IF;

  -- Verify batch assets exist (unless explicitly skipped)
  SELECT batch_code, qr_image_url, label_pdf_url, proof_url
  INTO v_batch_code, v_qr_image_url, v_label_pdf_url, v_proof_url
  FROM raw_ops.batches WHERE id = v_batch_id;

  IF NOT p_skip_asset_check THEN
    v_missing_assets := ARRAY[]::TEXT[];
    IF v_qr_image_url IS NULL THEN v_missing_assets := v_missing_assets || 'qr_image_url'; END IF;
    IF v_label_pdf_url IS NULL THEN v_missing_assets := v_missing_assets || 'label_pdf_url'; END IF;
    IF v_proof_url IS NULL THEN v_missing_assets := v_missing_assets || 'proof_url'; END IF;

    IF array_length(v_missing_assets, 1) > 0 THEN
      RETURN jsonb_build_object(
        'success', false,
        'error', 'Cannot resolve: batch is missing assets. Use p_skip_asset_check := true to override.',
        'missing_assets', to_jsonb(v_missing_assets),
        'batch_code', v_batch_code
      );
    END IF;
  END IF;

  -- Resolve
  UPDATE raw_ops.proof_jobs
  SET state = 'resolved_manual', updated_at = now()
  WHERE id = p_job_id;

  -- Audit trail
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job manually resolved',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'previous_state', v_old_state,
      'resolution_note', p_resolution_note,
      'asset_check_skipped', p_skip_asset_check
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code, 'resolved', true);
END;
$$;

Asset check behaviour: By default, the function refuses to resolve a job if the batch is missing qr_image_url, label_pdf_url, or proof_url. This prevents false completions where the operator marks a job done but the batch still lacks proof assets. If the operator has genuinely completed the steps outside the system (e.g., manually uploaded files), they can pass p_skip_asset_check := true with an explanation in p_resolution_note.


Edge Functions

create-batch

Purpose: Thin wrapper. Receives Ops Portal request, calls fn_create_batch_v1(), returns result.

Endpoint: POST /functions/v1/create-batch Auth: Cloudflare Access is the authentication boundary. All requests to ops.protocolraw.co.uk must pass Google OAuth via Cloudflare Access before reaching any backend. The Edge Function itself does not perform caller identity verification. It trusts that any request reaching it has been authenticated by Cloudflare Access. The function uses the service_role key to call PostgreSQL, which is standard for backend functions. JWT Verification: Enabled (Supabase JWT from Ops Portal session)

Flow: 1. Parse request body (production_date, kg_produced) 2. Call raw_ops.fn_create_batch_v1() via service_role client 3. Return JSONB result to caller

This function does NOT orchestrate proof generation, label generation, or email sending. Those are handled asynchronously by the proof job queue.

process-proof-jobs

Purpose: Thin wrapper for external I/O. Claims queued proof jobs, executes external steps (QR, PDF, email), finalises or fails each job.

Endpoint: POST /functions/v1/process-proof-jobs Auth: Internal only. Validates X-Worker-Secret header against PROOF_WORKER_SECRET env var. Rejects requests without a valid secret. JWT Verification: Disabled (internal calls only, compensated by shared secret)

Threat model: This function is reachable at a public URL with JWT disabled. Without compensating controls, any caller who discovers the project URL and function name could invoke it. The shared secret header (X-Worker-Secret) prevents unauthorised invocation. The function is also idempotent and non-destructive: it claims already-queued jobs and generates assets for batches that already exist. Worst-case unauthorised invocation would generate proof assets that would be generated anyway on the next legitimate cycle.

Validation logic (first lines of Edge Function):

const workerSecret = Deno.env.get("PROOF_WORKER_SECRET");
const headerSecret = req.headers.get("X-Worker-Secret");
if (!workerSecret || headerSecret !== workerSecret) {
  return new Response(JSON.stringify({ error: "Unauthorized" }), { status: 401 });
}

Flow per invocation:

1. Validate X-Worker-Secret header
2. Call fn_claim_proof_jobs_v1(10) → get up to 10 claimed job rows
3. For each claimed job:
   a. Generate QR code (internal qrcode library)
      - Content: proof URL from batch record
      - Output: PNG buffer, 300x300px
      - Timeout: 5 seconds
      - On failure: record step_progress, jump to fail handler
   b. Upload QR to Supabase Storage
      - Path: proof-assets/qr-codes/{batch_code}.png
      - Timeout: 10 seconds
      - Retries: 2 with backoff (1s, 3s)
   c. Call fn_set_batch_qr_assets_v1(batch_id, qr_url, proof_url)
      - Updates batch record via PostgreSQL (not direct table write)
   d. Generate label PDF (Browserless API)
      - Render HTML template with batch data + QR image
      - Size: 100mm x 150mm, 300 DPI
      - Timeout: 15 seconds
      - Retries: 3 with exponential backoff (1s, 3s, 9s)
   e. Upload label PDF to Supabase Storage
      - Path: proof-assets/batch-labels/{batch_code}.pdf
      - Timeout: 10 seconds
      - Retries: 2 with backoff (1s, 3s)
   f. Call fn_set_batch_label_asset_v1(batch_id, label_url)
      - Updates batch record via PostgreSQL (not direct table write)
   g. Send co-packer email (Resend API)
      - Timeout: 10 seconds
      - Retries: 3 with exponential backoff (1s, 3s, 9s)
   h. Call fn_finalize_proof_job_v1() with asset URLs and duration
4. On any step failure after retries exhausted:
   - Call fn_mark_proof_job_failed_v1() with error, category, and step_progress
   - If alert_config returned (dead letter), call ops-alerter
   - Continue to next job (do not abort batch)

DB/Edge Function boundary: The Edge Function performs external I/O only (QR library, Storage upload, Browserless HTTP, Resend HTTP). All database mutations go through named PostgreSQL functions. The Edge Function never directly UPDATEs raw_ops.batches or raw_ops.proof_jobs.

Label Specifications

Generated by process-proof-jobs using Browserless API for HTML-to-PDF rendering.

Label size: 100mm x 150mm (standard for 500g pouches) Format: PDF, 300 DPI, print-ready Template: HTML/CSS rendered by Browserless

Label content (top to bottom): 1. Brand header: "PROTOCOL RAW" (Montserrat Bold, 14pt), tagline, proof QR (25mm x 25mm, top right) 2. Product information: "PROTOCOL RAW COMPLETE", description, net weight 3. Batch traceability: Batch code (prominent), production date, best before date 4. Ingredients list: Full list in descending order by weight 5. Footer: "Made in the UK", company details, "Verified safe, batch by batch"

Co-packer Email

Sent by process-proof-jobs via Resend API.

From: production@protocolraw.co.uk To: Co-packer email (from COPACKER_EMAIL secret) Subject: Protocol Raw Batch {batch_code} - Label Artwork Body: Batch details (code, date, kg, product, formulation) + download link for label PDF + printing instructions Delivery: Resend API with retry logic (see Retry Policy)

proof-page

Purpose: Serves public proof pages. Customer-facing.

Endpoint: GET /functions/v1/proof-page/* Auth: Public (unauthenticated). No login required to view proof pages. JWT Verification: Disabled (public endpoint) Domain: proof.protocolraw.co.uk (Cloudflare DNS → Supabase Edge Function)

Data access model: The proof-page Edge Function uses the service_role key to query raw_ops.batches (bypassing RLS). This is necessary because the function renders different HTML depending on batch status:

Batch Status Page Rendered
RELEASED Full results: lab data, share buttons, formulation info
QA_HOLD Status message: "This batch is undergoing safety testing. Results will be published once verified."
REJECTED Status message: "This batch did not pass our safety requirements."
Not found 404: "Batch not found."

The anon RLS policy (Allow public read on released batches) applies to the separate search API endpoint (/api/search), which uses the anon key and correctly returns only RELEASED batch metadata. The server-rendered proof pages use service_role because they need to distinguish between QA_HOLD, REJECTED, and non-existent batches to render appropriate status pages.

Routes:

Route Method Purpose
/ GET Portal homepage with search box and latest released batch card
/batch/{public_batch_id} GET Real batch proof page (RELEASED / QA_HOLD / REJECTED / 404)
/batch/sample GET Sample batch page with hardcoded data; canonical pre-purchase URL
/sample GET Legacy alias for /batch/sample
/batch/PR-00LATEST GET 302 redirect to most recent released batch via fn_get_latest_released_batch_v1(); falls back to /batch/sample. Reached via Cloudflare Worker proof-latest-redirect (Supabase custom-domain gateway blocks /batch/latest)
/api/search GET Batch lookup by code; returns {found, status, public_batch_id} for RELEASED only
/api/event POST Client-side share/click event ingestion

See SOP-PROOF-01 for complete route documentation, page design, and caching strategy.


Retry Policy

Retries operate on two tiers:

Tier 1 — Within a single Edge Function invocation. The process-proof-jobs worker retries transient HTTP failures (Storage uploads, Browserless, Resend) using a simple retry loop with exponential backoff before giving up on the current invocation. This handles brief network glitches without consuming a job-level attempt.

Tier 2 — Across pg_cron cycles. If all Tier 1 retries for a step are exhausted, fn_mark_proof_job_failed_v1() increments attempts and sets state = 'failed'. The next pg_cron cycle (30 seconds later) reclaims the job and tries again. After max_attempts (5) Tier 2 failures, the job moves to dead_letter.

Retry helper (used by all HTTP steps in Edge Function):

async function withRetry(
  fn: () => Promise<Response>,
  maxRetries: number,
  timeoutMs: number,
  backoffMs: number[] // e.g. [1000, 3000, 9000]
): Promise<Response> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timer = setTimeout(() => controller.abort(), timeoutMs);
      try {
        const res = await fn();
        if (res.ok) return res;
        if (attempt < maxRetries) {
          await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
          continue;
        }
        throw new Error(`HTTP ${res.status}: ${await res.text()}`);
      } finally {
        clearTimeout(timer);
      }
    } catch (err) {
      if (attempt >= maxRetries) throw err;
      await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
    }
  }
  throw new Error("Retries exhausted");
}

Per-dependency configuration:

Dependency Timeout Tier 1 Retries Backoff Tier 2 Max Attempts
QR generation (internal) 5s 0 (library call, no HTTP) N/A 5
Supabase Storage upload 10s 2 1s, 3s 5
Browserless PDF 15s 3 1s, 3s, 9s 5
Resend email 10s 3 1s, 3s, 9s 5

Error categories:

Category Meaning
qr_generation QR code creation failed (extremely rare)
storage_upload Supabase Storage write failed
pdf_generation Browserless rendering failed
email_delivery Resend API failed
db_error Database operation failed
unknown Unclassified error

Total time to dead letter: 5 Tier 2 attempts x 30-second pg_cron cycles = ~2.5 minutes minimum. With Tier 1 backoff within each cycle, typically 5-10 minutes for persistent failures.


pg_cron Jobs

process-proof-jobs

SELECT cron.schedule(
  'process-proof-jobs',
  '30 seconds',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/process-proof-jobs',
    headers := jsonb_build_object(
      'Content-Type', 'application/json',
      'apikey', current_setting('app.settings.anon_key'),
      'X-Worker-Secret', current_setting('app.settings.proof_worker_secret')
    ),
    body := '{}'::jsonb
  );
  $$
);

Note: PROOF_WORKER_SECRET must be set both as a Supabase Edge Function secret and as a Supabase database setting (app.settings.proof_worker_secret) so that pg_cron can pass it in the header. Generate a random 64-character hex string: openssl rand -hex 32.

monitor-proof-system-health

SELECT cron.schedule(
  'monitor-proof-system-health',
  '*/5 * * * *',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/run-monitor',
    headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
    body := '{"check": "proof_system_health"}'::jsonb
  );
  $$
);

Monitoring

fn_check_proof_system_health_v2()

Purpose: Comprehensive proof system health check. Detects stuck jobs, dead letters, partial completions, and processing degradation.

Registration: Add proof_system_health: "fn_check_proof_system_health_v2" to run-monitor Edge Function dispatch map.

CREATE OR REPLACE FUNCTION raw_ops.fn_check_proof_system_health_v2()
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_run_id uuid;
  v_start_time timestamptz;
  v_duration_ms integer;
  v_alert_needed boolean := false;
  v_alert_channel text := 'ops-alerts';
  v_alert_severity text := 'warning';
  v_alert_title text;
  v_alert_message text;
  v_alert_fields jsonb := '[]'::jsonb;
  v_stuck_claimed integer;
  v_dead_letter_count integer;
  v_failed_count integer;
  v_partial_completion_count integer;
  v_avg_duration_ms numeric;
  v_queue_depth integer;
  v_completed_24h integer;
BEGIN
  v_start_time := clock_timestamp();

  INSERT INTO raw_ops.monitoring_runs (check_name, status)
  VALUES ('proof_system_health', 'running')
  RETURNING id INTO v_run_id;

  -- 1. Stuck claimed jobs (claimed > 5 minutes ago, not finalised)
  SELECT COUNT(*) INTO v_stuck_claimed
  FROM raw_ops.proof_jobs
  WHERE state = 'claimed'
    AND claimed_at < now() - INTERVAL '5 minutes';

  -- 2. Dead letter jobs (unresolved)
  SELECT COUNT(*) INTO v_dead_letter_count
  FROM raw_ops.proof_jobs
  WHERE state = 'dead_letter';

  -- 3. Failed jobs awaiting retry
  SELECT COUNT(*) INTO v_failed_count
  FROM raw_ops.proof_jobs
  WHERE state = 'failed';

  -- 4. Partial completions (some steps done, job not complete)
  SELECT COUNT(*) INTO v_partial_completion_count
  FROM raw_ops.proof_jobs
  WHERE state IN ('failed', 'dead_letter')
    AND (qr_generated = true OR label_generated = true)
    AND (qr_uploaded = false OR label_uploaded = false OR email_sent = false);

  -- 5. Average processing duration (last 24h)
  SELECT COALESCE(AVG(processing_duration_ms), 0) INTO v_avg_duration_ms
  FROM raw_ops.proof_jobs
  WHERE state = 'done'
    AND completed_at > now() - INTERVAL '24 hours';

  -- 6. Current queue depth
  SELECT COUNT(*) INTO v_queue_depth
  FROM raw_ops.proof_jobs
  WHERE state IN ('queued', 'failed')
    AND attempts < max_attempts;

  -- 7. Completed in last 24h
  SELECT COUNT(*) INTO v_completed_24h
  FROM raw_ops.proof_jobs
  WHERE state = 'done'
    AND completed_at > now() - INTERVAL '24 hours';

  -- Determine alert level (priority order)
  IF v_dead_letter_count > 0 THEN
    v_alert_needed := true;
    v_alert_channel := 'ops-urgent';
    v_alert_severity := 'critical';
    v_alert_title := 'Proof System: Dead Letter Jobs';
    v_alert_message := v_dead_letter_count || ' proof job(s) in dead letter queue. Manual intervention required.';
  ELSIF v_stuck_claimed > 0 THEN
    v_alert_needed := true;
    v_alert_channel := 'ops-urgent';
    v_alert_severity := 'critical';
    v_alert_title := 'Proof System: Stuck Jobs';
    v_alert_message := v_stuck_claimed || ' proof job(s) stuck in claimed state for >5 minutes.';
  ELSIF v_partial_completion_count > 0 THEN
    v_alert_needed := true;
    v_alert_title := 'Proof System: Partial Completions';
    v_alert_message := v_partial_completion_count || ' proof job(s) with partial step completion.';
  ELSIF v_failed_count > 3 THEN
    v_alert_needed := true;
    v_alert_title := 'Proof System: Elevated Failures';
    v_alert_message := v_failed_count || ' proof job(s) in failed state awaiting retry.';
  END IF;

  IF v_alert_needed THEN
    v_alert_fields := jsonb_build_array(
      jsonb_build_object('label', 'Dead Letter', 'value', v_dead_letter_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Stuck', 'value', v_stuck_claimed::TEXT, 'inline', true),
      jsonb_build_object('label', 'Failed', 'value', v_failed_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Partial', 'value', v_partial_completion_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Queue', 'value', v_queue_depth::TEXT, 'inline', true),
      jsonb_build_object('label', 'Avg Duration', 'value', ROUND(v_avg_duration_ms)::TEXT || 'ms', 'inline', true)
    );
  END IF;

  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;

  UPDATE raw_ops.monitoring_runs
  SET
    status = 'success',
    duration_ms = v_duration_ms,
    alert_triggered = v_alert_needed,
    result_summary = 'Queue: ' || v_queue_depth || ', Done 24h: ' || v_completed_24h ||
                     ', Failed: ' || v_failed_count || ', DLQ: ' || v_dead_letter_count,
    result_json = jsonb_build_object(
      'stuck_claimed', v_stuck_claimed,
      'dead_letter_count', v_dead_letter_count,
      'failed_count', v_failed_count,
      'partial_completion_count', v_partial_completion_count,
      'avg_duration_ms', ROUND(v_avg_duration_ms),
      'queue_depth', v_queue_depth,
      'completed_24h', v_completed_24h
    )
  WHERE id = v_run_id;

  RETURN jsonb_build_object(
    'success', true,
    'run_id', v_run_id,
    'check_name', 'proof_system_health',
    'duration_ms', v_duration_ms,
    'alert_config', CASE
      WHEN NOT v_alert_needed THEN NULL
      ELSE jsonb_build_object(
        'channel', v_alert_channel,
        'severity', v_alert_severity,
        'title', v_alert_title,
        'message', v_alert_message,
        'fields', v_alert_fields
      )
    END
  );

EXCEPTION WHEN OTHERS THEN
  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;
  UPDATE raw_ops.monitoring_runs
  SET status = 'error', duration_ms = v_duration_ms, error_message = SQLERRM
  WHERE id = v_run_id;
  RETURN jsonb_build_object(
    'success', false, 'run_id', v_run_id,
    'check_name', 'proof_system_health', 'error', SQLERRM, 'duration_ms', v_duration_ms
  );
END;
$$;

Alert Thresholds

Condition Severity Channel Response
Dead letter jobs > 0 Critical #ops-urgent Immediate. Proof generation blocked for batch.
Stuck claimed jobs > 0 (>5 min) Critical #ops-urgent Edge Function may be hanging. Check logs.
Partial completion > 0 Warning #ops-alerts Some steps OK, others failed. Verify assets.
Failed jobs > 3 Warning #ops-alerts Elevated retries. Check external service health.
Avg duration > 60,000ms Warning #ops-alerts Degradation. Check Browserless/Resend latency.

Metabase Dashboard: Proof System Health

Collection: "SOP-PROOF-00 - Operations"

Saved Questions:

  1. Job Queue Status (real-time): state distribution, last 7 days
  2. Processing Duration Trend (7-day): avg/max ms per day, jobs completed
  3. Dead Letter and Failure History (30-day): DLQ, retrying, resolved counts by date
  4. Partial Completion Detail (current): jobs with mixed step completion flags
  5. Batch Creation Volume (12-week): weekly counts by status (RELEASED, QA_HOLD, REJECTED)

Customer Access Methods

1. Pouch QR Code Scan (Primary)

Customer scans QR on any pouch label. Mobile browser opens https://proof.protocolraw.co.uk/batch/{public_batch_id}. Shows full lab results (if RELEASED) or testing-in-progress message (if QA_HOLD).

2. Insert Card QR Scan (Onboarding)

Customer scans QR on insert card. Opens https://portal.protocolraw.co.uk?view=transition. Shows personalised 10-day transition feeding plan.

Included in delivery SMS, order confirmation email, and customer portal.

4. Manual Batch Lookup

Search at proof.protocolraw.co.uk. See SOP-PROOF-01.

Timeline

Phase What Happens
Day 0 (Batch Creation) QR code, label PDF, co-packer email. Batch in QA_HOLD.
Days 0-3 Product manufactured with printed labels. Samples at lab.
Day 3-5 Lab results arrive. SOP-LAB-01 releases batch.
Day 3-5+ Pouch QR links to live proof page with results.

Storage

Buckets

Path Contents Access
proof-assets/qr-codes/ QR code PNGs Public read
proof-assets/batch-labels/ Pouch label PDFs Authenticated only
proof-assets/lab-reports/ Lab certificate PDFs (SOP-LAB-01) Authenticated only

Label PDF access: Labels contain production details. Authenticated access only to prevent competitive intelligence exposure at scale. Co-packer receives label via email link. Ops team accesses via portal.

Storage Projections (Annual at Scale)

Asset Per Item Annual Volume Total
QR codes 10 KB ~25,000 250 MB
Label PDFs 500 KB ~25,000 12.5 GB
Lab reports 2 MB ~25,000 50 GB
Total ~63 GB/year

Required Secrets

Secret Purpose Used By
PROOF_WORKER_SECRET Shared secret for internal worker auth process-proof-jobs (validates), pg_cron (sends via header)
BROWSERLESS_API_KEY Label PDF generation process-proof-jobs
RESEND_API_KEY Co-packer email delivery process-proof-jobs
COPACKER_EMAIL Co-packer email address process-proof-jobs
SLACK_OPS_ALERTS_WEBHOOK Standard alerts ops-alerter
SLACK_OPS_URGENT_WEBHOOK Critical alerts ops-alerter

PROOF_WORKER_SECRET setup: Must be configured in two places: (1) Supabase Edge Function secrets, and (2) Supabase database settings as app.settings.proof_worker_secret so pg_cron can read it. Generate with openssl rand -hex 32.

Rotation: Review quarterly. Rotate immediately if compromised. Zero-downtime rotation via Supabase secrets update + redeploy.


Batch Code System

Dual ID System

Internal: PR-YYMMDD-NNN (sequential, for ops/warehouse/co-packer) Public: PR-XXXXXXXX (SHA-256 hash, for QR codes/proof URLs/customers)

Public ID protects competitive intelligence (no date or volume information). Generated by trigger_set_public_batch_id on batch INSERT.


Operational Procedures

Creating a Batch

When: Co-packer confirms production. Where: ops.protocolraw.co.uk. Time: ~30 seconds.

  1. Enter production date + kg produced
  2. Click "Create Batch"
  3. Confirm: batch code, proof URL in response
  4. Within 60 seconds: QR, label, email should complete (check proof_jobs)

Weekly Review (10 min)

  1. Check Metabase dashboard
  2. Review any dead letter or failed jobs
  3. Scan 2-3 random QR codes to verify proof pages

Monthly Review (15 min)

  1. Review weekly trends
  2. Verify label print quality with co-packer
  3. Check storage costs (~$1-2/month)
  4. Confirm SSL auto-renewal (Cloudflare)

Troubleshooting

Batch Creation Fails

Check create-batch Edge Function logs. Check ops_events for errors. Common: no active formulation, no product record, invalid date.

Proof Job Stuck

SELECT id, state, attempts, claimed_at, last_error, qr_generated, label_generated, email_sent
FROM raw_ops.proof_jobs
WHERE state IN ('queued', 'claimed') AND created_at < now() - INTERVAL '5 minutes';

If claimed >10 min: reset to queued. If queued with 0 attempts: check pg_cron job is active.

Partial Completion (Label Missing)

Check Browserless API key. Check HTML template. Reset job for retry:

UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE batch_id = (SELECT id FROM raw_ops.batches WHERE batch_code = 'PR-YYMMDD-NNN');

Dead Letter Job

Check error details. Fix root cause. Reset for retry or resolve manually:

-- Reset for retry
UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE id = 'job-uuid';

-- Or resolve manually
SELECT raw_ops.fn_resolve_proof_job_manually_v1('job-uuid', 'Resolution note here');

Proof Page Not Found

Check batch status. QA_HOLD = expected (testing). REJECTED = expected (failed). Verify RLS policy exists.


Compliance and Audit Trail

  • Traceability: Every pouch traceable to batch via QR
  • Transparency: Lab results publicly accessible after release
  • Immutability: Batch data changes only via defined functions
  • Audit Trail: All changes logged to batch_status_history and ops_events
  • Recall: See SOP-LAB-01 for recall procedure

Data Retention

Data Retention Reason
Batches Indefinite Regulatory
Lab Results Indefinite Regulatory
QR Codes Indefinite Customer access
Label PDFs 2 years Operational
Proof Jobs 90 days Monitoring
ops_events Indefinite Audit trail

Scaling

Phase Batches/Day Performance Action
A (0-1K customers) 1-3 <60s per job Monitor metrics
B (1-10K) 5-15 <60s per job No changes expected
C (10-100K) 30-70 Parallel via SKIP LOCKED Bulk creation UI, monitor Browserless concurrency
100K+ 70+ Review needed Upgrade Supabase, read replicas

No claims of readiness at scale are made until validated with production data.


All Objects

Object Type Purpose
raw_ops.proof_jobs Table State machine
raw_ops.formulations Table Formulation versions
raw_ops.proof_page_events Table Portal analytics
raw_ops.fn_create_batch_v1() Function Batch creation logic
raw_ops.fn_enqueue_proof_job() Trigger function Enqueue on INSERT
raw_ops.fn_claim_proof_jobs_v1() Function Atomic claim (SKIP LOCKED)
raw_ops.fn_set_batch_qr_assets_v1() Function Set QR + proof URL on batch
raw_ops.fn_set_batch_label_asset_v1() Function Set label PDF URL on batch
raw_ops.fn_finalize_proof_job_v1() Function Mark done, update batch
raw_ops.fn_mark_proof_job_failed_v1() Function Failure + DLQ
raw_ops.fn_resolve_proof_job_manually_v1() Function DLQ resolution (with asset check)
raw_ops.fn_check_proof_system_health_v2() Function SOP-MON-01 monitor
trigger_enqueue_proof_job Trigger AFTER INSERT on batches
trigger_set_public_batch_id Trigger Existing, unchanged
create-batch Edge Function Thin wrapper
process-proof-jobs Edge Function Queue processor (shared secret auth)
proof-page Edge Function Public pages (service_role for status rendering)
process-proof-jobs pg_cron job Every 30 seconds
monitor-proof-system-health pg_cron job Every 5 minutes
idx_proof_jobs_claimable Index Queue performance
idx_proof_jobs_batch_id Index Batch lookup
idx_proof_jobs_dead_letter Index DLQ monitoring
Allow public read on released batches RLS Policy Search API access
PROOF_WORKER_SECRET Secret Internal worker auth

Contact and Escalation

Operations Owner: Anton (Founder) Technical Owner: Anton (Systems Architect)

Escalation: 1. Check Slack #ops-alerts and #ops-urgent 2. Check Metabase dashboard 3. Review Edge Function logs 4. Query proof_jobs for diagnostics 5. Check external service status (Resend, Browserless, Cloudflare) 6. If unresolved after 30 min: pause batch creation


Version History

Version Date Changes Author
SOP-PROOF-00 v1.0 2026-03-20 Complete rewrite replacing SOP-0R v4.2. Architecture rebuilt to Supabase-native pattern. Named execution chain, state machine, retry/DLQ, SOP-MON-01 monitoring, partial completion detection, RLS policies, auth model, QR internalised, honest scaling claims. Protocol Raw Operations

Superseded: SOP-0R v4.2 (2026-01-21) — fully replaced.


End of SOP-PROOF-00 v1.0

Last reviewed: 2026-03-20 Next review: 2026-06-20 System status: ✅ Production Ready

Protocol Raw — Verified safe, batch by batch