SOP-PROOF-00: Proof-of-Safety System v1.0¶

End-to-end proof generation, label production, and public verification for every batch

Document ID: SOP-PROOF-00-v1.0 Version: 1.0 Status: ✅ Production Ready Last Updated: 2026-03-20 Owner: Anton (Founder) Replaces: SOP-0R v4.2 Review Date: 2026-06-20

Key Changes from SOP-0R v4.2¶

✅ Complete architecture rewrite to mandated Supabase-native pattern (pg_cron → pg_net → Edge Function → PostgreSQL function → ops-alerter)
✅ "Background worker" eliminated — replaced with named execution chain through process-proof-jobs Edge Function and PostgreSQL state machine
✅ Proof job state machine fully specified — fn_create_batch_v1(), fn_claim_proof_jobs_v1(), fn_finalize_proof_job_v1(), fn_mark_proof_job_failed_v1()
✅ Queue-claiming mechanics defined — FOR UPDATE SKIP LOCKED with LIMIT 10 for concurrent-safe, bounded processing
✅ Dead-letter queue — 5 attempts max, then dead_letter state with critical Slack alert
✅ Retry policy for all external dependencies — QR generation (internal, no external API), Browserless (3 attempts, 15s timeout), Resend (3 attempts, 10s timeout)
✅ QR generation internalised — qrcode Deno library replaces third-party api.qrserver.com dependency
✅ Partial completion detection — monitoring catches batches where some steps succeeded but others failed
✅ SOP-MON-01 integration — fn_check_proof_system_health_v2() monitor on 5-minute schedule
✅ RLS policies specified in full — not just named, but defined with CREATE POLICY statements
✅ Auth model documented for all operational Edge Functions
✅ ops_events logging at every state transition
✅ Honest operational claims — removed unsupported "zero manual intervention" assertions
❌ Removed "Production Ready for 100,000+ Customers" pending real-world validation at scale

Purpose¶

This SOP defines the complete proof-of-safety system: from batch creation through QR code generation, pouch label production, co-packer delivery, and public proof page verification. It is the operational backbone of Protocol Raw's "Proof, Not Promises" differentiator.

Every batch gets a unique, publicly accessible proof page with independently verified lab results. Customers scan a QR code on their pouch and see the evidence. This system generates the QR codes, produces the labels, and delivers them to the co-packer automatically.

This SOP does not cover: - Lab result processing and batch release (see SOP-LAB-01) - Proof portal frontend design and copy (see SOP-PROOF-01) - Batch allocation and fulfilment (see SOP-INV-01, SOP-ORD-02)

Scope¶

Applies to: All batches created via the Operations Portal

This SOP owns: - fn_create_batch_v1() — because proof generation begins at batch INSERT. This function is called from the Ops Portal via the create-batch Edge Function. SOP-LAB-01 owns the Ops Portal UI, operational procedures for when to create batches, and all downstream lab processing - Proof job queue and state machine (queuing, claiming, step execution, retry, dead letter) - QR code generation for pouch labels - Pouch label PDF generation (Browserless) - Co-packer email delivery (Resend) - Batch asset URL updates (fn_set_batch_qr_assets_v1, fn_set_batch_label_asset_v1) - Monitoring of the proof generation pipeline

This SOP does NOT own: - When to create batches or Ops Portal batch creation UI (SOP-LAB-01) - Lab result processing, batch release/rejection logic (SOP-LAB-01) - Proof portal page design, copy, analytics, share functionality (SOP-PROOF-01) - Order allocation or fulfilment (SOP-LAB-01, SOP-ORD-02)

Users: - Operations team (batch creation via Ops Portal) - Automated systems (proof generation, label generation, co-packer email) - Monitoring and alerts (see SOP-MON-01) - Customers (proof verification via QR scan)

Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release (upstream — lab processing, batch release) - SOP-PROOF-01: Proof Portal System (downstream — frontend display, analytics, caching) - SOP-MON-01: Monitoring and Alerting Architecture (monitoring pattern) - SOP-INV-01: Inventory Control (allocation after release) - SOP-ORD-02: 3PL Order Export (fulfilment after allocation)

Architecture¶

Complete Execution Chain¶

Ops Portal UI → create-batch Edge Function (thin wrapper)
    ↓
raw_ops.fn_create_batch_v1() (PostgreSQL — all batch creation logic)
    ↓
raw_ops.batches INSERT fires trigger
    ↓
raw_ops.fn_enqueue_proof_job() (trigger function — creates proof_jobs record)
    ↓
pg_cron (every 30 seconds)
    ↓
pg_net (async HTTP)
    ↓
process-proof-jobs Edge Function (thin wrapper — external I/O only)
    ↓
raw_ops.fn_claim_proof_jobs_v1() (claim queued jobs, FOR UPDATE SKIP LOCKED)
    ↓
Edge Function executes external steps:
  1. Generate QR code (internal library, no external API)
  2. Upload QR to Supabase Storage
  3. Generate label PDF (Browserless API)
  4. Upload label PDF to Supabase Storage
  5. Send co-packer email (Resend API)
    ↓
raw_ops.fn_finalize_proof_job_v1() (mark done, update batch record)
    ↓
ops_events audit log

Failure Chain¶

External step fails → retry within Edge Function (per-dependency policy)
    ↓
All retries exhausted for this invocation → Edge Function calls:
    ↓
raw_ops.fn_mark_proof_job_failed_v1() (increment attempts, log error)
    ↓
Next pg_cron cycle picks up job again (still in retryable state)
    ↓
After 5 total attempts → fn_mark_proof_job_failed_v1() sets state = 'dead_letter'
    ↓
ops-alerter → Slack #ops-urgent (critical severity)
    ↓
Manual resolution via fn_resolve_proof_job_manually_v1()

Monitoring Architecture (SOP-MON-01 compliant)¶

pg_cron (every 5 minutes)
    ↓
pg_net (async HTTP)
    ↓
run-monitor Edge Function
    ↓
raw_ops.fn_check_proof_system_health_v2()
    ↓
monitoring_runs (execution log)
    ↓ (if threshold breached)
ops-alerter Edge Function
    ↓
Slack (#ops-alerts or #ops-urgent)

Database Schema¶

All tables in raw_ops schema. RLS enabled on all tables.

Core Tables¶

Table	Purpose	RLS Policy
`batches`	Master batch records	Service role only + public read on RELEASED batches
`lab_results`	Lab test results per batch	Service role only
`formulations`	Formulation version tracking	Service role only
`proof_jobs`	Proof generation queue and state machine	Service role only
`batch_status_history`	Audit log for all batch status changes	Service role only
`proof_page_events`	Proof portal analytics	Anon INSERT + SELECT (public portal)

proof_jobs Table¶

CREATE TABLE raw_ops.proof_jobs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  batch_id UUID NOT NULL REFERENCES raw_ops.batches(id),
  job_key TEXT NOT NULL UNIQUE,
  state TEXT NOT NULL DEFAULT 'queued',
  attempts INTEGER NOT NULL DEFAULT 0,
  max_attempts INTEGER NOT NULL DEFAULT 5,
  last_error TEXT,
  error_category TEXT,
  processing_duration_ms INTEGER,
  -- Step completion tracking (partial completion detection)
  qr_generated BOOLEAN NOT NULL DEFAULT false,
  qr_uploaded BOOLEAN NOT NULL DEFAULT false,
  label_generated BOOLEAN NOT NULL DEFAULT false,
  label_uploaded BOOLEAN NOT NULL DEFAULT false,
  email_sent BOOLEAN NOT NULL DEFAULT false,
  -- Timestamps
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  claimed_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  CONSTRAINT valid_state CHECK (state IN ('queued', 'claimed', 'done', 'failed', 'dead_letter', 'resolved_manual')),
  CONSTRAINT valid_attempts CHECK (attempts >= 0 AND attempts <= max_attempts)
);

COMMENT ON TABLE raw_ops.proof_jobs IS 'Proof generation state machine. Each row represents one proof generation job for a batch. Processed by process-proof-jobs Edge Function via pg_cron.';

Processing Status Values¶

Status	Meaning	Transition
`queued`	Awaiting processing	→ `claimed`
`claimed`	Picked up by process-proof-jobs	→ `done` or `failed`
`done`	All steps completed successfully	Terminal
`failed`	Processing failed, eligible for retry	→ `claimed` (on next cycle) or `dead_letter`
`dead_letter`	5 attempts exhausted, manual intervention required	→ `resolved_manual` or → `queued` (reset)
`resolved_manual`	Manually resolved via controlled function	Terminal

formulations Table¶

CREATE TABLE raw_ops.formulations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  version TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  description TEXT,
  effective_from DATE NOT NULL,
  effective_until DATE,
  shopify_page_url TEXT NOT NULL,
  lab_certificate_pdf_url TEXT,
  lab_name TEXT,
  certificate_reference TEXT,
  analysis_date DATE,
  fediaf_verified BOOLEAN DEFAULT true,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE raw_ops.formulations IS 'Formulation versions with effective date ranges. Batches auto-link to the active formulation based on production date.';

Key Indexes¶

-- Proof job queue processing (critical for claim query performance)
CREATE INDEX idx_proof_jobs_claimable
  ON raw_ops.proof_jobs (created_at ASC)
  WHERE state IN ('queued', 'failed');

-- Proof job by batch (lookup during batch creation verification)
CREATE INDEX idx_proof_jobs_batch_id
  ON raw_ops.proof_jobs (batch_id);

-- Proof job dead letter monitoring
CREATE INDEX idx_proof_jobs_dead_letter
  ON raw_ops.proof_jobs (created_at DESC)
  WHERE state = 'dead_letter';

-- Batch code lookup (used during production and customer scans)
CREATE INDEX idx_batches_batch_code
  ON raw_ops.batches (batch_code);

-- Batch status filtering for public access and monitoring
CREATE INDEX idx_batches_status
  ON raw_ops.batches (status);

-- Lab result lookups by batch
CREATE INDEX idx_lab_results_batch_id
  ON raw_ops.lab_results (batch_id);

RLS Policies¶

-- All operational tables: RLS enabled, no policies = service role only, anon blocked
ALTER TABLE raw_ops.proof_jobs ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.formulations ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.batch_status_history ENABLE ROW LEVEL SECURITY;

-- Batches: public read access for RELEASED batches only (proof portal)
ALTER TABLE raw_ops.batches ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Allow public read on released batches"
  ON raw_ops.batches
  FOR SELECT
  TO anon
  USING (status = 'RELEASED');

-- Proof page events: public insert + select (analytics from proof portal)
ALTER TABLE raw_ops.proof_page_events ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Allow anon insert proof events"
  ON raw_ops.proof_page_events
  FOR INSERT
  TO anon
  WITH CHECK (true);

CREATE POLICY "Allow anon read proof events"
  ON raw_ops.proof_page_events
  FOR SELECT
  TO anon
  USING (true);

RLS posture: With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables. The two exceptions are batches (public read for RELEASED only) and proof_page_events (public insert/read for portal analytics).

QR Code System¶

There are TWO types of QR codes in the Protocol Raw system with different purposes.

QR Type	Location	Links To	Generated
Proof Verification QR	Product pouch labels	`proof.protocolraw.co.uk/batch/{public_batch_id}`	Per-batch, during proof job
Transition Guide QR	In-box insert card	`portal.protocolraw.co.uk?view=transition`	Static, bulk printed

Proof Verification QR (On Pouches)¶

Batch-specific QR codes generated automatically during proof job processing.

Content: https://proof.protocolraw.co.uk/batch/{public_batch_id} Storage: proof-assets/qr-codes/{batch_code}.png Size on label: 25mm x 25mm (300 DPI) Generation: Internal library (qrcode Deno module) inside process-proof-jobs Edge Function. No external API dependency.

Transition Guide QR (On Insert Card)¶

Universal QR code, same for all boxes.

Content: https://portal.protocolraw.co.uk?view=transition Purpose: Onboarding. Personalised 10-day transition feeding plan. Generation: Static. Not batch-specific. Same QR on all cards.

Design Principle¶

The insert card educates and onboards. The pouch QR verifies. These are separate concerns with separate touchpoints.

Physical Insert Cards¶

Insert cards are universal static cards, bulk printed, with identical content for every box. The proof_insert_pdf_url column is deprecated.

Card Purpose¶

Onboarding new customers with transition guidance
Explaining the proof verification system
Driving portal engagement for personalised feeding plans

Card Design: Front Side (Transition)¶

Headline: "Your First 10 Days"

Content: - Visual transition structure (Day 1-2: 25%, Day 3-4: 50%, etc.) - QR code linking to portal.protocolraw.co.uk?view=transition - CTA: "Scan for your personalised feeding plan" - Safe handling reminder

Card Design: Back Side (Proof Education)¶

Headline: "Verified Safe, Batch by Batch"

Content: - Explains that every pouch has a QR code linking to lab results - Points customer to scan pouch QR (no separate proof QR on card) - CTA: "Scan any pouch to verify" - Brief explanation of independent lab testing

Card Specifications¶

Attribute	Specification
Size	TBD by designer (A7 or business card format recommended)
QR Codes	ONE QR only (front side) linking to portal transition guide
Batch Code	NONE. Card is universal (same card for all boxes)
Print Run	Bulk print. Order in batches of 1,000+ (NOT generated per-batch)
Delivery	Stocked at packing location, one card per box
Brand Guidelines	Visual Identity Guide v2.4 (Forest Green, Cream, Espresso)

PostgreSQL Functions¶

fn_create_batch_v1()¶

Purpose: All batch creation business logic. Called by create-batch Edge Function.

Responsibilities: - Validate inputs (production_date, kg_produced) - Generate batch code (PR-YYMMDD-NNN format, next sequential number for date) - Look up active formulation for production date - Look up COGS per kg from config - Calculate expiry date (production + 12 months) - INSERT batch record with status QA_HOLD - Public batch ID generated by existing trigger (trigger_set_public_batch_id) - Generate proof URL - Log to ops_events - Return batch record as JSONB

CREATE OR REPLACE FUNCTION raw_ops.fn_create_batch_v1(
  p_production_date DATE,
  p_kg_produced NUMERIC
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
  v_seq INTEGER;
  v_formulation_id UUID;
  v_product_id UUID;
  v_cogs_per_kg NUMERIC;
  v_batch_id UUID;
  v_public_batch_id TEXT;
  v_proof_url TEXT;
  v_expiry_date DATE;
BEGIN
  -- Validate inputs
  IF p_production_date IS NULL THEN
    RAISE EXCEPTION 'production_date is required';
  END IF;
  IF p_kg_produced IS NULL OR p_kg_produced <= 0 THEN
    RAISE EXCEPTION 'kg_produced must be positive';
  END IF;

  -- Generate sequential batch code for date
  SELECT COALESCE(MAX(
    CAST(SUBSTRING(batch_code FROM '\d{3}$') AS INTEGER)
  ), 0) + 1
  INTO v_seq
  FROM raw_ops.batches
  WHERE batch_code LIKE 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-%';

  v_batch_code := 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-' || LPAD(v_seq::TEXT, 3, '0');

  -- Look up active formulation
  SELECT id INTO v_formulation_id
  FROM raw_ops.formulations
  WHERE effective_from <= p_production_date
    AND (effective_until IS NULL OR effective_until >= p_production_date)
  ORDER BY effective_from DESC
  LIMIT 1;

  IF v_formulation_id IS NULL THEN
    RAISE EXCEPTION 'No active formulation found for date %', p_production_date;
  END IF;

  -- Look up product (single SKU)
  SELECT id INTO v_product_id
  FROM raw_ops.products
  LIMIT 1;

  -- Look up COGS per kg from config
  SELECT value::NUMERIC INTO v_cogs_per_kg
  FROM raw_ops.config
  WHERE key = 'default_cogs_per_kg';

  -- Calculate expiry
  v_expiry_date := p_production_date + INTERVAL '12 months';

  -- Insert batch
  INSERT INTO raw_ops.batches (
    batch_code, produced_at, expiry_date, kg_produced,
    product_id, formulation_id, status, cogs_per_kg_gbp
  ) VALUES (
    v_batch_code, p_production_date, v_expiry_date, p_kg_produced,
    v_product_id, v_formulation_id, 'QA_HOLD', v_cogs_per_kg
  )
  RETURNING id, public_batch_id INTO v_batch_id, v_public_batch_id;

  -- Set proof URL (public_batch_id set by trigger_set_public_batch_id)
  v_proof_url := 'https://proof.protocolraw.co.uk/batch/' || v_public_batch_id;

  UPDATE raw_ops.batches
  SET proof_url = v_proof_url
  WHERE id = v_batch_id;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', v_batch_id, 'INFO', 'Batch created via Ops Portal',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'public_batch_id', v_public_batch_id,
      'kg_produced', p_kg_produced,
      'production_date', p_production_date,
      'formulation_id', v_formulation_id
    )
  );

  RETURN jsonb_build_object(
    'success', true,
    'batch_id', v_batch_id,
    'batch_code', v_batch_code,
    'public_batch_id', v_public_batch_id,
    'proof_url', v_proof_url,
    'expiry_date', v_expiry_date
  );

EXCEPTION WHEN OTHERS THEN
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', NULL, 'ERROR', 'Batch creation failed: ' || SQLERRM,
    jsonb_build_object('production_date', p_production_date, 'kg_produced', p_kg_produced)
  );
  RETURN jsonb_build_object('success', false, 'error', SQLERRM);
END;
$$;

fn_enqueue_proof_job()¶

Purpose: Trigger function. Creates a proof job record when a batch is inserted.

CREATE OR REPLACE FUNCTION raw_ops.fn_enqueue_proof_job()
RETURNS TRIGGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
  INSERT INTO raw_ops.proof_jobs (batch_id, job_key, state)
  VALUES (NEW.id, 'proof:batch:' || NEW.id::TEXT, 'queued');

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', NEW.id, 'INFO', 'Proof job enqueued',
    jsonb_build_object('batch_code', NEW.batch_code, 'batch_id', NEW.id)
  );

  RETURN NEW;
END;
$$;

CREATE TRIGGER trigger_enqueue_proof_job
  AFTER INSERT ON raw_ops.batches
  FOR EACH ROW
  EXECUTE FUNCTION raw_ops.fn_enqueue_proof_job();

fn_claim_proof_jobs_v1()¶

Purpose: Atomically claim up to N queued or failed proof jobs for processing. Returns the exact rows claimed, deterministically, via UPDATE ... RETURNING. Uses FOR UPDATE SKIP LOCKED for concurrent safety. No secondary query needed.

CREATE OR REPLACE FUNCTION raw_ops.fn_claim_proof_jobs_v1(
  p_limit INTEGER DEFAULT 10
)
RETURNS SETOF raw_ops.proof_jobs
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
  RETURN QUERY
  WITH claimable AS (
    SELECT id
    FROM raw_ops.proof_jobs
    WHERE state IN ('queued', 'failed')
      AND attempts < max_attempts
    ORDER BY
      CASE WHEN state = 'queued' THEN 0 ELSE 1 END,  -- queued first, retries second
      created_at ASC
    LIMIT p_limit
    FOR UPDATE SKIP LOCKED
  )
  UPDATE raw_ops.proof_jobs pj
  SET
    state = 'claimed',
    claimed_at = now(),
    updated_at = now()
  FROM claimable c
  WHERE pj.id = c.id
  RETURNING pj.*;
END;
$$;

Concurrency guarantee: If two workers call this function simultaneously, FOR UPDATE SKIP LOCKED ensures each worker claims a disjoint set of jobs. No duplicate processing is possible. The RETURNING pj.* clause returns exactly the rows that were updated in this transaction, not an approximation.

fn_set_batch_qr_assets_v1()¶

Purpose: Update a batch record with QR image URL and proof URL after QR generation. Called by process-proof-jobs Edge Function after successful QR upload. Ensures batch asset mutations happen in PostgreSQL, not in Edge Function code.

CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_qr_assets_v1(
  p_batch_id UUID,
  p_qr_image_url TEXT,
  p_proof_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
BEGIN
  UPDATE raw_ops.batches
  SET
    qr_image_url = p_qr_image_url,
    proof_url = p_proof_url,
    updated_at = now()
  WHERE id = p_batch_id
  RETURNING batch_code INTO v_batch_code;

  IF v_batch_code IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', p_batch_id, 'INFO', 'QR assets set for batch ' || v_batch_code,
    jsonb_build_object(
      'batch_code', v_batch_code,
      'qr_image_url', p_qr_image_url,
      'proof_url', p_proof_url
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_set_batch_label_asset_v1()¶

Purpose: Update a batch record with label PDF URL after label generation. Called by process-proof-jobs Edge Function after successful label upload.

CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_label_asset_v1(
  p_batch_id UUID,
  p_label_pdf_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_code TEXT;
BEGIN
  UPDATE raw_ops.batches
  SET
    label_pdf_url = p_label_pdf_url,
    updated_at = now()
  WHERE id = p_batch_id
  RETURNING batch_code INTO v_batch_code;

  IF v_batch_code IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
  END IF;

  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('batch', p_batch_id, 'INFO', 'Label PDF set for batch ' || v_batch_code,
    jsonb_build_object('batch_code', v_batch_code, 'label_pdf_url', p_label_pdf_url)
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_finalize_proof_job_v1()¶

Purpose: Mark a proof job as successfully completed. Update the batch record with generated asset URLs.

CREATE OR REPLACE FUNCTION raw_ops.fn_finalize_proof_job_v1(
  p_job_id UUID,
  p_qr_image_url TEXT,
  p_label_pdf_url TEXT,
  p_processing_duration_ms INTEGER
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_batch_id UUID;
  v_batch_code TEXT;
BEGIN
  -- Update proof job
  UPDATE raw_ops.proof_jobs
  SET
    state = 'done',
    completed_at = now(),
    updated_at = now(),
    processing_duration_ms = p_processing_duration_ms,
    qr_generated = true,
    qr_uploaded = true,
    label_generated = true,
    label_uploaded = true,
    email_sent = true
  WHERE id = p_job_id AND state = 'claimed'
  RETURNING batch_id INTO v_batch_id;

  IF v_batch_id IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found or not in claimed state');
  END IF;

  -- Update batch with asset URLs
  UPDATE raw_ops.batches
  SET
    qr_image_url = p_qr_image_url,
    label_pdf_url = p_label_pdf_url,
    updated_at = now()
  WHERE id = v_batch_id
  RETURNING batch_code INTO v_batch_code;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job completed successfully',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'duration_ms', p_processing_duration_ms,
      'qr_image_url', p_qr_image_url,
      'label_pdf_url', p_label_pdf_url
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;

fn_mark_proof_job_failed_v1()¶

Purpose: Record a proof job failure. If max attempts reached, move to dead letter queue and trigger critical alert.

CREATE OR REPLACE FUNCTION raw_ops.fn_mark_proof_job_failed_v1(
  p_job_id UUID,
  p_error TEXT,
  p_error_category TEXT DEFAULT 'unknown',
  p_step_progress JSONB DEFAULT '{}'::jsonb
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_new_state TEXT;
  v_attempts INTEGER;
  v_max_attempts INTEGER;
  v_batch_id UUID;
  v_batch_code TEXT;
  v_alert_needed BOOLEAN := false;
BEGIN
  -- Increment attempts and determine new state
  UPDATE raw_ops.proof_jobs
  SET
    attempts = attempts + 1,
    last_error = p_error,
    error_category = p_error_category,
    updated_at = now(),
    -- Preserve step progress so we know what completed before failure
    qr_generated = COALESCE((p_step_progress->>'qr_generated')::boolean, qr_generated),
    qr_uploaded = COALESCE((p_step_progress->>'qr_uploaded')::boolean, qr_uploaded),
    label_generated = COALESCE((p_step_progress->>'label_generated')::boolean, label_generated),
    label_uploaded = COALESCE((p_step_progress->>'label_uploaded')::boolean, label_uploaded),
    email_sent = COALESCE((p_step_progress->>'email_sent')::boolean, email_sent)
  WHERE id = p_job_id
  RETURNING attempts, max_attempts, batch_id INTO v_attempts, v_max_attempts, v_batch_id;

  IF v_batch_id IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found');
  END IF;

  -- Determine state based on attempt count
  IF v_attempts >= v_max_attempts THEN
    v_new_state := 'dead_letter';
    v_alert_needed := true;
  ELSE
    v_new_state := 'failed';
  END IF;

  UPDATE raw_ops.proof_jobs
  SET state = v_new_state
  WHERE id = p_job_id;

  -- Get batch code for logging
  SELECT batch_code INTO v_batch_code
  FROM raw_ops.batches WHERE id = v_batch_id;

  -- Audit log
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id,
    CASE WHEN v_new_state = 'dead_letter' THEN 'ERROR' ELSE 'EXCEPTION' END,
    CASE WHEN v_new_state = 'dead_letter'
      THEN 'Proof job moved to dead letter queue after ' || v_attempts || ' attempts'
      ELSE 'Proof job failed (attempt ' || v_attempts || '/' || v_max_attempts || ')'
    END,
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'error', p_error,
      'error_category', p_error_category,
      'attempts', v_attempts,
      'max_attempts', v_max_attempts,
      'step_progress', p_step_progress
    )
  );

  RETURN jsonb_build_object(
    'success', true,
    'new_state', v_new_state,
    'attempts', v_attempts,
    'max_attempts', v_max_attempts,
    'alert_needed', v_alert_needed,
    'alert_config', CASE
      WHEN NOT v_alert_needed THEN NULL
      ELSE jsonb_build_object(
        'channel', 'ops-urgent',
        'severity', 'critical',
        'title', 'Proof Job Dead Letter: ' || v_batch_code,
        'message', 'Proof generation failed after ' || v_attempts || ' attempts. Manual intervention required.',
        'fields', jsonb_build_array(
          jsonb_build_object('label', 'Batch', 'value', v_batch_code, 'inline', true),
          jsonb_build_object('label', 'Attempts', 'value', v_attempts::TEXT, 'inline', true),
          jsonb_build_object('label', 'Error', 'value', LEFT(p_error, 200), 'inline', false),
          jsonb_build_object('label', 'Category', 'value', p_error_category, 'inline', true)
        )
      )
    END
  );
END;
$$;

fn_resolve_proof_job_manually_v1()¶

Purpose: Controlled resolution of dead-letter proof jobs with audit trail. Verifies that all expected batch assets exist before allowing resolution. Direct UPDATE statements are prohibited.

CREATE OR REPLACE FUNCTION raw_ops.fn_resolve_proof_job_manually_v1(
  p_job_id UUID,
  p_resolution_note TEXT,
  p_skip_asset_check BOOLEAN DEFAULT false
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_old_state TEXT;
  v_batch_id UUID;
  v_batch_code TEXT;
  v_qr_image_url TEXT;
  v_label_pdf_url TEXT;
  v_proof_url TEXT;
  v_missing_assets TEXT[];
BEGIN
  -- Only accept dead_letter jobs
  SELECT state, batch_id INTO v_old_state, v_batch_id
  FROM raw_ops.proof_jobs WHERE id = p_job_id;

  IF v_old_state IS NULL THEN
    RETURN jsonb_build_object('success', false, 'error', 'Job not found');
  END IF;

  IF v_old_state != 'dead_letter' THEN
    RETURN jsonb_build_object('success', false, 'error',
      'Only dead_letter jobs can be manually resolved. Current state: ' || v_old_state);
  END IF;

  -- Verify batch assets exist (unless explicitly skipped)
  SELECT batch_code, qr_image_url, label_pdf_url, proof_url
  INTO v_batch_code, v_qr_image_url, v_label_pdf_url, v_proof_url
  FROM raw_ops.batches WHERE id = v_batch_id;

  IF NOT p_skip_asset_check THEN
    v_missing_assets := ARRAY[]::TEXT[];
    IF v_qr_image_url IS NULL THEN v_missing_assets := v_missing_assets || 'qr_image_url'; END IF;
    IF v_label_pdf_url IS NULL THEN v_missing_assets := v_missing_assets || 'label_pdf_url'; END IF;
    IF v_proof_url IS NULL THEN v_missing_assets := v_missing_assets || 'proof_url'; END IF;

    IF array_length(v_missing_assets, 1) > 0 THEN
      RETURN jsonb_build_object(
        'success', false,
        'error', 'Cannot resolve: batch is missing assets. Use p_skip_asset_check := true to override.',
        'missing_assets', to_jsonb(v_missing_assets),
        'batch_code', v_batch_code
      );
    END IF;
  END IF;

  -- Resolve
  UPDATE raw_ops.proof_jobs
  SET state = 'resolved_manual', updated_at = now()
  WHERE id = p_job_id;

  -- Audit trail
  INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
  VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job manually resolved',
    jsonb_build_object(
      'batch_code', v_batch_code,
      'job_id', p_job_id,
      'previous_state', v_old_state,
      'resolution_note', p_resolution_note,
      'asset_check_skipped', p_skip_asset_check
    )
  );

  RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code, 'resolved', true);
END;
$$;

Asset check behaviour: By default, the function refuses to resolve a job if the batch is missing qr_image_url, label_pdf_url, or proof_url. This prevents false completions where the operator marks a job done but the batch still lacks proof assets. If the operator has genuinely completed the steps outside the system (e.g., manually uploaded files), they can pass p_skip_asset_check := true with an explanation in p_resolution_note.

Edge Functions¶

create-batch¶

Purpose: Thin wrapper. Receives Ops Portal request, calls fn_create_batch_v1(), returns result.

Endpoint: POST /functions/v1/create-batch Auth: Cloudflare Access is the authentication boundary. All requests to ops.protocolraw.co.uk must pass Google OAuth via Cloudflare Access before reaching any backend. The Edge Function itself does not perform caller identity verification. It trusts that any request reaching it has been authenticated by Cloudflare Access. The function uses the service_role key to call PostgreSQL, which is standard for backend functions. JWT Verification: Enabled (Supabase JWT from Ops Portal session)

Flow: 1. Parse request body (production_date, kg_produced) 2. Call raw_ops.fn_create_batch_v1() via service_role client 3. Return JSONB result to caller

This function does NOT orchestrate proof generation, label generation, or email sending. Those are handled asynchronously by the proof job queue.

process-proof-jobs¶

Purpose: Thin wrapper for external I/O. Claims queued proof jobs, executes external steps (QR, PDF, email), finalises or fails each job.

Endpoint: POST /functions/v1/process-proof-jobs Auth: Internal only. Validates X-Worker-Secret header against PROOF_WORKER_SECRET env var. Rejects requests without a valid secret. JWT Verification: Disabled (internal calls only, compensated by shared secret)

Threat model: This function is reachable at a public URL with JWT disabled. Without compensating controls, any caller who discovers the project URL and function name could invoke it. The shared secret header (X-Worker-Secret) prevents unauthorised invocation. The function is also idempotent and non-destructive: it claims already-queued jobs and generates assets for batches that already exist. Worst-case unauthorised invocation would generate proof assets that would be generated anyway on the next legitimate cycle.

Validation logic (first lines of Edge Function):

const workerSecret = Deno.env.get("PROOF_WORKER_SECRET");
const headerSecret = req.headers.get("X-Worker-Secret");
if (!workerSecret || headerSecret !== workerSecret) {
  return new Response(JSON.stringify({ error: "Unauthorized" }), { status: 401 });
}

Flow per invocation:

1. Validate X-Worker-Secret header
2. Call fn_claim_proof_jobs_v1(10) → get up to 10 claimed job rows
3. For each claimed job:
   a. Generate QR code (internal qrcode library)
      - Content: proof URL from batch record
      - Output: PNG buffer, 300x300px
      - Timeout: 5 seconds
      - On failure: record step_progress, jump to fail handler
   b. Upload QR to Supabase Storage
      - Path: proof-assets/qr-codes/{batch_code}.png
      - Timeout: 10 seconds
      - Retries: 2 with backoff (1s, 3s)
   c. Call fn_set_batch_qr_assets_v1(batch_id, qr_url, proof_url)
      - Updates batch record via PostgreSQL (not direct table write)
   d. Generate label PDF (Browserless API)
      - Render HTML template with batch data + QR image
      - Size: 100mm x 150mm, 300 DPI
      - Timeout: 15 seconds
      - Retries: 3 with exponential backoff (1s, 3s, 9s)
   e. Upload label PDF to Supabase Storage
      - Path: proof-assets/batch-labels/{batch_code}.pdf
      - Timeout: 10 seconds
      - Retries: 2 with backoff (1s, 3s)
   f. Call fn_set_batch_label_asset_v1(batch_id, label_url)
      - Updates batch record via PostgreSQL (not direct table write)
   g. Send co-packer email (Resend API)
      - Timeout: 10 seconds
      - Retries: 3 with exponential backoff (1s, 3s, 9s)
   h. Call fn_finalize_proof_job_v1() with asset URLs and duration
4. On any step failure after retries exhausted:
   - Call fn_mark_proof_job_failed_v1() with error, category, and step_progress
   - If alert_config returned (dead letter), call ops-alerter
   - Continue to next job (do not abort batch)

DB/Edge Function boundary: The Edge Function performs external I/O only (QR library, Storage upload, Browserless HTTP, Resend HTTP). All database mutations go through named PostgreSQL functions. The Edge Function never directly UPDATEs raw_ops.batches or raw_ops.proof_jobs.

Label Specifications¶

Generated by process-proof-jobs using Browserless API for HTML-to-PDF rendering.

Label size: 100mm x 150mm (standard for 500g pouches) Format: PDF, 300 DPI, print-ready Template: HTML/CSS rendered by Browserless

Label content (top to bottom): 1. Brand header: "PROTOCOL RAW" (Montserrat Bold, 14pt), tagline, proof QR (25mm x 25mm, top right) 2. Product information: "PROTOCOL RAW COMPLETE", description, net weight 3. Batch traceability: Batch code (prominent), production date, best before date 4. Ingredients list: Full list in descending order by weight 5. Footer: "Made in the UK", company details, "Verified safe, batch by batch"

Co-packer Email¶

Sent by process-proof-jobs via Resend API.

From: production@protocolraw.co.uk To: Co-packer email (from COPACKER_EMAIL secret) Subject: Protocol Raw Batch {batch_code} - Label Artwork Body: Batch details (code, date, kg, product, formulation) + download link for label PDF + printing instructions Delivery: Resend API with retry logic (see Retry Policy)

proof-page¶

Purpose: Serves public proof pages. Customer-facing.

Endpoint: GET /functions/v1/proof-page/* Auth: Public (unauthenticated). No login required to view proof pages. JWT Verification: Disabled (public endpoint) Domain: proof.protocolraw.co.uk (Cloudflare DNS → Supabase Edge Function)

Data access model: The proof-page Edge Function uses the service_role key to query raw_ops.batches (bypassing RLS). This is necessary because the function renders different HTML depending on batch status:

Batch Status	Page Rendered
`RELEASED`	Full results: lab data, share buttons, formulation info
`QA_HOLD`	Status message: "This batch is undergoing safety testing. Results will be published once verified."
`REJECTED`	Status message: "This batch did not pass our safety requirements."
Not found	404: "Batch not found."

The anon RLS policy (Allow public read on released batches) applies to the separate search API endpoint (/api/search), which uses the anon key and correctly returns only RELEASED batch metadata. The server-rendered proof pages use service_role because they need to distinguish between QA_HOLD, REJECTED, and non-existent batches to render appropriate status pages.

Routes:

Route	Method	Purpose
`/`	GET	Portal homepage with search box and latest released batch card
`/batch/{public_batch_id}`	GET	Real batch proof page (RELEASED / QA_HOLD / REJECTED / 404)
`/batch/sample`	GET	Sample batch page with hardcoded data; canonical pre-purchase URL
`/sample`	GET	Legacy alias for `/batch/sample`
`/batch/PR-00LATEST`	GET	302 redirect to most recent released batch via `fn_get_latest_released_batch_v1()`; falls back to `/batch/sample`. Reached via Cloudflare Worker `proof-latest-redirect` (Supabase custom-domain gateway blocks `/batch/latest`)
`/api/search`	GET	Batch lookup by code; returns `{found, status, public_batch_id}` for RELEASED only
`/api/event`	POST	Client-side share/click event ingestion

See SOP-PROOF-01 for complete route documentation, page design, and caching strategy.

Retry Policy¶

Retries operate on two tiers:

Tier 1 — Within a single Edge Function invocation. The process-proof-jobs worker retries transient HTTP failures (Storage uploads, Browserless, Resend) using a simple retry loop with exponential backoff before giving up on the current invocation. This handles brief network glitches without consuming a job-level attempt.

Tier 2 — Across pg_cron cycles. If all Tier 1 retries for a step are exhausted, fn_mark_proof_job_failed_v1() increments attempts and sets state = 'failed'. The next pg_cron cycle (30 seconds later) reclaims the job and tries again. After max_attempts (5) Tier 2 failures, the job moves to dead_letter.

Retry helper (used by all HTTP steps in Edge Function):

async function withRetry(
  fn: () => Promise<Response>,
  maxRetries: number,
  timeoutMs: number,
  backoffMs: number[] // e.g. [1000, 3000, 9000]
): Promise<Response> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timer = setTimeout(() => controller.abort(), timeoutMs);
      try {
        const res = await fn();
        if (res.ok) return res;
        if (attempt < maxRetries) {
          await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
          continue;
        }
        throw new Error(`HTTP ${res.status}: ${await res.text()}`);
      } finally {
        clearTimeout(timer);
      }
    } catch (err) {
      if (attempt >= maxRetries) throw err;
      await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
    }
  }
  throw new Error("Retries exhausted");
}

Per-dependency configuration:

Dependency	Timeout	Tier 1 Retries	Backoff	Tier 2 Max Attempts
QR generation (internal)	5s	0 (library call, no HTTP)	N/A	5
Supabase Storage upload	10s	2	1s, 3s	5
Browserless PDF	15s	3	1s, 3s, 9s	5
Resend email	10s	3	1s, 3s, 9s	5

Error categories:

Category	Meaning
`qr_generation`	QR code creation failed (extremely rare)
`storage_upload`	Supabase Storage write failed
`pdf_generation`	Browserless rendering failed
`email_delivery`	Resend API failed
`db_error`	Database operation failed
`unknown`	Unclassified error

Total time to dead letter: 5 Tier 2 attempts x 30-second pg_cron cycles = ~2.5 minutes minimum. With Tier 1 backoff within each cycle, typically 5-10 minutes for persistent failures.

pg_cron Jobs¶

process-proof-jobs¶

SELECT cron.schedule(
  'process-proof-jobs',
  '30 seconds',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/process-proof-jobs',
    headers := jsonb_build_object(
      'Content-Type', 'application/json',
      'apikey', current_setting('app.settings.anon_key'),
      'X-Worker-Secret', current_setting('app.settings.proof_worker_secret')
    ),
    body := '{}'::jsonb
  );
  $$
);

Note: PROOF_WORKER_SECRET must be set both as a Supabase Edge Function secret and as a Supabase database setting (app.settings.proof_worker_secret) so that pg_cron can pass it in the header. Generate a random 64-character hex string: openssl rand -hex 32.

monitor-proof-system-health¶

SELECT cron.schedule(
  'monitor-proof-system-health',
  '*/5 * * * *',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/run-monitor',
    headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
    body := '{"check": "proof_system_health"}'::jsonb
  );
  $$
);

Monitoring¶

fn_check_proof_system_health_v2()¶

Purpose: Comprehensive proof system health check. Detects stuck jobs, dead letters, partial completions, and processing degradation.

Registration: Add proof_system_health: "fn_check_proof_system_health_v2" to run-monitor Edge Function dispatch map.

CREATE OR REPLACE FUNCTION raw_ops.fn_check_proof_system_health_v2()
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_run_id uuid;
  v_start_time timestamptz;
  v_duration_ms integer;
  v_alert_needed boolean := false;
  v_alert_channel text := 'ops-alerts';
  v_alert_severity text := 'warning';
  v_alert_title text;
  v_alert_message text;
  v_alert_fields jsonb := '[]'::jsonb;
  v_stuck_claimed integer;
  v_dead_letter_count integer;
  v_failed_count integer;
  v_partial_completion_count integer;
  v_avg_duration_ms numeric;
  v_queue_depth integer;
  v_completed_24h integer;
BEGIN
  v_start_time := clock_timestamp();

  INSERT INTO raw_ops.monitoring_runs (check_name, status)
  VALUES ('proof_system_health', 'running')
  RETURNING id INTO v_run_id;

  -- 1. Stuck claimed jobs (claimed > 5 minutes ago, not finalised)
  SELECT COUNT(*) INTO v_stuck_claimed
  FROM raw_ops.proof_jobs
  WHERE state = 'claimed'
    AND claimed_at < now() - INTERVAL '5 minutes';

  -- 2. Dead letter jobs (unresolved)
  SELECT COUNT(*) INTO v_dead_letter_count
  FROM raw_ops.proof_jobs
  WHERE state = 'dead_letter';

  -- 3. Failed jobs awaiting retry
  SELECT COUNT(*) INTO v_failed_count
  FROM raw_ops.proof_jobs
  WHERE state = 'failed';

  -- 4. Partial completions (some steps done, job not complete)
  SELECT COUNT(*) INTO v_partial_completion_count
  FROM raw_ops.proof_jobs
  WHERE state IN ('failed', 'dead_letter')
    AND (qr_generated = true OR label_generated = true)
    AND (qr_uploaded = false OR label_uploaded = false OR email_sent = false);

  -- 5. Average processing duration (last 24h)
  SELECT COALESCE(AVG(processing_duration_ms), 0) INTO v_avg_duration_ms
  FROM raw_ops.proof_jobs
  WHERE state = 'done'
    AND completed_at > now() - INTERVAL '24 hours';

  -- 6. Current queue depth
  SELECT COUNT(*) INTO v_queue_depth
  FROM raw_ops.proof_jobs
  WHERE state IN ('queued', 'failed')
    AND attempts < max_attempts;

  -- 7. Completed in last 24h
  SELECT COUNT(*) INTO v_completed_24h
  FROM raw_ops.proof_jobs
  WHERE state = 'done'
    AND completed_at > now() - INTERVAL '24 hours';

  -- Determine alert level (priority order)
  IF v_dead_letter_count > 0 THEN
    v_alert_needed := true;
    v_alert_channel := 'ops-urgent';
    v_alert_severity := 'critical';
    v_alert_title := 'Proof System: Dead Letter Jobs';
    v_alert_message := v_dead_letter_count || ' proof job(s) in dead letter queue. Manual intervention required.';
  ELSIF v_stuck_claimed > 0 THEN
    v_alert_needed := true;
    v_alert_channel := 'ops-urgent';
    v_alert_severity := 'critical';
    v_alert_title := 'Proof System: Stuck Jobs';
    v_alert_message := v_stuck_claimed || ' proof job(s) stuck in claimed state for >5 minutes.';
  ELSIF v_partial_completion_count > 0 THEN
    v_alert_needed := true;
    v_alert_title := 'Proof System: Partial Completions';
    v_alert_message := v_partial_completion_count || ' proof job(s) with partial step completion.';
  ELSIF v_failed_count > 3 THEN
    v_alert_needed := true;
    v_alert_title := 'Proof System: Elevated Failures';
    v_alert_message := v_failed_count || ' proof job(s) in failed state awaiting retry.';
  END IF;

  IF v_alert_needed THEN
    v_alert_fields := jsonb_build_array(
      jsonb_build_object('label', 'Dead Letter', 'value', v_dead_letter_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Stuck', 'value', v_stuck_claimed::TEXT, 'inline', true),
      jsonb_build_object('label', 'Failed', 'value', v_failed_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Partial', 'value', v_partial_completion_count::TEXT, 'inline', true),
      jsonb_build_object('label', 'Queue', 'value', v_queue_depth::TEXT, 'inline', true),
      jsonb_build_object('label', 'Avg Duration', 'value', ROUND(v_avg_duration_ms)::TEXT || 'ms', 'inline', true)
    );
  END IF;

  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;

  UPDATE raw_ops.monitoring_runs
  SET
    status = 'success',
    duration_ms = v_duration_ms,
    alert_triggered = v_alert_needed,
    result_summary = 'Queue: ' || v_queue_depth || ', Done 24h: ' || v_completed_24h ||
                     ', Failed: ' || v_failed_count || ', DLQ: ' || v_dead_letter_count,
    result_json = jsonb_build_object(
      'stuck_claimed', v_stuck_claimed,
      'dead_letter_count', v_dead_letter_count,
      'failed_count', v_failed_count,
      'partial_completion_count', v_partial_completion_count,
      'avg_duration_ms', ROUND(v_avg_duration_ms),
      'queue_depth', v_queue_depth,
      'completed_24h', v_completed_24h
    )
  WHERE id = v_run_id;

  RETURN jsonb_build_object(
    'success', true,
    'run_id', v_run_id,
    'check_name', 'proof_system_health',
    'duration_ms', v_duration_ms,
    'alert_config', CASE
      WHEN NOT v_alert_needed THEN NULL
      ELSE jsonb_build_object(
        'channel', v_alert_channel,
        'severity', v_alert_severity,
        'title', v_alert_title,
        'message', v_alert_message,
        'fields', v_alert_fields
      )
    END
  );

EXCEPTION WHEN OTHERS THEN
  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;
  UPDATE raw_ops.monitoring_runs
  SET status = 'error', duration_ms = v_duration_ms, error_message = SQLERRM
  WHERE id = v_run_id;
  RETURN jsonb_build_object(
    'success', false, 'run_id', v_run_id,
    'check_name', 'proof_system_health', 'error', SQLERRM, 'duration_ms', v_duration_ms
  );
END;
$$;

Alert Thresholds¶

Condition	Severity	Channel	Response
Dead letter jobs > 0	Critical	#ops-urgent	Immediate. Proof generation blocked for batch.
Stuck claimed jobs > 0 (>5 min)	Critical	#ops-urgent	Edge Function may be hanging. Check logs.
Partial completion > 0	Warning	#ops-alerts	Some steps OK, others failed. Verify assets.
Failed jobs > 3	Warning	#ops-alerts	Elevated retries. Check external service health.
Avg duration > 60,000ms	Warning	#ops-alerts	Degradation. Check Browserless/Resend latency.

Metabase Dashboard: Proof System Health¶

Collection: "SOP-PROOF-00 - Operations"

Saved Questions:

Job Queue Status (real-time): state distribution, last 7 days
Processing Duration Trend (7-day): avg/max ms per day, jobs completed
Dead Letter and Failure History (30-day): DLQ, retrying, resolved counts by date
Partial Completion Detail (current): jobs with mixed step completion flags
Batch Creation Volume (12-week): weekly counts by status (RELEASED, QA_HOLD, REJECTED)

Customer Access Methods¶

1. Pouch QR Code Scan (Primary)¶

Customer scans QR on any pouch label. Mobile browser opens https://proof.protocolraw.co.uk/batch/{public_batch_id}. Shows full lab results (if RELEASED) or testing-in-progress message (if QA_HOLD).

2. Insert Card QR Scan (Onboarding)¶

Customer scans QR on insert card. Opens https://portal.protocolraw.co.uk?view=transition. Shows personalised 10-day transition feeding plan.

3. Direct Link¶

Included in delivery SMS, order confirmation email, and customer portal.

4. Manual Batch Lookup¶

Search at proof.protocolraw.co.uk. See SOP-PROOF-01.

Timeline¶

Phase	What Happens
Day 0 (Batch Creation)	QR code, label PDF, co-packer email. Batch in QA_HOLD.
Days 0-3	Product manufactured with printed labels. Samples at lab.
Day 3-5	Lab results arrive. SOP-LAB-01 releases batch.
Day 3-5+	Pouch QR links to live proof page with results.

Storage¶

Buckets¶

Path	Contents	Access
`proof-assets/qr-codes/`	QR code PNGs	Public read
`proof-assets/batch-labels/`	Pouch label PDFs	Authenticated only
`proof-assets/lab-reports/`	Lab certificate PDFs (SOP-LAB-01)	Authenticated only

Label PDF access: Labels contain production details. Authenticated access only to prevent competitive intelligence exposure at scale. Co-packer receives label via email link. Ops team accesses via portal.

Storage Projections (Annual at Scale)¶

Asset	Per Item	Annual Volume	Total
QR codes	10 KB	~25,000	250 MB
Label PDFs	500 KB	~25,000	12.5 GB
Lab reports	2 MB	~25,000	50 GB
Total			~63 GB/year

Required Secrets¶

Secret	Purpose	Used By
`PROOF_WORKER_SECRET`	Shared secret for internal worker auth	process-proof-jobs (validates), pg_cron (sends via header)
`BROWSERLESS_API_KEY`	Label PDF generation	process-proof-jobs
`RESEND_API_KEY`	Co-packer email delivery	process-proof-jobs
`COPACKER_EMAIL`	Co-packer email address	process-proof-jobs
`SLACK_OPS_ALERTS_WEBHOOK`	Standard alerts	ops-alerter
`SLACK_OPS_URGENT_WEBHOOK`	Critical alerts	ops-alerter

PROOF_WORKER_SECRET setup: Must be configured in two places: (1) Supabase Edge Function secrets, and (2) Supabase database settings as app.settings.proof_worker_secret so pg_cron can read it. Generate with openssl rand -hex 32.

Rotation: Review quarterly. Rotate immediately if compromised. Zero-downtime rotation via Supabase secrets update + redeploy.

Batch Code System¶

Dual ID System¶

Internal: PR-YYMMDD-NNN (sequential, for ops/warehouse/co-packer) Public: PR-XXXXXXXX (SHA-256 hash, for QR codes/proof URLs/customers)

Public ID protects competitive intelligence (no date or volume information). Generated by trigger_set_public_batch_id on batch INSERT.

Operational Procedures¶

Creating a Batch¶

When: Co-packer confirms production. Where: ops.protocolraw.co.uk. Time: ~30 seconds.

Enter production date + kg produced
Click "Create Batch"
Confirm: batch code, proof URL in response
Within 60 seconds: QR, label, email should complete (check proof_jobs)

Weekly Review (10 min)¶

Check Metabase dashboard
Review any dead letter or failed jobs
Scan 2-3 random QR codes to verify proof pages

Monthly Review (15 min)¶

Review weekly trends
Verify label print quality with co-packer
Check storage costs (~$1-2/month)
Confirm SSL auto-renewal (Cloudflare)

Troubleshooting¶

Batch Creation Fails¶

Check create-batch Edge Function logs. Check ops_events for errors. Common: no active formulation, no product record, invalid date.

Proof Job Stuck¶

SELECT id, state, attempts, claimed_at, last_error, qr_generated, label_generated, email_sent
FROM raw_ops.proof_jobs
WHERE state IN ('queued', 'claimed') AND created_at < now() - INTERVAL '5 minutes';

If claimed >10 min: reset to queued. If queued with 0 attempts: check pg_cron job is active.

Partial Completion (Label Missing)¶

Check Browserless API key. Check HTML template. Reset job for retry:

UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE batch_id = (SELECT id FROM raw_ops.batches WHERE batch_code = 'PR-YYMMDD-NNN');

Dead Letter Job¶

Check error details. Fix root cause. Reset for retry or resolve manually:

-- Reset for retry
UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE id = 'job-uuid';

-- Or resolve manually
SELECT raw_ops.fn_resolve_proof_job_manually_v1('job-uuid', 'Resolution note here');

Proof Page Not Found¶

Check batch status. QA_HOLD = expected (testing). REJECTED = expected (failed). Verify RLS policy exists.

Compliance and Audit Trail¶

Traceability: Every pouch traceable to batch via QR
Transparency: Lab results publicly accessible after release
Immutability: Batch data changes only via defined functions
Audit Trail: All changes logged to batch_status_history and ops_events
Recall: See SOP-LAB-01 for recall procedure

Data Retention¶

Data	Retention	Reason
Batches	Indefinite	Regulatory
Lab Results	Indefinite	Regulatory
QR Codes	Indefinite	Customer access
Label PDFs	2 years	Operational
Proof Jobs	90 days	Monitoring
ops_events	Indefinite	Audit trail

Scaling¶

Phase	Batches/Day	Performance	Action
A (0-1K customers)	1-3	<60s per job	Monitor metrics
B (1-10K)	5-15	<60s per job	No changes expected
C (10-100K)	30-70	Parallel via SKIP LOCKED	Bulk creation UI, monitor Browserless concurrency
100K+	70+	Review needed	Upgrade Supabase, read replicas

No claims of readiness at scale are made until validated with production data.

All Objects¶

Object	Type	Purpose
`raw_ops.proof_jobs`	Table	State machine
`raw_ops.formulations`	Table	Formulation versions
`raw_ops.proof_page_events`	Table	Portal analytics
`raw_ops.fn_create_batch_v1()`	Function	Batch creation logic
`raw_ops.fn_enqueue_proof_job()`	Trigger function	Enqueue on INSERT
`raw_ops.fn_claim_proof_jobs_v1()`	Function	Atomic claim (SKIP LOCKED)
`raw_ops.fn_set_batch_qr_assets_v1()`	Function	Set QR + proof URL on batch
`raw_ops.fn_set_batch_label_asset_v1()`	Function	Set label PDF URL on batch
`raw_ops.fn_finalize_proof_job_v1()`	Function	Mark done, update batch
`raw_ops.fn_mark_proof_job_failed_v1()`	Function	Failure + DLQ
`raw_ops.fn_resolve_proof_job_manually_v1()`	Function	DLQ resolution (with asset check)
`raw_ops.fn_check_proof_system_health_v2()`	Function	SOP-MON-01 monitor
`trigger_enqueue_proof_job`	Trigger	AFTER INSERT on batches
`trigger_set_public_batch_id`	Trigger	Existing, unchanged
`create-batch`	Edge Function	Thin wrapper
`process-proof-jobs`	Edge Function	Queue processor (shared secret auth)
`proof-page`	Edge Function	Public pages (service_role for status rendering)
`process-proof-jobs`	pg_cron job	Every 30 seconds
`monitor-proof-system-health`	pg_cron job	Every 5 minutes
`idx_proof_jobs_claimable`	Index	Queue performance
`idx_proof_jobs_batch_id`	Index	Batch lookup
`idx_proof_jobs_dead_letter`	Index	DLQ monitoring
`Allow public read on released batches`	RLS Policy	Search API access
`PROOF_WORKER_SECRET`	Secret	Internal worker auth

Contact and Escalation¶

Operations Owner: Anton (Founder) Technical Owner: Anton (Systems Architect)

Escalation: 1. Check Slack #ops-alerts and #ops-urgent 2. Check Metabase dashboard 3. Review Edge Function logs 4. Query proof_jobs for diagnostics 5. Check external service status (Resend, Browserless, Cloudflare) 6. If unresolved after 30 min: pause batch creation

Version History¶

Version	Date	Changes	Author
SOP-PROOF-00 v1.0	2026-03-20	Complete rewrite replacing SOP-0R v4.2. Architecture rebuilt to Supabase-native pattern. Named execution chain, state machine, retry/DLQ, SOP-MON-01 monitoring, partial completion detection, RLS policies, auth model, QR internalised, honest scaling claims.	Protocol Raw Operations

Superseded: SOP-0R v4.2 (2026-01-21) — fully replaced.

End of SOP-PROOF-00 v1.0

Last reviewed: 2026-03-20 Next review: 2026-06-20 System status: ✅ Production Ready

Protocol Raw — Verified safe, batch by batch