SOP-PROOF-00: Proof-of-Safety System v1.0¶
End-to-end proof generation, label production, and public verification for every batch
Document ID: SOP-PROOF-00-v1.0 Version: 1.0 Status: ✅ Production Ready Last Updated: 2026-03-20 Owner: Anton (Founder) Replaces: SOP-0R v4.2 Review Date: 2026-06-20
Key Changes from SOP-0R v4.2¶
- ✅ Complete architecture rewrite to mandated Supabase-native pattern (
pg_cron → pg_net → Edge Function → PostgreSQL function → ops-alerter) - ✅ "Background worker" eliminated — replaced with named execution chain through
process-proof-jobsEdge Function and PostgreSQL state machine - ✅ Proof job state machine fully specified —
fn_create_batch_v1(),fn_claim_proof_jobs_v1(),fn_finalize_proof_job_v1(),fn_mark_proof_job_failed_v1() - ✅ Queue-claiming mechanics defined —
FOR UPDATE SKIP LOCKEDwithLIMIT 10for concurrent-safe, bounded processing - ✅ Dead-letter queue — 5 attempts max, then
dead_letterstate with critical Slack alert - ✅ Retry policy for all external dependencies — QR generation (internal, no external API), Browserless (3 attempts, 15s timeout), Resend (3 attempts, 10s timeout)
- ✅ QR generation internalised —
qrcodeDeno library replaces third-partyapi.qrserver.comdependency - ✅ Partial completion detection — monitoring catches batches where some steps succeeded but others failed
- ✅ SOP-MON-01 integration —
fn_check_proof_system_health_v2()monitor on 5-minute schedule - ✅ RLS policies specified in full — not just named, but defined with CREATE POLICY statements
- ✅ Auth model documented for all operational Edge Functions
- ✅ ops_events logging at every state transition
- ✅ Honest operational claims — removed unsupported "zero manual intervention" assertions
- ❌ Removed "Production Ready for 100,000+ Customers" pending real-world validation at scale
Purpose¶
This SOP defines the complete proof-of-safety system: from batch creation through QR code generation, pouch label production, co-packer delivery, and public proof page verification. It is the operational backbone of Protocol Raw's "Proof, Not Promises" differentiator.
Every batch gets a unique, publicly accessible proof page with independently verified lab results. Customers scan a QR code on their pouch and see the evidence. This system generates the QR codes, produces the labels, and delivers them to the co-packer automatically.
This SOP does not cover: - Lab result processing and batch release (see SOP-LAB-01) - Proof portal frontend design and copy (see SOP-PROOF-01) - Batch allocation and fulfilment (see SOP-INV-01, SOP-ORD-02)
Scope¶
Applies to: All batches created via the Operations Portal
This SOP owns:
- fn_create_batch_v1() — because proof generation begins at batch INSERT. This function is called from the Ops Portal via the create-batch Edge Function. SOP-LAB-01 owns the Ops Portal UI, operational procedures for when to create batches, and all downstream lab processing
- Proof job queue and state machine (queuing, claiming, step execution, retry, dead letter)
- QR code generation for pouch labels
- Pouch label PDF generation (Browserless)
- Co-packer email delivery (Resend)
- Batch asset URL updates (fn_set_batch_qr_assets_v1, fn_set_batch_label_asset_v1)
- Monitoring of the proof generation pipeline
This SOP does NOT own: - When to create batches or Ops Portal batch creation UI (SOP-LAB-01) - Lab result processing, batch release/rejection logic (SOP-LAB-01) - Proof portal page design, copy, analytics, share functionality (SOP-PROOF-01) - Order allocation or fulfilment (SOP-LAB-01, SOP-ORD-02)
Users: - Operations team (batch creation via Ops Portal) - Automated systems (proof generation, label generation, co-packer email) - Monitoring and alerts (see SOP-MON-01) - Customers (proof verification via QR scan)
Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release (upstream — lab processing, batch release) - SOP-PROOF-01: Proof Portal System (downstream — frontend display, analytics, caching) - SOP-MON-01: Monitoring and Alerting Architecture (monitoring pattern) - SOP-INV-01: Inventory Control (allocation after release) - SOP-ORD-02: 3PL Order Export (fulfilment after allocation)
Architecture¶
Complete Execution Chain¶
Ops Portal UI → create-batch Edge Function (thin wrapper)
↓
raw_ops.fn_create_batch_v1() (PostgreSQL — all batch creation logic)
↓
raw_ops.batches INSERT fires trigger
↓
raw_ops.fn_enqueue_proof_job() (trigger function — creates proof_jobs record)
↓
pg_cron (every 30 seconds)
↓
pg_net (async HTTP)
↓
process-proof-jobs Edge Function (thin wrapper — external I/O only)
↓
raw_ops.fn_claim_proof_jobs_v1() (claim queued jobs, FOR UPDATE SKIP LOCKED)
↓
Edge Function executes external steps:
1. Generate QR code (internal library, no external API)
2. Upload QR to Supabase Storage
3. Generate label PDF (Browserless API)
4. Upload label PDF to Supabase Storage
5. Send co-packer email (Resend API)
↓
raw_ops.fn_finalize_proof_job_v1() (mark done, update batch record)
↓
ops_events audit log
Failure Chain¶
External step fails → retry within Edge Function (per-dependency policy)
↓
All retries exhausted for this invocation → Edge Function calls:
↓
raw_ops.fn_mark_proof_job_failed_v1() (increment attempts, log error)
↓
Next pg_cron cycle picks up job again (still in retryable state)
↓
After 5 total attempts → fn_mark_proof_job_failed_v1() sets state = 'dead_letter'
↓
ops-alerter → Slack #ops-urgent (critical severity)
↓
Manual resolution via fn_resolve_proof_job_manually_v1()
Monitoring Architecture (SOP-MON-01 compliant)¶
pg_cron (every 5 minutes)
↓
pg_net (async HTTP)
↓
run-monitor Edge Function
↓
raw_ops.fn_check_proof_system_health_v2()
↓
monitoring_runs (execution log)
↓ (if threshold breached)
ops-alerter Edge Function
↓
Slack (#ops-alerts or #ops-urgent)
Database Schema¶
All tables in raw_ops schema. RLS enabled on all tables.
Core Tables¶
| Table | Purpose | RLS Policy |
|---|---|---|
batches |
Master batch records | Service role only + public read on RELEASED batches |
lab_results |
Lab test results per batch | Service role only |
formulations |
Formulation version tracking | Service role only |
proof_jobs |
Proof generation queue and state machine | Service role only |
batch_status_history |
Audit log for all batch status changes | Service role only |
proof_page_events |
Proof portal analytics | Anon INSERT + SELECT (public portal) |
proof_jobs Table¶
CREATE TABLE raw_ops.proof_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
batch_id UUID NOT NULL REFERENCES raw_ops.batches(id),
job_key TEXT NOT NULL UNIQUE,
state TEXT NOT NULL DEFAULT 'queued',
attempts INTEGER NOT NULL DEFAULT 0,
max_attempts INTEGER NOT NULL DEFAULT 5,
last_error TEXT,
error_category TEXT,
processing_duration_ms INTEGER,
-- Step completion tracking (partial completion detection)
qr_generated BOOLEAN NOT NULL DEFAULT false,
qr_uploaded BOOLEAN NOT NULL DEFAULT false,
label_generated BOOLEAN NOT NULL DEFAULT false,
label_uploaded BOOLEAN NOT NULL DEFAULT false,
email_sent BOOLEAN NOT NULL DEFAULT false,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
claimed_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
CONSTRAINT valid_state CHECK (state IN ('queued', 'claimed', 'done', 'failed', 'dead_letter', 'resolved_manual')),
CONSTRAINT valid_attempts CHECK (attempts >= 0 AND attempts <= max_attempts)
);
COMMENT ON TABLE raw_ops.proof_jobs IS 'Proof generation state machine. Each row represents one proof generation job for a batch. Processed by process-proof-jobs Edge Function via pg_cron.';
Processing Status Values¶
| Status | Meaning | Transition |
|---|---|---|
queued |
Awaiting processing | → claimed |
claimed |
Picked up by process-proof-jobs | → done or failed |
done |
All steps completed successfully | Terminal |
failed |
Processing failed, eligible for retry | → claimed (on next cycle) or dead_letter |
dead_letter |
5 attempts exhausted, manual intervention required | → resolved_manual or → queued (reset) |
resolved_manual |
Manually resolved via controlled function | Terminal |
formulations Table¶
CREATE TABLE raw_ops.formulations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
version TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
description TEXT,
effective_from DATE NOT NULL,
effective_until DATE,
shopify_page_url TEXT NOT NULL,
lab_certificate_pdf_url TEXT,
lab_name TEXT,
certificate_reference TEXT,
analysis_date DATE,
fediaf_verified BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE raw_ops.formulations IS 'Formulation versions with effective date ranges. Batches auto-link to the active formulation based on production date.';
Key Indexes¶
-- Proof job queue processing (critical for claim query performance)
CREATE INDEX idx_proof_jobs_claimable
ON raw_ops.proof_jobs (created_at ASC)
WHERE state IN ('queued', 'failed');
-- Proof job by batch (lookup during batch creation verification)
CREATE INDEX idx_proof_jobs_batch_id
ON raw_ops.proof_jobs (batch_id);
-- Proof job dead letter monitoring
CREATE INDEX idx_proof_jobs_dead_letter
ON raw_ops.proof_jobs (created_at DESC)
WHERE state = 'dead_letter';
-- Batch code lookup (used during production and customer scans)
CREATE INDEX idx_batches_batch_code
ON raw_ops.batches (batch_code);
-- Batch status filtering for public access and monitoring
CREATE INDEX idx_batches_status
ON raw_ops.batches (status);
-- Lab result lookups by batch
CREATE INDEX idx_lab_results_batch_id
ON raw_ops.lab_results (batch_id);
RLS Policies¶
-- All operational tables: RLS enabled, no policies = service role only, anon blocked
ALTER TABLE raw_ops.proof_jobs ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.formulations ENABLE ROW LEVEL SECURITY;
ALTER TABLE raw_ops.batch_status_history ENABLE ROW LEVEL SECURITY;
-- Batches: public read access for RELEASED batches only (proof portal)
ALTER TABLE raw_ops.batches ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Allow public read on released batches"
ON raw_ops.batches
FOR SELECT
TO anon
USING (status = 'RELEASED');
-- Proof page events: public insert + select (analytics from proof portal)
ALTER TABLE raw_ops.proof_page_events ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Allow anon insert proof events"
ON raw_ops.proof_page_events
FOR INSERT
TO anon
WITH CHECK (true);
CREATE POLICY "Allow anon read proof events"
ON raw_ops.proof_page_events
FOR SELECT
TO anon
USING (true);
RLS posture: With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables. The two exceptions are batches (public read for RELEASED only) and proof_page_events (public insert/read for portal analytics).
QR Code System¶
There are TWO types of QR codes in the Protocol Raw system with different purposes.
| QR Type | Location | Links To | Generated |
|---|---|---|---|
| Proof Verification QR | Product pouch labels | proof.protocolraw.co.uk/batch/{public_batch_id} |
Per-batch, during proof job |
| Transition Guide QR | In-box insert card | portal.protocolraw.co.uk?view=transition |
Static, bulk printed |
Proof Verification QR (On Pouches)¶
Batch-specific QR codes generated automatically during proof job processing.
Content: https://proof.protocolraw.co.uk/batch/{public_batch_id}
Storage: proof-assets/qr-codes/{batch_code}.png
Size on label: 25mm x 25mm (300 DPI)
Generation: Internal library (qrcode Deno module) inside process-proof-jobs Edge Function. No external API dependency.
Transition Guide QR (On Insert Card)¶
Universal QR code, same for all boxes.
Content: https://portal.protocolraw.co.uk?view=transition
Purpose: Onboarding. Personalised 10-day transition feeding plan.
Generation: Static. Not batch-specific. Same QR on all cards.
Design Principle¶
The insert card educates and onboards. The pouch QR verifies. These are separate concerns with separate touchpoints.
Physical Insert Cards¶
Insert cards are universal static cards, bulk printed, with identical content for every box. The proof_insert_pdf_url column is deprecated.
Card Purpose¶
- Onboarding new customers with transition guidance
- Explaining the proof verification system
- Driving portal engagement for personalised feeding plans
Card Design: Front Side (Transition)¶
Headline: "Your First 10 Days"
Content:
- Visual transition structure (Day 1-2: 25%, Day 3-4: 50%, etc.)
- QR code linking to portal.protocolraw.co.uk?view=transition
- CTA: "Scan for your personalised feeding plan"
- Safe handling reminder
Card Design: Back Side (Proof Education)¶
Headline: "Verified Safe, Batch by Batch"
Content: - Explains that every pouch has a QR code linking to lab results - Points customer to scan pouch QR (no separate proof QR on card) - CTA: "Scan any pouch to verify" - Brief explanation of independent lab testing
Card Specifications¶
| Attribute | Specification |
|---|---|
| Size | TBD by designer (A7 or business card format recommended) |
| QR Codes | ONE QR only (front side) linking to portal transition guide |
| Batch Code | NONE. Card is universal (same card for all boxes) |
| Print Run | Bulk print. Order in batches of 1,000+ (NOT generated per-batch) |
| Delivery | Stocked at packing location, one card per box |
| Brand Guidelines | Visual Identity Guide v2.4 (Forest Green, Cream, Espresso) |
PostgreSQL Functions¶
fn_create_batch_v1()¶
Purpose: All batch creation business logic. Called by create-batch Edge Function.
Responsibilities:
- Validate inputs (production_date, kg_produced)
- Generate batch code (PR-YYMMDD-NNN format, next sequential number for date)
- Look up active formulation for production date
- Look up COGS per kg from config
- Calculate expiry date (production + 12 months)
- INSERT batch record with status QA_HOLD
- Public batch ID generated by existing trigger (trigger_set_public_batch_id)
- Generate proof URL
- Log to ops_events
- Return batch record as JSONB
CREATE OR REPLACE FUNCTION raw_ops.fn_create_batch_v1(
p_production_date DATE,
p_kg_produced NUMERIC
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_batch_code TEXT;
v_seq INTEGER;
v_formulation_id UUID;
v_product_id UUID;
v_cogs_per_kg NUMERIC;
v_batch_id UUID;
v_public_batch_id TEXT;
v_proof_url TEXT;
v_expiry_date DATE;
BEGIN
-- Validate inputs
IF p_production_date IS NULL THEN
RAISE EXCEPTION 'production_date is required';
END IF;
IF p_kg_produced IS NULL OR p_kg_produced <= 0 THEN
RAISE EXCEPTION 'kg_produced must be positive';
END IF;
-- Generate sequential batch code for date
SELECT COALESCE(MAX(
CAST(SUBSTRING(batch_code FROM '\d{3}$') AS INTEGER)
), 0) + 1
INTO v_seq
FROM raw_ops.batches
WHERE batch_code LIKE 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-%';
v_batch_code := 'PR-' || TO_CHAR(p_production_date, 'YYMMDD') || '-' || LPAD(v_seq::TEXT, 3, '0');
-- Look up active formulation
SELECT id INTO v_formulation_id
FROM raw_ops.formulations
WHERE effective_from <= p_production_date
AND (effective_until IS NULL OR effective_until >= p_production_date)
ORDER BY effective_from DESC
LIMIT 1;
IF v_formulation_id IS NULL THEN
RAISE EXCEPTION 'No active formulation found for date %', p_production_date;
END IF;
-- Look up product (single SKU)
SELECT id INTO v_product_id
FROM raw_ops.products
LIMIT 1;
-- Look up COGS per kg from config
SELECT value::NUMERIC INTO v_cogs_per_kg
FROM raw_ops.config
WHERE key = 'default_cogs_per_kg';
-- Calculate expiry
v_expiry_date := p_production_date + INTERVAL '12 months';
-- Insert batch
INSERT INTO raw_ops.batches (
batch_code, produced_at, expiry_date, kg_produced,
product_id, formulation_id, status, cogs_per_kg_gbp
) VALUES (
v_batch_code, p_production_date, v_expiry_date, p_kg_produced,
v_product_id, v_formulation_id, 'QA_HOLD', v_cogs_per_kg
)
RETURNING id, public_batch_id INTO v_batch_id, v_public_batch_id;
-- Set proof URL (public_batch_id set by trigger_set_public_batch_id)
v_proof_url := 'https://proof.protocolraw.co.uk/batch/' || v_public_batch_id;
UPDATE raw_ops.batches
SET proof_url = v_proof_url
WHERE id = v_batch_id;
-- Audit log
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('batch', v_batch_id, 'INFO', 'Batch created via Ops Portal',
jsonb_build_object(
'batch_code', v_batch_code,
'public_batch_id', v_public_batch_id,
'kg_produced', p_kg_produced,
'production_date', p_production_date,
'formulation_id', v_formulation_id
)
);
RETURN jsonb_build_object(
'success', true,
'batch_id', v_batch_id,
'batch_code', v_batch_code,
'public_batch_id', v_public_batch_id,
'proof_url', v_proof_url,
'expiry_date', v_expiry_date
);
EXCEPTION WHEN OTHERS THEN
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('batch', NULL, 'ERROR', 'Batch creation failed: ' || SQLERRM,
jsonb_build_object('production_date', p_production_date, 'kg_produced', p_kg_produced)
);
RETURN jsonb_build_object('success', false, 'error', SQLERRM);
END;
$$;
fn_enqueue_proof_job()¶
Purpose: Trigger function. Creates a proof job record when a batch is inserted.
CREATE OR REPLACE FUNCTION raw_ops.fn_enqueue_proof_job()
RETURNS TRIGGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
INSERT INTO raw_ops.proof_jobs (batch_id, job_key, state)
VALUES (NEW.id, 'proof:batch:' || NEW.id::TEXT, 'queued');
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('proof_job', NEW.id, 'INFO', 'Proof job enqueued',
jsonb_build_object('batch_code', NEW.batch_code, 'batch_id', NEW.id)
);
RETURN NEW;
END;
$$;
CREATE TRIGGER trigger_enqueue_proof_job
AFTER INSERT ON raw_ops.batches
FOR EACH ROW
EXECUTE FUNCTION raw_ops.fn_enqueue_proof_job();
fn_claim_proof_jobs_v1()¶
Purpose: Atomically claim up to N queued or failed proof jobs for processing. Returns the exact rows claimed, deterministically, via UPDATE ... RETURNING. Uses FOR UPDATE SKIP LOCKED for concurrent safety. No secondary query needed.
CREATE OR REPLACE FUNCTION raw_ops.fn_claim_proof_jobs_v1(
p_limit INTEGER DEFAULT 10
)
RETURNS SETOF raw_ops.proof_jobs
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
BEGIN
RETURN QUERY
WITH claimable AS (
SELECT id
FROM raw_ops.proof_jobs
WHERE state IN ('queued', 'failed')
AND attempts < max_attempts
ORDER BY
CASE WHEN state = 'queued' THEN 0 ELSE 1 END, -- queued first, retries second
created_at ASC
LIMIT p_limit
FOR UPDATE SKIP LOCKED
)
UPDATE raw_ops.proof_jobs pj
SET
state = 'claimed',
claimed_at = now(),
updated_at = now()
FROM claimable c
WHERE pj.id = c.id
RETURNING pj.*;
END;
$$;
Concurrency guarantee: If two workers call this function simultaneously, FOR UPDATE SKIP LOCKED ensures each worker claims a disjoint set of jobs. No duplicate processing is possible. The RETURNING pj.* clause returns exactly the rows that were updated in this transaction, not an approximation.
fn_set_batch_qr_assets_v1()¶
Purpose: Update a batch record with QR image URL and proof URL after QR generation. Called by process-proof-jobs Edge Function after successful QR upload. Ensures batch asset mutations happen in PostgreSQL, not in Edge Function code.
CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_qr_assets_v1(
p_batch_id UUID,
p_qr_image_url TEXT,
p_proof_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_batch_code TEXT;
BEGIN
UPDATE raw_ops.batches
SET
qr_image_url = p_qr_image_url,
proof_url = p_proof_url,
updated_at = now()
WHERE id = p_batch_id
RETURNING batch_code INTO v_batch_code;
IF v_batch_code IS NULL THEN
RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
END IF;
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('batch', p_batch_id, 'INFO', 'QR assets set for batch ' || v_batch_code,
jsonb_build_object(
'batch_code', v_batch_code,
'qr_image_url', p_qr_image_url,
'proof_url', p_proof_url
)
);
RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;
fn_set_batch_label_asset_v1()¶
Purpose: Update a batch record with label PDF URL after label generation. Called by process-proof-jobs Edge Function after successful label upload.
CREATE OR REPLACE FUNCTION raw_ops.fn_set_batch_label_asset_v1(
p_batch_id UUID,
p_label_pdf_url TEXT
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_batch_code TEXT;
BEGIN
UPDATE raw_ops.batches
SET
label_pdf_url = p_label_pdf_url,
updated_at = now()
WHERE id = p_batch_id
RETURNING batch_code INTO v_batch_code;
IF v_batch_code IS NULL THEN
RETURN jsonb_build_object('success', false, 'error', 'Batch not found');
END IF;
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('batch', p_batch_id, 'INFO', 'Label PDF set for batch ' || v_batch_code,
jsonb_build_object('batch_code', v_batch_code, 'label_pdf_url', p_label_pdf_url)
);
RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;
fn_finalize_proof_job_v1()¶
Purpose: Mark a proof job as successfully completed. Update the batch record with generated asset URLs.
CREATE OR REPLACE FUNCTION raw_ops.fn_finalize_proof_job_v1(
p_job_id UUID,
p_qr_image_url TEXT,
p_label_pdf_url TEXT,
p_processing_duration_ms INTEGER
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_batch_id UUID;
v_batch_code TEXT;
BEGIN
-- Update proof job
UPDATE raw_ops.proof_jobs
SET
state = 'done',
completed_at = now(),
updated_at = now(),
processing_duration_ms = p_processing_duration_ms,
qr_generated = true,
qr_uploaded = true,
label_generated = true,
label_uploaded = true,
email_sent = true
WHERE id = p_job_id AND state = 'claimed'
RETURNING batch_id INTO v_batch_id;
IF v_batch_id IS NULL THEN
RETURN jsonb_build_object('success', false, 'error', 'Job not found or not in claimed state');
END IF;
-- Update batch with asset URLs
UPDATE raw_ops.batches
SET
qr_image_url = p_qr_image_url,
label_pdf_url = p_label_pdf_url,
updated_at = now()
WHERE id = v_batch_id
RETURNING batch_code INTO v_batch_code;
-- Audit log
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job completed successfully',
jsonb_build_object(
'batch_code', v_batch_code,
'job_id', p_job_id,
'duration_ms', p_processing_duration_ms,
'qr_image_url', p_qr_image_url,
'label_pdf_url', p_label_pdf_url
)
);
RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code);
END;
$$;
fn_mark_proof_job_failed_v1()¶
Purpose: Record a proof job failure. If max attempts reached, move to dead letter queue and trigger critical alert.
CREATE OR REPLACE FUNCTION raw_ops.fn_mark_proof_job_failed_v1(
p_job_id UUID,
p_error TEXT,
p_error_category TEXT DEFAULT 'unknown',
p_step_progress JSONB DEFAULT '{}'::jsonb
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_new_state TEXT;
v_attempts INTEGER;
v_max_attempts INTEGER;
v_batch_id UUID;
v_batch_code TEXT;
v_alert_needed BOOLEAN := false;
BEGIN
-- Increment attempts and determine new state
UPDATE raw_ops.proof_jobs
SET
attempts = attempts + 1,
last_error = p_error,
error_category = p_error_category,
updated_at = now(),
-- Preserve step progress so we know what completed before failure
qr_generated = COALESCE((p_step_progress->>'qr_generated')::boolean, qr_generated),
qr_uploaded = COALESCE((p_step_progress->>'qr_uploaded')::boolean, qr_uploaded),
label_generated = COALESCE((p_step_progress->>'label_generated')::boolean, label_generated),
label_uploaded = COALESCE((p_step_progress->>'label_uploaded')::boolean, label_uploaded),
email_sent = COALESCE((p_step_progress->>'email_sent')::boolean, email_sent)
WHERE id = p_job_id
RETURNING attempts, max_attempts, batch_id INTO v_attempts, v_max_attempts, v_batch_id;
IF v_batch_id IS NULL THEN
RETURN jsonb_build_object('success', false, 'error', 'Job not found');
END IF;
-- Determine state based on attempt count
IF v_attempts >= v_max_attempts THEN
v_new_state := 'dead_letter';
v_alert_needed := true;
ELSE
v_new_state := 'failed';
END IF;
UPDATE raw_ops.proof_jobs
SET state = v_new_state
WHERE id = p_job_id;
-- Get batch code for logging
SELECT batch_code INTO v_batch_code
FROM raw_ops.batches WHERE id = v_batch_id;
-- Audit log
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('proof_job', v_batch_id,
CASE WHEN v_new_state = 'dead_letter' THEN 'ERROR' ELSE 'EXCEPTION' END,
CASE WHEN v_new_state = 'dead_letter'
THEN 'Proof job moved to dead letter queue after ' || v_attempts || ' attempts'
ELSE 'Proof job failed (attempt ' || v_attempts || '/' || v_max_attempts || ')'
END,
jsonb_build_object(
'batch_code', v_batch_code,
'job_id', p_job_id,
'error', p_error,
'error_category', p_error_category,
'attempts', v_attempts,
'max_attempts', v_max_attempts,
'step_progress', p_step_progress
)
);
RETURN jsonb_build_object(
'success', true,
'new_state', v_new_state,
'attempts', v_attempts,
'max_attempts', v_max_attempts,
'alert_needed', v_alert_needed,
'alert_config', CASE
WHEN NOT v_alert_needed THEN NULL
ELSE jsonb_build_object(
'channel', 'ops-urgent',
'severity', 'critical',
'title', 'Proof Job Dead Letter: ' || v_batch_code,
'message', 'Proof generation failed after ' || v_attempts || ' attempts. Manual intervention required.',
'fields', jsonb_build_array(
jsonb_build_object('label', 'Batch', 'value', v_batch_code, 'inline', true),
jsonb_build_object('label', 'Attempts', 'value', v_attempts::TEXT, 'inline', true),
jsonb_build_object('label', 'Error', 'value', LEFT(p_error, 200), 'inline', false),
jsonb_build_object('label', 'Category', 'value', p_error_category, 'inline', true)
)
)
END
);
END;
$$;
fn_resolve_proof_job_manually_v1()¶
Purpose: Controlled resolution of dead-letter proof jobs with audit trail. Verifies that all expected batch assets exist before allowing resolution. Direct UPDATE statements are prohibited.
CREATE OR REPLACE FUNCTION raw_ops.fn_resolve_proof_job_manually_v1(
p_job_id UUID,
p_resolution_note TEXT,
p_skip_asset_check BOOLEAN DEFAULT false
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_old_state TEXT;
v_batch_id UUID;
v_batch_code TEXT;
v_qr_image_url TEXT;
v_label_pdf_url TEXT;
v_proof_url TEXT;
v_missing_assets TEXT[];
BEGIN
-- Only accept dead_letter jobs
SELECT state, batch_id INTO v_old_state, v_batch_id
FROM raw_ops.proof_jobs WHERE id = p_job_id;
IF v_old_state IS NULL THEN
RETURN jsonb_build_object('success', false, 'error', 'Job not found');
END IF;
IF v_old_state != 'dead_letter' THEN
RETURN jsonb_build_object('success', false, 'error',
'Only dead_letter jobs can be manually resolved. Current state: ' || v_old_state);
END IF;
-- Verify batch assets exist (unless explicitly skipped)
SELECT batch_code, qr_image_url, label_pdf_url, proof_url
INTO v_batch_code, v_qr_image_url, v_label_pdf_url, v_proof_url
FROM raw_ops.batches WHERE id = v_batch_id;
IF NOT p_skip_asset_check THEN
v_missing_assets := ARRAY[]::TEXT[];
IF v_qr_image_url IS NULL THEN v_missing_assets := v_missing_assets || 'qr_image_url'; END IF;
IF v_label_pdf_url IS NULL THEN v_missing_assets := v_missing_assets || 'label_pdf_url'; END IF;
IF v_proof_url IS NULL THEN v_missing_assets := v_missing_assets || 'proof_url'; END IF;
IF array_length(v_missing_assets, 1) > 0 THEN
RETURN jsonb_build_object(
'success', false,
'error', 'Cannot resolve: batch is missing assets. Use p_skip_asset_check := true to override.',
'missing_assets', to_jsonb(v_missing_assets),
'batch_code', v_batch_code
);
END IF;
END IF;
-- Resolve
UPDATE raw_ops.proof_jobs
SET state = 'resolved_manual', updated_at = now()
WHERE id = p_job_id;
-- Audit trail
INSERT INTO raw_ops.ops_events (entity_type, entity_id, kind, message, meta)
VALUES ('proof_job', v_batch_id, 'INFO', 'Proof job manually resolved',
jsonb_build_object(
'batch_code', v_batch_code,
'job_id', p_job_id,
'previous_state', v_old_state,
'resolution_note', p_resolution_note,
'asset_check_skipped', p_skip_asset_check
)
);
RETURN jsonb_build_object('success', true, 'batch_code', v_batch_code, 'resolved', true);
END;
$$;
Asset check behaviour: By default, the function refuses to resolve a job if the batch is missing qr_image_url, label_pdf_url, or proof_url. This prevents false completions where the operator marks a job done but the batch still lacks proof assets. If the operator has genuinely completed the steps outside the system (e.g., manually uploaded files), they can pass p_skip_asset_check := true with an explanation in p_resolution_note.
Edge Functions¶
create-batch¶
Purpose: Thin wrapper. Receives Ops Portal request, calls fn_create_batch_v1(), returns result.
Endpoint: POST /functions/v1/create-batch
Auth: Cloudflare Access is the authentication boundary. All requests to ops.protocolraw.co.uk must pass Google OAuth via Cloudflare Access before reaching any backend. The Edge Function itself does not perform caller identity verification. It trusts that any request reaching it has been authenticated by Cloudflare Access. The function uses the service_role key to call PostgreSQL, which is standard for backend functions.
JWT Verification: Enabled (Supabase JWT from Ops Portal session)
Flow:
1. Parse request body (production_date, kg_produced)
2. Call raw_ops.fn_create_batch_v1() via service_role client
3. Return JSONB result to caller
This function does NOT orchestrate proof generation, label generation, or email sending. Those are handled asynchronously by the proof job queue.
process-proof-jobs¶
Purpose: Thin wrapper for external I/O. Claims queued proof jobs, executes external steps (QR, PDF, email), finalises or fails each job.
Endpoint: POST /functions/v1/process-proof-jobs
Auth: Internal only. Validates X-Worker-Secret header against PROOF_WORKER_SECRET env var. Rejects requests without a valid secret.
JWT Verification: Disabled (internal calls only, compensated by shared secret)
Threat model: This function is reachable at a public URL with JWT disabled. Without compensating controls, any caller who discovers the project URL and function name could invoke it. The shared secret header (X-Worker-Secret) prevents unauthorised invocation. The function is also idempotent and non-destructive: it claims already-queued jobs and generates assets for batches that already exist. Worst-case unauthorised invocation would generate proof assets that would be generated anyway on the next legitimate cycle.
Validation logic (first lines of Edge Function):
const workerSecret = Deno.env.get("PROOF_WORKER_SECRET");
const headerSecret = req.headers.get("X-Worker-Secret");
if (!workerSecret || headerSecret !== workerSecret) {
return new Response(JSON.stringify({ error: "Unauthorized" }), { status: 401 });
}
Flow per invocation:
1. Validate X-Worker-Secret header
2. Call fn_claim_proof_jobs_v1(10) → get up to 10 claimed job rows
3. For each claimed job:
a. Generate QR code (internal qrcode library)
- Content: proof URL from batch record
- Output: PNG buffer, 300x300px
- Timeout: 5 seconds
- On failure: record step_progress, jump to fail handler
b. Upload QR to Supabase Storage
- Path: proof-assets/qr-codes/{batch_code}.png
- Timeout: 10 seconds
- Retries: 2 with backoff (1s, 3s)
c. Call fn_set_batch_qr_assets_v1(batch_id, qr_url, proof_url)
- Updates batch record via PostgreSQL (not direct table write)
d. Generate label PDF (Browserless API)
- Render HTML template with batch data + QR image
- Size: 100mm x 150mm, 300 DPI
- Timeout: 15 seconds
- Retries: 3 with exponential backoff (1s, 3s, 9s)
e. Upload label PDF to Supabase Storage
- Path: proof-assets/batch-labels/{batch_code}.pdf
- Timeout: 10 seconds
- Retries: 2 with backoff (1s, 3s)
f. Call fn_set_batch_label_asset_v1(batch_id, label_url)
- Updates batch record via PostgreSQL (not direct table write)
g. Send co-packer email (Resend API)
- Timeout: 10 seconds
- Retries: 3 with exponential backoff (1s, 3s, 9s)
h. Call fn_finalize_proof_job_v1() with asset URLs and duration
4. On any step failure after retries exhausted:
- Call fn_mark_proof_job_failed_v1() with error, category, and step_progress
- If alert_config returned (dead letter), call ops-alerter
- Continue to next job (do not abort batch)
DB/Edge Function boundary: The Edge Function performs external I/O only (QR library, Storage upload, Browserless HTTP, Resend HTTP). All database mutations go through named PostgreSQL functions. The Edge Function never directly UPDATEs raw_ops.batches or raw_ops.proof_jobs.
Label Specifications¶
Generated by process-proof-jobs using Browserless API for HTML-to-PDF rendering.
Label size: 100mm x 150mm (standard for 500g pouches) Format: PDF, 300 DPI, print-ready Template: HTML/CSS rendered by Browserless
Label content (top to bottom): 1. Brand header: "PROTOCOL RAW" (Montserrat Bold, 14pt), tagline, proof QR (25mm x 25mm, top right) 2. Product information: "PROTOCOL RAW COMPLETE", description, net weight 3. Batch traceability: Batch code (prominent), production date, best before date 4. Ingredients list: Full list in descending order by weight 5. Footer: "Made in the UK", company details, "Verified safe, batch by batch"
Co-packer Email¶
Sent by process-proof-jobs via Resend API.
From: production@protocolraw.co.uk
To: Co-packer email (from COPACKER_EMAIL secret)
Subject: Protocol Raw Batch {batch_code} - Label Artwork
Body: Batch details (code, date, kg, product, formulation) + download link for label PDF + printing instructions
Delivery: Resend API with retry logic (see Retry Policy)
proof-page¶
Purpose: Serves public proof pages. Customer-facing.
Endpoint: GET /functions/v1/proof-page/*
Auth: Public (unauthenticated). No login required to view proof pages.
JWT Verification: Disabled (public endpoint)
Domain: proof.protocolraw.co.uk (Cloudflare DNS → Supabase Edge Function)
Data access model: The proof-page Edge Function uses the service_role key to query raw_ops.batches (bypassing RLS). This is necessary because the function renders different HTML depending on batch status:
| Batch Status | Page Rendered |
|---|---|
RELEASED |
Full results: lab data, share buttons, formulation info |
QA_HOLD |
Status message: "This batch is undergoing safety testing. Results will be published once verified." |
REJECTED |
Status message: "This batch did not pass our safety requirements." |
| Not found | 404: "Batch not found." |
The anon RLS policy (Allow public read on released batches) applies to the separate search API endpoint (/api/search), which uses the anon key and correctly returns only RELEASED batch metadata. The server-rendered proof pages use service_role because they need to distinguish between QA_HOLD, REJECTED, and non-existent batches to render appropriate status pages.
Routes:
| Route | Method | Purpose |
|---|---|---|
/ |
GET | Portal homepage with search box and latest released batch card |
/batch/{public_batch_id} |
GET | Real batch proof page (RELEASED / QA_HOLD / REJECTED / 404) |
/batch/sample |
GET | Sample batch page with hardcoded data; canonical pre-purchase URL |
/sample |
GET | Legacy alias for /batch/sample |
/batch/PR-00LATEST |
GET | 302 redirect to most recent released batch via fn_get_latest_released_batch_v1(); falls back to /batch/sample. Reached via Cloudflare Worker proof-latest-redirect (Supabase custom-domain gateway blocks /batch/latest) |
/api/search |
GET | Batch lookup by code; returns {found, status, public_batch_id} for RELEASED only |
/api/event |
POST | Client-side share/click event ingestion |
See SOP-PROOF-01 for complete route documentation, page design, and caching strategy.
Retry Policy¶
Retries operate on two tiers:
Tier 1 — Within a single Edge Function invocation. The process-proof-jobs worker retries transient HTTP failures (Storage uploads, Browserless, Resend) using a simple retry loop with exponential backoff before giving up on the current invocation. This handles brief network glitches without consuming a job-level attempt.
Tier 2 — Across pg_cron cycles. If all Tier 1 retries for a step are exhausted, fn_mark_proof_job_failed_v1() increments attempts and sets state = 'failed'. The next pg_cron cycle (30 seconds later) reclaims the job and tries again. After max_attempts (5) Tier 2 failures, the job moves to dead_letter.
Retry helper (used by all HTTP steps in Edge Function):
async function withRetry(
fn: () => Promise<Response>,
maxRetries: number,
timeoutMs: number,
backoffMs: number[] // e.g. [1000, 3000, 9000]
): Promise<Response> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
const res = await fn();
if (res.ok) return res;
if (attempt < maxRetries) {
await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
continue;
}
throw new Error(`HTTP ${res.status}: ${await res.text()}`);
} finally {
clearTimeout(timer);
}
} catch (err) {
if (attempt >= maxRetries) throw err;
await new Promise(r => setTimeout(r, backoffMs[attempt] ?? 1000));
}
}
throw new Error("Retries exhausted");
}
Per-dependency configuration:
| Dependency | Timeout | Tier 1 Retries | Backoff | Tier 2 Max Attempts |
|---|---|---|---|---|
| QR generation (internal) | 5s | 0 (library call, no HTTP) | N/A | 5 |
| Supabase Storage upload | 10s | 2 | 1s, 3s | 5 |
| Browserless PDF | 15s | 3 | 1s, 3s, 9s | 5 |
| Resend email | 10s | 3 | 1s, 3s, 9s | 5 |
Error categories:
| Category | Meaning |
|---|---|
qr_generation |
QR code creation failed (extremely rare) |
storage_upload |
Supabase Storage write failed |
pdf_generation |
Browserless rendering failed |
email_delivery |
Resend API failed |
db_error |
Database operation failed |
unknown |
Unclassified error |
Total time to dead letter: 5 Tier 2 attempts x 30-second pg_cron cycles = ~2.5 minutes minimum. With Tier 1 backoff within each cycle, typically 5-10 minutes for persistent failures.
pg_cron Jobs¶
process-proof-jobs¶
SELECT cron.schedule(
'process-proof-jobs',
'30 seconds',
$$
SELECT net.http_post(
url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/process-proof-jobs',
headers := jsonb_build_object(
'Content-Type', 'application/json',
'apikey', current_setting('app.settings.anon_key'),
'X-Worker-Secret', current_setting('app.settings.proof_worker_secret')
),
body := '{}'::jsonb
);
$$
);
Note:
PROOF_WORKER_SECRETmust be set both as a Supabase Edge Function secret and as a Supabase database setting (app.settings.proof_worker_secret) so that pg_cron can pass it in the header. Generate a random 64-character hex string:openssl rand -hex 32.
monitor-proof-system-health¶
SELECT cron.schedule(
'monitor-proof-system-health',
'*/5 * * * *',
$$
SELECT net.http_post(
url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/run-monitor',
headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
body := '{"check": "proof_system_health"}'::jsonb
);
$$
);
Monitoring¶
fn_check_proof_system_health_v2()¶
Purpose: Comprehensive proof system health check. Detects stuck jobs, dead letters, partial completions, and processing degradation.
Registration: Add proof_system_health: "fn_check_proof_system_health_v2" to run-monitor Edge Function dispatch map.
CREATE OR REPLACE FUNCTION raw_ops.fn_check_proof_system_health_v2()
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_run_id uuid;
v_start_time timestamptz;
v_duration_ms integer;
v_alert_needed boolean := false;
v_alert_channel text := 'ops-alerts';
v_alert_severity text := 'warning';
v_alert_title text;
v_alert_message text;
v_alert_fields jsonb := '[]'::jsonb;
v_stuck_claimed integer;
v_dead_letter_count integer;
v_failed_count integer;
v_partial_completion_count integer;
v_avg_duration_ms numeric;
v_queue_depth integer;
v_completed_24h integer;
BEGIN
v_start_time := clock_timestamp();
INSERT INTO raw_ops.monitoring_runs (check_name, status)
VALUES ('proof_system_health', 'running')
RETURNING id INTO v_run_id;
-- 1. Stuck claimed jobs (claimed > 5 minutes ago, not finalised)
SELECT COUNT(*) INTO v_stuck_claimed
FROM raw_ops.proof_jobs
WHERE state = 'claimed'
AND claimed_at < now() - INTERVAL '5 minutes';
-- 2. Dead letter jobs (unresolved)
SELECT COUNT(*) INTO v_dead_letter_count
FROM raw_ops.proof_jobs
WHERE state = 'dead_letter';
-- 3. Failed jobs awaiting retry
SELECT COUNT(*) INTO v_failed_count
FROM raw_ops.proof_jobs
WHERE state = 'failed';
-- 4. Partial completions (some steps done, job not complete)
SELECT COUNT(*) INTO v_partial_completion_count
FROM raw_ops.proof_jobs
WHERE state IN ('failed', 'dead_letter')
AND (qr_generated = true OR label_generated = true)
AND (qr_uploaded = false OR label_uploaded = false OR email_sent = false);
-- 5. Average processing duration (last 24h)
SELECT COALESCE(AVG(processing_duration_ms), 0) INTO v_avg_duration_ms
FROM raw_ops.proof_jobs
WHERE state = 'done'
AND completed_at > now() - INTERVAL '24 hours';
-- 6. Current queue depth
SELECT COUNT(*) INTO v_queue_depth
FROM raw_ops.proof_jobs
WHERE state IN ('queued', 'failed')
AND attempts < max_attempts;
-- 7. Completed in last 24h
SELECT COUNT(*) INTO v_completed_24h
FROM raw_ops.proof_jobs
WHERE state = 'done'
AND completed_at > now() - INTERVAL '24 hours';
-- Determine alert level (priority order)
IF v_dead_letter_count > 0 THEN
v_alert_needed := true;
v_alert_channel := 'ops-urgent';
v_alert_severity := 'critical';
v_alert_title := 'Proof System: Dead Letter Jobs';
v_alert_message := v_dead_letter_count || ' proof job(s) in dead letter queue. Manual intervention required.';
ELSIF v_stuck_claimed > 0 THEN
v_alert_needed := true;
v_alert_channel := 'ops-urgent';
v_alert_severity := 'critical';
v_alert_title := 'Proof System: Stuck Jobs';
v_alert_message := v_stuck_claimed || ' proof job(s) stuck in claimed state for >5 minutes.';
ELSIF v_partial_completion_count > 0 THEN
v_alert_needed := true;
v_alert_title := 'Proof System: Partial Completions';
v_alert_message := v_partial_completion_count || ' proof job(s) with partial step completion.';
ELSIF v_failed_count > 3 THEN
v_alert_needed := true;
v_alert_title := 'Proof System: Elevated Failures';
v_alert_message := v_failed_count || ' proof job(s) in failed state awaiting retry.';
END IF;
IF v_alert_needed THEN
v_alert_fields := jsonb_build_array(
jsonb_build_object('label', 'Dead Letter', 'value', v_dead_letter_count::TEXT, 'inline', true),
jsonb_build_object('label', 'Stuck', 'value', v_stuck_claimed::TEXT, 'inline', true),
jsonb_build_object('label', 'Failed', 'value', v_failed_count::TEXT, 'inline', true),
jsonb_build_object('label', 'Partial', 'value', v_partial_completion_count::TEXT, 'inline', true),
jsonb_build_object('label', 'Queue', 'value', v_queue_depth::TEXT, 'inline', true),
jsonb_build_object('label', 'Avg Duration', 'value', ROUND(v_avg_duration_ms)::TEXT || 'ms', 'inline', true)
);
END IF;
v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;
UPDATE raw_ops.monitoring_runs
SET
status = 'success',
duration_ms = v_duration_ms,
alert_triggered = v_alert_needed,
result_summary = 'Queue: ' || v_queue_depth || ', Done 24h: ' || v_completed_24h ||
', Failed: ' || v_failed_count || ', DLQ: ' || v_dead_letter_count,
result_json = jsonb_build_object(
'stuck_claimed', v_stuck_claimed,
'dead_letter_count', v_dead_letter_count,
'failed_count', v_failed_count,
'partial_completion_count', v_partial_completion_count,
'avg_duration_ms', ROUND(v_avg_duration_ms),
'queue_depth', v_queue_depth,
'completed_24h', v_completed_24h
)
WHERE id = v_run_id;
RETURN jsonb_build_object(
'success', true,
'run_id', v_run_id,
'check_name', 'proof_system_health',
'duration_ms', v_duration_ms,
'alert_config', CASE
WHEN NOT v_alert_needed THEN NULL
ELSE jsonb_build_object(
'channel', v_alert_channel,
'severity', v_alert_severity,
'title', v_alert_title,
'message', v_alert_message,
'fields', v_alert_fields
)
END
);
EXCEPTION WHEN OTHERS THEN
v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;
UPDATE raw_ops.monitoring_runs
SET status = 'error', duration_ms = v_duration_ms, error_message = SQLERRM
WHERE id = v_run_id;
RETURN jsonb_build_object(
'success', false, 'run_id', v_run_id,
'check_name', 'proof_system_health', 'error', SQLERRM, 'duration_ms', v_duration_ms
);
END;
$$;
Alert Thresholds¶
| Condition | Severity | Channel | Response |
|---|---|---|---|
| Dead letter jobs > 0 | Critical | #ops-urgent | Immediate. Proof generation blocked for batch. |
| Stuck claimed jobs > 0 (>5 min) | Critical | #ops-urgent | Edge Function may be hanging. Check logs. |
| Partial completion > 0 | Warning | #ops-alerts | Some steps OK, others failed. Verify assets. |
| Failed jobs > 3 | Warning | #ops-alerts | Elevated retries. Check external service health. |
| Avg duration > 60,000ms | Warning | #ops-alerts | Degradation. Check Browserless/Resend latency. |
Metabase Dashboard: Proof System Health¶
Collection: "SOP-PROOF-00 - Operations"
Saved Questions:
- Job Queue Status (real-time): state distribution, last 7 days
- Processing Duration Trend (7-day): avg/max ms per day, jobs completed
- Dead Letter and Failure History (30-day): DLQ, retrying, resolved counts by date
- Partial Completion Detail (current): jobs with mixed step completion flags
- Batch Creation Volume (12-week): weekly counts by status (RELEASED, QA_HOLD, REJECTED)
Customer Access Methods¶
1. Pouch QR Code Scan (Primary)¶
Customer scans QR on any pouch label. Mobile browser opens https://proof.protocolraw.co.uk/batch/{public_batch_id}. Shows full lab results (if RELEASED) or testing-in-progress message (if QA_HOLD).
2. Insert Card QR Scan (Onboarding)¶
Customer scans QR on insert card. Opens https://portal.protocolraw.co.uk?view=transition. Shows personalised 10-day transition feeding plan.
3. Direct Link¶
Included in delivery SMS, order confirmation email, and customer portal.
4. Manual Batch Lookup¶
Search at proof.protocolraw.co.uk. See SOP-PROOF-01.
Timeline¶
| Phase | What Happens |
|---|---|
| Day 0 (Batch Creation) | QR code, label PDF, co-packer email. Batch in QA_HOLD. |
| Days 0-3 | Product manufactured with printed labels. Samples at lab. |
| Day 3-5 | Lab results arrive. SOP-LAB-01 releases batch. |
| Day 3-5+ | Pouch QR links to live proof page with results. |
Storage¶
Buckets¶
| Path | Contents | Access |
|---|---|---|
proof-assets/qr-codes/ |
QR code PNGs | Public read |
proof-assets/batch-labels/ |
Pouch label PDFs | Authenticated only |
proof-assets/lab-reports/ |
Lab certificate PDFs (SOP-LAB-01) | Authenticated only |
Label PDF access: Labels contain production details. Authenticated access only to prevent competitive intelligence exposure at scale. Co-packer receives label via email link. Ops team accesses via portal.
Storage Projections (Annual at Scale)¶
| Asset | Per Item | Annual Volume | Total |
|---|---|---|---|
| QR codes | 10 KB | ~25,000 | 250 MB |
| Label PDFs | 500 KB | ~25,000 | 12.5 GB |
| Lab reports | 2 MB | ~25,000 | 50 GB |
| Total | ~63 GB/year |
Required Secrets¶
| Secret | Purpose | Used By |
|---|---|---|
PROOF_WORKER_SECRET |
Shared secret for internal worker auth | process-proof-jobs (validates), pg_cron (sends via header) |
BROWSERLESS_API_KEY |
Label PDF generation | process-proof-jobs |
RESEND_API_KEY |
Co-packer email delivery | process-proof-jobs |
COPACKER_EMAIL |
Co-packer email address | process-proof-jobs |
SLACK_OPS_ALERTS_WEBHOOK |
Standard alerts | ops-alerter |
SLACK_OPS_URGENT_WEBHOOK |
Critical alerts | ops-alerter |
PROOF_WORKER_SECRET setup: Must be configured in two places: (1) Supabase Edge Function secrets, and (2) Supabase database settings as app.settings.proof_worker_secret so pg_cron can read it. Generate with openssl rand -hex 32.
Rotation: Review quarterly. Rotate immediately if compromised. Zero-downtime rotation via Supabase secrets update + redeploy.
Batch Code System¶
Dual ID System¶
Internal: PR-YYMMDD-NNN (sequential, for ops/warehouse/co-packer)
Public: PR-XXXXXXXX (SHA-256 hash, for QR codes/proof URLs/customers)
Public ID protects competitive intelligence (no date or volume information). Generated by trigger_set_public_batch_id on batch INSERT.
Operational Procedures¶
Creating a Batch¶
When: Co-packer confirms production. Where: ops.protocolraw.co.uk. Time: ~30 seconds.
- Enter production date + kg produced
- Click "Create Batch"
- Confirm: batch code, proof URL in response
- Within 60 seconds: QR, label, email should complete (check proof_jobs)
Weekly Review (10 min)¶
- Check Metabase dashboard
- Review any dead letter or failed jobs
- Scan 2-3 random QR codes to verify proof pages
Monthly Review (15 min)¶
- Review weekly trends
- Verify label print quality with co-packer
- Check storage costs (~$1-2/month)
- Confirm SSL auto-renewal (Cloudflare)
Troubleshooting¶
Batch Creation Fails¶
Check create-batch Edge Function logs. Check ops_events for errors. Common: no active formulation, no product record, invalid date.
Proof Job Stuck¶
SELECT id, state, attempts, claimed_at, last_error, qr_generated, label_generated, email_sent
FROM raw_ops.proof_jobs
WHERE state IN ('queued', 'claimed') AND created_at < now() - INTERVAL '5 minutes';
If claimed >10 min: reset to queued. If queued with 0 attempts: check pg_cron job is active.
Partial Completion (Label Missing)¶
Check Browserless API key. Check HTML template. Reset job for retry:
UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE batch_id = (SELECT id FROM raw_ops.batches WHERE batch_code = 'PR-YYMMDD-NNN');
Dead Letter Job¶
Check error details. Fix root cause. Reset for retry or resolve manually:
-- Reset for retry
UPDATE raw_ops.proof_jobs
SET state = 'queued', attempts = 0, claimed_at = NULL, last_error = NULL, updated_at = now()
WHERE id = 'job-uuid';
-- Or resolve manually
SELECT raw_ops.fn_resolve_proof_job_manually_v1('job-uuid', 'Resolution note here');
Proof Page Not Found¶
Check batch status. QA_HOLD = expected (testing). REJECTED = expected (failed). Verify RLS policy exists.
Compliance and Audit Trail¶
- Traceability: Every pouch traceable to batch via QR
- Transparency: Lab results publicly accessible after release
- Immutability: Batch data changes only via defined functions
- Audit Trail: All changes logged to
batch_status_historyandops_events - Recall: See SOP-LAB-01 for recall procedure
Data Retention¶
| Data | Retention | Reason |
|---|---|---|
| Batches | Indefinite | Regulatory |
| Lab Results | Indefinite | Regulatory |
| QR Codes | Indefinite | Customer access |
| Label PDFs | 2 years | Operational |
| Proof Jobs | 90 days | Monitoring |
| ops_events | Indefinite | Audit trail |
Scaling¶
| Phase | Batches/Day | Performance | Action |
|---|---|---|---|
| A (0-1K customers) | 1-3 | <60s per job | Monitor metrics |
| B (1-10K) | 5-15 | <60s per job | No changes expected |
| C (10-100K) | 30-70 | Parallel via SKIP LOCKED | Bulk creation UI, monitor Browserless concurrency |
| 100K+ | 70+ | Review needed | Upgrade Supabase, read replicas |
No claims of readiness at scale are made until validated with production data.
All Objects¶
| Object | Type | Purpose |
|---|---|---|
raw_ops.proof_jobs |
Table | State machine |
raw_ops.formulations |
Table | Formulation versions |
raw_ops.proof_page_events |
Table | Portal analytics |
raw_ops.fn_create_batch_v1() |
Function | Batch creation logic |
raw_ops.fn_enqueue_proof_job() |
Trigger function | Enqueue on INSERT |
raw_ops.fn_claim_proof_jobs_v1() |
Function | Atomic claim (SKIP LOCKED) |
raw_ops.fn_set_batch_qr_assets_v1() |
Function | Set QR + proof URL on batch |
raw_ops.fn_set_batch_label_asset_v1() |
Function | Set label PDF URL on batch |
raw_ops.fn_finalize_proof_job_v1() |
Function | Mark done, update batch |
raw_ops.fn_mark_proof_job_failed_v1() |
Function | Failure + DLQ |
raw_ops.fn_resolve_proof_job_manually_v1() |
Function | DLQ resolution (with asset check) |
raw_ops.fn_check_proof_system_health_v2() |
Function | SOP-MON-01 monitor |
trigger_enqueue_proof_job |
Trigger | AFTER INSERT on batches |
trigger_set_public_batch_id |
Trigger | Existing, unchanged |
create-batch |
Edge Function | Thin wrapper |
process-proof-jobs |
Edge Function | Queue processor (shared secret auth) |
proof-page |
Edge Function | Public pages (service_role for status rendering) |
process-proof-jobs |
pg_cron job | Every 30 seconds |
monitor-proof-system-health |
pg_cron job | Every 5 minutes |
idx_proof_jobs_claimable |
Index | Queue performance |
idx_proof_jobs_batch_id |
Index | Batch lookup |
idx_proof_jobs_dead_letter |
Index | DLQ monitoring |
Allow public read on released batches |
RLS Policy | Search API access |
PROOF_WORKER_SECRET |
Secret | Internal worker auth |
Contact and Escalation¶
Operations Owner: Anton (Founder) Technical Owner: Anton (Systems Architect)
Escalation: 1. Check Slack #ops-alerts and #ops-urgent 2. Check Metabase dashboard 3. Review Edge Function logs 4. Query proof_jobs for diagnostics 5. Check external service status (Resend, Browserless, Cloudflare) 6. If unresolved after 30 min: pause batch creation
Version History¶
| Version | Date | Changes | Author |
|---|---|---|---|
| SOP-PROOF-00 v1.0 | 2026-03-20 | Complete rewrite replacing SOP-0R v4.2. Architecture rebuilt to Supabase-native pattern. Named execution chain, state machine, retry/DLQ, SOP-MON-01 monitoring, partial completion detection, RLS policies, auth model, QR internalised, honest scaling claims. | Protocol Raw Operations |
Superseded: SOP-0R v4.2 (2026-01-21) — fully replaced.
End of SOP-PROOF-00 v1.0
Last reviewed: 2026-03-20 Next review: 2026-06-20 System status: ✅ Production Ready
Protocol Raw — Verified safe, batch by batch