Skip to content

SOP-MON-03: Allocation Health Monitor v1.0

Detects PAID orders stuck without inventory allocation

Document ID: SOP-MON-03-v1.0
Version: 1.0
Status: 🔵 Ready to Deploy
Created: 2026-02-17
Owner: Protocol Raw Operations
Review Date: 2026-05-17


Purpose

Automatically detects PAID orders where one or more order_items have not been allocated to a RELEASED batch, and alerts the operations team via Slack.

The Gap This Closes: The Order Ingestion SLO monitor (SOP-MON-02) catches orders that fail to arrive in Supabase from Shopify. But an order can ingest successfully and then sit unallocated indefinitely if no RELEASED inventory is available. Without this monitor, the customer has paid but nothing ships — and nobody is alerted.

Trigger: pg_cron scheduled job (every 15 minutes)
Processing Time: <100ms
Human Intervention: Only when unallocated orders detected
Automation Coverage: 100%


Scope

Applies to: All PAID orders in raw_ops.orders with incomplete allocations

Monitoring Architecture: Follows SOP-MON-01 pattern exactly:

pg_cron (*/15 * * * *)
    ↓
pg_net (async HTTP)
    ↓
Edge Function: run-monitor
    ↓
PostgreSQL: raw_ops.fn_check_allocation_health_v2()
    ↓
monitoring_runs (logging)
    ↓
Edge Function: ops-alerter [if unallocated orders found]
    ↓
Slack: #ops-alerts or #ops-urgent

Related SOPs: - SOP-MON-01: Monitoring & Alerting Architecture (pattern) - SOP-MON-02: Order Ingestion SLO Monitor (upstream — order must exist first) - SOP-01: Batch Creation & Lab-to-Release (allocation fires on RELEASED) - SOP-INV-01: Inventory Control (allocation logic, stock levels) - SOP-0X: 3PL Order Export (downstream — only exports allocated orders)


Alert Logic

What Triggers an Alert

An order is flagged when ALL of the following are true: 1. orders.status = 'PAID' (payment confirmed) 2. At least one order_item has no matching record in allocations 3. The order has been PAID for longer than the grace period (30 minutes)

The 30-minute grace period accounts for: - Normal allocation trigger timing (fires on batch RELEASED) - Edge cases where a batch releases moments after an order arrives - Avoids false alarms during normal batch release → allocation cascade

Severity Levels

Age Since Payment Severity Channel Meaning
30 min – 4 hours WARN #ops-alerts Likely waiting for next batch release
4 – 24 hours HIGH #ops-urgent Stock gap — may need to expedite batch
> 24 hours PAGE #ops-urgent Customer paid >1 day ago, no allocation

Alert Fields

Field Description
Orders Affected Count of unallocated orders
Oldest Age of oldest unallocated order (hours)
Total kg Needed Sum of kg required across all unallocated items
Available Stock (kg) Total RELEASED inventory available
Severity WARN / HIGH / PAGE
Order IDs Shopify order IDs (first 10)

Database Components

1. Core Check Function

CREATE OR REPLACE FUNCTION raw_ops.fn_check_allocation_health_v2()
RETURNS jsonb
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_run_id uuid;
  v_start_time timestamptz;
  v_duration_ms integer;
  v_unallocated_count integer;
  v_oldest_hours numeric;
  v_total_kg_needed numeric;
  v_available_stock_kg numeric;
  v_max_severity text;
  v_order_ids text[];
  v_alert_channel text;
  v_alert_severity text;
BEGIN
  v_start_time := clock_timestamp();

  -- Create monitoring run record
  INSERT INTO raw_ops.monitoring_runs (check_name, status)
  VALUES ('allocation_health', 'running')
  RETURNING id INTO v_run_id;

  -- Find PAID orders with incomplete allocations older than 30 min grace period
  WITH unallocated_orders AS (
    SELECT 
      o.id AS order_id,
      o.shopify_order_id,
      o.created_at AS paid_at,
      EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600 AS age_hours,
      SUM(oi.qty * p.kg) AS kg_needed,
      CASE
        WHEN EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600 > 24 THEN 'PAGE'
        WHEN EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600 > 4 THEN 'HIGH'
        ELSE 'WARN'
      END AS severity
    FROM raw_ops.orders o
    JOIN raw_ops.order_items oi ON o.id = oi.order_id
    JOIN raw_ops.products p ON oi.product_id = p.id
    LEFT JOIN raw_ops.allocations a ON oi.id = a.order_item_id
    WHERE o.status = 'PAID'
      AND a.id IS NULL
      AND o.created_at < NOW() - INTERVAL '30 minutes'
    GROUP BY o.id, o.shopify_order_id, o.created_at
  ),
  available_stock AS (
    SELECT COALESCE(SUM(b.qty_on_hand - b.qty_reserved), 0) AS total_available_kg
    FROM raw_ops.batches b
    WHERE b.status = 'RELEASED'
      AND (b.qty_on_hand - b.qty_reserved) > 0
  )
  SELECT 
    COUNT(*)::integer,
    ROUND(MAX(uo.age_hours)::numeric, 1),
    ROUND(COALESCE(SUM(uo.kg_needed), 0)::numeric, 1),
    ast.total_available_kg,
    MAX(uo.severity),
    ARRAY_AGG(uo.shopify_order_id::text ORDER BY uo.age_hours DESC)
  INTO 
    v_unallocated_count,
    v_oldest_hours,
    v_total_kg_needed,
    v_available_stock_kg,
    v_max_severity,
    v_order_ids
  FROM unallocated_orders uo
  CROSS JOIN available_stock ast;

  -- Handle NULL (no unallocated orders)
  IF v_unallocated_count IS NULL OR v_unallocated_count = 0 THEN
    v_unallocated_count := 0;
    v_oldest_hours := 0;
    v_total_kg_needed := 0;
    v_max_severity := NULL;
    v_order_ids := ARRAY[]::text[];

    SELECT COALESCE(SUM(b.qty_on_hand - b.qty_reserved), 0)
    INTO v_available_stock_kg
    FROM raw_ops.batches b
    WHERE b.status = 'RELEASED'
      AND (b.qty_on_hand - b.qty_reserved) > 0;
  END IF;

  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;

  -- Determine alert channel
  IF v_max_severity IN ('PAGE', 'HIGH') THEN
    v_alert_channel := 'ops-urgent';
    v_alert_severity := 'critical';
  ELSIF v_max_severity = 'WARN' THEN
    v_alert_channel := 'ops-alerts';
    v_alert_severity := 'warning';
  END IF;

  -- Update monitoring run
  UPDATE raw_ops.monitoring_runs
  SET 
    status = 'success',
    duration_ms = v_duration_ms,
    result_summary = CASE 
      WHEN v_unallocated_count = 0 THEN 'All PAID orders allocated'
      ELSE v_unallocated_count || ' orders unallocated (oldest: ' || v_oldest_hours || 'h)'
    END,
    result_json = jsonb_build_object(
      'unallocated_count', v_unallocated_count,
      'oldest_hours', v_oldest_hours,
      'kg_needed', v_total_kg_needed,
      'available_stock_kg', v_available_stock_kg,
      'max_severity', v_max_severity,
      'order_ids', v_order_ids
    )
  WHERE id = v_run_id;

  -- Return result
  RETURN jsonb_build_object(
    'success', true,
    'run_id', v_run_id,
    'check_name', 'allocation_health',
    'unallocated_count', v_unallocated_count,
    'duration_ms', v_duration_ms,
    'alert_config', CASE 
      WHEN v_unallocated_count = 0 THEN NULL
      ELSE jsonb_build_object(
        'channel', v_alert_channel,
        'severity', v_alert_severity,
        'title', '📦 Unallocated Orders Detected',
        'message', v_unallocated_count || ' PAID order(s) have no inventory allocation',
        'fields', jsonb_build_array(
          jsonb_build_object('label', 'Orders Affected', 'value', v_unallocated_count::text, 'inline', true),
          jsonb_build_object('label', 'Oldest', 'value', v_oldest_hours || ' hours', 'inline', true),
          jsonb_build_object('label', 'Severity', 'value', v_max_severity, 'inline', true),
          jsonb_build_object('label', 'kg Needed', 'value', v_total_kg_needed || ' kg', 'inline', true),
          jsonb_build_object('label', 'Available Stock', 'value', v_available_stock_kg || ' kg', 'inline', true),
          jsonb_build_object('label', 'Order IDs', 'value', array_to_string(v_order_ids[1:10], ', '), 'inline', false)
        ),
        'source_check', 'allocation_health'
      )
    END
  );

EXCEPTION WHEN OTHERS THEN
  v_duration_ms := EXTRACT(EPOCH FROM (clock_timestamp() - v_start_time)) * 1000;

  UPDATE raw_ops.monitoring_runs
  SET status = 'error', duration_ms = v_duration_ms, error_message = SQLERRM
  WHERE id = v_run_id;

  RETURN jsonb_build_object(
    'success', false,
    'run_id', v_run_id,
    'check_name', 'allocation_health',
    'error', SQLERRM,
    'duration_ms', v_duration_ms
  );
END;
$$;

2. Required Indexes

-- Fast lookup: PAID orders without allocations
-- This is the critical query path for the monitor
CREATE INDEX IF NOT EXISTS idx_orders_paid_unallocated
ON raw_ops.orders(created_at DESC)
WHERE status = 'PAID';

-- Fast join: order_items to allocations (LEFT JOIN performance)
CREATE INDEX IF NOT EXISTS idx_allocations_order_item_id
ON raw_ops.allocations(order_item_id);

-- Fast lookup: available RELEASED inventory
CREATE INDEX IF NOT EXISTS idx_batches_released_available
ON raw_ops.batches(status)
WHERE status = 'RELEASED';

3. Verification View (Optional — for Metabase/Manual Checks)

CREATE OR REPLACE VIEW raw_ops.v_unallocated_orders AS
SELECT 
  o.id AS order_id,
  o.shopify_order_id,
  o.order_number,
  c.email AS customer_email,
  c.first_name,
  o.created_at AS paid_at,
  ROUND(EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600, 1) AS hours_waiting,
  SUM(oi.qty * p.kg) AS kg_needed,
  p.sku,
  CASE
    WHEN EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600 > 24 THEN 'PAGE'
    WHEN EXTRACT(EPOCH FROM (NOW() - o.created_at)) / 3600 > 4 THEN 'HIGH'
    ELSE 'WARN'
  END AS severity
FROM raw_ops.orders o
JOIN raw_ops.order_items oi ON o.id = oi.order_id
JOIN raw_ops.products p ON oi.product_id = p.id
JOIN raw_ops.customers c ON o.customer_id = c.id
LEFT JOIN raw_ops.allocations a ON oi.id = a.order_item_id
WHERE o.status = 'PAID'
  AND a.id IS NULL
  AND o.created_at < NOW() - INTERVAL '30 minutes'
GROUP BY o.id, o.shopify_order_id, o.order_number, c.email, c.first_name, o.created_at, p.sku
ORDER BY o.created_at ASC;

GRANT SELECT ON raw_ops.v_unallocated_orders TO analytics_user;

Edge Function Update

run-monitor Dispatch Map

Add allocation_health to the MONITOR_FUNCTIONS map in run-monitor/index.ts:

const MONITOR_FUNCTIONS: Record<string, string> = {
  // Existing monitors
  order_ingestion_slo: "fn_check_order_ingestion_slo_v2",
  outbox_health: "fn_check_outbox_health_v2",
  stock_levels: "fn_check_stock_levels_v2",
  lab_sla: "fn_check_lab_sla_v2",
  daily_snapshot: "fn_daily_snapshot_v2",
  working_capital: "fn_check_working_capital_v2",
  refund_health: "fn_check_refund_health_v2",
  subscription_health: "fn_check_subscription_health_v2",
  subscription_daily: "fn_check_subscription_daily_v2",

  // NEW: Allocation health (SOP-MON-03)
  allocation_health: "fn_check_allocation_health_v2",
};

Deploy:

supabase functions deploy run-monitor


pg_cron Configuration

Schedule Job

SELECT cron.schedule(
  'monitor-allocation-health',
  '*/15 * * * *',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/run-monitor',
    headers := '{"Content-Type": "application/json", "apikey": "YOUR_ANON_KEY"}'::jsonb,
    body := '{"check": "allocation_health"}'::jsonb
  );
  $$
);

Verify Job

SELECT jobid, jobname, schedule, active
FROM cron.job 
WHERE jobname = 'monitor-allocation-health';

Testing

Test 1: No Unallocated Orders (Healthy State)

-- Should return unallocated_count = 0
SELECT raw_ops.fn_check_allocation_health_v2();

Expected: alert_config: null (no alert fired)

Test 2: Simulate Unallocated Order

-- Create a test order older than 30 min grace period
INSERT INTO raw_ops.orders (customer_id, total_inc_vat, status, created_at)
VALUES (
  (SELECT id FROM raw_ops.customers LIMIT 1),
  89.00,
  'PAID',
  NOW() - INTERVAL '2 hours'
)
RETURNING id;

-- Add an order_item with NO allocation
INSERT INTO raw_ops.order_items (order_id, product_id, qty)
VALUES (
  '<order_id_from_above>',
  (SELECT id FROM raw_ops.products WHERE sku = 'STARTER-8KG' LIMIT 1),
  1
);

-- Run monitor — should detect the unallocated order
SELECT raw_ops.fn_check_allocation_health_v2();

-- Clean up test data
DELETE FROM raw_ops.order_items WHERE order_id = '<order_id_from_above>';
DELETE FROM raw_ops.orders WHERE id = '<order_id_from_above>';

Expected: alert_config populated with channel: ops-alerts, severity: warning

Test 3: Verify Monitoring Run Logged

SELECT * FROM raw_ops.monitoring_runs
WHERE check_name = 'allocation_health'
ORDER BY run_at DESC LIMIT 5;

Test 4: Edge Function End-to-End

curl -X POST 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/run-monitor' \
  -H 'Content-Type: application/json' \
  -H 'apikey: YOUR_ANON_KEY' \
  -d '{"check": "allocation_health"}'

Alert Response Runbook

WARN (30 min – 4 hours)

Likely Cause: No RELEASED inventory available. Normal if a batch is in QA_HOLD.

Actions: 1. Check if batch is pending lab results: SELECT * FROM raw_ops.batches WHERE status = 'QA_HOLD' ORDER BY created_at DESC; 2. Check lab SLA: Is the batch within expected turnaround? 3. If lab results are overdue → chase lab (see SOP-01) 4. If no batch in pipeline → contact co-packer for expedited production

HIGH (4 – 24 hours)

Likely Cause: Stock gap — no batch available or lab results significantly delayed.

Actions: 1. All WARN actions above 2. Check demand forecast: SELECT * FROM raw_ops.v_unallocated_orders; 3. Quantify the gap: How much kg is needed vs what's in QA_HOLD? 4. Contact co-packer for production ETA 5. Consider communicating delay to affected customers if >12 hours

PAGE (> 24 hours)

Likely Cause: Serious supply chain issue — customer has been waiting >1 day.

Actions: 1. All HIGH actions above 2. Immediate co-packer escalation (phone, not email) 3. Prepare customer communication: proactive email explaining delay 4. If batch in QA_HOLD with results overdue: escalate lab directly 5. Consider manual allocation if partial stock available (ship what you can) 6. Log incident in ops_events with ALLOCATION_ESCALATION type


Deployment Checklist

  • [ ] Run SQL: Create fn_check_allocation_health_v2() function
  • [ ] Run SQL: Create indexes (3 indexes)
  • [ ] Run SQL: Create v_unallocated_orders view
  • [ ] Update run-monitor Edge Function with allocation_health mapping
  • [ ] Deploy: supabase functions deploy run-monitor
  • [ ] Run SQL: Schedule pg_cron job
  • [ ] Test: Manual function call (Test 1)
  • [ ] Test: Simulated unallocated order (Test 2)
  • [ ] Test: Edge Function end-to-end (Test 4)
  • [ ] Verify: monitoring_runs logging (Test 3)
  • [ ] Verify: Slack alert appears in correct channel

Monitor Summary

Property Value
Check Name allocation_health
Function raw_ops.fn_check_allocation_health_v2()
Schedule Every 15 minutes (*/15 * * * *)
Grace Period 30 minutes (avoids false alarms during batch release)
WARN 30 min – 4 hours → #ops-alerts
HIGH 4 – 24 hours → #ops-urgent
PAGE > 24 hours → #ops-urgent
Logging raw_ops.monitoring_runs (check_name = 'allocation_health')
View raw_ops.v_unallocated_orders (for Metabase / manual checks)

Version History

Version Date Author Changes
1.0 2026-02-17 Anton + Claude Initial release — closes Gap 2 from system review

Document Owner: Protocol Raw Operations
Classification: Monitoring Infrastructure
Review Cycle: Quarterly
Next Review: 2026-05-17


Protocol Raw — Verified safe, batch by batch.