SOP-ORD-01: Order Ingestion v5.1¶

Automated webhook capture, processing, retry, and monitoring for Shopify order data.

Document ID: SOP-ORD-01-v5.1 Version: 5.1 Status: ✅ Production Ready Last Updated: 2026-03-21 Owner: Protocol Raw Operations Replaces: SOP-00-v4.0 (renamed from SOP-00 to SOP-ORD-01) Review Date: 2026-06-18

Key Changes in v5.1¶

✅ Order Ingestion SLO monitor absorbed from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes)
✅ pg_cron examples updated to fn_invoke_monitor pattern (SOP-MON-01 v1.1)
✅ MON-01 reference updated to v1.1

Key Changes in v5.0¶

✅ Monitoring rebuilt to SOP-MON-01 architecture (pg_cron → pg_net → run-monitor → fn_check → ops-alerter → Slack)
✅ All 4 Make.com polling scenarios deactivated, replaced by native Supabase monitors
✅ DLQ manual resolution replaced with controlled fn_resolve_webhook_manually_v1() function and resolved_manual status
✅ Retry worker updated with LIMIT 100 and FOR UPDATE SKIP LOCKED for bounded, concurrent-safe processing
✅ RLS enabled on all 6 core tables (service_role bypasses, anon blocked)
✅ All credentials removed from documentation
❌ Removed "production ready for 100k+" claims pending real-world validation at scale

Purpose¶

Capture 100% of Shopify order webhooks, process them into structured operational data, automatically retry failures, and monitor system health with zero manual intervention.

Scope¶

This SOP covers the complete order ingestion pipeline from Shopify webhook to confirmed database record. It does not cover downstream processing (allocation, fulfilment, dispatch), which are handled by SOP-LAB-01 onwards.

Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release - SOP-DLV-01: Courier Watchdog - SOP-ORD-02: 3PL Order Export - SOP-ORD-03: Dispatch Ingestion - SOP-MON-01: Monitoring and Alerting Architecture (master pattern)

Architecture¶

Shopify (order/create webhook)
    ↓
shopify-webhook-handler (Edge Function — HMAC validation, thin wrapper)
    ↓
raw_ops.process_order_webhook_txn() (PostgreSQL — all business logic)
    ↓
webhook_inbox + customers + orders + order_items (database tables)
    ↓ (on failure)
retry_failed_webhooks() (pg_cron → pg_net → Edge Function → DB function)
    ↓ (after 5 failures)
Dead Letter Queue (processing_status = 'dlq')
    ↓ (manual intervention)
fn_resolve_webhook_manually_v1() (controlled resolution with audit trail)

Monitoring Architecture (SOP-MON-01 compliant)¶

pg_cron (every 5 min / daily 09:00)
    ↓
pg_net (async HTTP)
    ↓
run-monitor (Edge Function — generic router)
    ↓
fn_check_webhook_health_v2() / fn_check_webhook_daily_digest_v2()
    ↓
monitoring_runs (execution log)
    ↓ (if threshold breached)
ops-alerter (Edge Function)
    ↓
Slack (#ops-alerts / #ops-urgent / #daily-ops)

Database Schema¶

All tables in raw_ops schema. RLS enabled on all tables. Service role bypasses RLS. Anon key has no access.

Core Tables¶

Table	Purpose	Key Indexes
`webhook_inbox`	All incoming webhooks with processing status	`idx_webhook_inbox_retry_queue` (partial), `idx_webhook_inbox_status`, `idx_webhook_inbox_shopify_order` (partial)
`customers`	Customer records upserted from webhook data	`email`, `shopify_customer_id`
`orders`	Order headers, one per Shopify order	`shopify_order_id`, `ordered_at`, `export_state`
`order_items`	Line items per order (many-to-one with orders)	`idx_order_items_order_id`, `idx_order_items_product` (partial)
`ops_events`	Operational audit log for all mutations	`entity_type`, `entity_id`
`monitoring_runs`	Execution log for all scheduled checks	`check_name`, `run_at`

RLS Posture¶

Table	RLS Enabled	Policies	Access Model
`webhook_inbox`	✅ Yes	None (no anon access)	Service role only
`customers`	✅ Yes	None	Service role only
`orders`	✅ Yes	None	Service role only
`order_items`	✅ Yes	None	Service role only
`ops_events`	✅ Yes	None	Service role only
`monitoring_runs`	✅ Yes	None	Service role only

No policies are defined. With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables that are never queried from the client.

Processing Status Values¶

Status	Meaning	Transition
`pending`	Received, awaiting processing	→ `success` or `failed`
`success`	Processed successfully	Terminal
`failed`	Processing failed, eligible for retry	→ `success`, `failed` (re-attempt), or `dlq`
`retrying`	Currently being retried	→ `success` or `failed`
`dlq`	5 attempts exhausted, manual intervention required	→ `resolved_manual` only
`resolved_manual`	Manually resolved via controlled function	Terminal

Key Fields: webhook_inbox¶

Field	Purpose
`processing_status`	Current state (see above)
`processing_attempts`	Retry count (max 5)
`processing_duration_ms`	Performance tracking
`last_error`	Error message for debugging
`error_category`	Classification of error type
`shopify_order_id`	Link to Shopify order
`received_at`	Timestamp for retry ordering

Performance Indexes¶

Critical for scale. Do not remove.

-- Retry worker optimisation (partial index)
CREATE INDEX idx_webhook_inbox_retry_queue
ON raw_ops.webhook_inbox (received_at DESC)
WHERE processing_status IN ('pending', 'failed', 'retrying');

-- Status monitoring
CREATE INDEX idx_webhook_inbox_status
ON raw_ops.webhook_inbox (processing_status, created_at DESC);

-- Shopify order lookup
CREATE INDEX idx_webhook_inbox_shopify_order
ON raw_ops.webhook_inbox (shopify_order_id)
WHERE shopify_order_id IS NOT NULL;

-- Order items join optimisation
CREATE INDEX idx_order_items_order_id
ON raw_ops.order_items (order_id);

-- Product analysis
CREATE INDEX idx_order_items_product
ON raw_ops.order_items (product_id)
WHERE product_id IS NOT NULL;

Performance impact at scale: - Retry worker query: ~10ms at 1M rows (vs ~30s without indexes) - Order detail queries: sub-10ms - Monitoring view queries: sub-100ms

Indexes are maintained automatically by PostgreSQL. Monitor sizes via pg_stat_user_indexes.

Edge Function: shopify-webhook-handler¶

Endpoint: https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler

Purpose: Receive Shopify order webhooks, validate HMAC signature, delegate processing to PostgreSQL.

Processing Flow¶

Validate Shopify HMAC signature using SHOPIFY_WEBHOOK_SECRET
Store raw webhook in webhook_inbox (status: pending)
Call raw_ops.process_order_webhook_txn() which:
Upserts customer data
Creates order record
Creates order items
Mark webhook as success or failed
Always return HTTP 200 to Shopify (prevents Shopify's own retry loop)

Error Handling¶

All errors caught and logged to webhook_inbox.last_error
Errors classified via error_category
Failed webhooks picked up by retry worker automatically

Environment Variables¶

Set in Supabase Dashboard → Edge Functions → Secrets: - SHOPIFY_WEBHOOK_SECRET — Webhook validation key from Shopify

Deployment¶

supabase functions deploy shopify-webhook-handler

Performance Targets¶

Metric	Target
P95 processing time	< 100ms
Cold start	< 100ms
Availability	Scales horizontally via Supabase

Retry Worker¶

Function: raw_ops.retry_failed_webhooks() Schedule: Every 5 minutes via pg_cron Invocation: pg_cron → pg_net → retry-failed-webhooks Edge Function → DB function

Logic¶

Select failed webhooks: LIMIT 100, FOR UPDATE SKIP LOCKED (bounded, concurrent-safe)
Retry with exponential backoff:
Attempt 1: Immediate
Attempt 2: 1 minute wait
Attempt 3: 5 minutes wait
Attempt 4: 15 minutes wait
Attempt 5: 30 minutes wait
After 5 failures: Move to DLQ (processing_status = 'dlq')
Total time to DLQ: 51 minutes maximum

Retry Query¶

SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('pending', 'failed', 'retrying')
ORDER BY received_at DESC
LIMIT 100
FOR UPDATE SKIP LOCKED;

Uses idx_webhook_inbox_retry_queue partial index. Query time ~10ms at 1M rows.

Deployment¶

supabase functions deploy retry-failed-webhooks

pg_cron Job¶

SELECT cron.schedule(
  'retry-failed-webhooks',
  '*/5 * * * *',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/retry-failed-webhooks',
    headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
    body := '{}'::jsonb
  );
  $$
);

Dead Letter Queue (DLQ)¶

When Webhooks Enter DLQ¶

After 5 failed processing attempts, webhooks are moved to processing_status = 'dlq'. Every DLQ entry triggers a critical alert to #ops-urgent via the webhook health monitor.

Resolution Process¶

DLQ webhooks must be resolved using the controlled resolution function. Direct UPDATE statements are prohibited.

-- Resolve a DLQ webhook with audit trail
SELECT raw_ops.fn_resolve_webhook_manually_v1(
  p_webhook_id := 'uuid-here',
  p_resolution_note := 'Duplicate order, original processed successfully as order #1234'
);

Function behaviour: - Only accepts webhooks with processing_status = 'dlq' (rejects all other statuses) - Sets status to resolved_manual (not success, preserving data integrity) - Logs resolution to ops_events with note and previous status - Returns JSON confirmation or error

Common DLQ Scenarios¶

Scenario	Resolution
Malformed Shopify data	Fix data upstream, resolve with note
Duplicate order ID	Confirm original processed, resolve as duplicate
Database constraint violation	Fix constraint or data, reprocess if needed
Edge function bug	Deploy fix, manually reprocess webhook payload

Monitoring¶

Monitor 1: Order Ingestion SLO¶

Function: raw_ops.fn_check_order_ingestion_slo_v2() Schedule: Every 2 minutes Registered in: run-monitor Edge Function as order_ingestion_slo

SLO: Every order.created webhook logged to raw_ops.ops_events must have a corresponding row in raw_ops.orders within 5 minutes of receipt.

Lookback window: 20 minutes

Thresholds:

Condition	Severity	Channel
Order age 5-15 minutes without ingestion	`warning`	#ops-alerts
Order age >15 minutes without ingestion	`critical`	#ops-urgent

Alert fields: - Orders Affected (count) - Oldest (minutes) - Severity - Order IDs (Shopify IDs)

Source logic: Compares ops_events (webhook received, kind = 'INFO', entity_type = 'order') against orders table (ingested) within 20-minute lookback window.

Required indexes:

CREATE INDEX IF NOT EXISTS idx_ops_events_order_info
ON raw_ops.ops_events(kind, created_at DESC)
WHERE kind = 'INFO' AND entity_type = 'order';

CREATE INDEX IF NOT EXISTS idx_orders_shopify_order_id
ON raw_ops.orders(shopify_order_id);

pg_cron job:

SELECT cron.schedule(
  'monitor-order-ingestion-slo',
  '*/2 * * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('order_ingestion_slo'); $$
);

Monitor 2: Webhook Health Check¶

Function: raw_ops.fn_check_webhook_health_v2() Schedule: Every 5 minutes Registered in: run-monitor Edge Function as webhook_health

Checks and thresholds:

Check	Threshold	Severity	Channel
Retry queue depth	> 100 pending webhooks	Warning	#ops-alerts
Oldest pending webhook	> 30 minutes old	Warning	#ops-alerts
New DLQ entries	Any in last 10 minutes	Critical	#ops-urgent
P95 processing latency	> 5,000ms	Warning	#ops-alerts

pg_cron job:

SELECT cron.schedule(
  'monitor-webhook-health',
  '*/5 * * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('webhook_health'); $$
);

Monitor 3: Daily Webhook Digest¶

Function: raw_ops.fn_check_webhook_daily_digest_v2() Schedule: Daily at 09:00 UTC Channel: #daily-ops (always fires, not threshold-gated)

Reports: - 24-hour success, failure, and DLQ counts - Success rate percentage - Average, P95, and max processing latency

pg_cron job:

SELECT cron.schedule(
  'monitor-webhook-daily-digest',
  '0 9 * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('webhook_daily_digest'); $$
);

Monitoring Views¶

These SQL views remain available for ad-hoc investigation and Metabase dashboards.

View	Purpose
`raw_ops.webhook_health`	24-hour processing metrics (counts, latency percentiles, success rate)
`raw_ops.retry_queue_depth`	Current retry queue status (pending count, oldest age, breakdown)
`raw_ops.webhook_processing_by_hour`	Hourly stats for last 7 days (volume, success rate, latency trends)
`raw_ops.error_summary`	Error patterns by category (types, frequency, sample messages)
`raw_ops.alert_retry_queue_backup`	Returns rows only if queue depth > 100 or oldest > 30 min
`raw_ops.alert_slow_processing`	Returns rows only if P95 > 5,000ms or max > 10,000ms

Metabase Dashboard: Webhook Operations¶

Status: ⏳ NOT YET BUILT. Dashboard specification defined, implementation pending. Priority: Build after lifecycle emails, before seed deck. This dashboard provides investor-grade operational credibility (99%+ ingestion success rate, sub-100ms latency, zero DLQ incidents).

Required questions: 1. Success rate trend (7-day rolling) 2. Queue depth over time 3. DLQ count by day 4. P95 latency trend 5. Error category breakdown (pie chart) 6. Retry resolution rate (retried → success vs retried → DLQ) 7. Oldest pending webhook age 8. Daily volume trend

Health Indicators¶

State	Success Rate	Queue Depth	P95 Latency	DLQ/Day
🟢 Healthy	> 99%	0-10	< 1s	0
🟡 Warning	95-99%	10-50	1-3s	1-2
🔴 Critical	< 95%	> 50	> 3s	3+

Shopify Configuration¶

Location: Shopify Admin → Settings → Notifications → Webhooks

Setting	Value
Event	Order creation
Format	JSON
URL	`https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler`
API Version	2024-10 (or latest stable)

Webhook secret stored in: - Supabase: Edge Functions → Secrets → SHOPIFY_WEBHOOK_SECRET - Password manager (backup)

Database Functions¶

Core Functions¶

Function	Called By	Purpose
`raw_ops.process_order_webhook_txn(webhook_data JSONB)`	shopify-webhook-handler Edge Function	Process webhook into customer, order, and order_items within a transaction
`raw_ops.retry_failed_webhooks()`	retry-failed-webhooks Edge Function (via pg_cron)	Retry failed webhooks with exponential backoff, bounded batch of 100, `FOR UPDATE SKIP LOCKED`
`raw_ops.fn_resolve_webhook_manually_v1(p_webhook_id UUID, p_resolution_note TEXT)`	Operator via SQL console	Resolve DLQ webhooks with audit trail. Only accepts `dlq` status. Sets to `resolved_manual`
`raw_ops.fn_check_webhook_health_v2()`	run-monitor Edge Function (via pg_cron)	Check queue depth, DLQ entries, P95 latency. Returns alert_config if thresholds breached
`raw_ops.fn_check_webhook_daily_digest_v2()`	run-monitor Edge Function (via pg_cron)	Generate 24-hour digest with counts, rates, and latency metrics

Supporting Functions¶

SELECT routine_name, routine_type
FROM information_schema.routines
WHERE routine_schema = 'raw_ops'
AND routine_name LIKE '%webhook%'
ORDER BY routine_name;

Alert Response Procedures¶

Retry Queue Backup (#ops-alerts, warning)¶

Trigger: Pending webhooks > 100 or oldest pending > 30 minutes.

Investigation: 1. Check retry worker cron is running: Supabase Dashboard → Database → Cron Jobs → verify retry-failed-webhooks shows recent executions 2. Check database logs: Supabase Dashboard → Logs → Database → look for retry_failed_webhooks() errors 3. Check queue manually: SELECT * FROM raw_ops.retry_queue_depth;

Common causes: Retry worker cron failed (restart it), database connection pool exhausted (check pg_stat_activity), systematic webhook processing error (check raw_ops.error_summary).

Resolution: If cron stuck, manually trigger retry worker. If systematic error, fix root cause then trigger retry. If database overload, investigate slow queries.

DLQ Critical (#ops-urgent, critical)¶

Trigger: Any new DLQ entry in the last 10 minutes.

Investigation: 1. Query the specific webhook: SELECT * FROM raw_ops.webhook_inbox WHERE processing_status = 'dlq' ORDER BY last_error_at DESC LIMIT 5; 2. Review last_error, error_category, and raw webhook payload

Common causes: Malformed Shopify data, database constraint violation, edge function timeout.

Resolution: Fix underlying issue. Resolve via controlled function:

SELECT raw_ops.fn_resolve_webhook_manually_v1(
  'webhook-uuid-here',
  'Description of what was wrong and how it was resolved'
);

Order Ingestion SLO Breach (#ops-alerts warning / #ops-urgent critical)¶

Trigger: Orders received via webhook but not ingested within 5 minutes (warning) or 15 minutes (critical).

Investigation: 1. Check Edge Function logs for shopify-webhook-handler in Supabase Dashboard 2. Check webhook inbox for stuck or failed webhooks:

   SELECT id, shopify_order_id, processing_status, attempts, created_at
   FROM raw_ops.webhook_inbox
   WHERE processing_status IN ('failed', 'dlq', 'pending')
   ORDER BY created_at DESC
   LIMIT 20;

3. Check Shopify webhook delivery status (Settings → Notifications → Webhooks) 4. Check retry worker cron:

   SELECT * FROM cron.job_run_details
   WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'retry-failed-webhooks')
   ORDER BY start_time DESC LIMIT 5;

5. Check current violations: SELECT * FROM raw_ops.check_order_ingestion_slo();

Common causes: Edge Function crash or deployment issue, Shopify HMAC secret rotation, database connection pool exhaustion, webhook stuck in retry loop (check attempts column, DLQ after 5 failures).

Resolution: If webhooks are in DLQ status, resolve using raw_ops.fn_resolve_webhook_manually_v1() per the DLQ resolution procedure above.

Slow Processing (#ops-alerts, warning)¶

Trigger: P95 processing latency > 5,000ms.

Investigation: 1. Check edge function logs: Supabase Dashboard → Edge Functions → Logs 2. Check database performance: SELECT * FROM pg_stat_activity WHERE state = 'active'; 3. Verify indexes: SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';

Common causes: Table bloat (run VACUUM ANALYZE), indexes not being used, connection pool exhaustion, unusually large orders.

Resolution: If table bloat, run VACUUM ANALYZE raw_ops.webhook_inbox;. If indexes unused, run ANALYZE. If persistent, scale Supabase instance.

Data Archival Strategy¶

When to start: Month 3+ or when webhook_inbox exceeds 100k rows.

Archive Successful Webhooks (monthly)¶

-- Create archive table (one time)
CREATE TABLE IF NOT EXISTS raw_ops.webhook_inbox_archive (
  LIKE raw_ops.webhook_inbox INCLUDING ALL
);

-- Move processed data older than 90 days
INSERT INTO raw_ops.webhook_inbox_archive
SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
  AND created_at < NOW() - INTERVAL '90 days';

DELETE FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
  AND created_at < NOW() - INTERVAL '90 days';

VACUUM ANALYZE raw_ops.webhook_inbox;

Archive: success and resolved_manual (processing complete). Retain indefinitely: failed and dlq (debugging and analysis).

Expected table sizes at steady state: - Active table: < 500k rows - Archive growth: ~150k rows/month at 10k customers

Credentials and Access¶

All credentials stored in password manager. No credentials appear in this document.

System	Access
Supabase project	`znfjpllsiuyezqlneqzr` — Owner: Protocol Raw operations account
Shopify webhook secret	Supabase Edge Functions → Secrets → `SHOPIFY_WEBHOOK_SECRET`
Slack channels	#ops-alerts, #ops-urgent, #daily-ops — via ops-alerter Edge Function

Deactivated Make.com Scenarios (v4.0 legacy)¶

The following Make.com scenarios were replaced by native Supabase monitoring in v5.0. They remain in Make.com in deactivated state as fallback reference.

Webhook Failure Alert (was every 5 min → replaced by fn_check_webhook_health_v2)
DLQ Critical Alert (was every 5 min → replaced by fn_check_webhook_health_v2)
Daily Webhook Summary (was daily 09:00 → replaced by fn_check_webhook_daily_digest_v2)
Hourly Performance Check (was hourly → replaced by fn_check_webhook_health_v2)

Maintenance¶

Automated (no manual work)¶

Retry worker processes failures every 5 minutes
Webhook health monitor checks every 5 minutes
Daily digest fires at 09:00 UTC
All executions logged to monitoring_runs

Monthly¶

Review error categories and update handling logic if new patterns emerge
Check index usage: SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';

Quarterly (Month 3+)¶

Run archival script (see Data Archival Strategy above)
Review and vacuum tables if needed

As Needed¶

Update Shopify API version (annually)
Redeploy Edge Functions after code changes
Scale Supabase instance at growth thresholds

Scaling Triggers¶

Threshold	Action
10,000 customers	Review monthly metrics, confirm health indicators are green
50,000 customers	Implement archival strategy, monitor index sizes
100,000 customers	Consider read replicas for analytics queries
500,000+ customers	Partition `webhook_inbox` by month
1,000,000+ customers	Separate analytics database

Outstanding Items¶

These items are identified gaps that do not block production operation but are required for full operational maturity.

Must Do (before seed deck)¶

Item	Detail	Effort	Owner
Metabase dashboard	Build the 8 questions defined in the Monitoring section. Provides investor-grade proof of operational reliability	2-3 hours (Claude Code)	Anton
~~pg_cron job rename~~	~~Rename `shopify-webhook-retry` → `ord-01-webhook-retry` and monitor jobs to align with new SOP taxonomy~~	✅ Done 2026-03-18	Anton
Deactivate legacy Make.com scenarios	Toggle OFF the 4 monitoring scenarios replaced in v5.0. Verify native monitors are healthy for 48h first	10 min (manual)	Anton
Rotate service role key	Key was exposed in SOP-00 v4.0. Treat as compromised. Regenerate, update Make.com connections, password manager	30 min (manual)	Anton

Should Do (Phase A)¶

Item	Detail	Effort
~~Monitoring heartbeat~~	~~Add a check that verifies monitoring runs completed in the last hour. Catches silent monitoring failures.~~ Deployed as `fn_check_monitoring_heartbeat_v2()` + `monitor-heartbeat` pg_cron job (every 15 min)	✅ Done 2026-03-18
Alert delivery retry	If ops-alerter fails to post to Slack, currently the `monitoring_runs` table logs `alert_triggered = true` but the message is lost. Add a retry mechanism or fallback channel (e.g. email via Customer.io)	1-2 hours

Parked (Phase B)¶

Item	Detail	Trigger
Role segmentation	Replace universal service_role with scoped roles (ingestion_role, monitoring_role, analytics_read_role). No security benefit while solo founder; becomes relevant with first technical hire or contractor	First technical hire
Operational review cadence	Define weekly review ritual, alert response SLAs, owner per metric. Currently data is available but not embedded in operational behaviour	Post-launch, once there is real data to review

Deployment Checklist¶

Database Objects¶

[x] webhook_inbox, customers, orders, order_items tables
[x] All 5 performance indexes deployed
[x] 6 monitoring views created
[x] process_order_webhook_txn() function
[x] retry_failed_webhooks() function (with LIMIT 100, FOR UPDATE SKIP LOCKED)
[x] fn_resolve_webhook_manually_v1() function
[x] fn_check_webhook_health_v2() monitoring function
[x] fn_check_webhook_daily_digest_v2() monitoring function
[x] RLS enabled on all 6 core tables

Edge Functions¶

[x] shopify-webhook-handler deployed
[x] retry-failed-webhooks deployed
[x] run-monitor updated with webhook_health and webhook_daily_digest checks
[x] ops-alerter available for Slack delivery

Scheduled Jobs¶

[x] retry-failed-webhooks pg_cron job (every 5 min)
[x] monitor-webhook-health pg_cron job (every 5 min)
[x] monitor-webhook-daily-digest pg_cron job (daily 09:00)

Integrations¶

[x] Shopify webhook configured (order creation → Edge Function)
[x] Slack channels: #ops-alerts, #ops-urgent, #daily-ops

Deactivated¶

[x] Make.com Scenario 1: Webhook Failure Alert (OFF)
[x] Make.com Scenario 2: DLQ Critical Alert (OFF)
[x] Make.com Scenario 3: Daily Webhook Summary (OFF)
[x] Make.com Scenario 4: Hourly Performance Check (OFF)

Version History¶

Version	Date	Changes	Author
5.1	2026-03-21	Absorbed Order Ingestion SLO monitor from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes). Updated pg_cron examples to fn_invoke_monitor pattern. Updated MON-01 reference to v1.1.	Anton / Claude
5.0	2026-03-18	Renamed from SOP-00 to SOP-ORD-01. Rebuilt monitoring to SOP-MON-01 pattern. Added fn_resolve_webhook_manually_v1 and resolved_manual status. Bounded retry worker (LIMIT 100, FOR UPDATE SKIP LOCKED). Enabled RLS on all tables. Removed all credentials. Deactivated Make.com monitoring scenarios.	Anton / Claude
4.0	2025-10-28	Added 5 performance indexes, 6 monitoring views, 4 Make.com alert scenarios, processing duration tracking.	Anton
3.0	2025-10-22	Retry worker, Slack alerts, daily stats view, Make.com integration.	Anton
2.0	2025-10	Error handling, DLQ, processing attempts tracking.	Anton
1.0	2025-10	Initial webhook capture and basic schema.	Anton

Escalation Path¶

Check Slack alerts and this documentation
Query monitoring views for diagnostics
Review Supabase Dashboard logs (Edge Functions, Database)
Check monitoring_runs for execution history: SELECT * FROM raw_ops.monitoring_runs WHERE check_name LIKE 'webhook%' ORDER BY run_at DESC LIMIT 20;
If unresolved: Escalate to Anton

End of SOP-ORD-01 v5.1