Skip to content

SOP-ORD-01: Order Ingestion v5.1

Automated webhook capture, processing, retry, and monitoring for Shopify order data.

Document ID: SOP-ORD-01-v5.1 Version: 5.1 Status: ✅ Production Ready Last Updated: 2026-03-21 Owner: Protocol Raw Operations Replaces: SOP-00-v4.0 (renamed from SOP-00 to SOP-ORD-01) Review Date: 2026-06-18


Key Changes in v5.1

  • ✅ Order Ingestion SLO monitor absorbed from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes)
  • ✅ pg_cron examples updated to fn_invoke_monitor pattern (SOP-MON-01 v1.1)
  • ✅ MON-01 reference updated to v1.1

Key Changes in v5.0

  • ✅ Monitoring rebuilt to SOP-MON-01 architecture (pg_cron → pg_net → run-monitor → fn_check → ops-alerter → Slack)
  • ✅ All 4 Make.com polling scenarios deactivated, replaced by native Supabase monitors
  • ✅ DLQ manual resolution replaced with controlled fn_resolve_webhook_manually_v1() function and resolved_manual status
  • ✅ Retry worker updated with LIMIT 100 and FOR UPDATE SKIP LOCKED for bounded, concurrent-safe processing
  • ✅ RLS enabled on all 6 core tables (service_role bypasses, anon blocked)
  • ✅ All credentials removed from documentation
  • ❌ Removed "production ready for 100k+" claims pending real-world validation at scale

Purpose

Capture 100% of Shopify order webhooks, process them into structured operational data, automatically retry failures, and monitor system health with zero manual intervention.

Scope

This SOP covers the complete order ingestion pipeline from Shopify webhook to confirmed database record. It does not cover downstream processing (allocation, fulfilment, dispatch), which are handled by SOP-LAB-01 onwards.

Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release - SOP-DLV-01: Courier Watchdog - SOP-ORD-02: 3PL Order Export - SOP-ORD-03: Dispatch Ingestion - SOP-MON-01: Monitoring and Alerting Architecture (master pattern)


Architecture

Shopify (order/create webhook)
shopify-webhook-handler (Edge Function — HMAC validation, thin wrapper)
raw_ops.process_order_webhook_txn() (PostgreSQL — all business logic)
webhook_inbox + customers + orders + order_items (database tables)
    ↓ (on failure)
retry_failed_webhooks() (pg_cron → pg_net → Edge Function → DB function)
    ↓ (after 5 failures)
Dead Letter Queue (processing_status = 'dlq')
    ↓ (manual intervention)
fn_resolve_webhook_manually_v1() (controlled resolution with audit trail)

Monitoring Architecture (SOP-MON-01 compliant)

pg_cron (every 5 min / daily 09:00)
pg_net (async HTTP)
run-monitor (Edge Function — generic router)
fn_check_webhook_health_v2() / fn_check_webhook_daily_digest_v2()
monitoring_runs (execution log)
    ↓ (if threshold breached)
ops-alerter (Edge Function)
Slack (#ops-alerts / #ops-urgent / #daily-ops)

Database Schema

All tables in raw_ops schema. RLS enabled on all tables. Service role bypasses RLS. Anon key has no access.

Core Tables

Table Purpose Key Indexes
webhook_inbox All incoming webhooks with processing status idx_webhook_inbox_retry_queue (partial), idx_webhook_inbox_status, idx_webhook_inbox_shopify_order (partial)
customers Customer records upserted from webhook data email, shopify_customer_id
orders Order headers, one per Shopify order shopify_order_id, ordered_at, export_state
order_items Line items per order (many-to-one with orders) idx_order_items_order_id, idx_order_items_product (partial)
ops_events Operational audit log for all mutations entity_type, entity_id
monitoring_runs Execution log for all scheduled checks check_name, run_at

RLS Posture

Table RLS Enabled Policies Access Model
webhook_inbox ✅ Yes None (no anon access) Service role only
customers ✅ Yes None Service role only
orders ✅ Yes None Service role only
order_items ✅ Yes None Service role only
ops_events ✅ Yes None Service role only
monitoring_runs ✅ Yes None Service role only

No policies are defined. With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables that are never queried from the client.

Processing Status Values

Status Meaning Transition
pending Received, awaiting processing success or failed
success Processed successfully Terminal
failed Processing failed, eligible for retry success, failed (re-attempt), or dlq
retrying Currently being retried success or failed
dlq 5 attempts exhausted, manual intervention required resolved_manual only
resolved_manual Manually resolved via controlled function Terminal

Key Fields: webhook_inbox

Field Purpose
processing_status Current state (see above)
processing_attempts Retry count (max 5)
processing_duration_ms Performance tracking
last_error Error message for debugging
error_category Classification of error type
shopify_order_id Link to Shopify order
received_at Timestamp for retry ordering

Performance Indexes

Critical for scale. Do not remove.

-- Retry worker optimisation (partial index)
CREATE INDEX idx_webhook_inbox_retry_queue
ON raw_ops.webhook_inbox (received_at DESC)
WHERE processing_status IN ('pending', 'failed', 'retrying');

-- Status monitoring
CREATE INDEX idx_webhook_inbox_status
ON raw_ops.webhook_inbox (processing_status, created_at DESC);

-- Shopify order lookup
CREATE INDEX idx_webhook_inbox_shopify_order
ON raw_ops.webhook_inbox (shopify_order_id)
WHERE shopify_order_id IS NOT NULL;

-- Order items join optimisation
CREATE INDEX idx_order_items_order_id
ON raw_ops.order_items (order_id);

-- Product analysis
CREATE INDEX idx_order_items_product
ON raw_ops.order_items (product_id)
WHERE product_id IS NOT NULL;

Performance impact at scale: - Retry worker query: ~10ms at 1M rows (vs ~30s without indexes) - Order detail queries: sub-10ms - Monitoring view queries: sub-100ms

Indexes are maintained automatically by PostgreSQL. Monitor sizes via pg_stat_user_indexes.


Edge Function: shopify-webhook-handler

Endpoint: https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler

Purpose: Receive Shopify order webhooks, validate HMAC signature, delegate processing to PostgreSQL.

Processing Flow

  1. Validate Shopify HMAC signature using SHOPIFY_WEBHOOK_SECRET
  2. Store raw webhook in webhook_inbox (status: pending)
  3. Call raw_ops.process_order_webhook_txn() which:
  4. Upserts customer data
  5. Creates order record
  6. Creates order items
  7. Mark webhook as success or failed
  8. Always return HTTP 200 to Shopify (prevents Shopify's own retry loop)

Error Handling

  • All errors caught and logged to webhook_inbox.last_error
  • Errors classified via error_category
  • Failed webhooks picked up by retry worker automatically

Environment Variables

Set in Supabase Dashboard → Edge Functions → Secrets: - SHOPIFY_WEBHOOK_SECRET — Webhook validation key from Shopify

Deployment

supabase functions deploy shopify-webhook-handler

Performance Targets

Metric Target
P95 processing time < 100ms
Cold start < 100ms
Availability Scales horizontally via Supabase

Retry Worker

Function: raw_ops.retry_failed_webhooks() Schedule: Every 5 minutes via pg_cron Invocation: pg_cron → pg_net → retry-failed-webhooks Edge Function → DB function

Logic

  1. Select failed webhooks: LIMIT 100, FOR UPDATE SKIP LOCKED (bounded, concurrent-safe)
  2. Retry with exponential backoff:
  3. Attempt 1: Immediate
  4. Attempt 2: 1 minute wait
  5. Attempt 3: 5 minutes wait
  6. Attempt 4: 15 minutes wait
  7. Attempt 5: 30 minutes wait
  8. After 5 failures: Move to DLQ (processing_status = 'dlq')
  9. Total time to DLQ: 51 minutes maximum

Retry Query

SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('pending', 'failed', 'retrying')
ORDER BY received_at DESC
LIMIT 100
FOR UPDATE SKIP LOCKED;

Uses idx_webhook_inbox_retry_queue partial index. Query time ~10ms at 1M rows.

Deployment

supabase functions deploy retry-failed-webhooks

pg_cron Job

SELECT cron.schedule(
  'retry-failed-webhooks',
  '*/5 * * * *',
  $$
  SELECT net.http_post(
    url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/retry-failed-webhooks',
    headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
    body := '{}'::jsonb
  );
  $$
);

Dead Letter Queue (DLQ)

When Webhooks Enter DLQ

After 5 failed processing attempts, webhooks are moved to processing_status = 'dlq'. Every DLQ entry triggers a critical alert to #ops-urgent via the webhook health monitor.

Resolution Process

DLQ webhooks must be resolved using the controlled resolution function. Direct UPDATE statements are prohibited.

-- Resolve a DLQ webhook with audit trail
SELECT raw_ops.fn_resolve_webhook_manually_v1(
  p_webhook_id := 'uuid-here',
  p_resolution_note := 'Duplicate order, original processed successfully as order #1234'
);

Function behaviour: - Only accepts webhooks with processing_status = 'dlq' (rejects all other statuses) - Sets status to resolved_manual (not success, preserving data integrity) - Logs resolution to ops_events with note and previous status - Returns JSON confirmation or error

Common DLQ Scenarios

Scenario Resolution
Malformed Shopify data Fix data upstream, resolve with note
Duplicate order ID Confirm original processed, resolve as duplicate
Database constraint violation Fix constraint or data, reprocess if needed
Edge function bug Deploy fix, manually reprocess webhook payload

Monitoring

Monitor 1: Order Ingestion SLO

Function: raw_ops.fn_check_order_ingestion_slo_v2() Schedule: Every 2 minutes Registered in: run-monitor Edge Function as order_ingestion_slo

SLO: Every order.created webhook logged to raw_ops.ops_events must have a corresponding row in raw_ops.orders within 5 minutes of receipt.

Lookback window: 20 minutes

Thresholds:

Condition Severity Channel
Order age 5-15 minutes without ingestion warning #ops-alerts
Order age >15 minutes without ingestion critical #ops-urgent

Alert fields: - Orders Affected (count) - Oldest (minutes) - Severity - Order IDs (Shopify IDs)

Source logic: Compares ops_events (webhook received, kind = 'INFO', entity_type = 'order') against orders table (ingested) within 20-minute lookback window.

Required indexes:

CREATE INDEX IF NOT EXISTS idx_ops_events_order_info
ON raw_ops.ops_events(kind, created_at DESC)
WHERE kind = 'INFO' AND entity_type = 'order';

CREATE INDEX IF NOT EXISTS idx_orders_shopify_order_id
ON raw_ops.orders(shopify_order_id);

pg_cron job:

SELECT cron.schedule(
  'monitor-order-ingestion-slo',
  '*/2 * * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('order_ingestion_slo'); $$
);

Monitor 2: Webhook Health Check

Function: raw_ops.fn_check_webhook_health_v2() Schedule: Every 5 minutes Registered in: run-monitor Edge Function as webhook_health

Checks and thresholds:

Check Threshold Severity Channel
Retry queue depth > 100 pending webhooks Warning #ops-alerts
Oldest pending webhook > 30 minutes old Warning #ops-alerts
New DLQ entries Any in last 10 minutes Critical #ops-urgent
P95 processing latency > 5,000ms Warning #ops-alerts

pg_cron job:

SELECT cron.schedule(
  'monitor-webhook-health',
  '*/5 * * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('webhook_health'); $$
);

Monitor 3: Daily Webhook Digest

Function: raw_ops.fn_check_webhook_daily_digest_v2() Schedule: Daily at 09:00 UTC Channel: #daily-ops (always fires, not threshold-gated)

Reports: - 24-hour success, failure, and DLQ counts - Success rate percentage - Average, P95, and max processing latency

pg_cron job:

SELECT cron.schedule(
  'monitor-webhook-daily-digest',
  '0 9 * * *',
  $$ SELECT raw_ops.fn_invoke_monitor('webhook_daily_digest'); $$
);

Monitoring Views

These SQL views remain available for ad-hoc investigation and Metabase dashboards.

View Purpose
raw_ops.webhook_health 24-hour processing metrics (counts, latency percentiles, success rate)
raw_ops.retry_queue_depth Current retry queue status (pending count, oldest age, breakdown)
raw_ops.webhook_processing_by_hour Hourly stats for last 7 days (volume, success rate, latency trends)
raw_ops.error_summary Error patterns by category (types, frequency, sample messages)
raw_ops.alert_retry_queue_backup Returns rows only if queue depth > 100 or oldest > 30 min
raw_ops.alert_slow_processing Returns rows only if P95 > 5,000ms or max > 10,000ms

Metabase Dashboard: Webhook Operations

Status: ⏳ NOT YET BUILT. Dashboard specification defined, implementation pending. Priority: Build after lifecycle emails, before seed deck. This dashboard provides investor-grade operational credibility (99%+ ingestion success rate, sub-100ms latency, zero DLQ incidents).

Required questions: 1. Success rate trend (7-day rolling) 2. Queue depth over time 3. DLQ count by day 4. P95 latency trend 5. Error category breakdown (pie chart) 6. Retry resolution rate (retried → success vs retried → DLQ) 7. Oldest pending webhook age 8. Daily volume trend

Health Indicators

State Success Rate Queue Depth P95 Latency DLQ/Day
🟢 Healthy > 99% 0-10 < 1s 0
🟡 Warning 95-99% 10-50 1-3s 1-2
🔴 Critical < 95% > 50 > 3s 3+

Shopify Configuration

Location: Shopify Admin → Settings → Notifications → Webhooks

Setting Value
Event Order creation
Format JSON
URL https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler
API Version 2024-10 (or latest stable)

Webhook secret stored in: - Supabase: Edge Functions → Secrets → SHOPIFY_WEBHOOK_SECRET - Password manager (backup)


Database Functions

Core Functions

Function Called By Purpose
raw_ops.process_order_webhook_txn(webhook_data JSONB) shopify-webhook-handler Edge Function Process webhook into customer, order, and order_items within a transaction
raw_ops.retry_failed_webhooks() retry-failed-webhooks Edge Function (via pg_cron) Retry failed webhooks with exponential backoff, bounded batch of 100, FOR UPDATE SKIP LOCKED
raw_ops.fn_resolve_webhook_manually_v1(p_webhook_id UUID, p_resolution_note TEXT) Operator via SQL console Resolve DLQ webhooks with audit trail. Only accepts dlq status. Sets to resolved_manual
raw_ops.fn_check_webhook_health_v2() run-monitor Edge Function (via pg_cron) Check queue depth, DLQ entries, P95 latency. Returns alert_config if thresholds breached
raw_ops.fn_check_webhook_daily_digest_v2() run-monitor Edge Function (via pg_cron) Generate 24-hour digest with counts, rates, and latency metrics

Supporting Functions

SELECT routine_name, routine_type
FROM information_schema.routines
WHERE routine_schema = 'raw_ops'
AND routine_name LIKE '%webhook%'
ORDER BY routine_name;

Alert Response Procedures

Retry Queue Backup (#ops-alerts, warning)

Trigger: Pending webhooks > 100 or oldest pending > 30 minutes.

Investigation: 1. Check retry worker cron is running: Supabase Dashboard → Database → Cron Jobs → verify retry-failed-webhooks shows recent executions 2. Check database logs: Supabase Dashboard → Logs → Database → look for retry_failed_webhooks() errors 3. Check queue manually: SELECT * FROM raw_ops.retry_queue_depth;

Common causes: Retry worker cron failed (restart it), database connection pool exhausted (check pg_stat_activity), systematic webhook processing error (check raw_ops.error_summary).

Resolution: If cron stuck, manually trigger retry worker. If systematic error, fix root cause then trigger retry. If database overload, investigate slow queries.

DLQ Critical (#ops-urgent, critical)

Trigger: Any new DLQ entry in the last 10 minutes.

Investigation: 1. Query the specific webhook: SELECT * FROM raw_ops.webhook_inbox WHERE processing_status = 'dlq' ORDER BY last_error_at DESC LIMIT 5; 2. Review last_error, error_category, and raw webhook payload

Common causes: Malformed Shopify data, database constraint violation, edge function timeout.

Resolution: Fix underlying issue. Resolve via controlled function:

SELECT raw_ops.fn_resolve_webhook_manually_v1(
  'webhook-uuid-here',
  'Description of what was wrong and how it was resolved'
);

Order Ingestion SLO Breach (#ops-alerts warning / #ops-urgent critical)

Trigger: Orders received via webhook but not ingested within 5 minutes (warning) or 15 minutes (critical).

Investigation: 1. Check Edge Function logs for shopify-webhook-handler in Supabase Dashboard 2. Check webhook inbox for stuck or failed webhooks:

   SELECT id, shopify_order_id, processing_status, attempts, created_at
   FROM raw_ops.webhook_inbox
   WHERE processing_status IN ('failed', 'dlq', 'pending')
   ORDER BY created_at DESC
   LIMIT 20;
3. Check Shopify webhook delivery status (Settings → Notifications → Webhooks) 4. Check retry worker cron:
   SELECT * FROM cron.job_run_details
   WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'retry-failed-webhooks')
   ORDER BY start_time DESC LIMIT 5;
5. Check current violations: SELECT * FROM raw_ops.check_order_ingestion_slo();

Common causes: Edge Function crash or deployment issue, Shopify HMAC secret rotation, database connection pool exhaustion, webhook stuck in retry loop (check attempts column, DLQ after 5 failures).

Resolution: If webhooks are in DLQ status, resolve using raw_ops.fn_resolve_webhook_manually_v1() per the DLQ resolution procedure above.

Slow Processing (#ops-alerts, warning)

Trigger: P95 processing latency > 5,000ms.

Investigation: 1. Check edge function logs: Supabase Dashboard → Edge Functions → Logs 2. Check database performance: SELECT * FROM pg_stat_activity WHERE state = 'active'; 3. Verify indexes: SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';

Common causes: Table bloat (run VACUUM ANALYZE), indexes not being used, connection pool exhaustion, unusually large orders.

Resolution: If table bloat, run VACUUM ANALYZE raw_ops.webhook_inbox;. If indexes unused, run ANALYZE. If persistent, scale Supabase instance.


Data Archival Strategy

When to start: Month 3+ or when webhook_inbox exceeds 100k rows.

Archive Successful Webhooks (monthly)

-- Create archive table (one time)
CREATE TABLE IF NOT EXISTS raw_ops.webhook_inbox_archive (
  LIKE raw_ops.webhook_inbox INCLUDING ALL
);

-- Move processed data older than 90 days
INSERT INTO raw_ops.webhook_inbox_archive
SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
  AND created_at < NOW() - INTERVAL '90 days';

DELETE FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
  AND created_at < NOW() - INTERVAL '90 days';

VACUUM ANALYZE raw_ops.webhook_inbox;

Archive: success and resolved_manual (processing complete). Retain indefinitely: failed and dlq (debugging and analysis).

Expected table sizes at steady state: - Active table: < 500k rows - Archive growth: ~150k rows/month at 10k customers


Credentials and Access

All credentials stored in password manager. No credentials appear in this document.

System Access
Supabase project znfjpllsiuyezqlneqzr — Owner: Protocol Raw operations account
Shopify webhook secret Supabase Edge Functions → Secrets → SHOPIFY_WEBHOOK_SECRET
Slack channels #ops-alerts, #ops-urgent, #daily-ops — via ops-alerter Edge Function

Deactivated Make.com Scenarios (v4.0 legacy)

The following Make.com scenarios were replaced by native Supabase monitoring in v5.0. They remain in Make.com in deactivated state as fallback reference.

  1. Webhook Failure Alert (was every 5 min → replaced by fn_check_webhook_health_v2)
  2. DLQ Critical Alert (was every 5 min → replaced by fn_check_webhook_health_v2)
  3. Daily Webhook Summary (was daily 09:00 → replaced by fn_check_webhook_daily_digest_v2)
  4. Hourly Performance Check (was hourly → replaced by fn_check_webhook_health_v2)

Maintenance

Automated (no manual work)

  • Retry worker processes failures every 5 minutes
  • Webhook health monitor checks every 5 minutes
  • Daily digest fires at 09:00 UTC
  • All executions logged to monitoring_runs

Monthly

  • Review error categories and update handling logic if new patterns emerge
  • Check index usage: SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';

Quarterly (Month 3+)

  • Run archival script (see Data Archival Strategy above)
  • Review and vacuum tables if needed

As Needed

  • Update Shopify API version (annually)
  • Redeploy Edge Functions after code changes
  • Scale Supabase instance at growth thresholds

Scaling Triggers

Threshold Action
10,000 customers Review monthly metrics, confirm health indicators are green
50,000 customers Implement archival strategy, monitor index sizes
100,000 customers Consider read replicas for analytics queries
500,000+ customers Partition webhook_inbox by month
1,000,000+ customers Separate analytics database

Outstanding Items

These items are identified gaps that do not block production operation but are required for full operational maturity.

Must Do (before seed deck)

Item Detail Effort Owner
Metabase dashboard Build the 8 questions defined in the Monitoring section. Provides investor-grade proof of operational reliability 2-3 hours (Claude Code) Anton
~~pg_cron job rename~~ ~~Rename shopify-webhook-retryord-01-webhook-retry and monitor jobs to align with new SOP taxonomy~~ ✅ Done 2026-03-18 Anton
Deactivate legacy Make.com scenarios Toggle OFF the 4 monitoring scenarios replaced in v5.0. Verify native monitors are healthy for 48h first 10 min (manual) Anton
Rotate service role key Key was exposed in SOP-00 v4.0. Treat as compromised. Regenerate, update Make.com connections, password manager 30 min (manual) Anton

Should Do (Phase A)

Item Detail Effort
~~Monitoring heartbeat~~ ~~Add a check that verifies monitoring runs completed in the last hour. Catches silent monitoring failures.~~ Deployed as fn_check_monitoring_heartbeat_v2() + monitor-heartbeat pg_cron job (every 15 min) ✅ Done 2026-03-18
Alert delivery retry If ops-alerter fails to post to Slack, currently the monitoring_runs table logs alert_triggered = true but the message is lost. Add a retry mechanism or fallback channel (e.g. email via Customer.io) 1-2 hours

Parked (Phase B)

Item Detail Trigger
Role segmentation Replace universal service_role with scoped roles (ingestion_role, monitoring_role, analytics_read_role). No security benefit while solo founder; becomes relevant with first technical hire or contractor First technical hire
Operational review cadence Define weekly review ritual, alert response SLAs, owner per metric. Currently data is available but not embedded in operational behaviour Post-launch, once there is real data to review

Deployment Checklist

Database Objects

  • [x] webhook_inbox, customers, orders, order_items tables
  • [x] All 5 performance indexes deployed
  • [x] 6 monitoring views created
  • [x] process_order_webhook_txn() function
  • [x] retry_failed_webhooks() function (with LIMIT 100, FOR UPDATE SKIP LOCKED)
  • [x] fn_resolve_webhook_manually_v1() function
  • [x] fn_check_webhook_health_v2() monitoring function
  • [x] fn_check_webhook_daily_digest_v2() monitoring function
  • [x] RLS enabled on all 6 core tables

Edge Functions

  • [x] shopify-webhook-handler deployed
  • [x] retry-failed-webhooks deployed
  • [x] run-monitor updated with webhook_health and webhook_daily_digest checks
  • [x] ops-alerter available for Slack delivery

Scheduled Jobs

  • [x] retry-failed-webhooks pg_cron job (every 5 min)
  • [x] monitor-webhook-health pg_cron job (every 5 min)
  • [x] monitor-webhook-daily-digest pg_cron job (daily 09:00)

Integrations

  • [x] Shopify webhook configured (order creation → Edge Function)
  • [x] Slack channels: #ops-alerts, #ops-urgent, #daily-ops

Deactivated

  • [x] Make.com Scenario 1: Webhook Failure Alert (OFF)
  • [x] Make.com Scenario 2: DLQ Critical Alert (OFF)
  • [x] Make.com Scenario 3: Daily Webhook Summary (OFF)
  • [x] Make.com Scenario 4: Hourly Performance Check (OFF)

Version History

Version Date Changes Author
5.1 2026-03-21 Absorbed Order Ingestion SLO monitor from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes). Updated pg_cron examples to fn_invoke_monitor pattern. Updated MON-01 reference to v1.1. Anton / Claude
5.0 2026-03-18 Renamed from SOP-00 to SOP-ORD-01. Rebuilt monitoring to SOP-MON-01 pattern. Added fn_resolve_webhook_manually_v1 and resolved_manual status. Bounded retry worker (LIMIT 100, FOR UPDATE SKIP LOCKED). Enabled RLS on all tables. Removed all credentials. Deactivated Make.com monitoring scenarios. Anton / Claude
4.0 2025-10-28 Added 5 performance indexes, 6 monitoring views, 4 Make.com alert scenarios, processing duration tracking. Anton
3.0 2025-10-22 Retry worker, Slack alerts, daily stats view, Make.com integration. Anton
2.0 2025-10 Error handling, DLQ, processing attempts tracking. Anton
1.0 2025-10 Initial webhook capture and basic schema. Anton

Escalation Path

  1. Check Slack alerts and this documentation
  2. Query monitoring views for diagnostics
  3. Review Supabase Dashboard logs (Edge Functions, Database)
  4. Check monitoring_runs for execution history: SELECT * FROM raw_ops.monitoring_runs WHERE check_name LIKE 'webhook%' ORDER BY run_at DESC LIMIT 20;
  5. If unresolved: Escalate to Anton

End of SOP-ORD-01 v5.1