SOP-ORD-01: Order Ingestion v5.1¶
Automated webhook capture, processing, retry, and monitoring for Shopify order data.
Document ID: SOP-ORD-01-v5.1 Version: 5.1 Status: ✅ Production Ready Last Updated: 2026-03-21 Owner: Protocol Raw Operations Replaces: SOP-00-v4.0 (renamed from SOP-00 to SOP-ORD-01) Review Date: 2026-06-18
Key Changes in v5.1¶
- ✅ Order Ingestion SLO monitor absorbed from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes)
- ✅ pg_cron examples updated to fn_invoke_monitor pattern (SOP-MON-01 v1.1)
- ✅ MON-01 reference updated to v1.1
Key Changes in v5.0¶
- ✅ Monitoring rebuilt to SOP-MON-01 architecture (
pg_cron → pg_net → run-monitor → fn_check → ops-alerter → Slack) - ✅ All 4 Make.com polling scenarios deactivated, replaced by native Supabase monitors
- ✅ DLQ manual resolution replaced with controlled
fn_resolve_webhook_manually_v1()function andresolved_manualstatus - ✅ Retry worker updated with
LIMIT 100andFOR UPDATE SKIP LOCKEDfor bounded, concurrent-safe processing - ✅ RLS enabled on all 6 core tables (service_role bypasses, anon blocked)
- ✅ All credentials removed from documentation
- ❌ Removed "production ready for 100k+" claims pending real-world validation at scale
Purpose¶
Capture 100% of Shopify order webhooks, process them into structured operational data, automatically retry failures, and monitor system health with zero manual intervention.
Scope¶
This SOP covers the complete order ingestion pipeline from Shopify webhook to confirmed database record. It does not cover downstream processing (allocation, fulfilment, dispatch), which are handled by SOP-LAB-01 onwards.
Related SOPs: - SOP-LAB-01: Batch Creation and Lab-to-Release - SOP-DLV-01: Courier Watchdog - SOP-ORD-02: 3PL Order Export - SOP-ORD-03: Dispatch Ingestion - SOP-MON-01: Monitoring and Alerting Architecture (master pattern)
Architecture¶
Shopify (order/create webhook)
↓
shopify-webhook-handler (Edge Function — HMAC validation, thin wrapper)
↓
raw_ops.process_order_webhook_txn() (PostgreSQL — all business logic)
↓
webhook_inbox + customers + orders + order_items (database tables)
↓ (on failure)
retry_failed_webhooks() (pg_cron → pg_net → Edge Function → DB function)
↓ (after 5 failures)
Dead Letter Queue (processing_status = 'dlq')
↓ (manual intervention)
fn_resolve_webhook_manually_v1() (controlled resolution with audit trail)
Monitoring Architecture (SOP-MON-01 compliant)¶
pg_cron (every 5 min / daily 09:00)
↓
pg_net (async HTTP)
↓
run-monitor (Edge Function — generic router)
↓
fn_check_webhook_health_v2() / fn_check_webhook_daily_digest_v2()
↓
monitoring_runs (execution log)
↓ (if threshold breached)
ops-alerter (Edge Function)
↓
Slack (#ops-alerts / #ops-urgent / #daily-ops)
Database Schema¶
All tables in raw_ops schema. RLS enabled on all tables. Service role bypasses RLS. Anon key has no access.
Core Tables¶
| Table | Purpose | Key Indexes |
|---|---|---|
webhook_inbox |
All incoming webhooks with processing status | idx_webhook_inbox_retry_queue (partial), idx_webhook_inbox_status, idx_webhook_inbox_shopify_order (partial) |
customers |
Customer records upserted from webhook data | email, shopify_customer_id |
orders |
Order headers, one per Shopify order | shopify_order_id, ordered_at, export_state |
order_items |
Line items per order (many-to-one with orders) | idx_order_items_order_id, idx_order_items_product (partial) |
ops_events |
Operational audit log for all mutations | entity_type, entity_id |
monitoring_runs |
Execution log for all scheduled checks | check_name, run_at |
RLS Posture¶
| Table | RLS Enabled | Policies | Access Model |
|---|---|---|---|
webhook_inbox |
✅ Yes | None (no anon access) | Service role only |
customers |
✅ Yes | None | Service role only |
orders |
✅ Yes | None | Service role only |
order_items |
✅ Yes | None | Service role only |
ops_events |
✅ Yes | None | Service role only |
monitoring_runs |
✅ Yes | None | Service role only |
No policies are defined. With RLS enabled and no policies, anon key queries return zero rows. Service role bypasses RLS entirely. This is the correct posture for backend-only operational tables that are never queried from the client.
Processing Status Values¶
| Status | Meaning | Transition |
|---|---|---|
pending |
Received, awaiting processing | → success or failed |
success |
Processed successfully | Terminal |
failed |
Processing failed, eligible for retry | → success, failed (re-attempt), or dlq |
retrying |
Currently being retried | → success or failed |
dlq |
5 attempts exhausted, manual intervention required | → resolved_manual only |
resolved_manual |
Manually resolved via controlled function | Terminal |
Key Fields: webhook_inbox¶
| Field | Purpose |
|---|---|
processing_status |
Current state (see above) |
processing_attempts |
Retry count (max 5) |
processing_duration_ms |
Performance tracking |
last_error |
Error message for debugging |
error_category |
Classification of error type |
shopify_order_id |
Link to Shopify order |
received_at |
Timestamp for retry ordering |
Performance Indexes¶
Critical for scale. Do not remove.
-- Retry worker optimisation (partial index)
CREATE INDEX idx_webhook_inbox_retry_queue
ON raw_ops.webhook_inbox (received_at DESC)
WHERE processing_status IN ('pending', 'failed', 'retrying');
-- Status monitoring
CREATE INDEX idx_webhook_inbox_status
ON raw_ops.webhook_inbox (processing_status, created_at DESC);
-- Shopify order lookup
CREATE INDEX idx_webhook_inbox_shopify_order
ON raw_ops.webhook_inbox (shopify_order_id)
WHERE shopify_order_id IS NOT NULL;
-- Order items join optimisation
CREATE INDEX idx_order_items_order_id
ON raw_ops.order_items (order_id);
-- Product analysis
CREATE INDEX idx_order_items_product
ON raw_ops.order_items (product_id)
WHERE product_id IS NOT NULL;
Performance impact at scale: - Retry worker query: ~10ms at 1M rows (vs ~30s without indexes) - Order detail queries: sub-10ms - Monitoring view queries: sub-100ms
Indexes are maintained automatically by PostgreSQL. Monitor sizes via pg_stat_user_indexes.
Edge Function: shopify-webhook-handler¶
Endpoint: https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler
Purpose: Receive Shopify order webhooks, validate HMAC signature, delegate processing to PostgreSQL.
Processing Flow¶
- Validate Shopify HMAC signature using
SHOPIFY_WEBHOOK_SECRET - Store raw webhook in
webhook_inbox(status:pending) - Call
raw_ops.process_order_webhook_txn()which: - Upserts customer data
- Creates order record
- Creates order items
- Mark webhook as
successorfailed - Always return HTTP 200 to Shopify (prevents Shopify's own retry loop)
Error Handling¶
- All errors caught and logged to
webhook_inbox.last_error - Errors classified via
error_category - Failed webhooks picked up by retry worker automatically
Environment Variables¶
Set in Supabase Dashboard → Edge Functions → Secrets:
- SHOPIFY_WEBHOOK_SECRET — Webhook validation key from Shopify
Deployment¶
Performance Targets¶
| Metric | Target |
|---|---|
| P95 processing time | < 100ms |
| Cold start | < 100ms |
| Availability | Scales horizontally via Supabase |
Retry Worker¶
Function: raw_ops.retry_failed_webhooks()
Schedule: Every 5 minutes via pg_cron
Invocation: pg_cron → pg_net → retry-failed-webhooks Edge Function → DB function
Logic¶
- Select failed webhooks:
LIMIT 100,FOR UPDATE SKIP LOCKED(bounded, concurrent-safe) - Retry with exponential backoff:
- Attempt 1: Immediate
- Attempt 2: 1 minute wait
- Attempt 3: 5 minutes wait
- Attempt 4: 15 minutes wait
- Attempt 5: 30 minutes wait
- After 5 failures: Move to DLQ (
processing_status = 'dlq') - Total time to DLQ: 51 minutes maximum
Retry Query¶
SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('pending', 'failed', 'retrying')
ORDER BY received_at DESC
LIMIT 100
FOR UPDATE SKIP LOCKED;
Uses idx_webhook_inbox_retry_queue partial index. Query time ~10ms at 1M rows.
Deployment¶
pg_cron Job¶
SELECT cron.schedule(
'retry-failed-webhooks',
'*/5 * * * *',
$$
SELECT net.http_post(
url := 'https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/retry-failed-webhooks',
headers := '{"Content-Type": "application/json", "apikey": "ANON_KEY"}'::jsonb,
body := '{}'::jsonb
);
$$
);
Dead Letter Queue (DLQ)¶
When Webhooks Enter DLQ¶
After 5 failed processing attempts, webhooks are moved to processing_status = 'dlq'. Every DLQ entry triggers a critical alert to #ops-urgent via the webhook health monitor.
Resolution Process¶
DLQ webhooks must be resolved using the controlled resolution function. Direct UPDATE statements are prohibited.
-- Resolve a DLQ webhook with audit trail
SELECT raw_ops.fn_resolve_webhook_manually_v1(
p_webhook_id := 'uuid-here',
p_resolution_note := 'Duplicate order, original processed successfully as order #1234'
);
Function behaviour:
- Only accepts webhooks with processing_status = 'dlq' (rejects all other statuses)
- Sets status to resolved_manual (not success, preserving data integrity)
- Logs resolution to ops_events with note and previous status
- Returns JSON confirmation or error
Common DLQ Scenarios¶
| Scenario | Resolution |
|---|---|
| Malformed Shopify data | Fix data upstream, resolve with note |
| Duplicate order ID | Confirm original processed, resolve as duplicate |
| Database constraint violation | Fix constraint or data, reprocess if needed |
| Edge function bug | Deploy fix, manually reprocess webhook payload |
Monitoring¶
Monitor 1: Order Ingestion SLO¶
Function: raw_ops.fn_check_order_ingestion_slo_v2()
Schedule: Every 2 minutes
Registered in: run-monitor Edge Function as order_ingestion_slo
SLO: Every order.created webhook logged to raw_ops.ops_events must have a corresponding row in raw_ops.orders within 5 minutes of receipt.
Lookback window: 20 minutes
Thresholds:
| Condition | Severity | Channel |
|---|---|---|
| Order age 5-15 minutes without ingestion | warning |
#ops-alerts |
| Order age >15 minutes without ingestion | critical |
#ops-urgent |
Alert fields: - Orders Affected (count) - Oldest (minutes) - Severity - Order IDs (Shopify IDs)
Source logic: Compares ops_events (webhook received, kind = 'INFO', entity_type = 'order') against orders table (ingested) within 20-minute lookback window.
Required indexes:
CREATE INDEX IF NOT EXISTS idx_ops_events_order_info
ON raw_ops.ops_events(kind, created_at DESC)
WHERE kind = 'INFO' AND entity_type = 'order';
CREATE INDEX IF NOT EXISTS idx_orders_shopify_order_id
ON raw_ops.orders(shopify_order_id);
pg_cron job:
SELECT cron.schedule(
'monitor-order-ingestion-slo',
'*/2 * * * *',
$$ SELECT raw_ops.fn_invoke_monitor('order_ingestion_slo'); $$
);
Monitor 2: Webhook Health Check¶
Function: raw_ops.fn_check_webhook_health_v2()
Schedule: Every 5 minutes
Registered in: run-monitor Edge Function as webhook_health
Checks and thresholds:
| Check | Threshold | Severity | Channel |
|---|---|---|---|
| Retry queue depth | > 100 pending webhooks | Warning | #ops-alerts |
| Oldest pending webhook | > 30 minutes old | Warning | #ops-alerts |
| New DLQ entries | Any in last 10 minutes | Critical | #ops-urgent |
| P95 processing latency | > 5,000ms | Warning | #ops-alerts |
pg_cron job:
SELECT cron.schedule(
'monitor-webhook-health',
'*/5 * * * *',
$$ SELECT raw_ops.fn_invoke_monitor('webhook_health'); $$
);
Monitor 3: Daily Webhook Digest¶
Function: raw_ops.fn_check_webhook_daily_digest_v2()
Schedule: Daily at 09:00 UTC
Channel: #daily-ops (always fires, not threshold-gated)
Reports: - 24-hour success, failure, and DLQ counts - Success rate percentage - Average, P95, and max processing latency
pg_cron job:
SELECT cron.schedule(
'monitor-webhook-daily-digest',
'0 9 * * *',
$$ SELECT raw_ops.fn_invoke_monitor('webhook_daily_digest'); $$
);
Monitoring Views¶
These SQL views remain available for ad-hoc investigation and Metabase dashboards.
| View | Purpose |
|---|---|
raw_ops.webhook_health |
24-hour processing metrics (counts, latency percentiles, success rate) |
raw_ops.retry_queue_depth |
Current retry queue status (pending count, oldest age, breakdown) |
raw_ops.webhook_processing_by_hour |
Hourly stats for last 7 days (volume, success rate, latency trends) |
raw_ops.error_summary |
Error patterns by category (types, frequency, sample messages) |
raw_ops.alert_retry_queue_backup |
Returns rows only if queue depth > 100 or oldest > 30 min |
raw_ops.alert_slow_processing |
Returns rows only if P95 > 5,000ms or max > 10,000ms |
Metabase Dashboard: Webhook Operations¶
Status: ⏳ NOT YET BUILT. Dashboard specification defined, implementation pending. Priority: Build after lifecycle emails, before seed deck. This dashboard provides investor-grade operational credibility (99%+ ingestion success rate, sub-100ms latency, zero DLQ incidents).
Required questions: 1. Success rate trend (7-day rolling) 2. Queue depth over time 3. DLQ count by day 4. P95 latency trend 5. Error category breakdown (pie chart) 6. Retry resolution rate (retried → success vs retried → DLQ) 7. Oldest pending webhook age 8. Daily volume trend
Health Indicators¶
| State | Success Rate | Queue Depth | P95 Latency | DLQ/Day |
|---|---|---|---|---|
| 🟢 Healthy | > 99% | 0-10 | < 1s | 0 |
| 🟡 Warning | 95-99% | 10-50 | 1-3s | 1-2 |
| 🔴 Critical | < 95% | > 50 | > 3s | 3+ |
Shopify Configuration¶
Location: Shopify Admin → Settings → Notifications → Webhooks
| Setting | Value |
|---|---|
| Event | Order creation |
| Format | JSON |
| URL | https://znfjpllsiuyezqlneqzr.supabase.co/functions/v1/shopify-webhook-handler |
| API Version | 2024-10 (or latest stable) |
Webhook secret stored in:
- Supabase: Edge Functions → Secrets → SHOPIFY_WEBHOOK_SECRET
- Password manager (backup)
Database Functions¶
Core Functions¶
| Function | Called By | Purpose |
|---|---|---|
raw_ops.process_order_webhook_txn(webhook_data JSONB) |
shopify-webhook-handler Edge Function | Process webhook into customer, order, and order_items within a transaction |
raw_ops.retry_failed_webhooks() |
retry-failed-webhooks Edge Function (via pg_cron) | Retry failed webhooks with exponential backoff, bounded batch of 100, FOR UPDATE SKIP LOCKED |
raw_ops.fn_resolve_webhook_manually_v1(p_webhook_id UUID, p_resolution_note TEXT) |
Operator via SQL console | Resolve DLQ webhooks with audit trail. Only accepts dlq status. Sets to resolved_manual |
raw_ops.fn_check_webhook_health_v2() |
run-monitor Edge Function (via pg_cron) | Check queue depth, DLQ entries, P95 latency. Returns alert_config if thresholds breached |
raw_ops.fn_check_webhook_daily_digest_v2() |
run-monitor Edge Function (via pg_cron) | Generate 24-hour digest with counts, rates, and latency metrics |
Supporting Functions¶
SELECT routine_name, routine_type
FROM information_schema.routines
WHERE routine_schema = 'raw_ops'
AND routine_name LIKE '%webhook%'
ORDER BY routine_name;
Alert Response Procedures¶
Retry Queue Backup (#ops-alerts, warning)¶
Trigger: Pending webhooks > 100 or oldest pending > 30 minutes.
Investigation:
1. Check retry worker cron is running: Supabase Dashboard → Database → Cron Jobs → verify retry-failed-webhooks shows recent executions
2. Check database logs: Supabase Dashboard → Logs → Database → look for retry_failed_webhooks() errors
3. Check queue manually: SELECT * FROM raw_ops.retry_queue_depth;
Common causes: Retry worker cron failed (restart it), database connection pool exhausted (check pg_stat_activity), systematic webhook processing error (check raw_ops.error_summary).
Resolution: If cron stuck, manually trigger retry worker. If systematic error, fix root cause then trigger retry. If database overload, investigate slow queries.
DLQ Critical (#ops-urgent, critical)¶
Trigger: Any new DLQ entry in the last 10 minutes.
Investigation:
1. Query the specific webhook: SELECT * FROM raw_ops.webhook_inbox WHERE processing_status = 'dlq' ORDER BY last_error_at DESC LIMIT 5;
2. Review last_error, error_category, and raw webhook payload
Common causes: Malformed Shopify data, database constraint violation, edge function timeout.
Resolution: Fix underlying issue. Resolve via controlled function:
SELECT raw_ops.fn_resolve_webhook_manually_v1(
'webhook-uuid-here',
'Description of what was wrong and how it was resolved'
);
Order Ingestion SLO Breach (#ops-alerts warning / #ops-urgent critical)¶
Trigger: Orders received via webhook but not ingested within 5 minutes (warning) or 15 minutes (critical).
Investigation:
1. Check Edge Function logs for shopify-webhook-handler in Supabase Dashboard
2. Check webhook inbox for stuck or failed webhooks:
SELECT id, shopify_order_id, processing_status, attempts, created_at
FROM raw_ops.webhook_inbox
WHERE processing_status IN ('failed', 'dlq', 'pending')
ORDER BY created_at DESC
LIMIT 20;
SELECT * FROM cron.job_run_details
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'retry-failed-webhooks')
ORDER BY start_time DESC LIMIT 5;
SELECT * FROM raw_ops.check_order_ingestion_slo();
Common causes: Edge Function crash or deployment issue, Shopify HMAC secret rotation, database connection pool exhaustion, webhook stuck in retry loop (check attempts column, DLQ after 5 failures).
Resolution: If webhooks are in DLQ status, resolve using raw_ops.fn_resolve_webhook_manually_v1() per the DLQ resolution procedure above.
Slow Processing (#ops-alerts, warning)¶
Trigger: P95 processing latency > 5,000ms.
Investigation:
1. Check edge function logs: Supabase Dashboard → Edge Functions → Logs
2. Check database performance: SELECT * FROM pg_stat_activity WHERE state = 'active';
3. Verify indexes: SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';
Common causes: Table bloat (run VACUUM ANALYZE), indexes not being used, connection pool exhaustion, unusually large orders.
Resolution: If table bloat, run VACUUM ANALYZE raw_ops.webhook_inbox;. If indexes unused, run ANALYZE. If persistent, scale Supabase instance.
Data Archival Strategy¶
When to start: Month 3+ or when webhook_inbox exceeds 100k rows.
Archive Successful Webhooks (monthly)¶
-- Create archive table (one time)
CREATE TABLE IF NOT EXISTS raw_ops.webhook_inbox_archive (
LIKE raw_ops.webhook_inbox INCLUDING ALL
);
-- Move processed data older than 90 days
INSERT INTO raw_ops.webhook_inbox_archive
SELECT * FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
AND created_at < NOW() - INTERVAL '90 days';
DELETE FROM raw_ops.webhook_inbox
WHERE processing_status IN ('success', 'resolved_manual')
AND created_at < NOW() - INTERVAL '90 days';
VACUUM ANALYZE raw_ops.webhook_inbox;
Archive: success and resolved_manual (processing complete).
Retain indefinitely: failed and dlq (debugging and analysis).
Expected table sizes at steady state: - Active table: < 500k rows - Archive growth: ~150k rows/month at 10k customers
Credentials and Access¶
All credentials stored in password manager. No credentials appear in this document.
| System | Access |
|---|---|
| Supabase project | znfjpllsiuyezqlneqzr — Owner: Protocol Raw operations account |
| Shopify webhook secret | Supabase Edge Functions → Secrets → SHOPIFY_WEBHOOK_SECRET |
| Slack channels | #ops-alerts, #ops-urgent, #daily-ops — via ops-alerter Edge Function |
Deactivated Make.com Scenarios (v4.0 legacy)¶
The following Make.com scenarios were replaced by native Supabase monitoring in v5.0. They remain in Make.com in deactivated state as fallback reference.
- Webhook Failure Alert (was every 5 min → replaced by
fn_check_webhook_health_v2) - DLQ Critical Alert (was every 5 min → replaced by
fn_check_webhook_health_v2) - Daily Webhook Summary (was daily 09:00 → replaced by
fn_check_webhook_daily_digest_v2) - Hourly Performance Check (was hourly → replaced by
fn_check_webhook_health_v2)
Maintenance¶
Automated (no manual work)¶
- Retry worker processes failures every 5 minutes
- Webhook health monitor checks every 5 minutes
- Daily digest fires at 09:00 UTC
- All executions logged to
monitoring_runs
Monthly¶
- Review error categories and update handling logic if new patterns emerge
- Check index usage:
SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'raw_ops';
Quarterly (Month 3+)¶
- Run archival script (see Data Archival Strategy above)
- Review and vacuum tables if needed
As Needed¶
- Update Shopify API version (annually)
- Redeploy Edge Functions after code changes
- Scale Supabase instance at growth thresholds
Scaling Triggers¶
| Threshold | Action |
|---|---|
| 10,000 customers | Review monthly metrics, confirm health indicators are green |
| 50,000 customers | Implement archival strategy, monitor index sizes |
| 100,000 customers | Consider read replicas for analytics queries |
| 500,000+ customers | Partition webhook_inbox by month |
| 1,000,000+ customers | Separate analytics database |
Outstanding Items¶
These items are identified gaps that do not block production operation but are required for full operational maturity.
Must Do (before seed deck)¶
| Item | Detail | Effort | Owner |
|---|---|---|---|
| Metabase dashboard | Build the 8 questions defined in the Monitoring section. Provides investor-grade proof of operational reliability | 2-3 hours (Claude Code) | Anton |
| ~~pg_cron job rename~~ | ~~Rename shopify-webhook-retry → ord-01-webhook-retry and monitor jobs to align with new SOP taxonomy~~ |
✅ Done 2026-03-18 | Anton |
| Deactivate legacy Make.com scenarios | Toggle OFF the 4 monitoring scenarios replaced in v5.0. Verify native monitors are healthy for 48h first | 10 min (manual) | Anton |
| Rotate service role key | Key was exposed in SOP-00 v4.0. Treat as compromised. Regenerate, update Make.com connections, password manager | 30 min (manual) | Anton |
Should Do (Phase A)¶
| Item | Detail | Effort |
|---|---|---|
| ~~Monitoring heartbeat~~ | ~~Add a check that verifies monitoring runs completed in the last hour. Catches silent monitoring failures.~~ Deployed as fn_check_monitoring_heartbeat_v2() + monitor-heartbeat pg_cron job (every 15 min) |
✅ Done 2026-03-18 |
| Alert delivery retry | If ops-alerter fails to post to Slack, currently the monitoring_runs table logs alert_triggered = true but the message is lost. Add a retry mechanism or fallback channel (e.g. email via Customer.io) |
1-2 hours |
Parked (Phase B)¶
| Item | Detail | Trigger |
|---|---|---|
| Role segmentation | Replace universal service_role with scoped roles (ingestion_role, monitoring_role, analytics_read_role). No security benefit while solo founder; becomes relevant with first technical hire or contractor | First technical hire |
| Operational review cadence | Define weekly review ritual, alert response SLAs, owner per metric. Currently data is available but not embedded in operational behaviour | Post-launch, once there is real data to review |
Deployment Checklist¶
Database Objects¶
- [x]
webhook_inbox,customers,orders,order_itemstables - [x] All 5 performance indexes deployed
- [x] 6 monitoring views created
- [x]
process_order_webhook_txn()function - [x]
retry_failed_webhooks()function (with LIMIT 100, FOR UPDATE SKIP LOCKED) - [x]
fn_resolve_webhook_manually_v1()function - [x]
fn_check_webhook_health_v2()monitoring function - [x]
fn_check_webhook_daily_digest_v2()monitoring function - [x] RLS enabled on all 6 core tables
Edge Functions¶
- [x]
shopify-webhook-handlerdeployed - [x]
retry-failed-webhooksdeployed - [x]
run-monitorupdated with webhook_health and webhook_daily_digest checks - [x]
ops-alerteravailable for Slack delivery
Scheduled Jobs¶
- [x]
retry-failed-webhookspg_cron job (every 5 min) - [x]
monitor-webhook-healthpg_cron job (every 5 min) - [x]
monitor-webhook-daily-digestpg_cron job (daily 09:00)
Integrations¶
- [x] Shopify webhook configured (order creation → Edge Function)
- [x] Slack channels: #ops-alerts, #ops-urgent, #daily-ops
Deactivated¶
- [x] Make.com Scenario 1: Webhook Failure Alert (OFF)
- [x] Make.com Scenario 2: DLQ Critical Alert (OFF)
- [x] Make.com Scenario 3: Daily Webhook Summary (OFF)
- [x] Make.com Scenario 4: Hourly Performance Check (OFF)
Version History¶
| Version | Date | Changes | Author |
|---|---|---|---|
| 5.1 | 2026-03-21 | Absorbed Order Ingestion SLO monitor from retired SOP-MON-02 (SLO definition, thresholds, runbook, indexes). Updated pg_cron examples to fn_invoke_monitor pattern. Updated MON-01 reference to v1.1. | Anton / Claude |
| 5.0 | 2026-03-18 | Renamed from SOP-00 to SOP-ORD-01. Rebuilt monitoring to SOP-MON-01 pattern. Added fn_resolve_webhook_manually_v1 and resolved_manual status. Bounded retry worker (LIMIT 100, FOR UPDATE SKIP LOCKED). Enabled RLS on all tables. Removed all credentials. Deactivated Make.com monitoring scenarios. | Anton / Claude |
| 4.0 | 2025-10-28 | Added 5 performance indexes, 6 monitoring views, 4 Make.com alert scenarios, processing duration tracking. | Anton |
| 3.0 | 2025-10-22 | Retry worker, Slack alerts, daily stats view, Make.com integration. | Anton |
| 2.0 | 2025-10 | Error handling, DLQ, processing attempts tracking. | Anton |
| 1.0 | 2025-10 | Initial webhook capture and basic schema. | Anton |
Escalation Path¶
- Check Slack alerts and this documentation
- Query monitoring views for diagnostics
- Review Supabase Dashboard logs (Edge Functions, Database)
- Check
monitoring_runsfor execution history:SELECT * FROM raw_ops.monitoring_runs WHERE check_name LIKE 'webhook%' ORDER BY run_at DESC LIMIT 20; - If unresolved: Escalate to Anton
End of SOP-ORD-01 v5.1