SOP-MON-02: Order Ingestion SLO Monitor v2.0¶
Status: ✅ Production Ready
Last Updated: 2026-01-17
Owner: Protocol Raw Operations
Document ID: SOP-MON-02-v2.0
Replaces: SOP-06-v1.01
Phase: A, foundational
Overview¶
Automatically detects when Shopify orders fail to be ingested into Supabase within the SLO threshold, and alerts the operations team via Slack.
Trigger: pg_cron scheduled job (every 2 minutes)
Processing Time: <100ms
Human Intervention: Only for SLO breaches
Automation Coverage: 100%
Monitoring Architecture: See SOP-MON-01: Monitoring & Alerting Architecture for full implementation details.
Purpose¶
Monitor the end-to-end health of the Shopify → Supabase ingestion pipeline (SOP 00). Alert when orders logged in raw_ops.ops_events do not appear in raw_ops.orders within the defined SLO threshold.
This provides early warning of ingestion failures, allowing the team to investigate and resolve issues before customers are impacted.
SLO Definition¶
Service Level Objective: For each order.created webhook received from Shopify and logged to raw_ops.ops_events, a corresponding order row must exist in raw_ops.orders within 5 minutes of the webhook receipt time.
Severity Levels:
| Age | Severity | Channel |
|---|---|---|
| 5-15 minutes | WARN | #ops-alerts |
| >15 minutes | PAGE | #ops-urgent |
Architecture¶
Native Supabase Flow (v2.0)¶
pg_cron (*/2 * * * *)
↓
pg_net (async HTTP)
↓
Edge Function: run-monitor
↓
PostgreSQL: raw_ops.fn_check_order_ingestion_slo_v2()
↓
monitoring_runs (logging)
↓
Edge Function: ops-alerter [if violations found]
↓
Slack: #ops-alerts or #ops-urgent
Previous Architecture (v1.x - Deprecated)¶
Make.com Schedule (every 2 min)
↓
HTTP Request to Supabase
↓
Filter: Has violations?
↓
Iterator: Loop through violations
↓
Slack Module
Migration Benefits:
- Eliminated external dependency on Make.com
- Unified logging in raw_ops.monitoring_runs
- Consistent architecture with other monitors (SOP-MON-01)
- Zero additional cost (within Supabase free tier)
Dependencies¶
Upstream:
- SOP 00: Shopify → Supabase Bridge must be operational and logging to ops_events
Monitoring Infrastructure (SOP-MON-01):
- pg_cron (job scheduling)
- pg_net (async HTTP calls)
- Edge Function: run-monitor
- Edge Function: ops-alerter
- Slack webhook
Database:
- Function: raw_ops.fn_check_order_ingestion_slo_v2() must exist
- Function: raw_ops.check_order_ingestion_slo() (underlying check logic)
- Table: raw_ops.monitoring_runs for logging
External Services: - Slack: Workspace configured with #ops-alerts and #ops-urgent channels
Database Components¶
Core Function: check_order_ingestion_slo()¶
Location: raw_ops.check_order_ingestion_slo()
Purpose: Compares recent order events in ops_events against orders table, identifies missing orders past the SLO threshold.
Returns: Array of violations with:
- shopify_order_id (text)
- received_at_utc (timestamptz)
- message (text)
- check_time_utc (timestamptz)
- age_minutes (numeric)
- severity (text: 'WARN' or 'PAGE')
Lookback window: 20 minutes
Wrapper Function: fn_check_order_ingestion_slo_v2()¶
Location: raw_ops.fn_check_order_ingestion_slo_v2()
Purpose: Wraps the core check function with monitoring infrastructure integration.
Responsibilities:
- Creates monitoring run record
- Calls check_order_ingestion_slo() for actual check
- Logs results to monitoring_runs
- Returns alert configuration for ops-alerter
Returns: JSONB with structure:
{
"success": true,
"run_id": "uuid",
"check_name": "order_ingestion_slo",
"violation_count": 0,
"duration_ms": 5,
"alert_config": null
}
When violations exist:
{
"success": true,
"run_id": "uuid",
"check_name": "order_ingestion_slo",
"violation_count": 2,
"duration_ms": 8,
"alert_config": {
"channel": "ops-alerts",
"severity": "warning",
"title": "Order Ingestion SLO Breach",
"message": "Orders received but not ingested within SLO",
"fields": [
{"label": "Orders Affected", "value": "2", "inline": true},
{"label": "Oldest", "value": "8.5 min", "inline": true},
{"label": "Severity", "value": "WARN", "inline": true}
]
}
}
Required Indexes¶
-- Fast filtering on ops_events for order webhooks
CREATE INDEX IF NOT EXISTS idx_ops_events_order_info
ON raw_ops.ops_events(kind, created_at DESC)
WHERE kind = 'INFO' AND entity_type = 'order';
-- Fast lookups on orders by shopify_order_id
CREATE INDEX IF NOT EXISTS idx_orders_shopify_order_id
ON raw_ops.orders(shopify_order_id);
-- Range scans on orders.created_at
CREATE INDEX IF NOT EXISTS idx_orders_created_at
ON raw_ops.orders(created_at DESC);
-- GIN index for JSONB meta field searches
CREATE INDEX IF NOT EXISTS idx_ops_events_meta_shopify_id
ON raw_ops.ops_events USING gin(meta);
pg_cron Configuration¶
Job Details¶
Job Name: monitor-order-ingestion-slo
Schedule: */2 * * * * (every 2 minutes)
Status: Active
View Job¶
SELECT jobid, jobname, schedule, active
FROM cron.job
WHERE jobname = 'monitor-order-ingestion-slo';
Job Management¶
Disable:
Enable:
Reschedule:
SELECT cron.unschedule('monitor-order-ingestion-slo');
SELECT cron.schedule(
'monitor-order-ingestion-slo',
'*/5 * * * *', -- New schedule
$$ ... $$
);
Alert Format¶
Slack Message (WARN)¶
âš ï¸ Order Ingestion SLO Breach
Orders Affected: 2
Oldest: 8.5 min
Severity: WARN
Order IDs: 6234567890123, 6234567890124
Check Make.com SOP 00 execution history
Slack Message (PAGE)¶
🚨 Order Ingestion SLO Breach - CRITICAL
Orders Affected: 1
Oldest: 18.2 min
Severity: PAGE
Order IDs: 6234567890125
IMMEDIATE ACTION REQUIRED
Monitoring & Troubleshooting¶
Check Recent Runs¶
SELECT
run_at,
status,
duration_ms,
alert_triggered,
result_summary,
error_message
FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
ORDER BY run_at DESC
LIMIT 10;
Check for Current Violations¶
Manual Test¶
Send test request to Edge Function:
Common Issues¶
Issue 1: False Positives¶
Symptom: Alerts for orders that were actually ingested
Investigation:
-- Check if order exists
SELECT * FROM raw_ops.orders
WHERE shopify_order_id = '6234567890123';
-- Check ops_events timing
SELECT * FROM raw_ops.ops_events
WHERE meta->>'shopify_order_id' = '6234567890123'
ORDER BY created_at DESC;
Common Causes: - Clock skew between Shopify and Supabase - Order ingestion delayed but eventually succeeded - Duplicate webhook events
Resolution: - Review timing to confirm if it was a genuine SLO breach - If frequent false positives, consider adjusting threshold
Issue 2: Monitor Not Running¶
Symptom: No alerts even when orders are delayed
Investigation:
-- Check job status
SELECT jobname, schedule, active
FROM cron.job
WHERE jobname = 'monitor-order-ingestion-slo';
-- Check recent runs
SELECT * FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
ORDER BY run_at DESC LIMIT 5;
-- Check cron execution
SELECT * FROM cron.job_run_details
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'monitor-order-ingestion-slo')
ORDER BY start_time DESC LIMIT 5;
Common Causes: - pg_cron job disabled - Edge Function not deployed - Database function error
Resolution:
- Enable job: UPDATE cron.job SET active = true WHERE jobname = 'monitor-order-ingestion-slo';
- Redeploy Edge Functions
- Check function exists: \df raw_ops.fn_check_order_ingestion_slo_v2
Issue 3: Alerts Not Sending¶
Symptom: Violations detected but no Slack message
Investigation:
-- Check if alerts are being triggered
SELECT * FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
AND alert_triggered = true
ORDER BY run_at DESC LIMIT 5;
Common Causes:
- ops-alerter Edge Function issue
- Slack webhook URL invalid
- Alert payload format error
Resolution:
- Check Edge Function logs in Supabase Dashboard
- Verify Slack webhook URL in environment variables
- Test ops-alerter directly with manual payload
Response Runbook¶
When Alert Received¶
-
Acknowledge alert in Slack (react with 👀)
-
Check current status:
-
If violations still exist:
- Check SOP 00 webhook processing
- Check Shopify webhook delivery status
-
Check Edge Function logs
-
If violations resolved:
- Confirm orders now in database
-
Note resolution time
-
For PAGE severity:
- Immediate investigation required
- Consider pausing marketing if systemic issue
- Escalate to technical lead
Emergency Manual Ingestion¶
If webhook processing is down, manually ingest critical orders:
-- âš ï¸ Warning: Only for critical recovery
INSERT INTO raw_ops.orders (
shopify_order_id,
customer_id,
status,
total_price_gbp,
created_at
)
SELECT
(meta->>'shopify_order_id')::bigint,
-- ... other fields
FROM raw_ops.ops_events
WHERE meta->>'shopify_order_id' = '6234567890123';
Configuration¶
| Parameter | Value |
|---|---|
| Schedule | Every 2 minutes (*/2 * * * *) |
| Lookback Window | 20 minutes |
| WARN Threshold | 5 minutes |
| PAGE Threshold | 15 minutes |
| WARN Channel | #ops-alerts |
| PAGE Channel | #ops-urgent |
Success Metrics¶
| Metric | Target |
|---|---|
| False positive rate | <5% |
| Detection latency | <2 min after threshold breach |
| Alert delivery | 100% of violations result in Slack message |
| Monitor uptime | 99.9% |
Version History¶
v2.0 (2026-01-17) - Native Supabase Monitoring ✅¶
Major Changes:
- Migrated from Make.com to native Supabase monitoring
- New wrapper function fn_check_order_ingestion_slo_v2()
- pg_cron scheduled job monitor-order-ingestion-slo
- Integration with SOP-MON-01 monitoring architecture
- All runs logged to raw_ops.monitoring_runs
- Dual-channel alerting (WARN → #ops-alerts, PAGE → #ops-urgent)
Removed: - Make.com scenario dependency - Iterator-based Slack messaging
Benefits: - Zero external dependencies - Unified monitoring architecture - Consistent logging and observability - No additional cost
v1.01 (2025-11-02) - Initial Implementation¶
- Make.com scenario with 2-minute schedule
- Supabase function
check_order_ingestion_slo() - Slack integration via Make.com module
- 5-minute SLO threshold
- WARN/PAGE severity levels
v1.0 (2025-10-15) - Initial Draft¶
- Manual trigger only (pre-launch)
- Basic SLO definition
- Documentation framework
Related SOPs¶
- SOP 00: Shopify → Supabase Bridge (upstream dependency)
- SOP-MON-01: Monitoring & Alerting Architecture (monitoring infrastructure)
End of SOP-MON-02
Last reviewed: 2026-01-17
Next review: Monthly via automated monitoring
System status: ✅ Production Ready