Skip to content

SOP-MON-02: Order Ingestion SLO Monitor v2.0

Status: ✅ Production Ready
Last Updated: 2026-01-17
Owner: Protocol Raw Operations
Document ID: SOP-MON-02-v2.0
Replaces: SOP-06-v1.01
Phase: A, foundational


Overview

Automatically detects when Shopify orders fail to be ingested into Supabase within the SLO threshold, and alerts the operations team via Slack.

Trigger: pg_cron scheduled job (every 2 minutes)
Processing Time: <100ms
Human Intervention: Only for SLO breaches
Automation Coverage: 100%

Monitoring Architecture: See SOP-MON-01: Monitoring & Alerting Architecture for full implementation details.


Purpose

Monitor the end-to-end health of the Shopify → Supabase ingestion pipeline (SOP 00). Alert when orders logged in raw_ops.ops_events do not appear in raw_ops.orders within the defined SLO threshold.

This provides early warning of ingestion failures, allowing the team to investigate and resolve issues before customers are impacted.


SLO Definition

Service Level Objective: For each order.created webhook received from Shopify and logged to raw_ops.ops_events, a corresponding order row must exist in raw_ops.orders within 5 minutes of the webhook receipt time.

Severity Levels:

Age Severity Channel
5-15 minutes WARN #ops-alerts
>15 minutes PAGE #ops-urgent

Architecture

Native Supabase Flow (v2.0)

pg_cron (*/2 * * * *)
    ↓
pg_net (async HTTP)
    ↓
Edge Function: run-monitor
    ↓
PostgreSQL: raw_ops.fn_check_order_ingestion_slo_v2()
    ↓
monitoring_runs (logging)
    ↓
Edge Function: ops-alerter [if violations found]
    ↓
Slack: #ops-alerts or #ops-urgent

Previous Architecture (v1.x - Deprecated)

Make.com Schedule (every 2 min)
    ↓
HTTP Request to Supabase
    ↓
Filter: Has violations?
    ↓
Iterator: Loop through violations
    ↓
Slack Module

Migration Benefits: - Eliminated external dependency on Make.com - Unified logging in raw_ops.monitoring_runs - Consistent architecture with other monitors (SOP-MON-01) - Zero additional cost (within Supabase free tier)


Dependencies

Upstream: - SOP 00: Shopify → Supabase Bridge must be operational and logging to ops_events

Monitoring Infrastructure (SOP-MON-01): - pg_cron (job scheduling) - pg_net (async HTTP calls) - Edge Function: run-monitor - Edge Function: ops-alerter - Slack webhook

Database: - Function: raw_ops.fn_check_order_ingestion_slo_v2() must exist - Function: raw_ops.check_order_ingestion_slo() (underlying check logic) - Table: raw_ops.monitoring_runs for logging

External Services: - Slack: Workspace configured with #ops-alerts and #ops-urgent channels


Database Components

Core Function: check_order_ingestion_slo()

Location: raw_ops.check_order_ingestion_slo()
Purpose: Compares recent order events in ops_events against orders table, identifies missing orders past the SLO threshold.

Returns: Array of violations with: - shopify_order_id (text) - received_at_utc (timestamptz) - message (text) - check_time_utc (timestamptz) - age_minutes (numeric) - severity (text: 'WARN' or 'PAGE')

Lookback window: 20 minutes

Wrapper Function: fn_check_order_ingestion_slo_v2()

Location: raw_ops.fn_check_order_ingestion_slo_v2()
Purpose: Wraps the core check function with monitoring infrastructure integration.

Responsibilities: - Creates monitoring run record - Calls check_order_ingestion_slo() for actual check - Logs results to monitoring_runs - Returns alert configuration for ops-alerter

Returns: JSONB with structure:

{
  "success": true,
  "run_id": "uuid",
  "check_name": "order_ingestion_slo",
  "violation_count": 0,
  "duration_ms": 5,
  "alert_config": null
}

When violations exist:

{
  "success": true,
  "run_id": "uuid",
  "check_name": "order_ingestion_slo",
  "violation_count": 2,
  "duration_ms": 8,
  "alert_config": {
    "channel": "ops-alerts",
    "severity": "warning",
    "title": "Order Ingestion SLO Breach",
    "message": "Orders received but not ingested within SLO",
    "fields": [
      {"label": "Orders Affected", "value": "2", "inline": true},
      {"label": "Oldest", "value": "8.5 min", "inline": true},
      {"label": "Severity", "value": "WARN", "inline": true}
    ]
  }
}

Required Indexes

-- Fast filtering on ops_events for order webhooks
CREATE INDEX IF NOT EXISTS idx_ops_events_order_info
ON raw_ops.ops_events(kind, created_at DESC)
WHERE kind = 'INFO' AND entity_type = 'order';

-- Fast lookups on orders by shopify_order_id
CREATE INDEX IF NOT EXISTS idx_orders_shopify_order_id
ON raw_ops.orders(shopify_order_id);

-- Range scans on orders.created_at
CREATE INDEX IF NOT EXISTS idx_orders_created_at
ON raw_ops.orders(created_at DESC);

-- GIN index for JSONB meta field searches
CREATE INDEX IF NOT EXISTS idx_ops_events_meta_shopify_id
ON raw_ops.ops_events USING gin(meta);

pg_cron Configuration

Job Details

Job Name: monitor-order-ingestion-slo
Schedule: */2 * * * * (every 2 minutes)
Status: Active

View Job

SELECT jobid, jobname, schedule, active
FROM cron.job 
WHERE jobname = 'monitor-order-ingestion-slo';

Job Management

Disable:

UPDATE cron.job SET active = false 
WHERE jobname = 'monitor-order-ingestion-slo';

Enable:

UPDATE cron.job SET active = true 
WHERE jobname = 'monitor-order-ingestion-slo';

Reschedule:

SELECT cron.unschedule('monitor-order-ingestion-slo');
SELECT cron.schedule(
  'monitor-order-ingestion-slo',
  '*/5 * * * *',  -- New schedule
  $$ ... $$
);


Alert Format

Slack Message (WARN)

⚠️ Order Ingestion SLO Breach

Orders Affected: 2
Oldest: 8.5 min
Severity: WARN
Order IDs: 6234567890123, 6234567890124

Check Make.com SOP 00 execution history

Slack Message (PAGE)

🚨 Order Ingestion SLO Breach - CRITICAL

Orders Affected: 1
Oldest: 18.2 min
Severity: PAGE
Order IDs: 6234567890125

IMMEDIATE ACTION REQUIRED

Monitoring & Troubleshooting

Check Recent Runs

SELECT 
  run_at,
  status,
  duration_ms,
  alert_triggered,
  result_summary,
  error_message
FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
ORDER BY run_at DESC
LIMIT 10;

Check for Current Violations

SELECT * FROM raw_ops.check_order_ingestion_slo();

Manual Test

Send test request to Edge Function:

{"check": "order_ingestion_slo"}

Common Issues

Issue 1: False Positives

Symptom: Alerts for orders that were actually ingested

Investigation:

-- Check if order exists
SELECT * FROM raw_ops.orders 
WHERE shopify_order_id = '6234567890123';

-- Check ops_events timing
SELECT * FROM raw_ops.ops_events
WHERE meta->>'shopify_order_id' = '6234567890123'
ORDER BY created_at DESC;

Common Causes: - Clock skew between Shopify and Supabase - Order ingestion delayed but eventually succeeded - Duplicate webhook events

Resolution: - Review timing to confirm if it was a genuine SLO breach - If frequent false positives, consider adjusting threshold

Issue 2: Monitor Not Running

Symptom: No alerts even when orders are delayed

Investigation:

-- Check job status
SELECT jobname, schedule, active 
FROM cron.job 
WHERE jobname = 'monitor-order-ingestion-slo';

-- Check recent runs
SELECT * FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
ORDER BY run_at DESC LIMIT 5;

-- Check cron execution
SELECT * FROM cron.job_run_details
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'monitor-order-ingestion-slo')
ORDER BY start_time DESC LIMIT 5;

Common Causes: - pg_cron job disabled - Edge Function not deployed - Database function error

Resolution: - Enable job: UPDATE cron.job SET active = true WHERE jobname = 'monitor-order-ingestion-slo'; - Redeploy Edge Functions - Check function exists: \df raw_ops.fn_check_order_ingestion_slo_v2

Issue 3: Alerts Not Sending

Symptom: Violations detected but no Slack message

Investigation:

-- Check if alerts are being triggered
SELECT * FROM raw_ops.monitoring_runs
WHERE check_name = 'order_ingestion_slo'
  AND alert_triggered = true
ORDER BY run_at DESC LIMIT 5;

Common Causes: - ops-alerter Edge Function issue - Slack webhook URL invalid - Alert payload format error

Resolution: - Check Edge Function logs in Supabase Dashboard - Verify Slack webhook URL in environment variables - Test ops-alerter directly with manual payload


Response Runbook

When Alert Received

  1. Acknowledge alert in Slack (react with 👀)

  2. Check current status:

    SELECT * FROM raw_ops.check_order_ingestion_slo();
    

  3. If violations still exist:

  4. Check SOP 00 webhook processing
  5. Check Shopify webhook delivery status
  6. Check Edge Function logs

  7. If violations resolved:

  8. Confirm orders now in database
  9. Note resolution time

  10. For PAGE severity:

  11. Immediate investigation required
  12. Consider pausing marketing if systemic issue
  13. Escalate to technical lead

Emergency Manual Ingestion

If webhook processing is down, manually ingest critical orders:

-- ⚠️ Warning: Only for critical recovery
INSERT INTO raw_ops.orders (
  shopify_order_id,
  customer_id,
  status,
  total_price_gbp,
  created_at
)
SELECT 
  (meta->>'shopify_order_id')::bigint,
  -- ... other fields
FROM raw_ops.ops_events
WHERE meta->>'shopify_order_id' = '6234567890123';

Configuration

Parameter Value
Schedule Every 2 minutes (*/2 * * * *)
Lookback Window 20 minutes
WARN Threshold 5 minutes
PAGE Threshold 15 minutes
WARN Channel #ops-alerts
PAGE Channel #ops-urgent

Success Metrics

Metric Target
False positive rate <5%
Detection latency <2 min after threshold breach
Alert delivery 100% of violations result in Slack message
Monitor uptime 99.9%

Version History

v2.0 (2026-01-17) - Native Supabase Monitoring ✅

Major Changes: - Migrated from Make.com to native Supabase monitoring - New wrapper function fn_check_order_ingestion_slo_v2() - pg_cron scheduled job monitor-order-ingestion-slo - Integration with SOP-MON-01 monitoring architecture - All runs logged to raw_ops.monitoring_runs - Dual-channel alerting (WARN → #ops-alerts, PAGE → #ops-urgent)

Removed: - Make.com scenario dependency - Iterator-based Slack messaging

Benefits: - Zero external dependencies - Unified monitoring architecture - Consistent logging and observability - No additional cost

v1.01 (2025-11-02) - Initial Implementation

  • Make.com scenario with 2-minute schedule
  • Supabase function check_order_ingestion_slo()
  • Slack integration via Make.com module
  • 5-minute SLO threshold
  • WARN/PAGE severity levels

v1.0 (2025-10-15) - Initial Draft

  • Manual trigger only (pre-launch)
  • Basic SLO definition
  • Documentation framework

  • SOP 00: Shopify → Supabase Bridge (upstream dependency)
  • SOP-MON-01: Monitoring & Alerting Architecture (monitoring infrastructure)

End of SOP-MON-02

Last reviewed: 2026-01-17
Next review: Monthly via automated monitoring
System status: ✅ Production Ready