Skip to content

SOP-DLV-01: Courier Watchdog v4.0

Proactive delivery exception detection, customer notification, and courier event traceability

Document ID: SOP-DLV-01-v4.0 Version: 4.0 Status: ✅ Production Ready Last Updated: 2026-03-19 Owner: Protocol Raw Operations Replaces: SOP-02-v3.0 Review Date: 2026-06-19


Key Changes in v4.0

  • ✅ Removed Make.com from the notification delivery path (Customer.io called directly via Edge Function)
  • ✅ Added explicit retry logic with exponential backoff (5 attempts: immediate, 5m, 15m, 1h, 6h)
  • ✅ Added notification state tracking columns on ops_events (notification_status, notification_attempts, notification_last_error, notification_next_retry_at)
  • ✅ Rebuilt monitoring to SOP-MON-01 pattern (fn_check_courier_watchdog_health_v2 via run-monitor)
  • ✅ Added idempotency index on courier_events (shipment_id, event_code, event_time)
  • ✅ Enabled RLS on courier_events, courier_event_mapping, and shipments
  • ✅ Converted SOP from PDF to markdown
  • ✅ Renamed from SOP-02 to SOP-DLV-01 (delivery operations namespace)
  • ❌ DPD inbound webhook still via Make.com (pending DPD UK API confirmation)

Purpose

The Courier Watchdog system proactively detects delivery exceptions reported by DPD, notifies customers with reassuring branded emails, and maintains full traceability of all courier events. It operates automatically with minimal manual intervention using real-time database triggers.

The system exists to protect the customer experience during delivery. When something goes wrong (missed delivery, weather delay, address issue, package damage), the customer should hear from Protocol Raw before they need to chase it themselves.


Scope

Applies to: All DPD courier exception events for Protocol Raw shipments

Components: - 4 database tables (courier_events, courier_event_mapping, shipments, ops_events) - 1 PostgreSQL processing function (process_courier_event) - 1 PostgreSQL trigger function (notify_courier_exception) - 1 Edge Function for notification delivery (send-courier-notification) - 1 PostgreSQL retry function (fn_retry_courier_notifications_v1) - 1 PostgreSQL monitoring function (fn_check_courier_watchdog_health_v2) - 1 Customer.io campaign with 5 email templates - 2 pg_cron jobs (retry every 5 min, health check every 15 min)

Related SOPs: - SOP-MON-01: Monitoring and Alerting Architecture (monitoring pattern) - SOP-ORD-01: Order Lifecycle (upstream, creates shipments) - SOP-PACK-01: Pick, Pack and Dispatch (upstream, dispatches shipments) - SOP-CS-00: Customer Operations System (delivery tracking in Customer Portal) - SOP-LC-01: Lifecycle Communications (dispatch/delivery confirmation emails)


System Architecture

Data Flow

DPD Exception Event
Make.com Webhook (inbound only)
raw_ops.process_courier_event()
courier_events (logged) + ops_events (CUSTOMER_EVENT created)
Database Trigger: trigger_notify_courier_exception
notify_courier_exception() → pg_net
Edge Function: send-courier-notification
Customer.io Track API (courier_exception event)
Campaign: "Courier Exception Handler" (multi-split by notification_template_id)
Branded email to customer

Processing time: Less than 10 seconds end-to-end from DPD event to customer inbox.

Component Responsibilities

Make.com (inbound only): Receives DPD webhook events and calls process_courier_event() via Supabase REST API. This is the only Make.com involvement. Make.com does not handle notification delivery or monitoring.

PostgreSQL (process_courier_event): Core processing function. Looks up the event code in courier_event_mapping, logs to courier_events, retrieves customer and shipment data, and creates a CUSTOMER_EVENT in ops_events with the full notification payload.

Database trigger (notify_courier_exception): Fires on INSERT to ops_events where kind = 'CUSTOMER_EVENT' and customer_event_type = 'courier_exception'. Calls the Edge Function via pg_net (async, non-blocking).

Edge Function (send-courier-notification): Delivers the notification to Customer.io Track API directly. Handles idempotency (safe to call multiple times for the same event), success/failure tracking, and exponential backoff scheduling for retries.

pg_cron retry processor: Runs every 5 minutes. Picks up events where notification_status = 'pending' and notification_next_retry_at <= now(), re-invokes the Edge Function. Maximum 5 attempts per event.

pg_cron health monitor: Runs every 15 minutes via run-monitor. Checks for stuck and permanently failed notifications, alerts via Slack following SOP-MON-01 pattern.


Event Code Mapping

Code Event Type Template ID Customer Message Priority
01 customer_not_home missed_delivery_retry We missed you today. Will retry tomorrow medium
02 premises_closed missed_delivery_retry Premises closed. Will retry tomorrow medium
03 address_issue address_problem Address issue. Please confirm details high
08 access_issue access_problem Driver unable to access. Will retry medium
09 weather_delay weather_delay Weather delay. Box is safe low
12 damaged_in_transit package_damage_review Package damage. Under review high

Source table: raw_ops.courier_event_mapping

SELECT event_code, event_type, notification_template_id, requires_customer_notification, priority
FROM raw_ops.courier_event_mapping
ORDER BY event_code;

Email Templates (Customer.io)

All templates live in the Customer.io campaign "Courier Exception Handler". The campaign uses a multi-split branch on the notification_template_id event attribute.

Template 1: Missed Delivery (Will Retry) - Template ID: missed_delivery_retry - Subject: Delivery update: We're monitoring your box - Trigger codes: 01, 02 - Tone: Reassuring, proactive - Key message: Packaging is over-engineered for 48-72 hour delays

Template 2: Access Issue - Template ID: access_problem - Subject: Delivery update: Access issue detected - Trigger code: 08 - Tone: Helpful, solution-focused - Key message: Will retry with updated instructions

Template 3: Weather Delay - Template ID: weather_delay - Subject: Delivery update: Weather delay in your area - Trigger code: 09 - Tone: Calm, reassuring - Key message: Safety first, packaging handles delays

Template 4: Address Issue (High Priority) - Template ID: address_problem - Subject: Action needed: Address issue detected - Trigger code: 03 - Tone: Urgent but helpful - Key message: Customer action required to resolve - SLA: 2-hour manual follow-up

Template 5: Package Damage (Automated) - Template ID: package_damage_review - Subject: Important: Issue with your Protocol Raw delivery - Trigger code: 12 - Tone: Clear, systematic - Key message: Team reviewing, will contact within 4 hours - SLA: 4-hour resolution


Notification Delivery

Edge Function: send-courier-notification

Location: supabase/functions/send-courier-notification/index.ts JWT Verification: Disabled (internal calls from pg_net) Secrets Required: CUSTOMERIO_SITE_ID, CUSTOMERIO_API_KEY

Behaviour:

  1. Receives { event_id } from the database trigger or retry processor
  2. Fetches the event from ops_events
  3. Idempotency check: If processed_at is already set or notification_status = 'sent', returns success without re-sending
  4. Skip check: If no email in the payload, marks as notification_status = 'skipped' and returns
  5. Sends courier_exception event to Customer.io Track API with the full payload
  6. On success: sets processed_at, notification_status = 'sent', increments notification_attempts
  7. On failure: increments notification_attempts, records error in notification_last_error, calculates next retry time with exponential backoff, sets notification_status = 'pending' (or 'failed' if max attempts exhausted)
  8. On max retries exhausted (5 attempts): sends alert to #ops-alerts via ops-alerter

Customer.io Payload

The Edge Function sends to the Customer.io Track API v1:

POST https://track.customer.io/api/v1/customers/{email}/events
Authorization: Basic {base64(site_id:api_key)}
{
  "name": "courier_exception",
  "data": {
    "notification_template_id": "missed_delivery_retry",
    "event_type": "customer_not_home",
    "event_description": "Customer not home - will retry tomorrow",
    "first_name": "Alice",
    "tracking_number": "DPD123456789GB",
    "tracking_url": "https://track.dpd.co.uk/..."
  }
}

Note: Tracking info is sourced from nested payload fields at payload.shipment.tracking_no and payload.shipment.tracking_url.

Retry Logic

Attempt Backoff Cumulative Wait
1 Immediate (trigger fires) 0
2 5 minutes 5 min
3 15 minutes 20 min
4 1 hour 1 hr 20 min
5 6 hours 7 hr 20 min

After 5 failed attempts, the event is marked notification_status = 'failed' and an alert is sent to Slack.

Notification Status Values

Status Meaning
NULL Not a notification event, or not yet attempted
pending Awaiting delivery or retry
sent Successfully delivered to Customer.io
failed Exhausted all retry attempts
skipped No notification needed (e.g. no email in payload)

Database Schema

Tables

raw_ops.courier_events — Complete log of all courier events

-- Key columns (inspect with \d raw_ops.courier_events for full schema)
-- shipment_id UUID REFERENCES raw_ops.shipments(id)
-- event_code TEXT
-- event_type TEXT
-- event_text TEXT
-- event_time TIMESTAMPTZ
-- requires_customer_notification BOOLEAN
-- raw_payload JSONB
-- created_at TIMESTAMPTZ DEFAULT now()

raw_ops.courier_event_mapping — Maps DPD event codes to notification templates

-- courier_name TEXT (e.g. 'DPD')
-- event_code TEXT
-- event_type TEXT
-- notification_template_id TEXT
-- requires_customer_notification BOOLEAN
-- priority TEXT

raw_ops.ops_events — Notification queue (courier events create CUSTOMER_EVENT rows)

Notification tracking columns (added in v4.0):

-- notification_attempts INTEGER DEFAULT 0
-- notification_last_error TEXT
-- notification_status TEXT DEFAULT NULL  (NULL, pending, sent, failed, skipped)
-- notification_next_retry_at TIMESTAMPTZ

Indexes

-- Shipment lookups during event processing
CREATE INDEX idx_courier_events_shipment_id
  ON raw_ops.courier_events(shipment_id);

-- Time-based queries for recent exceptions
CREATE INDEX idx_courier_events_created_at
  ON raw_ops.courier_events(created_at DESC);

-- Monitoring unprocessed events (partial index)
CREATE INDEX idx_ops_events_courier_unprocessed
  ON raw_ops.ops_events(created_at DESC)
  WHERE processed_at IS NULL
    AND kind = 'CUSTOMER_EVENT'
    AND customer_event_type = 'courier_exception';

-- Event code mapping lookups
CREATE INDEX idx_courier_event_mapping_code
  ON raw_ops.courier_event_mapping(courier_name, event_code);

-- Idempotency guard (v4.0)
CREATE UNIQUE INDEX idx_courier_events_idempotency
  ON raw_ops.courier_events(shipment_id, event_code, event_time);

Row Level Security

All tables have RLS enabled. No policies defined. Service role bypasses RLS. Anon role blocked.

Table RLS Enabled
courier_events
courier_event_mapping
shipments
ops_events

Core Processing Function

-- raw_ops.process_courier_event(
--   p_shipment_id UUID,
--   p_courier_name TEXT,
--   p_event_code TEXT,
--   p_event_text TEXT,
--   p_event_time TIMESTAMPTZ,
--   p_raw_payload JSONB
-- )
--
-- Called by Make.com inbound webhook. Performs:
-- 1. Looks up event code in courier_event_mapping
-- 2. Logs event to courier_events
-- 3. Retrieves customer and shipment data
-- 4. Creates CUSTOMER_EVENT in ops_events with full notification payload
-- 5. Returns notification instructions

Trigger Function

-- raw_ops.notify_courier_exception()
--
-- Trigger: AFTER INSERT ON raw_ops.ops_events
-- Condition: NEW.kind = 'CUSTOMER_EVENT' AND NEW.customer_event_type = 'courier_exception'
--
-- Action: Calls send-courier-notification Edge Function via pg_net
-- with { "event_id": NEW.id }

Retry Processor

-- raw_ops.fn_retry_courier_notifications_v1()
--
-- Schedule: Every 5 minutes via pg_cron
-- Picks up: ops_events where
--   kind = 'CUSTOMER_EVENT'
--   customer_event_type = 'courier_exception'
--   processed_at IS NULL
--   notification_status IS NULL OR 'pending'
--   notification_attempts < 5
--   notification_next_retry_at IS NULL OR <= now()
--   created_at > now() - interval '24 hours'
--
-- Processes up to 20 events per run (SKIP LOCKED to prevent double-firing)
-- Capacity: 240 retries/hour (sufficient for Phase A volumes)

Monitoring

Health Check: courier_watchdog_health

Function: raw_ops.fn_check_courier_watchdog_health_v2() Schedule: Every 15 minutes via run-monitor Pattern: SOP-MON-01 (pg_cron → pg_net → run-monitor → PostgreSQL function → monitoring_runs → ops-alerter)

Checks:

  1. Stuck events: Unprocessed CUSTOMER_EVENT courier exceptions older than 1 hour (within 24-hour lookback)
  2. Permanently failed notifications: Events with notification_status = 'failed' in the last 24 hours

Alert Thresholds:

Condition Severity Channel
3+ failed notifications in 24h critical #ops-urgent
1-2 failed notifications in 24h warning #ops-alerts
Events stuck for more than 60 minutes warning #ops-alerts
No issues No alert

Verify monitoring is running:

SELECT check_name, status, duration_ms, result_summary, run_at
FROM raw_ops.monitoring_runs
WHERE check_name = 'courier_watchdog_health'
ORDER BY run_at DESC
LIMIT 5;

pg_cron Jobs

Job Name Schedule Purpose
dlv-01-retry-notifications */5 * * * * Retry pending notifications with exponential backoff
monitor-dlv-01-watchdog-health */15 * * * * Health check via run-monitor

View jobs:

SELECT jobid, jobname, schedule, active
FROM cron.job
WHERE jobname LIKE 'dlv-01%' OR jobname LIKE 'monitor-dlv-01%'
ORDER BY jobname;

Operational Queries

Check Unprocessed Events

SELECT
  id,
  customer_event_type,
  message,
  notification_status,
  notification_attempts,
  notification_last_error,
  notification_next_retry_at,
  created_at,
  event_payload->>'email' AS email
FROM raw_ops.ops_events
WHERE processed_at IS NULL
  AND kind = 'CUSTOMER_EVENT'
  AND customer_event_type = 'courier_exception'
ORDER BY created_at DESC;

View Recent Courier Exceptions

SELECT
  ce.event_code,
  ce.event_type,
  ce.event_text,
  ce.event_time,
  s.tracking_no,
  c.email
FROM raw_ops.courier_events ce
JOIN raw_ops.shipments s ON ce.shipment_id = s.id
JOIN raw_ops.orders o ON s.order_id = o.id
JOIN raw_ops.customers c ON o.customer_id = c.id
WHERE ce.created_at > NOW() - INTERVAL '7 days'
ORDER BY ce.event_time DESC;

Exception Summary by Type

SELECT
  event_type,
  COUNT(*) AS total_events,
  COUNT(*) FILTER (WHERE requires_customer_notification) AS notified
FROM raw_ops.courier_events
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY event_type
ORDER BY total_events DESC;

Notification Delivery Success Rate

SELECT
  notification_status,
  COUNT(*) AS count,
  ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM raw_ops.ops_events
WHERE kind = 'CUSTOMER_EVENT'
  AND customer_event_type = 'courier_exception'
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY notification_status
ORDER BY count DESC;

Average Time-to-Notify

SELECT
  ROUND(AVG(EXTRACT(EPOCH FROM (processed_at - created_at))), 1) AS avg_seconds,
  ROUND(MAX(EXTRACT(EPOCH FROM (processed_at - created_at))), 1) AS max_seconds,
  COUNT(*) AS sample_size
FROM raw_ops.ops_events
WHERE kind = 'CUSTOMER_EVENT'
  AND customer_event_type = 'courier_exception'
  AND notification_status = 'sent'
  AND created_at > NOW() - INTERVAL '30 days';

Troubleshooting

Event Not Sending to Customer.io

Check 1: Is the event in ops_events?

SELECT id, notification_status, notification_attempts, notification_last_error, processed_at
FROM raw_ops.ops_events
WHERE customer_event_type = 'courier_exception'
ORDER BY created_at DESC LIMIT 5;

Check 2: Did the Edge Function fire?

Go to Supabase Dashboard → Edge Functions → send-courier-notification → Logs. Check for recent invocations and any errors.

Check 3: Is the retry processor running?

SELECT *
FROM cron.job_run_details
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'dlv-01-retry-notifications')
ORDER BY start_time DESC LIMIT 5;

Check 4: Is the notification permanently failed?

If notification_status = 'failed' and notification_attempts = 5, the event has exhausted retries. Check notification_last_error for the root cause. Common causes: - Customer.io API credentials invalid or expired - Customer.io API rate limited - Invalid email address in payload - Customer.io campaign paused or deleted

Manual re-trigger (after fixing root cause):

UPDATE raw_ops.ops_events
SET
  notification_status = 'pending',
  notification_attempts = 0,
  notification_last_error = NULL,
  notification_next_retry_at = NULL
WHERE id = '[EVENT_ID]';

The retry processor will pick it up within 5 minutes.

Email Not Sending from Customer.io

Check 1: Campaign "Courier Exception Handler" is active in Customer.io Check 2: Event appears in the customer's activity log in Customer.io Check 3: Multi-split branch condition matches: Event attribute notification_template_id equals the expected value

Wrong Email Template Sent

SELECT * FROM raw_ops.courier_event_mapping
WHERE event_code = '[CODE]';

-- Fix:
UPDATE raw_ops.courier_event_mapping
SET notification_template_id = '[CORRECT_TEMPLATE]'
WHERE event_code = '[CODE]';

Testing

Create Test Event

-- Get a real shipment ID
SELECT id FROM raw_ops.shipments LIMIT 1;

-- Insert test event
INSERT INTO raw_ops.ops_events (
  kind, entity_type, entity_id, customer_event_type,
  message, event_payload, created_at, processed_at
) VALUES (
  'CUSTOMER_EVENT', 'shipment',
  '[SHIPMENT_ID]'::uuid,
  'courier_exception',
  'TEST: Direct notification path',
  jsonb_build_object(
    'email', 'test@example.com',
    'first_name', 'Test',
    'notification_template_id', 'missed_delivery_retry',
    'event_type', 'customer_not_home',
    'event_description', 'Testing direct Customer.io path',
    'shipment', jsonb_build_object(
      'tracking_no', 'TEST123456789GB',
      'tracking_url', 'https://track.dpd.co.uk/test'
    )
  ),
  NOW(), NULL
);

Verify

  1. processed_at should be set within 10 seconds
  2. notification_status should be 'sent'
  3. notification_attempts should be 1
  4. Customer.io shows the event in the test customer's activity (if using a real email)

Cleanup

DELETE FROM raw_ops.ops_events
WHERE event_payload->>'email' = 'test@example.com';

Manual Interventions

High Priority Events (approximately 2% of exceptions)

Address Issue (Code 03): - Customer receives automated email asking to confirm address - Action: Monitor replies, manually update DPD if needed - SLA: Respond within 2 hours - Volume at 100k customers: approximately 600/year (50/month)

Damaged Package (Code 12): - Customer receives automated "under review" email - Action: Manual assessment of damage, approve replacement/refund - SLA: Contact customer within 4 hours - Volume at 100k customers: approximately 300-500/year (25-42/month)

Customer Refused (Code 05): - No automatic email (rare edge case) - Action: Contact customer to understand reason - SLA: Reach out within 24 hours - Volume at 100k customers: fewer than 50/year


Success Metrics

Operational KPIs

Metric Target
Event processing time Less than 10 seconds
Notification success rate Greater than 99%
Manual intervention rate Less than 2%
Query performance Less than 10ms
Automated monitoring coverage 100%

Customer Experience KPIs

Metric Target Tracking
Email open rate Greater than 60% Customer.io dashboard
Customer reply rate Track for feedback Manual review
Delivery success after exception Greater than 95% DPD final delivery data

System Health KPIs

Metric Target Alert Threshold
Stuck events 0 Any event stuck more than 1 hour
Notification failures 0 1+ permanently failed in 24h
Edge Function errors Less than 0.1% Greater than 1% error rate
Customer.io delivery failures Less than 0.1% Greater than 1% failure rate

Cost and Capacity

Assumptions: - Exception rate: 2-3% of shipments (industry standard) - Average customer: 12 deliveries/year (1 box/month) - Manual intervention rate: 2% of exceptions (high-priority only)

Customers Shipments/Year Exceptions/Year Exceptions/Day Edge Function Calls Customer.io Emails
1,000 12,000 300-360 1 ~360 ~300
5,000 60,000 1,500-1,800 4-5 ~1,800 ~1,500
10,000 120,000 3,000-3,600 8-10 ~3,600 ~3,000
25,000 300,000 7,500-9,000 21-25 ~9,000 ~7,500
100,000 1,200,000 30,000-36,000 82-99 ~36,000 ~30,000

All volumes comfortably within Supabase Edge Function limits and Customer.io plan capacity. No architectural changes required at any scale.


Outstanding Items

DPD UK Direct API (Tier 2)

The inbound DPD event path currently runs through a Make.com webhook. This is the last Make.com dependency in the Courier Watchdog system.

Action: Confirm with DPD account manager (via Jamie) whether their UK API supports webhook callbacks (e.g. /status/events/subscribetoparcel). If yes, build a dpd-webhook-handler Edge Function and subscribe on shipment creation, removing Make.com from the inbound path entirely. If no, Make.com stays as the legitimate inbound bridge.

Metabase Dashboard

Build a Courier Watchdog dashboard with: - Exception patterns by type (30-day rolling) - Notification success rate - Average time-to-notify - Exception volume trends - Failed notification log

Endpoint Security (Phase B)

send-courier-notification currently runs with --no-verify-jwt. Acceptable for Phase A (internal calls only via pg_net). Harden with JWT verification or shared secret in Phase B.


Review Schedule

Weekly: Analyse exception patterns by event code, review customer responses to emails, check for new DPD event codes.

Monthly: Review email open/click rates (target greater than 60%), update templates based on customer feedback, optimise event code mapping if needed.


Version History

Version Date Changes Author
4.0 2026-03-19 Removed Make.com from notification path, added retry with exponential backoff, rebuilt monitoring to SOP-MON-01 pattern, added notification state tracking, idempotency index, RLS, renamed to SOP-DLV-01, converted from PDF to markdown Anton
3.0 2025-11-02 Switched from polling to real-time webhooks, added database indexes, added automated Slack monitoring, added damaged package template, 98% automation coverage Anton
2.0 2025-10-24 Added automatic event processing function, integrated Customer.io Event Bridge, created 4 branded email templates, 95% automation coverage Anton
1.0 2025-10-21 Initial pilot version, manual event logging, 50-60% automation coverage, basic notification system Anton