Zero data loss architecture with R2 WAL, 3-tier failover, DLQ auto-recovery, and self-healing infrastructure. Every event persisted to at least 2 independent systems before acknowledgment. Arquitectura de cero perdida de datos con R2 WAL, failover de 3 niveles, auto-recuperacion DLQ e infraestructura auto-reparable. Cada evento persistido en al menos 2 sistemas independientes antes de confirmacion.
Every event is written to Cloudflare R2 (11 nines durability) before being sent to ClickHouse. Events are batched as gzipped NDJSON files with a time-based key structure. If ClickHouse or the Go service is down, events are safely buffered in R2 and replayed on recovery. Cada evento se escribe en Cloudflare R2 (11 nueves de durabilidad) antes de enviarse a ClickHouse. Los eventos se agrupan en archivos NDJSON comprimidos con gzip con una estructura de key basada en tiempo. Si ClickHouse o el servicio Go esta caido, los eventos se almacenan de forma segura en R2 y se replayan al recuperarse.
// WAL key format wal/events/YYYY/MM/DD/HH/{ulid}.ndjson.gz // Example wal/events/2026/03/19/14/01JQABCDEF123456789.ndjson.gz // Each file contains ~100-1000 events as newline-delimited JSON // Compressed with gzip for ~5x size reduction
// Every event passes through this chain: Event arrives → R2 WAL (persisted, 11 nines) → ClickHouse (NVMe RAID-1) → Supabase (dual-write, purchases) → DragonflyDB (identity cache) // Even if Go crashes mid-flight, the event is already in R2 // R2 replay tool reconstructs any missing events
For disaster recovery, the R2 replay tool can reconstruct the entire ClickHouse database from WAL data. It reads gzipped NDJSON files from R2, decompresses them, and re-ingests into ClickHouse with deduplication. Para recuperacion de desastres, la herramienta de replay R2 puede reconstruir toda la base de datos ClickHouse desde los datos del WAL. Lee archivos NDJSON comprimidos de R2, los descomprime y los reingesta en ClickHouse con deduplicacion.
// Replay events from a specific date range replay --from 2026-03-18 --to 2026-03-19 // Replay all events (full disaster recovery) replay --from 2026-01-01 --to 2026-03-19 // Deduplication via event_id ensures no double-counting
At 25M events/month, the WAL generates ~2-3 GB of compressed data per month. R2 pricing: $0.015/GB storage + $0.36/M Class A writes. Total: $2-3/month for complete zero-data-loss guarantee. Con 25M eventos/mes, el WAL genera ~2-3 GB de datos comprimidos por mes. Precios R2: $0.015/GB almacenamiento + $0.36/M escrituras Clase A. Total: $2-3/mes para garantia completa de cero perdida de datos.
All 3 edge workers (Click Wrapper, Web Pixel, S2S Proxy) implement the same 3-tier failover pattern. If the primary Go ingest service is down, events are never lost. Los 3 edge workers (Click Wrapper, Web Pixel, S2S Proxy) implementan el mismo patron de failover de 3 niveles. Si el servicio Go primario esta caido, los eventos nunca se pierden.
// 3-Tier Failover — all edge workers Tier 1: Hetzner Go Ingest (primary) ingest.relo.mx via Cloudflare Tunnel CPX41: 16GB RAM, 8 vCPU ClickHouse + DragonflyDB co-located ↓ health check fails (3x in 90s) Tier 2: VPS Warm Standby (DigitalOcean) Always-on Go receiver on separate server Buffers events to disk as JSON files Bidirectional HA with Hetzner primary ↓ standby also unreachable Tier 3: R2 Dead Letter Queue (DLQ) Events written as JSON to R2 object store 11 nines durability, permanent storage Auto-replay loops recover events
Each edge worker has the same 3-tier pattern with slightly different recovery intervals:Cada edge worker tiene el mismo patron de 3 niveles con intervalos de recuperacion ligeramente diferentes:
Click Wrapper (t.relo.mx/c/:code) Tier 1: Go Ingest (<50ms) Tier 2: KV fallback (24h TTL, ~30s recovery) Tier 3: R2 DLQ (auto-replay every 30s) Web Pixel (p.relo.mx) Tier 1: Go Ingest (<100ms) Tier 2: VPS Standby Tier 3: R2 DLQ (auto-replay every 60s) S2S Proxy (s2s.relo.mx) Tier 1: Go Ingest (<200ms) Tier 2: VPS Warm Standby Tier 3: R2 DLQ (auto-replay every 60s)
Each DLQ has its own recovery loop that periodically checks for pending events and replays them to the Go ingest service. Recovery is fully automatic with no manual intervention. Cada DLQ tiene su propio bucle de recuperacion que periodicamente revisa si hay eventos pendientes y los replaya al servicio Go de ingesta. La recuperacion es completamente automatica sin intervencion manual.
// DLQ Recovery Loop Intervals S2S DLQ: every 60s — checks for failed S2S postbacks Pixel DLQ: every 60s — checks for failed pixel events Click DLQ: every 30s — checks for failed click events R2 WAL: every 60s — checks for un-processed WAL files // Recovery flow DLQ check → list pending objects → POST to Go ingest → delete on success → retry on failure (next cycle)
Layer 1: R2 WAL — Write-ahead log, 11 nines durability, gzipped NDJSON Layer 2: ClickHouse — NVMe RAID-1, 24-month TTL, ZSTD 8.6x compression 12 tables including 6 materialized views Layer 3: Supabase — Cloud PostgreSQL, dual-write for purchases 4 aggregate tables synced every 10 min Layer 4: VPS Standby — Bidirectional HA with Hetzner primary Layer 5: Daily Backups — Incremental to R2, full weekly export RPO: ~0 (real-time WAL) | RTO: <2 min (auto-failover)
ClickHouse materialized views auto-refresh every 5 minutes. The Go aggregator syncs these to 4 Supabase tables every 10 minutes with a 7-day lookback window. The admin dashboard reads exclusively from these aggregate tables, not from raw product_sales. Las vistas materializadas de ClickHouse se refrescan automaticamente cada 5 minutos. El agregador Go sincroniza estos a 4 tablas Supabase cada 10 minutos con una ventana de 7 dias. El dashboard admin lee exclusivamente de estas tablas de agregados, no de product_sales raw.
// 4 Aggregate Tables (ClickHouse → Supabase every 10 min) 1. partner_daily_aggregates — per partner/segment/day: units, orders, revenue, commission 2. geo_daily_aggregates — per city/region/day: units, orders, revenue 3. media_source_daily — per media source/day: units, orders, revenue 4. hourly_stats — per hour-of-day: units, orders (for heatmaps) // Orders tab uses direct ClickHouse proxy (Go endpoint) // All other dashboard tabs read from Supabase aggregates