ReplayState Operational Reliability

Operational Reliability Controls

This control set documents how ReplayState remains available during routine deploys and runtime interruptions. It focuses on completion continuity, stall recovery, and explicit verification evidence.

Active Controls (as of March 5, 2026 UTC)

Control Implementation Status
Service Recovery `blockenv-demo.service` is configured with `Restart=always` to recover from process exits. Active
Stall Detection `replaystate-watchdog.timer` runs every 60s and checks `/api/health`; failed checks trigger service restart. Active
Deployment Health Gate `scripts/deploy_replaystate.sh` verifies local health and public `https://replaystate.com` before declaring success. Active
Evidence Validation `scripts/institutional_smoke.sh` and evidence packet checks run deterministic proof workflows. Active
Source-Level Queue Semantics Retry backoff policy and timeout-state transitions in queue worker internals. Scheduled for source workstream

Verification Commands

systemctl status blockenv-demo.service --no-pager
systemctl list-timers replaystate-watchdog.timer --all
./scripts/deploy_replaystate.sh --restart --tries 20 --sleep 1
./scripts/institutional_smoke.sh

Scope