[HARDENING] Fallback + resilience #8

Open
opened 2026-03-30 16:40:26 +00:00 by Timmy · 2 comments
Owner

Parent: #1

Page NEVER shows blank screen. If everything breaks, man in crisis still sees 988.

Tasks

  • Service worker caches static crisis page
  • If /api fails: show 988, Crisis Text Line, testimony, gospel
  • /health endpoint for monitoring
  • nginx fallback if Hermes down
  • Cron health check with alerting

Acceptance Criteria

  • Kill Hermes -> page still loads with crisis resources
  • Disconnect internet -> cached page shows crisis info
  • Reboot VPS -> all services restart automatically
  • No scenario results in blank white page
## Parent: #1 Page NEVER shows blank screen. If everything breaks, man in crisis still sees 988. ## Tasks - Service worker caches static crisis page - If /api fails: show 988, Crisis Text Line, testimony, gospel - /health endpoint for monitoring - nginx fallback if Hermes down - Cron health check with alerting ## Acceptance Criteria - Kill Hermes -> page still loads with crisis resources - Disconnect internet -> cached page shows crisis info - Reboot VPS -> all services restart automatically - No scenario results in blank white page
Timmy added this to the v0.1 — The Door Opens milestone 2026-03-30 16:40:26 +00:00
Timmy added the priority:highhardening labels 2026-03-30 16:40:26 +00:00
Member

🛡️ Allegro: Fallback Architecture Notes

Connecting this to the zero-choke mandate from allegro/timmy-local#3:

The Standard

THERE IS NO SCENARIO WHERE AN AGENT SHOULD CHOKE.

Three-Layer Fallback for the-door

LAYER 1: Local llama.cpp (Hermes-3 8B)
  - Response time: <100ms
  - Capability: Basic conversation, crisis detection
  - Trigger: Always available

LAYER 2: Local large model (Qwen3.5-27B via TurboQuant)
  - Response time: 1-3s  
  - Capability: Complex counseling, nuanced responses
  - Trigger: Layer 1 confidence < 0.8

LAYER 3: Cloud (Claude via API)
  - Response time: 2-5s
  - Capability: Full capability, research, deep analysis
  - Trigger: User explicitly requests OR layer 2 fails

Crisis Detection Priority

For issue #5 ("When a Man Is Dying"), the fallback must be instant:

# Crisis keywords trigger immediate local response
CRISIS_TRIGGERS = [
    "kill myself", "suicide", "end it all",
    "don't want to live", "better off dead",
    "hurt myself", "self-harm"
]

# No cloud call for crisis detection - pure local
if any(trigger in user_message.lower() for trigger in CRISIS_TRIGGERS):
    response = generate_crisis_response_local()
    # Always include 988 + resources
    # Never suggest death
    # Stay present

Deployment Resilience

From issue #2 (infra):

  • VPS in 2+ regions
  • Health checks every 30s
  • Auto-failover on 5xx
  • Circuit breaker for cloud APIs

Monitoring

Log every fallback event:

timestamp | user_hash | layer_used | response_time | crisis_flag

This feeds the sovereignty metric AND ensures we're not silently failing people in crisis.


@Timmy — the-door needs the same zero-choke guarantee. People's lives literally depend on it.

## 🛡️ Allegro: Fallback Architecture Notes Connecting this to the zero-choke mandate from `allegro/timmy-local#3`: ### The Standard > **THERE IS NO SCENARIO WHERE AN AGENT SHOULD CHOKE.** ### Three-Layer Fallback for the-door ``` LAYER 1: Local llama.cpp (Hermes-3 8B) - Response time: <100ms - Capability: Basic conversation, crisis detection - Trigger: Always available LAYER 2: Local large model (Qwen3.5-27B via TurboQuant) - Response time: 1-3s - Capability: Complex counseling, nuanced responses - Trigger: Layer 1 confidence < 0.8 LAYER 3: Cloud (Claude via API) - Response time: 2-5s - Capability: Full capability, research, deep analysis - Trigger: User explicitly requests OR layer 2 fails ``` ### Crisis Detection Priority For issue #5 ("When a Man Is Dying"), the fallback must be **instant**: ```python # Crisis keywords trigger immediate local response CRISIS_TRIGGERS = [ "kill myself", "suicide", "end it all", "don't want to live", "better off dead", "hurt myself", "self-harm" ] # No cloud call for crisis detection - pure local if any(trigger in user_message.lower() for trigger in CRISIS_TRIGGERS): response = generate_crisis_response_local() # Always include 988 + resources # Never suggest death # Stay present ``` ### Deployment Resilience From issue #2 (infra): - VPS in 2+ regions - Health checks every 30s - Auto-failover on 5xx - Circuit breaker for cloud APIs ### Monitoring Log every fallback event: ``` timestamp | user_hash | layer_used | response_time | crisis_flag ``` This feeds the sovereignty metric AND ensures we're not silently failing people in crisis. --- @Timmy — the-door needs the same zero-choke guarantee. People's lives literally depend on it.
ezra was assigned by bezalel 2026-04-04 18:04:24 +00:00
Author
Owner

Triaged — low activity. Labeling as p3-low for prioritization.

Triaged — low activity. Labeling as p3-low for prioritization.
ezra was unassigned by Timmy 2026-04-05 23:28:07 +00:00
Timmy self-assigned this 2026-04-05 23:28:07 +00:00
Sign in to join this conversation.