feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed

This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00
parent 8d85f95ee5
commit d8d976aa60
41 changed files with 6735 additions and 254 deletions
--- a/docs/adr/017-event-logging.md
+++ b/docs/adr/017-event-logging.md
@@ -0,0 +1,73 @@
+# ADR 017: Event Logging System
+
+## Status
+Accepted
+
+## Context
+The swarm system needed a way to audit all agent actions, task lifecycle events, and system events. Without centralized logging, debugging failures and understanding system behavior required grep-ing through application logs.
+
+## Decision
+Implement a centralized event logging system in SQLite (`event_log` table) that captures all significant events with structured data.
+
+## Event Types
+
+| Type | Description |
+|------|-------------|
+| `task.created` | New task posted |
+| `task.bidding` | Task opened for bidding |
+| `task.assigned` | Task assigned to agent |
+| `task.started` | Agent started working |
+| `task.completed` | Task finished successfully |
+| `task.failed` | Task failed |
+| `agent.joined` | New agent registered |
+| `agent.left` | Agent deregistered |
+| `bid.submitted` | Agent submitted bid |
+| `tool.called` | Tool execution started |
+| `tool.completed` | Tool execution finished |
+| `system.error` | System error occurred |
+
+## Schema
+```sql
+CREATE TABLE event_log (
+    id TEXT PRIMARY KEY,
+    event_type TEXT NOT NULL,
+    source TEXT NOT NULL,
+    task_id TEXT,
+    agent_id TEXT,
+    data TEXT,  -- JSON
+    timestamp TEXT NOT NULL
+);
+```
+
+## Usage
+
+```python
+from swarm.event_log import log_event, EventType, get_task_events
+
+# Log an event
+log_event(
+    event_type=EventType.TASK_ASSIGNED,
+    source="coordinator",
+    task_id=task.id,
+    agent_id=winner.agent_id,
+    data={"bid_sats": winner.bid_sats},
+)
+
+# Query events
+events = get_task_events(task_id)
+summary = get_event_summary(minutes=60)
+```
+
+## Integration
+The coordinator automatically logs:
+- Task creation, assignment, completion, failure
+- Agent join/leave events
+- System warnings and errors
+
+## Consequences
+- **Positive**: Complete audit trail, easy debugging, analytics support
+- **Negative**: Additional database writes, storage growth over time
+
+## Mitigations
+- `prune_events()` function removes events older than N days
+- Indexes on `task_id`, `agent_id`, and `timestamp` for fast queries
--- a/docs/adr/018-lightning-ledger.md
+++ b/docs/adr/018-lightning-ledger.md
@@ -0,0 +1,99 @@
+# ADR 018: Lightning Network Transaction Ledger
+
+## Status
+Accepted
+
+## Context
+The system needed to track all Lightning Network payments (incoming and outgoing) for accounting, dashboard display, and audit purposes. The existing payment handler created invoices but didn't persist transaction history.
+
+## Decision
+Implement a SQLite-based ledger (`ledger` table) that tracks all Lightning transactions with their lifecycle status.
+
+## Transaction Types
+
+| Type | Description |
+|------|-------------|
+| `incoming` | Invoice created (we're receiving payment) |
+| `outgoing` | Payment sent (we're paying someone) |
+
+## Transaction Status
+
+| Status | Description |
+|--------|-------------|
+| `pending` | Awaiting settlement |
+| `settled` | Payment completed |
+| `failed` | Payment failed |
+| `expired` | Invoice expired |
+
+## Schema
+```sql
+CREATE TABLE ledger (
+    id TEXT PRIMARY KEY,
+    tx_type TEXT NOT NULL,
+    status TEXT NOT NULL DEFAULT 'pending',
+    payment_hash TEXT UNIQUE NOT NULL,
+    amount_sats INTEGER NOT NULL,
+    memo TEXT,
+    invoice TEXT,
+    preimage TEXT,
+    source TEXT NOT NULL,
+    task_id TEXT,
+    agent_id TEXT,
+    created_at TEXT NOT NULL,
+    settled_at TEXT,
+    fee_sats INTEGER DEFAULT 0
+);
+```
+
+## Usage
+
+```python
+from lightning.ledger import (
+    create_invoice_entry,
+    mark_settled,
+    get_balance,
+)
+
+# Create invoice record
+entry = create_invoice_entry(
+    payment_hash=invoice.payment_hash,
+    amount_sats=1000,
+    memo="API access",
+    source="payment_handler",
+    task_id=task.id,
+)
+
+# Mark as paid
+mark_settled(payment_hash, preimage="secret")
+
+# Get balance
+balance = get_balance()
+print(f"Net: {balance['net_sats']} sats")
+```
+
+## Integration
+The `PaymentHandler` automatically:
+- Creates ledger entries when invoices are created
+- Updates status when payments are checked/settled
+- Tracks fees for outgoing payments
+
+## Balance Calculation
+```python
+{
+    "incoming_total_sats": total_received,
+    "outgoing_total_sats": total_sent,
+    "fees_paid_sats": total_fees,
+    "net_sats": incoming - outgoing - fees,
+    "pending_incoming_sats": pending_received,
+    "pending_outgoing_sats": pending_sent,
+    "available_sats": net - pending_outgoing,
+}
+```
+
+## Consequences
+- **Positive**: Complete payment history, balance tracking, audit trail
+- **Negative**: Additional DB writes, must keep in sync with actual Lightning node
+
+## Future Work
+- Reconciliation job to sync with LND node
+- Export to accounting formats (CSV, QIF)
--- a/docs/adr/019-semantic-memory.md
+++ b/docs/adr/019-semantic-memory.md
@@ -0,0 +1,114 @@
+# ADR 019: Semantic Memory (Vector Store)
+
+## Status
+Accepted
+
+## Context
+The Echo agent needed the ability to remember conversations, facts, and context across sessions. Simple keyword search was insufficient for finding relevant historical context.
+
+## Decision
+Implement a vector-based semantic memory store using SQLite with optional sentence-transformers embeddings.
+
+## Context Types
+
+| Type | Description |
+|------|-------------|
+| `conversation` | User/agent dialogue |
+| `fact` | Extracted facts about user/system |
+| `document` | Uploaded documents |
+
+## Schema
+```sql
+CREATE TABLE memory_entries (
+    id TEXT PRIMARY KEY,
+    content TEXT NOT NULL,
+    source TEXT NOT NULL,
+    context_type TEXT NOT NULL DEFAULT 'conversation',
+    agent_id TEXT,
+    task_id TEXT,
+    session_id TEXT,
+    metadata TEXT,  -- JSON
+    embedding TEXT,  -- JSON array of floats
+    timestamp TEXT NOT NULL
+);
+```
+
+## Embedding Strategy
+
+**Primary**: sentence-transformers `all-MiniLM-L6-v2` (384 dimensions)
+- High quality semantic similarity
+- Local execution (no cloud)
+- ~80MB model download
+
+**Fallback**: Character n-gram hash embedding
+- No external dependencies
+- Lower quality but functional
+- Enables system to work without heavy ML deps
+
+## Usage
+
+```python
+from memory.vector_store import (
+    store_memory,
+    search_memories,
+    get_memory_context,
+)
+
+# Store a memory
+store_memory(
+    content="User prefers dark mode",
+    source="user",
+    context_type="fact",
+    agent_id="echo",
+)
+
+# Search for relevant context
+results = search_memories(
+    query="user preferences",
+    agent_id="echo",
+    limit=5,
+)
+
+# Get formatted context for LLM
+context = get_memory_context(
+    query="what does user like?",
+    max_tokens=1000,
+)
+```
+
+## Integration Points
+
+### Echo Agent
+Echo should store all conversations and retrieve relevant context when answering questions about "what we discussed" or "what we know".
+
+### Task Context
+Task handlers can query for similar past tasks:
+```python
+similar = search_memories(
+    query=task.description,
+    context_type="conversation",
+    limit=3,
+)
+```
+
+## Similarity Scoring
+
+**Cosine Similarity** (when embeddings available):
+```python
+score = dot(a, b) / (norm(a) * norm(b))  # -1 to 1
+```
+
+**Keyword Overlap** (fallback):
+```python
+score = len(query_words & content_words) / len(query_words)
+```
+
+## Consequences
+- **Positive**: Semantic search finds related content even without keyword matches
+- **Negative**: Embedding computation adds latency (~10-100ms per query)
+- **Mitigation**: Background embedding computation, caching
+
+## Future Work
+- sqlite-vss extension for vector similarity index
+- Memory compression for long-term storage
+- Automatic fact extraction from conversations
--- a/docs/adr/020-cascade-router-integration.md
+++ b/docs/adr/020-cascade-router-integration.md
@@ -0,0 +1,126 @@
+# ADR 020: Cascade Router Integration with Timmy Agent
+
+## Status
+Proposed
+
+## Context
+Currently, the Timmy agent (`src/timmy/agent.py`) uses `src/timmy/backends.py` which provides a simple abstraction over Ollama and AirLLM. However, this lacks:
+- Automatic failover between multiple LLM providers
+- Circuit breaker pattern for failing providers
+- Cost and latency tracking per provider
+- Priority-based routing (local first, then APIs)
+
+The Cascade Router (`src/router/cascade.py`) already implements these features but is not integrated with Timmy.
+
+## Decision
+Integrate the Cascade Router as the primary LLM routing layer for Timmy, replacing the direct backend abstraction.
+
+## Architecture
+
+### Current Flow
+```
+User Request → Timmy Agent → backends.py → Ollama/AirLLM
+```
+
+### Proposed Flow
+```
+User Request → Timmy Agent → Cascade Router → Provider 1 (Ollama)
+                                      ↓ (if fail)
+                                 Provider 2 (Local AirLLM)
+                                      ↓ (if fail)
+                                 Provider 3 (API - optional)
+                                      ↓
+                                 Track metrics per provider
+```
+
+### Integration Points
+
+1. **Timmy Agent** (`src/timmy/agent.py`)
+   - Replace `create_timmy()` backend initialization
+   - Use `CascadeRouter.complete()` instead of direct `agent.run()`
+   - Expose provider status in agent responses
+
+2. **Cascade Router** (`src/router/cascade.py`)
+   - Already supports: Ollama, OpenAI, Anthropic, AirLLM
+   - Already has: Circuit breakers, metrics, failover logic
+   - Add: Integration with existing `src/timmy/prompts.py`
+
+3. **Configuration** (`config.yaml` or `config.py`)
+   - Provider list with priorities
+   - API keys (optional, for cloud fallback)
+   - Circuit breaker thresholds
+
+4. **Dashboard** (new route)
+   - `/router/status` - Show provider health, metrics, recent failures
+   - Real-time provider status indicator
+
+### Provider Priority Order
+
+1. **Ollama (local)** - Priority 1, always try first
+2. **AirLLM (local)** - Priority 2, if Ollama unavailable
+3. **API providers** - Priority 3+, only if configured
+
+### Data Flow
+
+```python
+# Timmy Agent
+async def respond(self, message: str) -> str:
+    # Get cascade router
+    router = get_cascade_router()
+    
+    # Route through cascade with automatic failover
+    response = await router.complete(
+        messages=[{"role": "user", "content": message}],
+        system_prompt=TIMMY_SYSTEM_PROMPT,
+    )
+    
+    # Response includes which provider was used
+    return response.content
+```
+
+## Schema Additions
+
+### Provider Status Table (new)
+```sql
+CREATE TABLE provider_metrics (
+    provider_name TEXT PRIMARY KEY,
+    total_requests INTEGER DEFAULT 0,
+    successful_requests INTEGER DEFAULT 0,
+    failed_requests INTEGER DEFAULT 0,
+    avg_latency_ms REAL DEFAULT 0,
+    last_error_time TEXT,
+    circuit_state TEXT DEFAULT 'closed',
+    updated_at TEXT
+);
+```
+
+## Consequences
+
+### Positive
+- Automatic failover improves reliability
+- Metrics enable data-driven provider selection
+- Circuit breakers prevent cascade failures
+- Configurable without code changes
+
+### Negative
+- Additional complexity in request path
+- Potential latency increase from retries
+- Requires careful circuit breaker tuning
+
+### Mitigations
+- Circuit breakers have short recovery timeouts (60s)
+- Metrics exposed for monitoring
+- Fallback to mock responses if all providers fail
+
+## Implementation Plan
+
+1. Create `src/timmy/cascade_adapter.py` - Adapter between Timmy and Cascade Router
+2. Modify `src/timmy/agent.py` - Use adapter instead of direct backends
+3. Create dashboard route `/router/status` - Provider health UI
+4. Add provider metrics persistence to SQLite
+5. Write tests for failover scenarios
+
+## Dependencies
+- Existing `src/router/cascade.py`
+- Existing `src/timmy/agent.py`
+- New dashboard route
--- a/docs/adr/021-self-upgrade-approval-queue.md
+++ b/docs/adr/021-self-upgrade-approval-queue.md
@@ -0,0 +1,189 @@
+# ADR 021: Self-Upgrade Approval Queue
+
+## Status
+Proposed
+
+## Context
+The self-modification system (`src/self_modify/loop.py`) can generate code changes autonomously. However, it currently either:
+- Applies changes immediately (risky)
+- Requires manual git review (slow)
+
+We need an approval queue where changes are staged for human review before application.
+
+## Decision
+Implement a dashboard-based approval queue for self-modifications with the following states:
+`proposed` → `approved` | `rejected` → `applied` | `failed`
+
+## Architecture
+
+### State Machine
+```
+                    ┌─────────────┐
+                    │   PROPOSED  │
+                    └──────┬──────┘
+                           │
+           ┌───────────────┼───────────────┐
+           │               │               │
+           ▼               ▼               ▼
+    ┌────────────┐  ┌────────────┐  ┌────────────┐
+    │  APPROVED  │  │  REJECTED  │  │  EXPIRED   │
+    └──────┬─────┘  └────────────┘  └────────────┘
+           │
+           ▼
+    ┌────────────┐
+    │  APPLIED   │
+    └──────┬─────┘
+           │
+           ▼
+    ┌────────────┐
+    │   FAILED   │
+    └────────────┘
+```
+
+### Components
+
+1. **Database Table** (`upgrades` table)
+   ```sql
+   CREATE TABLE upgrades (
+       id TEXT PRIMARY KEY,
+       status TEXT NOT NULL,  -- proposed, approved, rejected, applied, failed
+       proposed_at TEXT NOT NULL,
+       approved_at TEXT,
+       applied_at TEXT,
+       rejected_at TEXT,
+       branch_name TEXT NOT NULL,
+       description TEXT NOT NULL,
+       files_changed TEXT,  -- JSON array
+       diff_preview TEXT,   -- Short diff for review
+       test_results TEXT,   -- JSON: {passed: bool, output: str}
+       error_message TEXT,
+       approved_by TEXT     -- For audit
+   );
+   ```
+
+2. **Self-Modify Loop** (`src/self_modify/loop.py`)
+   - On change proposal: Create `proposed` entry, stop
+   - On approval: Checkout branch, apply changes, run tests, commit
+   - On rejection: Cleanup branch, mark `rejected`
+
+3. **Dashboard UI** (`/self-modify/queue`)
+   - List all proposed changes
+   - Show diff preview
+   - Approve/Reject buttons
+   - Show test results
+   - History of past upgrades
+
+4. **API Endpoints**
+   - `GET /self-modify/queue` - List pending upgrades
+   - `POST /self-modify/queue/{id}/approve` - Approve upgrade
+   - `POST /self-modify/queue/{id}/reject` - Reject upgrade
+   - `GET /self-modify/queue/{id}/diff` - View full diff
+
+### Integration Points
+
+**Existing: Self-Modify Loop**
+- Currently: Proposes change → applies immediately (or fails)
+- New: Proposes change → creates DB entry → waits for approval
+
+**Existing: Dashboard**
+- New page: Upgrade Queue
+- New nav item: "UPGRADES" with badge showing pending count
+
+**Existing: Event Log**
+- Logs: `upgrade.proposed`, `upgrade.approved`, `upgrade.applied`, `upgrade.failed`
+
+### Security Considerations
+
+1. **Approval Authentication** - Consider requiring password/PIN for approval
+2. **Diff Size Limits** - Reject diffs >10k lines (prevents DoS)
+3. **Test Requirement** - Must pass tests before applying
+4. **Rollback** - Keep previous commit SHA for rollback
+
+### Approval Flow
+
+```python
+# 1. System proposes upgrade
+upgrade = UpgradeQueue.propose(
+    description="Fix bug in task assignment",
+    branch_name="self-modify/fix-task-001",
+    files_changed=["src/swarm/coordinator.py"],
+    diff_preview="@@ -123,7 +123,7 @@...",
+)
+# Status: PROPOSED
+
+# 2. Human reviews in dashboard
+# - Views diff
+# - Sees test results (auto-run on propose)
+# - Clicks APPROVE or REJECT
+
+# 3. If approved
+upgrade.apply()  # Status: APPLIED or FAILED
+
+# 4. If rejected
+upgrade.reject()  # Status: REJECTED, branch deleted
+```
+
+## UI Design
+
+### Upgrade Queue Page (`/self-modify/queue`)
+
+```
+┌─────────────────────────────────────────┐
+│ PENDING UPGRADES (2)                    │
+├─────────────────────────────────────────┤
+│                                         │
+│ Fix bug in task assignment      [VIEW]  │
+│ Branch: self-modify/fix-task-001        │
+│ Files: coordinator.py                   │
+│ Tests: ✓ Passed                         │
+│ [APPROVE]  [REJECT]                     │
+│                                         │
+│ Add memory search feature       [VIEW]  │
+│ Branch: self-modify/memory-002          │
+│ Files: memory/vector_store.py           │
+│ Tests: ✗ Failed (1 error)               │
+│ [APPROVE]  [REJECT]                     │
+│                                         │
+└─────────────────────────────────────────┘
+
+┌─────────────────────────────────────────┐
+│ UPGRADE HISTORY                         │
+├─────────────────────────────────────────┤
+│ ✓ Fix auth bug           APPLIED 2h ago │
+│ ✗ Add new route          FAILED  5h ago │
+│ ✗ Change config          REJECTED 1d ago│
+└─────────────────────────────────────────┘
+```
+
+## Consequences
+
+### Positive
+- Human oversight prevents bad changes
+- Audit trail of all modifications
+- Test-before-apply prevents broken states
+- Rejection is clean (no lingering branches)
+
+### Negative
+- Adds friction to self-modification
+- Requires human availability for urgent fixes
+- Database storage for upgrade history
+
+### Mitigations
+- Auto-approve after 24h for low-risk changes (configurable)
+- Urgent changes can bypass queue (with logging)
+- Prune old history after 90 days
+
+## Implementation Plan
+
+1. Create `src/upgrades/models.py` - Database schema and ORM
+2. Create `src/upgrades/queue.py` - Queue management logic
+3. Modify `src/self_modify/loop.py` - Integrate with queue
+4. Create dashboard routes - UI for approval
+5. Create templates - Queue page, diff view
+6. Add event logging for upgrades
+7. Write tests for full workflow
+
+## Dependencies
+- Existing `src/self_modify/loop.py`
+- New database table `upgrades`
+- Existing Event Log system
--- a/docs/adr/022-real-time-activity-feed.md
+++ b/docs/adr/022-real-time-activity-feed.md
@@ -0,0 +1,212 @@
+# ADR 022: Real-Time Activity Feed
+
+## Status
+Proposed
+
+## Context
+The dashboard currently shows static snapshots of swarm state. Users must refresh to see:
+- New tasks being created
+- Agents joining/leaving
+- Bids being submitted
+- Tasks being completed
+
+This creates a poor UX for monitoring the swarm in real-time.
+
+## Decision
+Implement a WebSocket-based real-time activity feed that streams events from the Event Log to connected dashboard clients.
+
+## Architecture
+
+### Data Flow
+```
+Coordinator Event → Event Log (SQLite)
+       ↓
+WebSocket Broadcast
+       ↓
+Dashboard Clients (via ws_manager)
+```
+
+### Components
+
+1. **Event Source** (`src/swarm/coordinator.py`)
+   - Already emits events via `log_event()`
+   - Events are persisted to SQLite
+
+2. **WebSocket Bridge** (`src/ws_manager/handler.py`)
+   - Already exists for agent status
+   - Extend to broadcast events
+
+3. **Event Broadcaster** (`src/events/broadcaster.py` - NEW)
+   ```python
+   class EventBroadcaster:
+       """Bridges event_log → WebSocket."""
+       
+       async def on_event_logged(self, event: EventLogEntry):
+           """Called when new event is logged."""
+           await ws_manager.broadcast_event({
+               "type": event.event_type.value,
+               "source": event.source,
+               "task_id": event.task_id,
+               "agent_id": event.agent_id,
+               "timestamp": event.timestamp,
+               "data": event.data,
+           })
+   ```
+
+4. **Dashboard UI** (`/swarm/live` - enhanced)
+   - Already exists at `/swarm/live`
+   - Add activity feed panel
+   - Connect to WebSocket
+   - Show real-time events
+
+5. **Mobile Support**
+   - Same WebSocket for mobile view
+   - Simplified activity list
+
+### Event Types to Broadcast
+
+| Event Type | Display As | Icon |
+|------------|------------|------|
+| `task.created` | "New task: {description}" | 📝 |
+| `task.assigned` | "Task assigned to {agent}" | 👤 |
+| `task.completed` | "Task completed" | ✓ |
+| `agent.joined` | "Agent {name} joined" | 🟢 |
+| `agent.left` | "Agent {name} left" | 🔴 |
+| `bid.submitted` | "Bid: {amount}sats from {agent}" | 💰 |
+| `tool.called` | "Tool: {tool_name}" | 🔧 |
+| `system.error` | "Error: {message}" | ⚠️ |
+
+### WebSocket Protocol
+
+```json
+// Client connects
+{"action": "subscribe", "channel": "events"}
+
+// Server broadcasts
+{
+  "type": "event",
+  "payload": {
+    "event_type": "task.assigned",
+    "source": "coordinator",
+    "task_id": "task-123",
+    "agent_id": "agent-456",
+    "timestamp": "2024-01-15T10:30:00Z",
+    "data": {"bid_sats": 100}
+  }
+}
+```
+
+### UI Design: Activity Feed Panel
+
+```
+┌─────────────────────────────────────────┐
+│ LIVE ACTIVITY                    [🔴]  │
+├─────────────────────────────────────────┤
+│ 📝 New task: Write Python function      │
+│    10:30:01                             │
+│ 💰 Bid: 50sats from forge               │
+│    10:30:02                             │
+│ 👤 Task assigned to forge               │
+│    10:30:07                             │
+│ ✓ Task completed                        │
+│    10:30:15                             │
+│ 🟢 Agent Echo joined                    │
+│    10:31:00                             │
+│                                         │
+│ [Show All Events]                       │
+└─────────────────────────────────────────┘
+```
+
+### Integration with Existing Systems
+
+**Existing: Event Log** (`src/swarm/event_log.py`)
+- Hook into `log_event()` to trigger broadcasts
+- Use SQLite `AFTER INSERT` trigger or Python callback
+
+**Existing: WebSocket Manager** (`src/ws_manager/handler.py`)
+- Add `broadcast_event()` method
+- Handle client subscriptions
+
+**Existing: Coordinator** (`src/swarm/coordinator.py`)
+- Already calls `log_event()` for all lifecycle events
+- No changes needed
+
+**Existing: Swarm Live Page** (`/swarm/live`)
+- Enhance with activity feed panel
+- WebSocket client connection
+
+### Technical Design
+
+#### Option A: Direct Callback (Chosen)
+Modify `log_event()` to call broadcaster directly.
+
+**Pros:** Simple, immediate delivery
+**Cons:** Tight coupling
+
+```python
+# In event_log.py
+def log_event(...):
+    # ... store in DB ...
+    
+    # Broadcast to WebSocket clients
+    asyncio.create_task(_broadcast_event(event))
+```
+
+#### Option B: SQLite Trigger + Poll
+Use SQLite trigger to mark new events, poll from broadcaster.
+
+**Pros:** Decoupled, survives restarts
+**Cons:** Latency from polling
+
+#### Option C: Event Bus
+Use existing `src/events/bus.py` to publish/subscribe.
+
+**Pros:** Decoupled, flexible
+**Cons:** Additional complexity
+
+**Decision:** Option A for simplicity, with Option C as future refactoring.
+
+### Performance Considerations
+
+- **Rate Limiting:** Max 10 events/second to clients
+- **Buffering:** If client disconnected, buffer last 100 events
+- **Filtering:** Clients can filter by event type
+- **Deduplication:** WebSocket manager handles client dedup
+
+### Security
+
+- Only authenticated dashboard users receive events
+- Sanitize event data (no secrets in logs)
+- Rate limit connections per IP
+
+## Consequences
+
+### Positive
+- Real-time visibility into swarm activity
+- Better UX for monitoring
+- Uses existing infrastructure (Event Log, WebSocket)
+
+### Negative
+- Increased server load from WebSocket connections
+- Event data must be carefully sanitized
+- More complex client-side state management
+
+### Mitigations
+- Event throttling
+- Connection limits
+- Graceful degradation to polling
+
+## Implementation Plan
+
+1. **Create EventBroadcaster** - Bridge event_log → ws_manager
+2. **Extend ws_manager** - Add `broadcast_event()` method
+3. **Modify event_log.py** - Hook in broadcaster
+4. **Enhance /swarm/live** - Add activity feed panel with WebSocket
+5. **Create EventFeed component** - Reusable HTMX + WebSocket widget
+6. **Write tests** - E2E tests for real-time updates
+
+## Dependencies
+- Existing `src/swarm/event_log.py`
+- Existing `src/ws_manager/handler.py`
+- Existing `/swarm/live` page
+- HTMX WebSocket extension (already loaded)