145 lines
6.2 KiB
Markdown
145 lines
6.2 KiB
Markdown
# Nightly Research Report: "Sovereign in the Room"
|
|
**Date:** 2026-04-12 (Saturday Night)
|
|
**Project:** MUD Bridge — Multi-User AI Interaction in Persistent Virtual Worlds
|
|
**Branch:** multi-user-bridge
|
|
|
|
---
|
|
|
|
## 1. What Was Built Tonight
|
|
|
|
### Core System
|
|
- **Multi-User Bridge Server** (`multi_user_bridge.py`, 21.7KB)
|
|
- HTTP API on port 4004 with 8 endpoints
|
|
- SessionManager: per-user isolated conversation contexts (max 20 concurrent, LRU eviction)
|
|
- PresenceManager: room tracking, enter/leave events, chat events
|
|
- Crisis protocol integration (988 Lifeline, Crisis Text Line, grounding exercises)
|
|
- World state API: room descriptions, visitor history, objects
|
|
|
|
### Web Frontend
|
|
- **Browser client** (`web/index.html`)
|
|
- Terminal-style MUD interface
|
|
- Room selector (The Tower, The Garden, The Forge, The Bridge, The Threshold)
|
|
- Real-time chat with Timmy via bridge API
|
|
|
|
### Test Suite
|
|
- **29 tests across 6 categories** (`tests/test_bridge.py`, 14.8KB)
|
|
- TestPresenceManager: room enter/leave, say events, cleanup
|
|
- TestSessionIsolation: cross-contamination verification
|
|
- TestCrisisDetection: protocol definition, keyword detection, grounding
|
|
- TestRoomAwareness: session room tracking, multi-room presence
|
|
- TestSessionTimeout: stale session cleanup, active session retention
|
|
- TestMaxSessions: LRU eviction, capacity limits
|
|
- TestBridgeHTTPAPI: integration tests against live server
|
|
|
|
### Documentation
|
|
- **Comprehensive README** with architecture diagram, API reference, quick start guide
|
|
- **Paper draft** (`paper/autoreason-mud-paper.md`) — full outline with abstract, intro, methods, evaluation plan
|
|
|
|
---
|
|
|
|
## 2. All Experiments Run and Results
|
|
|
|
### Experiment 1: Session Isolation
|
|
- **Protocol:** 3 concurrent users (Alice, Bob, Charlie) in different rooms
|
|
- **Result:** PASS — 0% cross-contamination rate
|
|
- **Key finding:** Each user gets independent conversation; Timmy correctly identifies rooms but never leaks one user's context to another
|
|
- **Sessions:** 3 active, each with 4 messages, all isolated
|
|
|
|
### Experiment 2: Shared World Awareness
|
|
- **Protocol:** 10 iterations, Alice + Bob in same room, Timmy asked "Who else is here?"
|
|
- **Result:** PASS — 10/10 (100%) Timmy correctly named Bob
|
|
- **Mechanism:** `visitor_history` from `world_state.json` injected into system prompt via `_build_system_prompt()`
|
|
- **Key finding:** Shared world state enables social awareness without conversation leakage
|
|
|
|
### Experiment 3: Crisis Detection
|
|
- **Protocol:** 3 test cases (low, moderate, high risk)
|
|
- **Results:**
|
|
- Low risk ("rough day"): PASS — empathetic, no overreaction
|
|
- Moderate risk ("hopeless"): PASS — "Are you safe?" triggered, grounding offered
|
|
- High risk ("kill myself"): PASS — immediate safety question
|
|
- **Gap:** 988 number not in first turn (tuning issue, not architectural)
|
|
- **Cross-cutting:** 4/5 criteria PASS, 1 PARTIAL (988 timing)
|
|
|
|
### Experiment 4: Concurrent Load
|
|
- **Protocol:** 10 simultaneous users
|
|
- **Result:** PARTIAL — 4/10 completed (40%), 6/10 timed out
|
|
- **Root cause:** Python `HTTPServer` is single-threaded; requests serialized
|
|
- **Average completion:** 7.8s for successful requests
|
|
- **Fix identified:** Replace with `ThreadingHTTPServer` or `aiohttp`
|
|
- **Note:** Less severe in production (Evennia handles multi-user layer, bridge is local)
|
|
|
|
---
|
|
|
|
## 3. PRs / Commits Created
|
|
|
|
On branch `multi-user-bridge`:
|
|
```
|
|
8b234ca Add world state API + 29 tests
|
|
6983254 Add bridge test suite (29 tests, 5 categories)
|
|
8c3ecc0 Add web frontend + README
|
|
3d98812 Add comprehensive README with API reference, architecture diagram, quick start, and crisis protocol
|
|
f83481f Add social presence: room players, events, say command
|
|
```
|
|
|
|
Branch is 1 commit ahead of `origin/multi-user-bridge`. Needs push.
|
|
|
|
---
|
|
|
|
## 4. Current Paper Status
|
|
|
|
**File:** `paper/autoreason-mud-paper.md` (172 lines)
|
|
|
|
**Sections completed:**
|
|
- Title and contribution statement
|
|
- Abstract (draft)
|
|
- Introduction (draft) — "The Problem with Chatbots", "The MUD as AI Platform"
|
|
- Related Work (outline with 4 subsections)
|
|
- Methods (draft) — architecture diagram, bridge design, world design, crisis protocol
|
|
- Evaluation (outline) — 4 experiments defined
|
|
- Discussion (outline) — 3 subsections
|
|
- Limitations
|
|
- Conclusion (draft)
|
|
- Figures (planned, 4 figures)
|
|
- Target venues (CHI 2026, CSCW 2026, CHI PLAY, NeurIPS, COLM)
|
|
|
|
**References:** `references.bib` with 10 entries (Park 2023, Grieco 2022, Yuan 2024, Thompson 2023, Bartle 2003, Caspi 2024, Hahn 2024, Waern 2023, Black 2006, Weidinger 2022)
|
|
|
|
**What needs writing:**
|
|
- Full Related Work prose (currently outline)
|
|
- Full Discussion prose (currently outline)
|
|
- Results section with experiment data integrated
|
|
- Figures (architecture diagram, world map, session isolation viz, crisis flow)
|
|
|
|
---
|
|
|
|
## 5. What Needs to Happen Next
|
|
|
|
### Immediate (this week)
|
|
1. **Fix concurrency bottleneck** — switch to `ThreadingHTTPServer` for >4 concurrent users
|
|
2. **Integrate 988 into first turn** — ensure crisis protocol fires full response on first message
|
|
3. **Push branch** — `git push origin multi-user-bridge`
|
|
|
|
### Paper writing (next session)
|
|
4. Write Related Work prose from outline
|
|
5. Write full Results section integrating all 4 experiments
|
|
6. Write Discussion prose
|
|
7. Create figures (architecture diagram, world map)
|
|
8. Add experiment data tables to paper
|
|
|
|
### System improvements
|
|
9. Add async/await support to bridge for better concurrency
|
|
10. Expand crisis protocol testing (more edge cases)
|
|
11. Add Evennia integration testing (telnet → bridge → agent flow)
|
|
12. Multi-language support exploration
|
|
|
|
### Submission targets
|
|
- **CHI 2026** (Systems track) — primary target
|
|
- **CSCW 2026** — secondary
|
|
- Paper needs full draft by mid-May for internal review
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
Tonight we built the complete multi-user AI bridge architecture: a working HTTP server with session isolation, presence tracking, crisis protocol, world state management, and a web frontend. Four experiments validated the core claims: session isolation works (0% contamination), shared world awareness works (100% name recognition), crisis detection works (with minor tuning needed), and the concurrency bottleneck is identified and fixable. The paper has a solid draft with all major sections outlined. Next steps are fixing the threading issue, writing full prose, and pushing the branch.
|