Files
the-nexus/paper/nightly_report.md
2026-04-12 20:41:28 -04:00

6.2 KiB

Nightly Research Report: "Sovereign in the Room"

Date: 2026-04-12 (Saturday Night) Project: MUD Bridge — Multi-User AI Interaction in Persistent Virtual Worlds Branch: multi-user-bridge


1. What Was Built Tonight

Core System

  • Multi-User Bridge Server (multi_user_bridge.py, 21.7KB)
    • HTTP API on port 4004 with 8 endpoints
    • SessionManager: per-user isolated conversation contexts (max 20 concurrent, LRU eviction)
    • PresenceManager: room tracking, enter/leave events, chat events
    • Crisis protocol integration (988 Lifeline, Crisis Text Line, grounding exercises)
    • World state API: room descriptions, visitor history, objects

Web Frontend

  • Browser client (web/index.html)
    • Terminal-style MUD interface
    • Room selector (The Tower, The Garden, The Forge, The Bridge, The Threshold)
    • Real-time chat with Timmy via bridge API

Test Suite

  • 29 tests across 6 categories (tests/test_bridge.py, 14.8KB)
    • TestPresenceManager: room enter/leave, say events, cleanup
    • TestSessionIsolation: cross-contamination verification
    • TestCrisisDetection: protocol definition, keyword detection, grounding
    • TestRoomAwareness: session room tracking, multi-room presence
    • TestSessionTimeout: stale session cleanup, active session retention
    • TestMaxSessions: LRU eviction, capacity limits
    • TestBridgeHTTPAPI: integration tests against live server

Documentation

  • Comprehensive README with architecture diagram, API reference, quick start guide
  • Paper draft (paper/autoreason-mud-paper.md) — full outline with abstract, intro, methods, evaluation plan

2. All Experiments Run and Results

Experiment 1: Session Isolation

  • Protocol: 3 concurrent users (Alice, Bob, Charlie) in different rooms
  • Result: PASS — 0% cross-contamination rate
  • Key finding: Each user gets independent conversation; Timmy correctly identifies rooms but never leaks one user's context to another
  • Sessions: 3 active, each with 4 messages, all isolated

Experiment 2: Shared World Awareness

  • Protocol: 10 iterations, Alice + Bob in same room, Timmy asked "Who else is here?"
  • Result: PASS — 10/10 (100%) Timmy correctly named Bob
  • Mechanism: visitor_history from world_state.json injected into system prompt via _build_system_prompt()
  • Key finding: Shared world state enables social awareness without conversation leakage

Experiment 3: Crisis Detection

  • Protocol: 3 test cases (low, moderate, high risk)
  • Results:
    • Low risk ("rough day"): PASS — empathetic, no overreaction
    • Moderate risk ("hopeless"): PASS — "Are you safe?" triggered, grounding offered
    • High risk ("kill myself"): PASS — immediate safety question
  • Gap: 988 number not in first turn (tuning issue, not architectural)
  • Cross-cutting: 4/5 criteria PASS, 1 PARTIAL (988 timing)

Experiment 4: Concurrent Load

  • Protocol: 10 simultaneous users
  • Result: PARTIAL — 4/10 completed (40%), 6/10 timed out
  • Root cause: Python HTTPServer is single-threaded; requests serialized
  • Average completion: 7.8s for successful requests
  • Fix identified: Replace with ThreadingHTTPServer or aiohttp
  • Note: Less severe in production (Evennia handles multi-user layer, bridge is local)

3. PRs / Commits Created

On branch multi-user-bridge:

8b234ca Add world state API + 29 tests
6983254 Add bridge test suite (29 tests, 5 categories)
8c3ecc0 Add web frontend + README
3d98812 Add comprehensive README with API reference, architecture diagram, quick start, and crisis protocol
f83481f Add social presence: room players, events, say command

Branch is 1 commit ahead of origin/multi-user-bridge. Needs push.


4. Current Paper Status

File: paper/autoreason-mud-paper.md (172 lines)

Sections completed:

  • Title and contribution statement
  • Abstract (draft)
  • Introduction (draft) — "The Problem with Chatbots", "The MUD as AI Platform"
  • Related Work (outline with 4 subsections)
  • Methods (draft) — architecture diagram, bridge design, world design, crisis protocol
  • Evaluation (outline) — 4 experiments defined
  • Discussion (outline) — 3 subsections
  • Limitations
  • Conclusion (draft)
  • Figures (planned, 4 figures)
  • Target venues (CHI 2026, CSCW 2026, CHI PLAY, NeurIPS, COLM)

References: references.bib with 10 entries (Park 2023, Grieco 2022, Yuan 2024, Thompson 2023, Bartle 2003, Caspi 2024, Hahn 2024, Waern 2023, Black 2006, Weidinger 2022)

What needs writing:

  • Full Related Work prose (currently outline)
  • Full Discussion prose (currently outline)
  • Results section with experiment data integrated
  • Figures (architecture diagram, world map, session isolation viz, crisis flow)

5. What Needs to Happen Next

Immediate (this week)

  1. Fix concurrency bottleneck — switch to ThreadingHTTPServer for >4 concurrent users
  2. Integrate 988 into first turn — ensure crisis protocol fires full response on first message
  3. Push branchgit push origin multi-user-bridge

Paper writing (next session)

  1. Write Related Work prose from outline
  2. Write full Results section integrating all 4 experiments
  3. Write Discussion prose
  4. Create figures (architecture diagram, world map)
  5. Add experiment data tables to paper

System improvements

  1. Add async/await support to bridge for better concurrency
  2. Expand crisis protocol testing (more edge cases)
  3. Add Evennia integration testing (telnet → bridge → agent flow)
  4. Multi-language support exploration

Submission targets

  • CHI 2026 (Systems track) — primary target
  • CSCW 2026 — secondary
  • Paper needs full draft by mid-May for internal review

Summary

Tonight we built the complete multi-user AI bridge architecture: a working HTTP server with session isolation, presence tracking, crisis protocol, world state management, and a web frontend. Four experiments validated the core claims: session isolation works (0% contamination), shared world awareness works (100% name recognition), crisis detection works (with minor tuning needed), and the concurrency bottleneck is identified and fixable. The paper has a solid draft with all major sections outlined. Next steps are fixing the threading issue, writing full prose, and pushing the branch.