2026-03-23 17:57:32 +00:00
1 changed files with 353 additions and 0 deletions
--- a/docs/research/bannerlord-feudal-hierarchy-design.md
+++ b/docs/research/bannerlord-feudal-hierarchy-design.md
@@ -0,0 +1,353 @@
+# Bannerlord Feudal Multi-Agent Hierarchy Design
+
+**Issue:** #1099
+**Parent Epic:** #1091 (Project Bannerlord)
+**Date:** 2026-03-23
+**Status:** Draft
+
+---
+
+## Overview
+
+This document specifies the multi-agent hierarchy for Timmy's Bannerlord campaign.
+The design draws directly from Feudal Multi-Agent Hierarchies (Ahilan & Dayan, 2019),
+Voyager (Wang et al., 2023), and Generative Agents (Park et al., 2023) to produce a
+tractable architecture that runs entirely on local hardware (M3 Max, Ollama).
+
+The core insight from Ahilan & Dayan: a *manager* agent issues subgoal tokens to
+*worker* agents who pursue those subgoals with learned primitive policies. Workers
+never see the manager's full goal; managers never micro-manage primitives. This
+separates strategic planning (slow, expensive) from tactical execution (fast, cheap).
+
+---
+
+## 1. King-Level Timmy — Subgoal Vocabulary
+
+Timmy is the King agent. He operates on the **campaign map** timescale (days to weeks
+of in-game time). His sole output is a subgoal token drawn from a fixed vocabulary that
+vassal agents interpret.
+
+### Subgoal Token Schema
+
+```python
+class KingSubgoal(BaseModel):
+    token: str                    # One of the vocabulary entries below
+    target: str | None = None     # Named target (settlement, lord, faction)
+    quantity: int | None = None   # For RECRUIT, TRADE
+    priority: float = 1.0         # 0.0–2.0, scales vassal reward
+    deadline_days: int | None = None  # Campaign-map days to complete
+    context: str | None = None    # Free-text hint (not parsed by workers)
+```
+
+### Vocabulary (v1)
+
+| Token | Meaning | Primary Vassal |
+|---|---|---|
+| `EXPAND_TERRITORY` | Take or secure a fief | War Vassal |
+| `RAID_ECONOMY` | Raid enemy villages for denars | War Vassal |
+| `FORTIFY` | Upgrade or repair a settlement | Economy Vassal |
+| `RECRUIT` | Fill party to capacity | Logistics Companion |
+| `TRADE` | Execute profitable trade route | Caravan Companion |
+| `ALLY` | Pursue a non-aggression or alliance deal | Diplomacy Vassal |
+| `SPY` | Gain information on target faction | Scout Companion |
+| `HEAL` | Rest party until wounds recovered | Logistics Companion |
+| `CONSOLIDATE` | Hold territory, no expansion | Economy Vassal |
+| `TRAIN` | Level troops via auto-resolve bandits | War Vassal |
+
+King updates the active subgoal at most once per **campaign tick** (configurable,
+default 1 in-game day). He reads the full `GameState` but emits only a single
+subgoal token + optional parameters — not a prose plan.
+
+### King Decision Loop
+
+```
+while campaign_running:
+    state = gabs.get_state()          # Full kingdom + map snapshot
+    subgoal = king_llm.decide(state)  # Qwen3:32b, temp=0.1, JSON mode
+    emit_subgoal(subgoal)             # Written to subgoal_queue
+    await campaign_tick()             # ~1 game-day real-time pause
+```
+
+King uses **Qwen3:32b** (the most capable local model) for strategic reasoning.
+Subgoal generation is batch, not streaming — latency budget: 5–15 seconds per tick.
+
+---
+
+## 2. Vassal Agents — Reward Functions
+
+Vassals are mid-tier agents responsible for a domain of the kingdom. Each vassal
+has a defined reward function. Vassals run on **Qwen3:14b** (balanced capability
+vs. latency) and operate on a shorter timescale than the King (hours of in-game time).
+
+### 2a. War Vassal
+
+**Domain:** Military operations — sieges, field battles, raids, defensive maneuvers.
+
+**Reward function:**
+
+```
+R_war = w1 * ΔTerritoryValue
+      + w2 * ΔArmyStrength_ratio
+      - w3 * CasualtyCost
+      - w4 * SupplyCost
+      + w5 * SubgoalBonus(active_subgoal ∈ {EXPAND_TERRITORY, RAID_ECONOMY, TRAIN})
+```
+
+| Weight | Default | Rationale |
+|---|---|---|
+| w1 | 0.40 | Territory is the primary long-term asset |
+| w2 | 0.25 | Army ratio relative to nearest rival |
+| w3 | 0.20 | Casualties are expensive to replace |
+| w4 | 0.10 | Supply burn limits campaign duration |
+| w5 | 0.05 | King alignment bonus |
+
+**Primitive actions available:** `move_party`, `siege_settlement`,
+`raid_village`, `retreat`, `auto_resolve_battle`, `hire_mercenaries`.
+
+### 2b. Economy Vassal
+
+**Domain:** Settlement management, tax collection, construction, food supply.
+
+**Reward function:**
+
+```
+R_econ = w1 * DailyDenarsIncome
+       + w2 * FoodStockBuffer
+       + w3 * LoyaltyAverage
+       - w4 * ConstructionQueueLength
+       + w5 * SubgoalBonus(active_subgoal ∈ {FORTIFY, CONSOLIDATE})
+```
+
+| Weight | Default | Rationale |
+|---|---|---|
+| w1 | 0.35 | Income is the fuel for everything |
+| w2 | 0.25 | Starvation causes immediate loyalty crash |
+| w3 | 0.20 | Low loyalty triggers revolt |
+| w4 | 0.15 | Idle construction is opportunity cost |
+| w5 | 0.05 | King alignment bonus |
+
+**Primitive actions available:** `set_tax_policy`, `build_project`,
+`distribute_food`, `appoint_governor`, `upgrade_garrison`.
+
+### 2c. Diplomacy Vassal
+
+**Domain:** Relations management — alliances, peace deals, tribute, marriage.
+
+**Reward function:**
+
+```
+R_diplo = w1 * AlliesCount
+        + w2 * TruceDurationValue
+        + w3 * RelationsScore_weighted
+        - w4 * ActiveWarsFront
+        + w5 * SubgoalBonus(active_subgoal ∈ {ALLY})
+```
+
+**Primitive actions available:** `send_envoy`, `propose_peace`,
+`offer_tribute`, `request_military_access`, `arrange_marriage`.
+
+---
+
+## 3. Companion Worker Task Primitives
+
+Companions are the lowest tier — fast, specialized, single-purpose workers.
+They run on **Qwen3:8b** (or smaller) for sub-2-second response times.
+Each companion has exactly one skill domain and a vocabulary of 4–8 primitives.
+
+### 3a. Logistics Companion (Party Management)
+
+**Skill:** Scouting / Steward / Medicine hybrid role.
+
+| Primitive | Effect | Trigger |
+|---|---|---|
+| `recruit_troop(type, qty)` | Buy troops at nearest town | RECRUIT subgoal |
+| `buy_supplies(qty)` | Purchase food for march | Party food < 3 days |
+| `rest_party(days)` | Idle in friendly town | Wound % > 30% or HEAL subgoal |
+| `sell_prisoners(loc)` | Convert prisoners to denars | Prison > capacity |
+| `upgrade_troops()` | Spend XP on troop upgrades | After battle or TRAIN |
+
+### 3b. Caravan Companion (Trade)
+
+**Skill:** Trade / Charm.
+
+| Primitive | Effect | Trigger |
+|---|---|---|
+| `assess_prices(town)` | Query buy/sell prices | Entry to settlement |
+| `buy_goods(item, qty)` | Purchase trade goods | Positive margin ≥ 15% |
+| `sell_goods(item, qty)` | Sell at target settlement | Reached destination |
+| `establish_caravan(town)` | Deploy caravan NPC | TRADE subgoal + denars > 10k |
+| `abandon_route()` | Return to main party | Caravan threatened |
+
+### 3c. Scout Companion (Intelligence)
+
+**Skill:** Scouting / Roguery.
+
+| Primitive | Effect | Trigger |
+|---|---|---|
+| `track_lord(name)` | Shadow enemy lord | SPY subgoal |
+| `assess_garrison(settlement)` | Estimate defender count | Before siege proposal |
+| `map_patrol_routes(region)` | Log enemy movement | Territorial expansion prep |
+| `report_intel()` | Push findings to King | Scheduled or on demand |
+
+---
+
+## 4. Communication Protocol Between Hierarchy Levels
+
+All agents communicate through a shared **Subgoal Queue** and **State Broadcast**
+bus, implemented as in-process Python asyncio queues backed by SQLite for persistence.
+
+### Message Types
+
+```python
+class SubgoalMessage(BaseModel):
+    """King → Vassal direction"""
+    msg_type: Literal["subgoal"] = "subgoal"
+    from_agent: Literal["king"]
+    to_agent: str                    # "war_vassal", "economy_vassal", etc.
+    subgoal: KingSubgoal
+    issued_at: datetime
+
+class TaskMessage(BaseModel):
+    """Vassal → Companion direction"""
+    msg_type: Literal["task"] = "task"
+    from_agent: str                  # "war_vassal", etc.
+    to_agent: str                    # "logistics_companion", etc.
+    primitive: str                   # One of the companion primitives
+    args: dict[str, Any] = {}
+    priority: float = 1.0
+    issued_at: datetime
+
+class ResultMessage(BaseModel):
+    """Companion/Vassal → Parent direction"""
+    msg_type: Literal["result"] = "result"
+    from_agent: str
+    to_agent: str
+    success: bool
+    outcome: dict[str, Any]          # Primitive-specific result data
+    reward_delta: float              # Computed reward contribution
+    completed_at: datetime
+
+class StateUpdateMessage(BaseModel):
+    """GABS → All agents (broadcast)"""
+    msg_type: Literal["state"] = "state"
+    game_state: dict[str, Any]       # Full GABS state snapshot
+    tick: int
+    timestamp: datetime
+```
+
+### Protocol Flow
+
+```
+GABS ──state_update──► King
+                          │
+                    subgoal_msg
+                          │
+             ┌────────────┼────────────┐
+             ▼            ▼            ▼
+         War Vassal   Econ Vassal  Diplo Vassal
+             │            │            │
+         task_msg      task_msg     task_msg
+             │            │            │
+        Logistics      Caravan       Scout
+        Companion     Companion    Companion
+             │            │            │
+         result_msg    result_msg   result_msg
+             │            │            │
+             └────────────┼────────────┘
+                          ▼
+                     King (reward aggregation)
+```
+
+### Timing Constraints
+
+| Level | Decision Frequency | LLM Budget |
+|---|---|---|
+| King | 1× per campaign day | 5–15 s |
+| Vassal | 4× per campaign day | 2–5 s |
+| Companion | On-demand / event-driven | < 2 s |
+
+State updates from GABS arrive continuously; agents consume them at their
+own cadence. No agent blocks another's queue.
+
+### Conflict Resolution
+
+If two vassals propose conflicting actions (e.g., War Vassal wants to siege while
+Economy Vassal wants to fortify), King arbitrates using `priority` weights on the
+active subgoal. The highest-priority active subgoal wins resource contention.
+
+---
+
+## 5. Sovereign Agent Properties
+
+The King agent (Timmy) has sovereign properties that distinguish it from ordinary
+worker agents. These map directly to Timmy's existing identity architecture.
+
+### 5a. Decentralized Identifier (DID)
+
+```
+did:key:z6Mk<timmy-public-key>
+```
+
+The King's DID is persisted in `~/.timmy/identity.json` (existing SOUL.md pattern).
+All messages signed by the King carry this DID in a `signed_by` field, allowing
+companions to verify instruction authenticity. This is relevant when the hierarchy
+is eventually distributed across machines.
+
+### 5b. Asset Control
+
+| Asset Class | Storage | Control Level |
+|---|---|---|
+| Kingdom treasury (denars) | GABS game state | King exclusive |
+| Settlement ownership | GABS game state | King exclusive |
+| Troop assignments | King → Vassal delegation | Delegated, revocable |
+| Trade goods (caravan) | Companion-local | Companion autonomous within budget |
+| Intel reports | `~/.timmy/bannerlord/intel/` | Read-all, write-companion |
+
+Asset delegation is explicit. Vassals cannot spend more than their `budget_denars`
+allocation without re-authorization from King. Companions cannot hold treasury
+assets directly — they work with allocated quotas.
+
+### 5c. Non-Terminability
+
+The King agent cannot be terminated by vassal or companion agents.
+Termination authority is reserved for:
+1. The human operator (Ctrl+C or `timmy stop`)
+2. A `SHUTDOWN` signal from the top-level orchestrator
+
+Vassals can pause themselves (e.g., awaiting GABS state) but cannot signal the King
+to stop. This prevents a misbehaving military vassal from ending the campaign.
+
+Implementation: King runs in the main asyncio event loop. Vassals and companions
+run in `asyncio.TaskGroup` subgroups. Only the King's task holds a reference to
+the TaskGroup cancel scope.
+
+---
+
+## Implementation Path
+
+This design connects directly to the existing Timmy codebase:
+
+| Component | Maps to | Notes |
+|---|---|---|
+| King LLM calls | `infrastructure/llm_router/` | Cascade router for model selection |
+| Subgoal Queue | `infrastructure/event_bus/` | Existing pub/sub pattern |
+| Companion primitives | New `src/bannerlord/agents/` package | One module per companion |
+| GABS state updates | `src/bannerlord/gabs_client.py` | TCP JSON-RPC, port 4825 |
+| Asset ledger | `src/bannerlord/ledger.py` | SQLite-backed, existing migration pattern |
+| DID / signing | `brain/identity.py` | Extends existing SOUL.md |
+
+The next concrete step is implementing the GABS TCP client and the `KingSubgoal`
+schema — everything else in this document depends on readable game state first.
+
+---
+
+## References
+
+- Ahilan, S. & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative
+  Reinforcement Learning. https://arxiv.org/abs/1901.08492
+- Rood, S. (2022). Scaling Reinforcement Learning through Feudal Hierarchy (NPS thesis).
+- Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language
+  Models. https://arxiv.org/abs/2305.16291
+- Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior.
+  https://arxiv.org/abs/2304.03442
+- Silveira, T. (2022). CiF-Bannerlord: Social AI Integration in Bannerlord.