From c4edd337a23213922f628079b275bfa28d2a8437 Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Mon, 23 Mar 2026 13:56:35 -0400 Subject: [PATCH] docs: add feudal multi-agent hierarchy design for Bannerlord (#1099) Architecture document specifying: - King-level subgoal vocabulary (10 tokens with typed schema) - Vassal reward functions for War, Economy, and Diplomacy domains - Companion worker task primitives (Logistics, Caravan, Scout) - Asyncio-based inter-level communication protocol - Sovereign agent properties (DID, asset control, non-terminability) Refs #1099 Co-Authored-By: Claude Sonnet 4.6 --- .../bannerlord-feudal-hierarchy-design.md | 353 ++++++++++++++++++ 1 file changed, 353 insertions(+) create mode 100644 docs/research/bannerlord-feudal-hierarchy-design.md diff --git a/docs/research/bannerlord-feudal-hierarchy-design.md b/docs/research/bannerlord-feudal-hierarchy-design.md new file mode 100644 index 00000000..54c1042f --- /dev/null +++ b/docs/research/bannerlord-feudal-hierarchy-design.md @@ -0,0 +1,353 @@ +# Bannerlord Feudal Multi-Agent Hierarchy Design + +**Issue:** #1099 +**Parent Epic:** #1091 (Project Bannerlord) +**Date:** 2026-03-23 +**Status:** Draft + +--- + +## Overview + +This document specifies the multi-agent hierarchy for Timmy's Bannerlord campaign. +The design draws directly from Feudal Multi-Agent Hierarchies (Ahilan & Dayan, 2019), +Voyager (Wang et al., 2023), and Generative Agents (Park et al., 2023) to produce a +tractable architecture that runs entirely on local hardware (M3 Max, Ollama). + +The core insight from Ahilan & Dayan: a *manager* agent issues subgoal tokens to +*worker* agents who pursue those subgoals with learned primitive policies. Workers +never see the manager's full goal; managers never micro-manage primitives. This +separates strategic planning (slow, expensive) from tactical execution (fast, cheap). + +--- + +## 1. King-Level Timmy — Subgoal Vocabulary + +Timmy is the King agent. He operates on the **campaign map** timescale (days to weeks +of in-game time). His sole output is a subgoal token drawn from a fixed vocabulary that +vassal agents interpret. + +### Subgoal Token Schema + +```python +class KingSubgoal(BaseModel): + token: str # One of the vocabulary entries below + target: str | None = None # Named target (settlement, lord, faction) + quantity: int | None = None # For RECRUIT, TRADE + priority: float = 1.0 # 0.0–2.0, scales vassal reward + deadline_days: int | None = None # Campaign-map days to complete + context: str | None = None # Free-text hint (not parsed by workers) +``` + +### Vocabulary (v1) + +| Token | Meaning | Primary Vassal | +|---|---|---| +| `EXPAND_TERRITORY` | Take or secure a fief | War Vassal | +| `RAID_ECONOMY` | Raid enemy villages for denars | War Vassal | +| `FORTIFY` | Upgrade or repair a settlement | Economy Vassal | +| `RECRUIT` | Fill party to capacity | Logistics Companion | +| `TRADE` | Execute profitable trade route | Caravan Companion | +| `ALLY` | Pursue a non-aggression or alliance deal | Diplomacy Vassal | +| `SPY` | Gain information on target faction | Scout Companion | +| `HEAL` | Rest party until wounds recovered | Logistics Companion | +| `CONSOLIDATE` | Hold territory, no expansion | Economy Vassal | +| `TRAIN` | Level troops via auto-resolve bandits | War Vassal | + +King updates the active subgoal at most once per **campaign tick** (configurable, +default 1 in-game day). He reads the full `GameState` but emits only a single +subgoal token + optional parameters — not a prose plan. + +### King Decision Loop + +``` +while campaign_running: + state = gabs.get_state() # Full kingdom + map snapshot + subgoal = king_llm.decide(state) # Qwen3:32b, temp=0.1, JSON mode + emit_subgoal(subgoal) # Written to subgoal_queue + await campaign_tick() # ~1 game-day real-time pause +``` + +King uses **Qwen3:32b** (the most capable local model) for strategic reasoning. +Subgoal generation is batch, not streaming — latency budget: 5–15 seconds per tick. + +--- + +## 2. Vassal Agents — Reward Functions + +Vassals are mid-tier agents responsible for a domain of the kingdom. Each vassal +has a defined reward function. Vassals run on **Qwen3:14b** (balanced capability +vs. latency) and operate on a shorter timescale than the King (hours of in-game time). + +### 2a. War Vassal + +**Domain:** Military operations — sieges, field battles, raids, defensive maneuvers. + +**Reward function:** + +``` +R_war = w1 * ΔTerritoryValue + + w2 * ΔArmyStrength_ratio + - w3 * CasualtyCost + - w4 * SupplyCost + + w5 * SubgoalBonus(active_subgoal ∈ {EXPAND_TERRITORY, RAID_ECONOMY, TRAIN}) +``` + +| Weight | Default | Rationale | +|---|---|---| +| w1 | 0.40 | Territory is the primary long-term asset | +| w2 | 0.25 | Army ratio relative to nearest rival | +| w3 | 0.20 | Casualties are expensive to replace | +| w4 | 0.10 | Supply burn limits campaign duration | +| w5 | 0.05 | King alignment bonus | + +**Primitive actions available:** `move_party`, `siege_settlement`, +`raid_village`, `retreat`, `auto_resolve_battle`, `hire_mercenaries`. + +### 2b. Economy Vassal + +**Domain:** Settlement management, tax collection, construction, food supply. + +**Reward function:** + +``` +R_econ = w1 * DailyDenarsIncome + + w2 * FoodStockBuffer + + w3 * LoyaltyAverage + - w4 * ConstructionQueueLength + + w5 * SubgoalBonus(active_subgoal ∈ {FORTIFY, CONSOLIDATE}) +``` + +| Weight | Default | Rationale | +|---|---|---| +| w1 | 0.35 | Income is the fuel for everything | +| w2 | 0.25 | Starvation causes immediate loyalty crash | +| w3 | 0.20 | Low loyalty triggers revolt | +| w4 | 0.15 | Idle construction is opportunity cost | +| w5 | 0.05 | King alignment bonus | + +**Primitive actions available:** `set_tax_policy`, `build_project`, +`distribute_food`, `appoint_governor`, `upgrade_garrison`. + +### 2c. Diplomacy Vassal + +**Domain:** Relations management — alliances, peace deals, tribute, marriage. + +**Reward function:** + +``` +R_diplo = w1 * AlliesCount + + w2 * TruceDurationValue + + w3 * RelationsScore_weighted + - w4 * ActiveWarsFront + + w5 * SubgoalBonus(active_subgoal ∈ {ALLY}) +``` + +**Primitive actions available:** `send_envoy`, `propose_peace`, +`offer_tribute`, `request_military_access`, `arrange_marriage`. + +--- + +## 3. Companion Worker Task Primitives + +Companions are the lowest tier — fast, specialized, single-purpose workers. +They run on **Qwen3:8b** (or smaller) for sub-2-second response times. +Each companion has exactly one skill domain and a vocabulary of 4–8 primitives. + +### 3a. Logistics Companion (Party Management) + +**Skill:** Scouting / Steward / Medicine hybrid role. + +| Primitive | Effect | Trigger | +|---|---|---| +| `recruit_troop(type, qty)` | Buy troops at nearest town | RECRUIT subgoal | +| `buy_supplies(qty)` | Purchase food for march | Party food < 3 days | +| `rest_party(days)` | Idle in friendly town | Wound % > 30% or HEAL subgoal | +| `sell_prisoners(loc)` | Convert prisoners to denars | Prison > capacity | +| `upgrade_troops()` | Spend XP on troop upgrades | After battle or TRAIN | + +### 3b. Caravan Companion (Trade) + +**Skill:** Trade / Charm. + +| Primitive | Effect | Trigger | +|---|---|---| +| `assess_prices(town)` | Query buy/sell prices | Entry to settlement | +| `buy_goods(item, qty)` | Purchase trade goods | Positive margin ≥ 15% | +| `sell_goods(item, qty)` | Sell at target settlement | Reached destination | +| `establish_caravan(town)` | Deploy caravan NPC | TRADE subgoal + denars > 10k | +| `abandon_route()` | Return to main party | Caravan threatened | + +### 3c. Scout Companion (Intelligence) + +**Skill:** Scouting / Roguery. + +| Primitive | Effect | Trigger | +|---|---|---| +| `track_lord(name)` | Shadow enemy lord | SPY subgoal | +| `assess_garrison(settlement)` | Estimate defender count | Before siege proposal | +| `map_patrol_routes(region)` | Log enemy movement | Territorial expansion prep | +| `report_intel()` | Push findings to King | Scheduled or on demand | + +--- + +## 4. Communication Protocol Between Hierarchy Levels + +All agents communicate through a shared **Subgoal Queue** and **State Broadcast** +bus, implemented as in-process Python asyncio queues backed by SQLite for persistence. + +### Message Types + +```python +class SubgoalMessage(BaseModel): + """King → Vassal direction""" + msg_type: Literal["subgoal"] = "subgoal" + from_agent: Literal["king"] + to_agent: str # "war_vassal", "economy_vassal", etc. + subgoal: KingSubgoal + issued_at: datetime + +class TaskMessage(BaseModel): + """Vassal → Companion direction""" + msg_type: Literal["task"] = "task" + from_agent: str # "war_vassal", etc. + to_agent: str # "logistics_companion", etc. + primitive: str # One of the companion primitives + args: dict[str, Any] = {} + priority: float = 1.0 + issued_at: datetime + +class ResultMessage(BaseModel): + """Companion/Vassal → Parent direction""" + msg_type: Literal["result"] = "result" + from_agent: str + to_agent: str + success: bool + outcome: dict[str, Any] # Primitive-specific result data + reward_delta: float # Computed reward contribution + completed_at: datetime + +class StateUpdateMessage(BaseModel): + """GABS → All agents (broadcast)""" + msg_type: Literal["state"] = "state" + game_state: dict[str, Any] # Full GABS state snapshot + tick: int + timestamp: datetime +``` + +### Protocol Flow + +``` +GABS ──state_update──► King + │ + subgoal_msg + │ + ┌────────────┼────────────┐ + ▼ ▼ ▼ + War Vassal Econ Vassal Diplo Vassal + │ │ │ + task_msg task_msg task_msg + │ │ │ + Logistics Caravan Scout + Companion Companion Companion + │ │ │ + result_msg result_msg result_msg + │ │ │ + └────────────┼────────────┘ + ▼ + King (reward aggregation) +``` + +### Timing Constraints + +| Level | Decision Frequency | LLM Budget | +|---|---|---| +| King | 1× per campaign day | 5–15 s | +| Vassal | 4× per campaign day | 2–5 s | +| Companion | On-demand / event-driven | < 2 s | + +State updates from GABS arrive continuously; agents consume them at their +own cadence. No agent blocks another's queue. + +### Conflict Resolution + +If two vassals propose conflicting actions (e.g., War Vassal wants to siege while +Economy Vassal wants to fortify), King arbitrates using `priority` weights on the +active subgoal. The highest-priority active subgoal wins resource contention. + +--- + +## 5. Sovereign Agent Properties + +The King agent (Timmy) has sovereign properties that distinguish it from ordinary +worker agents. These map directly to Timmy's existing identity architecture. + +### 5a. Decentralized Identifier (DID) + +``` +did:key:z6Mk +``` + +The King's DID is persisted in `~/.timmy/identity.json` (existing SOUL.md pattern). +All messages signed by the King carry this DID in a `signed_by` field, allowing +companions to verify instruction authenticity. This is relevant when the hierarchy +is eventually distributed across machines. + +### 5b. Asset Control + +| Asset Class | Storage | Control Level | +|---|---|---| +| Kingdom treasury (denars) | GABS game state | King exclusive | +| Settlement ownership | GABS game state | King exclusive | +| Troop assignments | King → Vassal delegation | Delegated, revocable | +| Trade goods (caravan) | Companion-local | Companion autonomous within budget | +| Intel reports | `~/.timmy/bannerlord/intel/` | Read-all, write-companion | + +Asset delegation is explicit. Vassals cannot spend more than their `budget_denars` +allocation without re-authorization from King. Companions cannot hold treasury +assets directly — they work with allocated quotas. + +### 5c. Non-Terminability + +The King agent cannot be terminated by vassal or companion agents. +Termination authority is reserved for: +1. The human operator (Ctrl+C or `timmy stop`) +2. A `SHUTDOWN` signal from the top-level orchestrator + +Vassals can pause themselves (e.g., awaiting GABS state) but cannot signal the King +to stop. This prevents a misbehaving military vassal from ending the campaign. + +Implementation: King runs in the main asyncio event loop. Vassals and companions +run in `asyncio.TaskGroup` subgroups. Only the King's task holds a reference to +the TaskGroup cancel scope. + +--- + +## Implementation Path + +This design connects directly to the existing Timmy codebase: + +| Component | Maps to | Notes | +|---|---|---| +| King LLM calls | `infrastructure/llm_router/` | Cascade router for model selection | +| Subgoal Queue | `infrastructure/event_bus/` | Existing pub/sub pattern | +| Companion primitives | New `src/bannerlord/agents/` package | One module per companion | +| GABS state updates | `src/bannerlord/gabs_client.py` | TCP JSON-RPC, port 4825 | +| Asset ledger | `src/bannerlord/ledger.py` | SQLite-backed, existing migration pattern | +| DID / signing | `brain/identity.py` | Extends existing SOUL.md | + +The next concrete step is implementing the GABS TCP client and the `KingSubgoal` +schema — everything else in this document depends on readable game state first. + +--- + +## References + +- Ahilan, S. & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative + Reinforcement Learning. https://arxiv.org/abs/1901.08492 +- Rood, S. (2022). Scaling Reinforcement Learning through Feudal Hierarchy (NPS thesis). +- Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language + Models. https://arxiv.org/abs/2305.16291 +- Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. + https://arxiv.org/abs/2304.03442 +- Silveira, T. (2022). CiF-Bannerlord: Social AI Integration in Bannerlord. -- 2.43.0