Timmy-time-dashboard/docs/research/bannerlord-feudal-hierarchy-design.md

# Bannerlord Feudal Multi-Agent Hierarchy Design

**Issue:** #1099
**Parent Epic:** #1091 (Project Bannerlord)
**Date:** 2026-03-23
**Status:** Draft

---

## Overview

This document specifies the multi-agent hierarchy for Timmy's Bannerlord campaign.
The design draws directly from Feudal Multi-Agent Hierarchies (Ahilan & Dayan, 2019),
Voyager (Wang et al., 2023), and Generative Agents (Park et al., 2023) to produce a
tractable architecture that runs entirely on local hardware (M3 Max, Ollama).

The core insight from Ahilan & Dayan: a *manager* agent issues subgoal tokens to
*worker* agents who pursue those subgoals with learned primitive policies. Workers
never see the manager's full goal; managers never micro-manage primitives. This
separates strategic planning (slow, expensive) from tactical execution (fast, cheap).

---

## 1. King-Level Timmy — Subgoal Vocabulary

Timmy is the King agent. He operates on the **campaign map** timescale (days to weeks
of in-game time). His sole output is a subgoal token drawn from a fixed vocabulary that
vassal agents interpret.

### Subgoal Token Schema

```python
class KingSubgoal(BaseModel):
    token: str                    # One of the vocabulary entries below
    target: str | None = None     # Named target (settlement, lord, faction)
    quantity: int | None = None   # For RECRUIT, TRADE
    priority: float = 1.0         # 0.0–2.0, scales vassal reward
    deadline_days: int | None = None  # Campaign-map days to complete
    context: str | None = None    # Free-text hint (not parsed by workers)
```

### Vocabulary (v1)

| Token | Meaning | Primary Vassal |
|---|---|---|
| `EXPAND_TERRITORY` | Take or secure a fief | War Vassal |
| `RAID_ECONOMY` | Raid enemy villages for denars | War Vassal |
| `FORTIFY` | Upgrade or repair a settlement | Economy Vassal |
| `RECRUIT` | Fill party to capacity | Logistics Companion |
| `TRADE` | Execute profitable trade route | Caravan Companion |
| `ALLY` | Pursue a non-aggression or alliance deal | Diplomacy Vassal |
| `SPY` | Gain information on target faction | Scout Companion |
| `HEAL` | Rest party until wounds recovered | Logistics Companion |
| `CONSOLIDATE` | Hold territory, no expansion | Economy Vassal |
| `TRAIN` | Level troops via auto-resolve bandits | War Vassal |

King updates the active subgoal at most once per **campaign tick** (configurable,
default 1 in-game day). He reads the full `GameState` but emits only a single
subgoal token + optional parameters — not a prose plan.

### King Decision Loop

```
while campaign_running:
    state = gabs.get_state()          # Full kingdom + map snapshot
    subgoal = king_llm.decide(state)  # Qwen3:32b, temp=0.1, JSON mode
    emit_subgoal(subgoal)             # Written to subgoal_queue
    await campaign_tick()             # ~1 game-day real-time pause
```

King uses **Qwen3:32b** (the most capable local model) for strategic reasoning.
Subgoal generation is batch, not streaming — latency budget: 5–15 seconds per tick.

---

## 2. Vassal Agents — Reward Functions

Vassals are mid-tier agents responsible for a domain of the kingdom. Each vassal
has a defined reward function. Vassals run on **Qwen3:14b** (balanced capability
vs. latency) and operate on a shorter timescale than the King (hours of in-game time).

### 2a. War Vassal

**Domain:** Military operations — sieges, field battles, raids, defensive maneuvers.

**Reward function:**

```
R_war = w1 * ΔTerritoryValue
      + w2 * ΔArmyStrength_ratio
      - w3 * CasualtyCost
      - w4 * SupplyCost
      + w5 * SubgoalBonus(active_subgoal ∈ {EXPAND_TERRITORY, RAID_ECONOMY, TRAIN})
```

| Weight | Default | Rationale |
|---|---|---|
| w1 | 0.40 | Territory is the primary long-term asset |
| w2 | 0.25 | Army ratio relative to nearest rival |
| w3 | 0.20 | Casualties are expensive to replace |
| w4 | 0.10 | Supply burn limits campaign duration |
| w5 | 0.05 | King alignment bonus |

**Primitive actions available:** `move_party`, `siege_settlement`,
`raid_village`, `retreat`, `auto_resolve_battle`, `hire_mercenaries`.

### 2b. Economy Vassal

**Domain:** Settlement management, tax collection, construction, food supply.

**Reward function:**

```
R_econ = w1 * DailyDenarsIncome
       + w2 * FoodStockBuffer
       + w3 * LoyaltyAverage
       - w4 * ConstructionQueueLength
       + w5 * SubgoalBonus(active_subgoal ∈ {FORTIFY, CONSOLIDATE})
```

| Weight | Default | Rationale |
|---|---|---|
| w1 | 0.35 | Income is the fuel for everything |
| w2 | 0.25 | Starvation causes immediate loyalty crash |
| w3 | 0.20 | Low loyalty triggers revolt |
| w4 | 0.15 | Idle construction is opportunity cost |
| w5 | 0.05 | King alignment bonus |

**Primitive actions available:** `set_tax_policy`, `build_project`,
`distribute_food`, `appoint_governor`, `upgrade_garrison`.

### 2c. Diplomacy Vassal

**Domain:** Relations management — alliances, peace deals, tribute, marriage.

**Reward function:**

```
R_diplo = w1 * AlliesCount
        + w2 * TruceDurationValue
        + w3 * RelationsScore_weighted
        - w4 * ActiveWarsFront
        + w5 * SubgoalBonus(active_subgoal ∈ {ALLY})
```

**Primitive actions available:** `send_envoy`, `propose_peace`,
`offer_tribute`, `request_military_access`, `arrange_marriage`.

---

## 3. Companion Worker Task Primitives

Companions are the lowest tier — fast, specialized, single-purpose workers.
They run on **Qwen3:8b** (or smaller) for sub-2-second response times.
Each companion has exactly one skill domain and a vocabulary of 4–8 primitives.

### 3a. Logistics Companion (Party Management)

**Skill:** Scouting / Steward / Medicine hybrid role.

| Primitive | Effect | Trigger |
|---|---|---|
| `recruit_troop(type, qty)` | Buy troops at nearest town | RECRUIT subgoal |
| `buy_supplies(qty)` | Purchase food for march | Party food < 3 days |
| `rest_party(days)` | Idle in friendly town | Wound % > 30% or HEAL subgoal |
| `sell_prisoners(loc)` | Convert prisoners to denars | Prison > capacity |
| `upgrade_troops()` | Spend XP on troop upgrades | After battle or TRAIN |

### 3b. Caravan Companion (Trade)

**Skill:** Trade / Charm.

| Primitive | Effect | Trigger |
|---|---|---|
| `assess_prices(town)` | Query buy/sell prices | Entry to settlement |
| `buy_goods(item, qty)` | Purchase trade goods | Positive margin ≥ 15% |
| `sell_goods(item, qty)` | Sell at target settlement | Reached destination |
| `establish_caravan(town)` | Deploy caravan NPC | TRADE subgoal + denars > 10k |
| `abandon_route()` | Return to main party | Caravan threatened |

### 3c. Scout Companion (Intelligence)

**Skill:** Scouting / Roguery.

| Primitive | Effect | Trigger |
|---|---|---|
| `track_lord(name)` | Shadow enemy lord | SPY subgoal |
| `assess_garrison(settlement)` | Estimate defender count | Before siege proposal |
| `map_patrol_routes(region)` | Log enemy movement | Territorial expansion prep |
| `report_intel()` | Push findings to King | Scheduled or on demand |

---

## 4. Communication Protocol Between Hierarchy Levels

All agents communicate through a shared **Subgoal Queue** and **State Broadcast**
bus, implemented as in-process Python asyncio queues backed by SQLite for persistence.

### Message Types

```python
class SubgoalMessage(BaseModel):
    """King → Vassal direction"""
    msg_type: Literal["subgoal"] = "subgoal"
    from_agent: Literal["king"]
    to_agent: str                    # "war_vassal", "economy_vassal", etc.
    subgoal: KingSubgoal
    issued_at: datetime

class TaskMessage(BaseModel):
    """Vassal → Companion direction"""
    msg_type: Literal["task"] = "task"
    from_agent: str                  # "war_vassal", etc.
    to_agent: str                    # "logistics_companion", etc.
    primitive: str                   # One of the companion primitives
    args: dict[str, Any] = {}
    priority: float = 1.0
    issued_at: datetime

class ResultMessage(BaseModel):
    """Companion/Vassal → Parent direction"""
    msg_type: Literal["result"] = "result"
    from_agent: str
    to_agent: str
    success: bool
    outcome: dict[str, Any]          # Primitive-specific result data
    reward_delta: float              # Computed reward contribution
    completed_at: datetime

class StateUpdateMessage(BaseModel):
    """GABS → All agents (broadcast)"""
    msg_type: Literal["state"] = "state"
    game_state: dict[str, Any]       # Full GABS state snapshot
    tick: int
    timestamp: datetime
```

### Protocol Flow

```
GABS ──state_update──► King
                          │
                    subgoal_msg
                          │
             ┌────────────┼────────────┐
             ▼            ▼            ▼
         War Vassal   Econ Vassal  Diplo Vassal
             │            │            │
         task_msg      task_msg     task_msg
             │            │            │
        Logistics      Caravan       Scout
        Companion     Companion    Companion
             │            │            │
         result_msg    result_msg   result_msg
             │            │            │
             └────────────┼────────────┘
                          ▼
                     King (reward aggregation)
```

### Timing Constraints

| Level | Decision Frequency | LLM Budget |
|---|---|---|
| King | 1× per campaign day | 5–15 s |
| Vassal | 4× per campaign day | 2–5 s |
| Companion | On-demand / event-driven | < 2 s |

State updates from GABS arrive continuously; agents consume them at their
own cadence. No agent blocks another's queue.

### Conflict Resolution

If two vassals propose conflicting actions (e.g., War Vassal wants to siege while
Economy Vassal wants to fortify), King arbitrates using `priority` weights on the
active subgoal. The highest-priority active subgoal wins resource contention.

---

## 5. Sovereign Agent Properties

The King agent (Timmy) has sovereign properties that distinguish it from ordinary
worker agents. These map directly to Timmy's existing identity architecture.

### 5a. Decentralized Identifier (DID)

```
did:key:z6Mk<timmy-public-key>
```

The King's DID is persisted in `~/.timmy/identity.json` (existing SOUL.md pattern).
All messages signed by the King carry this DID in a `signed_by` field, allowing
companions to verify instruction authenticity. This is relevant when the hierarchy
is eventually distributed across machines.

### 5b. Asset Control

| Asset Class | Storage | Control Level |
|---|---|---|
| Kingdom treasury (denars) | GABS game state | King exclusive |
| Settlement ownership | GABS game state | King exclusive |
| Troop assignments | King → Vassal delegation | Delegated, revocable |
| Trade goods (caravan) | Companion-local | Companion autonomous within budget |
| Intel reports | `~/.timmy/bannerlord/intel/` | Read-all, write-companion |

Asset delegation is explicit. Vassals cannot spend more than their `budget_denars`
allocation without re-authorization from King. Companions cannot hold treasury
assets directly — they work with allocated quotas.

### 5c. Non-Terminability

The King agent cannot be terminated by vassal or companion agents.
Termination authority is reserved for:
1. The human operator (Ctrl+C or `timmy stop`)
2. A `SHUTDOWN` signal from the top-level orchestrator

Vassals can pause themselves (e.g., awaiting GABS state) but cannot signal the King
to stop. This prevents a misbehaving military vassal from ending the campaign.

Implementation: King runs in the main asyncio event loop. Vassals and companions
run in `asyncio.TaskGroup` subgroups. Only the King's task holds a reference to
the TaskGroup cancel scope.

---

## Implementation Path

This design connects directly to the existing Timmy codebase:

| Component | Maps to | Notes |
|---|---|---|
| King LLM calls | `infrastructure/llm_router/` | Cascade router for model selection |
| Subgoal Queue | `infrastructure/event_bus/` | Existing pub/sub pattern |
| Companion primitives | New `src/bannerlord/agents/` package | One module per companion |
| GABS state updates | `src/bannerlord/gabs_client.py` | TCP JSON-RPC, port 4825 |
| Asset ledger | `src/bannerlord/ledger.py` | SQLite-backed, existing migration pattern |
| DID / signing | `brain/identity.py` | Extends existing SOUL.md |

The next concrete step is implementing the GABS TCP client and the `KingSubgoal`
schema — everything else in this document depends on readable game state first.

---

## References

- Ahilan, S. & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative
  Reinforcement Learning. https://arxiv.org/abs/1901.08492
- Rood, S. (2022). Scaling Reinforcement Learning through Feudal Hierarchy (NPS thesis).
- Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language
  Models. https://arxiv.org/abs/2305.16291
- Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior.
  https://arxiv.org/abs/2304.03442
- Silveira, T. (2022). CiF-Bannerlord: Social AI Integration in Bannerlord.