This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
Timmy-time-dashboard/docs/research/bannerlord-feudal-hierarchy-design.md

13 KiB
Raw Blame History

Bannerlord Feudal Multi-Agent Hierarchy Design

Issue: #1099 Parent Epic: #1091 (Project Bannerlord) Date: 2026-03-23 Status: Draft


Overview

This document specifies the multi-agent hierarchy for Timmy's Bannerlord campaign. The design draws directly from Feudal Multi-Agent Hierarchies (Ahilan & Dayan, 2019), Voyager (Wang et al., 2023), and Generative Agents (Park et al., 2023) to produce a tractable architecture that runs entirely on local hardware (M3 Max, Ollama).

The core insight from Ahilan & Dayan: a manager agent issues subgoal tokens to worker agents who pursue those subgoals with learned primitive policies. Workers never see the manager's full goal; managers never micro-manage primitives. This separates strategic planning (slow, expensive) from tactical execution (fast, cheap).


1. King-Level Timmy — Subgoal Vocabulary

Timmy is the King agent. He operates on the campaign map timescale (days to weeks of in-game time). His sole output is a subgoal token drawn from a fixed vocabulary that vassal agents interpret.

Subgoal Token Schema

class KingSubgoal(BaseModel):
    token: str                    # One of the vocabulary entries below
    target: str | None = None     # Named target (settlement, lord, faction)
    quantity: int | None = None   # For RECRUIT, TRADE
    priority: float = 1.0         # 0.02.0, scales vassal reward
    deadline_days: int | None = None  # Campaign-map days to complete
    context: str | None = None    # Free-text hint (not parsed by workers)

Vocabulary (v1)

Token Meaning Primary Vassal
EXPAND_TERRITORY Take or secure a fief War Vassal
RAID_ECONOMY Raid enemy villages for denars War Vassal
FORTIFY Upgrade or repair a settlement Economy Vassal
RECRUIT Fill party to capacity Logistics Companion
TRADE Execute profitable trade route Caravan Companion
ALLY Pursue a non-aggression or alliance deal Diplomacy Vassal
SPY Gain information on target faction Scout Companion
HEAL Rest party until wounds recovered Logistics Companion
CONSOLIDATE Hold territory, no expansion Economy Vassal
TRAIN Level troops via auto-resolve bandits War Vassal

King updates the active subgoal at most once per campaign tick (configurable, default 1 in-game day). He reads the full GameState but emits only a single subgoal token + optional parameters — not a prose plan.

King Decision Loop

while campaign_running:
    state = gabs.get_state()          # Full kingdom + map snapshot
    subgoal = king_llm.decide(state)  # Qwen3:32b, temp=0.1, JSON mode
    emit_subgoal(subgoal)             # Written to subgoal_queue
    await campaign_tick()             # ~1 game-day real-time pause

King uses Qwen3:32b (the most capable local model) for strategic reasoning. Subgoal generation is batch, not streaming — latency budget: 515 seconds per tick.


2. Vassal Agents — Reward Functions

Vassals are mid-tier agents responsible for a domain of the kingdom. Each vassal has a defined reward function. Vassals run on Qwen3:14b (balanced capability vs. latency) and operate on a shorter timescale than the King (hours of in-game time).

2a. War Vassal

Domain: Military operations — sieges, field battles, raids, defensive maneuvers.

Reward function:

R_war = w1 * ΔTerritoryValue
      + w2 * ΔArmyStrength_ratio
      - w3 * CasualtyCost
      - w4 * SupplyCost
      + w5 * SubgoalBonus(active_subgoal ∈ {EXPAND_TERRITORY, RAID_ECONOMY, TRAIN})
Weight Default Rationale
w1 0.40 Territory is the primary long-term asset
w2 0.25 Army ratio relative to nearest rival
w3 0.20 Casualties are expensive to replace
w4 0.10 Supply burn limits campaign duration
w5 0.05 King alignment bonus

Primitive actions available: move_party, siege_settlement, raid_village, retreat, auto_resolve_battle, hire_mercenaries.

2b. Economy Vassal

Domain: Settlement management, tax collection, construction, food supply.

Reward function:

R_econ = w1 * DailyDenarsIncome
       + w2 * FoodStockBuffer
       + w3 * LoyaltyAverage
       - w4 * ConstructionQueueLength
       + w5 * SubgoalBonus(active_subgoal ∈ {FORTIFY, CONSOLIDATE})
Weight Default Rationale
w1 0.35 Income is the fuel for everything
w2 0.25 Starvation causes immediate loyalty crash
w3 0.20 Low loyalty triggers revolt
w4 0.15 Idle construction is opportunity cost
w5 0.05 King alignment bonus

Primitive actions available: set_tax_policy, build_project, distribute_food, appoint_governor, upgrade_garrison.

2c. Diplomacy Vassal

Domain: Relations management — alliances, peace deals, tribute, marriage.

Reward function:

R_diplo = w1 * AlliesCount
        + w2 * TruceDurationValue
        + w3 * RelationsScore_weighted
        - w4 * ActiveWarsFront
        + w5 * SubgoalBonus(active_subgoal ∈ {ALLY})

Primitive actions available: send_envoy, propose_peace, offer_tribute, request_military_access, arrange_marriage.


3. Companion Worker Task Primitives

Companions are the lowest tier — fast, specialized, single-purpose workers. They run on Qwen3:8b (or smaller) for sub-2-second response times. Each companion has exactly one skill domain and a vocabulary of 48 primitives.

3a. Logistics Companion (Party Management)

Skill: Scouting / Steward / Medicine hybrid role.

Primitive Effect Trigger
recruit_troop(type, qty) Buy troops at nearest town RECRUIT subgoal
buy_supplies(qty) Purchase food for march Party food < 3 days
rest_party(days) Idle in friendly town Wound % > 30% or HEAL subgoal
sell_prisoners(loc) Convert prisoners to denars Prison > capacity
upgrade_troops() Spend XP on troop upgrades After battle or TRAIN

3b. Caravan Companion (Trade)

Skill: Trade / Charm.

Primitive Effect Trigger
assess_prices(town) Query buy/sell prices Entry to settlement
buy_goods(item, qty) Purchase trade goods Positive margin ≥ 15%
sell_goods(item, qty) Sell at target settlement Reached destination
establish_caravan(town) Deploy caravan NPC TRADE subgoal + denars > 10k
abandon_route() Return to main party Caravan threatened

3c. Scout Companion (Intelligence)

Skill: Scouting / Roguery.

Primitive Effect Trigger
track_lord(name) Shadow enemy lord SPY subgoal
assess_garrison(settlement) Estimate defender count Before siege proposal
map_patrol_routes(region) Log enemy movement Territorial expansion prep
report_intel() Push findings to King Scheduled or on demand

4. Communication Protocol Between Hierarchy Levels

All agents communicate through a shared Subgoal Queue and State Broadcast bus, implemented as in-process Python asyncio queues backed by SQLite for persistence.

Message Types

class SubgoalMessage(BaseModel):
    """King → Vassal direction"""
    msg_type: Literal["subgoal"] = "subgoal"
    from_agent: Literal["king"]
    to_agent: str                    # "war_vassal", "economy_vassal", etc.
    subgoal: KingSubgoal
    issued_at: datetime

class TaskMessage(BaseModel):
    """Vassal → Companion direction"""
    msg_type: Literal["task"] = "task"
    from_agent: str                  # "war_vassal", etc.
    to_agent: str                    # "logistics_companion", etc.
    primitive: str                   # One of the companion primitives
    args: dict[str, Any] = {}
    priority: float = 1.0
    issued_at: datetime

class ResultMessage(BaseModel):
    """Companion/Vassal → Parent direction"""
    msg_type: Literal["result"] = "result"
    from_agent: str
    to_agent: str
    success: bool
    outcome: dict[str, Any]          # Primitive-specific result data
    reward_delta: float              # Computed reward contribution
    completed_at: datetime

class StateUpdateMessage(BaseModel):
    """GABS → All agents (broadcast)"""
    msg_type: Literal["state"] = "state"
    game_state: dict[str, Any]       # Full GABS state snapshot
    tick: int
    timestamp: datetime

Protocol Flow

GABS ──state_update──► King
                          │
                    subgoal_msg
                          │
             ┌────────────┼────────────┐
             ▼            ▼            ▼
         War Vassal   Econ Vassal  Diplo Vassal
             │            │            │
         task_msg      task_msg     task_msg
             │            │            │
        Logistics      Caravan       Scout
        Companion     Companion    Companion
             │            │            │
         result_msg    result_msg   result_msg
             │            │            │
             └────────────┼────────────┘
                          ▼
                     King (reward aggregation)

Timing Constraints

Level Decision Frequency LLM Budget
King 1× per campaign day 515 s
Vassal 4× per campaign day 25 s
Companion On-demand / event-driven < 2 s

State updates from GABS arrive continuously; agents consume them at their own cadence. No agent blocks another's queue.

Conflict Resolution

If two vassals propose conflicting actions (e.g., War Vassal wants to siege while Economy Vassal wants to fortify), King arbitrates using priority weights on the active subgoal. The highest-priority active subgoal wins resource contention.


5. Sovereign Agent Properties

The King agent (Timmy) has sovereign properties that distinguish it from ordinary worker agents. These map directly to Timmy's existing identity architecture.

5a. Decentralized Identifier (DID)

did:key:z6Mk<timmy-public-key>

The King's DID is persisted in ~/.timmy/identity.json (existing SOUL.md pattern). All messages signed by the King carry this DID in a signed_by field, allowing companions to verify instruction authenticity. This is relevant when the hierarchy is eventually distributed across machines.

5b. Asset Control

Asset Class Storage Control Level
Kingdom treasury (denars) GABS game state King exclusive
Settlement ownership GABS game state King exclusive
Troop assignments King → Vassal delegation Delegated, revocable
Trade goods (caravan) Companion-local Companion autonomous within budget
Intel reports ~/.timmy/bannerlord/intel/ Read-all, write-companion

Asset delegation is explicit. Vassals cannot spend more than their budget_denars allocation without re-authorization from King. Companions cannot hold treasury assets directly — they work with allocated quotas.

5c. Non-Terminability

The King agent cannot be terminated by vassal or companion agents. Termination authority is reserved for:

  1. The human operator (Ctrl+C or timmy stop)
  2. A SHUTDOWN signal from the top-level orchestrator

Vassals can pause themselves (e.g., awaiting GABS state) but cannot signal the King to stop. This prevents a misbehaving military vassal from ending the campaign.

Implementation: King runs in the main asyncio event loop. Vassals and companions run in asyncio.TaskGroup subgroups. Only the King's task holds a reference to the TaskGroup cancel scope.


Implementation Path

This design connects directly to the existing Timmy codebase:

Component Maps to Notes
King LLM calls infrastructure/llm_router/ Cascade router for model selection
Subgoal Queue infrastructure/event_bus/ Existing pub/sub pattern
Companion primitives New src/bannerlord/agents/ package One module per companion
GABS state updates src/bannerlord/gabs_client.py TCP JSON-RPC, port 4825
Asset ledger src/bannerlord/ledger.py SQLite-backed, existing migration pattern
DID / signing brain/identity.py Extends existing SOUL.md

The next concrete step is implementing the GABS TCP client and the KingSubgoal schema — everything else in this document depends on readable game state first.


References

  • Ahilan, S. & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. https://arxiv.org/abs/1901.08492
  • Rood, S. (2022). Scaling Reinforcement Learning through Feudal Hierarchy (NPS thesis).
  • Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. https://arxiv.org/abs/2305.16291
  • Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. https://arxiv.org/abs/2304.03442
  • Silveira, T. (2022). CiF-Bannerlord: Social AI Integration in Bannerlord.