Archived

forked from Rockachopa/Timmy-time-dashboard

This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Claude (Opus 4.6) 62e7cfeffb [claude] Feudal multi-agent hierarchy design for Bannerlord (#1099 ) (#1108 )

2026-03-23 17:57:32 +00:00

13 KiB

Raw Blame History

Bannerlord Feudal Multi-Agent Hierarchy Design

Issue: #1099 Parent Epic: #1091 (Project Bannerlord) Date: 2026-03-23 Status: Draft

Overview

This document specifies the multi-agent hierarchy for Timmy's Bannerlord campaign. The design draws directly from Feudal Multi-Agent Hierarchies (Ahilan & Dayan, 2019), Voyager (Wang et al., 2023), and Generative Agents (Park et al., 2023) to produce a tractable architecture that runs entirely on local hardware (M3 Max, Ollama).

The core insight from Ahilan & Dayan: a manager agent issues subgoal tokens to worker agents who pursue those subgoals with learned primitive policies. Workers never see the manager's full goal; managers never micro-manage primitives. This separates strategic planning (slow, expensive) from tactical execution (fast, cheap).

1. King-Level Timmy — Subgoal Vocabulary

Timmy is the King agent. He operates on the campaign map timescale (days to weeks of in-game time). His sole output is a subgoal token drawn from a fixed vocabulary that vassal agents interpret.

Subgoal Token Schema

class KingSubgoal(BaseModel):
    token: str                    # One of the vocabulary entries below
    target: str | None = None     # Named target (settlement, lord, faction)
    quantity: int | None = None   # For RECRUIT, TRADE
    priority: float = 1.0         # 0.0–2.0, scales vassal reward
    deadline_days: int | None = None  # Campaign-map days to complete
    context: str | None = None    # Free-text hint (not parsed by workers)

Vocabulary (v1)

Token	Meaning	Primary Vassal
`EXPAND_TERRITORY`	Take or secure a fief	War Vassal
`RAID_ECONOMY`	Raid enemy villages for denars	War Vassal
`FORTIFY`	Upgrade or repair a settlement	Economy Vassal
`RECRUIT`	Fill party to capacity	Logistics Companion
`TRADE`	Execute profitable trade route	Caravan Companion
`ALLY`	Pursue a non-aggression or alliance deal	Diplomacy Vassal
`SPY`	Gain information on target faction	Scout Companion
`HEAL`	Rest party until wounds recovered	Logistics Companion
`CONSOLIDATE`	Hold territory, no expansion	Economy Vassal
`TRAIN`	Level troops via auto-resolve bandits	War Vassal

King updates the active subgoal at most once per campaign tick (configurable, default 1 in-game day). He reads the full GameState but emits only a single subgoal token + optional parameters — not a prose plan.

King Decision Loop

while campaign_running:
    state = gabs.get_state()          # Full kingdom + map snapshot
    subgoal = king_llm.decide(state)  # Qwen3:32b, temp=0.1, JSON mode
    emit_subgoal(subgoal)             # Written to subgoal_queue
    await campaign_tick()             # ~1 game-day real-time pause

King uses Qwen3:32b (the most capable local model) for strategic reasoning. Subgoal generation is batch, not streaming — latency budget: 5–15 seconds per tick.

2. Vassal Agents — Reward Functions

Vassals are mid-tier agents responsible for a domain of the kingdom. Each vassal has a defined reward function. Vassals run on Qwen3:14b (balanced capability vs. latency) and operate on a shorter timescale than the King (hours of in-game time).

2a. War Vassal

Domain: Military operations — sieges, field battles, raids, defensive maneuvers.

Reward function:

R_war = w1 * ΔTerritoryValue
      + w2 * ΔArmyStrength_ratio
      - w3 * CasualtyCost
      - w4 * SupplyCost
      + w5 * SubgoalBonus(active_subgoal ∈ {EXPAND_TERRITORY, RAID_ECONOMY, TRAIN})

Weight	Default	Rationale
w1	0.40	Territory is the primary long-term asset
w2	0.25	Army ratio relative to nearest rival
w3	0.20	Casualties are expensive to replace
w4	0.10	Supply burn limits campaign duration
w5	0.05	King alignment bonus

Primitive actions available: move_party, siege_settlement, raid_village, retreat, auto_resolve_battle, hire_mercenaries.

2b. Economy Vassal

Domain: Settlement management, tax collection, construction, food supply.

Reward function:

R_econ = w1 * DailyDenarsIncome
       + w2 * FoodStockBuffer
       + w3 * LoyaltyAverage
       - w4 * ConstructionQueueLength
       + w5 * SubgoalBonus(active_subgoal ∈ {FORTIFY, CONSOLIDATE})

Weight	Default	Rationale
w1	0.35	Income is the fuel for everything
w2	0.25	Starvation causes immediate loyalty crash
w3	0.20	Low loyalty triggers revolt
w4	0.15	Idle construction is opportunity cost
w5	0.05	King alignment bonus

Primitive actions available: set_tax_policy, build_project, distribute_food, appoint_governor, upgrade_garrison.

2c. Diplomacy Vassal

Domain: Relations management — alliances, peace deals, tribute, marriage.

Reward function:

R_diplo = w1 * AlliesCount
        + w2 * TruceDurationValue
        + w3 * RelationsScore_weighted
        - w4 * ActiveWarsFront
        + w5 * SubgoalBonus(active_subgoal ∈ {ALLY})

Primitive actions available: send_envoy, propose_peace, offer_tribute, request_military_access, arrange_marriage.

3. Companion Worker Task Primitives

Companions are the lowest tier — fast, specialized, single-purpose workers. They run on Qwen3:8b (or smaller) for sub-2-second response times. Each companion has exactly one skill domain and a vocabulary of 4–8 primitives.

3a. Logistics Companion (Party Management)

Skill: Scouting / Steward / Medicine hybrid role.

Primitive	Effect	Trigger
`recruit_troop(type, qty)`	Buy troops at nearest town	RECRUIT subgoal
`buy_supplies(qty)`	Purchase food for march	Party food < 3 days
`rest_party(days)`	Idle in friendly town	Wound % > 30% or HEAL subgoal
`sell_prisoners(loc)`	Convert prisoners to denars	Prison > capacity
`upgrade_troops()`	Spend XP on troop upgrades	After battle or TRAIN

3b. Caravan Companion (Trade)

Skill: Trade / Charm.

Primitive	Effect	Trigger
`assess_prices(town)`	Query buy/sell prices	Entry to settlement
`buy_goods(item, qty)`	Purchase trade goods	Positive margin ≥ 15%
`sell_goods(item, qty)`	Sell at target settlement	Reached destination
`establish_caravan(town)`	Deploy caravan NPC	TRADE subgoal + denars > 10k
`abandon_route()`	Return to main party	Caravan threatened

3c. Scout Companion (Intelligence)

Skill: Scouting / Roguery.

Primitive	Effect	Trigger
`track_lord(name)`	Shadow enemy lord	SPY subgoal
`assess_garrison(settlement)`	Estimate defender count	Before siege proposal
`map_patrol_routes(region)`	Log enemy movement	Territorial expansion prep
`report_intel()`	Push findings to King	Scheduled or on demand

4. Communication Protocol Between Hierarchy Levels

All agents communicate through a shared Subgoal Queue and State Broadcast bus, implemented as in-process Python asyncio queues backed by SQLite for persistence.

Message Types

class SubgoalMessage(BaseModel):
    """King → Vassal direction"""
    msg_type: Literal["subgoal"] = "subgoal"
    from_agent: Literal["king"]
    to_agent: str                    # "war_vassal", "economy_vassal", etc.
    subgoal: KingSubgoal
    issued_at: datetime

class TaskMessage(BaseModel):
    """Vassal → Companion direction"""
    msg_type: Literal["task"] = "task"
    from_agent: str                  # "war_vassal", etc.
    to_agent: str                    # "logistics_companion", etc.
    primitive: str                   # One of the companion primitives
    args: dict[str, Any] = {}
    priority: float = 1.0
    issued_at: datetime

class ResultMessage(BaseModel):
    """Companion/Vassal → Parent direction"""
    msg_type: Literal["result"] = "result"
    from_agent: str
    to_agent: str
    success: bool
    outcome: dict[str, Any]          # Primitive-specific result data
    reward_delta: float              # Computed reward contribution
    completed_at: datetime

class StateUpdateMessage(BaseModel):
    """GABS → All agents (broadcast)"""
    msg_type: Literal["state"] = "state"
    game_state: dict[str, Any]       # Full GABS state snapshot
    tick: int
    timestamp: datetime

Protocol Flow

GABS ──state_update──► King
                          │
                    subgoal_msg
                          │
             ┌────────────┼────────────┐
             ▼            ▼            ▼
         War Vassal   Econ Vassal  Diplo Vassal
             │            │            │
         task_msg      task_msg     task_msg
             │            │            │
        Logistics      Caravan       Scout
        Companion     Companion    Companion
             │            │            │
         result_msg    result_msg   result_msg
             │            │            │
             └────────────┼────────────┘
                          ▼
                     King (reward aggregation)

Timing Constraints

Level	Decision Frequency	LLM Budget
King	1× per campaign day	5–15 s
Vassal	4× per campaign day	2–5 s
Companion	On-demand / event-driven	< 2 s

State updates from GABS arrive continuously; agents consume them at their own cadence. No agent blocks another's queue.

Conflict Resolution

If two vassals propose conflicting actions (e.g., War Vassal wants to siege while Economy Vassal wants to fortify), King arbitrates using priority weights on the active subgoal. The highest-priority active subgoal wins resource contention.

5. Sovereign Agent Properties

The King agent (Timmy) has sovereign properties that distinguish it from ordinary worker agents. These map directly to Timmy's existing identity architecture.

5a. Decentralized Identifier (DID)

did:key:z6Mk<timmy-public-key>

The King's DID is persisted in ~/.timmy/identity.json (existing SOUL.md pattern). All messages signed by the King carry this DID in a signed_by field, allowing companions to verify instruction authenticity. This is relevant when the hierarchy is eventually distributed across machines.

5b. Asset Control

Asset Class	Storage	Control Level
Kingdom treasury (denars)	GABS game state	King exclusive
Settlement ownership	GABS game state	King exclusive
Troop assignments	King → Vassal delegation	Delegated, revocable
Trade goods (caravan)	Companion-local	Companion autonomous within budget
Intel reports	`~/.timmy/bannerlord/intel/`	Read-all, write-companion

Asset delegation is explicit. Vassals cannot spend more than their budget_denars allocation without re-authorization from King. Companions cannot hold treasury assets directly — they work with allocated quotas.

5c. Non-Terminability

The King agent cannot be terminated by vassal or companion agents. Termination authority is reserved for:

The human operator (Ctrl+C or timmy stop)
A SHUTDOWN signal from the top-level orchestrator

Vassals can pause themselves (e.g., awaiting GABS state) but cannot signal the King to stop. This prevents a misbehaving military vassal from ending the campaign.

Implementation: King runs in the main asyncio event loop. Vassals and companions run in asyncio.TaskGroup subgroups. Only the King's task holds a reference to the TaskGroup cancel scope.

Implementation Path

This design connects directly to the existing Timmy codebase:

Component	Maps to	Notes
King LLM calls	`infrastructure/llm_router/`	Cascade router for model selection
Subgoal Queue	`infrastructure/event_bus/`	Existing pub/sub pattern
Companion primitives	New `src/bannerlord/agents/` package	One module per companion
GABS state updates	`src/bannerlord/gabs_client.py`	TCP JSON-RPC, port 4825
Asset ledger	`src/bannerlord/ledger.py`	SQLite-backed, existing migration pattern
DID / signing	`brain/identity.py`	Extends existing SOUL.md

The next concrete step is implementing the GABS TCP client and the KingSubgoal schema — everything else in this document depends on readable game state first.

References

Ahilan, S. & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. https://arxiv.org/abs/1901.08492
Rood, S. (2022). Scaling Reinforcement Learning through Feudal Hierarchy (NPS thesis).
Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. https://arxiv.org/abs/2305.16291
Park, J.S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. https://arxiv.org/abs/2304.03442
Silveira, T. (2022). CiF-Bannerlord: Social AI Integration in Bannerlord.

13 KiB Raw Blame History Unescape Escape

Bannerlord Feudal Multi-Agent Hierarchy Design

Overview

1. King-Level Timmy — Subgoal Vocabulary

Subgoal Token Schema

Vocabulary (v1)

King Decision Loop

2. Vassal Agents — Reward Functions

2a. War Vassal

2b. Economy Vassal

2c. Diplomacy Vassal

3. Companion Worker Task Primitives

3a. Logistics Companion (Party Management)

3b. Caravan Companion (Trade)

3c. Scout Companion (Intelligence)

4. Communication Protocol Between Hierarchy Levels

Message Types

Protocol Flow

Timing Constraints

Conflict Resolution

5. Sovereign Agent Properties

5a. Decentralized Identifier (DID)

5b. Asset Control

5c. Non-Terminability

Implementation Path

References

13 KiB

Raw Blame History