feat: add custom weights, model registry, per-agent models, and reward scoring

Inspired by OpenClaw-RL's multi-model orchestration, this adds four features for custom model management: 1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed registry for GGUF, safetensors, HF checkpoint, and Ollama models with role-based lookups (general, reward, teacher, judge). 2. Per-agent model assignment — each swarm persona can use a different model instead of sharing the global default. Resolved via registry assignment > persona default > global default. 3. Runtime model management API (/api/v1/models) — REST endpoints to register, list, assign, enable/disable, and remove custom models without restart. Includes a dashboard page at /models. 4. Reward model scoring (PRM-style) — majority-vote quality evaluation of agent outputs using a configurable reward model. Scores persist in SQLite and feed into the swarm learner. New config settings: custom_weights_dir, reward_model_enabled, reward_model_name, reward_model_votes. 54 new tests covering registry CRUD, API endpoints, agent assignments, role lookups, and reward scoring. https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC
2026-02-27 01:08:03 +00:00
parent e4d5ec5ed4
commit 211c54bc8c
13 changed files with 1603 additions and 1 deletions
--- a/config/providers.yaml
+++ b/config/providers.yaml
@@ -68,6 +68,37 @@ providers:
      - name: claude-3-sonnet-20240229
        context_window: 200000

+# ── Custom Models ──────────────────────────────────────────────────────
+# Register custom model weights for per-agent assignment.
+# Supports GGUF (Ollama), safetensors, and HuggingFace checkpoint dirs.
+# Models can also be registered at runtime via the /api/v1/models API.
+#
+# Roles: general (default inference), reward (PRM scoring),
+#        teacher (distillation), judge (output evaluation)
+custom_models: []
+  # Example entries:
+  # - name: my-finetuned-llama
+  #   format: gguf
+  #   path: /path/to/model.gguf
+  #   role: general
+  #   context_window: 8192
+  #   description: "Fine-tuned Llama for code tasks"
+  #
+  # - name: reward-model
+  #   format: ollama
+  #   path: deepseek-r1:1.5b
+  #   role: reward
+  #   context_window: 32000
+  #   description: "Process reward model for scoring outputs"
+
+# ── Agent Model Assignments ─────────────────────────────────────────────
+# Map persona agent IDs to specific models.
+# Agents without an assignment use the global default (ollama_model).
+agent_model_assignments: {}
+  # Example:
+  # persona-forge: my-finetuned-llama
+  # persona-echo: deepseek-r1:1.5b
+
 # Cost tracking (optional, for budget monitoring)
 cost_tracking:
  enabled: true