feat: add custom weights, model registry, per-agent models, and reward scoring
Inspired by OpenClaw-RL's multi-model orchestration, this adds four features for custom model management: 1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed registry for GGUF, safetensors, HF checkpoint, and Ollama models with role-based lookups (general, reward, teacher, judge). 2. Per-agent model assignment — each swarm persona can use a different model instead of sharing the global default. Resolved via registry assignment > persona default > global default. 3. Runtime model management API (/api/v1/models) — REST endpoints to register, list, assign, enable/disable, and remove custom models without restart. Includes a dashboard page at /models. 4. Reward model scoring (PRM-style) — majority-vote quality evaluation of agent outputs using a configurable reward model. Scores persist in SQLite and feed into the swarm learner. New config settings: custom_weights_dir, reward_model_enabled, reward_model_name, reward_model_votes. 54 new tests covering registry CRUD, API endpoints, agent assignments, role lookups, and reward scoring. https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC
This commit is contained in:
@@ -68,6 +68,37 @@ providers:
|
||||
- name: claude-3-sonnet-20240229
|
||||
context_window: 200000
|
||||
|
||||
# ── Custom Models ──────────────────────────────────────────────────────
|
||||
# Register custom model weights for per-agent assignment.
|
||||
# Supports GGUF (Ollama), safetensors, and HuggingFace checkpoint dirs.
|
||||
# Models can also be registered at runtime via the /api/v1/models API.
|
||||
#
|
||||
# Roles: general (default inference), reward (PRM scoring),
|
||||
# teacher (distillation), judge (output evaluation)
|
||||
custom_models: []
|
||||
# Example entries:
|
||||
# - name: my-finetuned-llama
|
||||
# format: gguf
|
||||
# path: /path/to/model.gguf
|
||||
# role: general
|
||||
# context_window: 8192
|
||||
# description: "Fine-tuned Llama for code tasks"
|
||||
#
|
||||
# - name: reward-model
|
||||
# format: ollama
|
||||
# path: deepseek-r1:1.5b
|
||||
# role: reward
|
||||
# context_window: 32000
|
||||
# description: "Process reward model for scoring outputs"
|
||||
|
||||
# ── Agent Model Assignments ─────────────────────────────────────────────
|
||||
# Map persona agent IDs to specific models.
|
||||
# Agents without an assignment use the global default (ollama_model).
|
||||
agent_model_assignments: {}
|
||||
# Example:
|
||||
# persona-forge: my-finetuned-llama
|
||||
# persona-echo: deepseek-r1:1.5b
|
||||
|
||||
# Cost tracking (optional, for budget monitoring)
|
||||
cost_tracking:
|
||||
enabled: true
|
||||
|
||||
Reference in New Issue
Block a user