feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
"""Models.dev registry integration for provider-aware context length detection.
|
|
|
|
|
|
|
|
|
|
Fetches model metadata from https://models.dev/api.json — a community-maintained
|
|
|
|
|
database of 3800+ models across 100+ providers, including per-provider context
|
|
|
|
|
windows, pricing, and capabilities.
|
|
|
|
|
|
|
|
|
|
Data is cached in memory (1hr TTL) and on disk (~/.hermes/models_dev_cache.json)
|
|
|
|
|
to avoid cold-start network latency.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import json
|
|
|
|
|
import logging
|
|
|
|
|
import os
|
|
|
|
|
import time
|
|
|
|
|
from pathlib import Path
|
|
|
|
|
from typing import Any, Dict, Optional
|
|
|
|
|
|
2026-03-28 14:20:30 -07:00
|
|
|
from utils import atomic_json_write
|
|
|
|
|
|
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
import requests
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
MODELS_DEV_URL = "https://models.dev/api.json"
|
|
|
|
|
_MODELS_DEV_CACHE_TTL = 3600 # 1 hour in-memory
|
|
|
|
|
|
|
|
|
|
# In-memory cache
|
|
|
|
|
_models_dev_cache: Dict[str, Any] = {}
|
|
|
|
|
_models_dev_cache_time: float = 0
|
|
|
|
|
|
|
|
|
|
# Provider ID mapping: Hermes provider names → models.dev provider IDs
|
|
|
|
|
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
|
|
|
|
|
"openrouter": "openrouter",
|
|
|
|
|
"anthropic": "anthropic",
|
|
|
|
|
"zai": "zai",
|
|
|
|
|
"kimi-coding": "kimi-for-coding",
|
|
|
|
|
"minimax": "minimax",
|
|
|
|
|
"minimax-cn": "minimax-cn",
|
|
|
|
|
"deepseek": "deepseek",
|
|
|
|
|
"alibaba": "alibaba",
|
|
|
|
|
"copilot": "github-copilot",
|
|
|
|
|
"ai-gateway": "vercel",
|
|
|
|
|
"opencode-zen": "opencode",
|
|
|
|
|
"opencode-go": "opencode-go",
|
|
|
|
|
"kilocode": "kilo",
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_cache_path() -> Path:
|
|
|
|
|
"""Return path to disk cache file."""
|
|
|
|
|
env_val = os.environ.get("HERMES_HOME", "")
|
|
|
|
|
hermes_home = Path(env_val) if env_val else Path.home() / ".hermes"
|
|
|
|
|
return hermes_home / "models_dev_cache.json"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _load_disk_cache() -> Dict[str, Any]:
|
|
|
|
|
"""Load models.dev data from disk cache."""
|
|
|
|
|
try:
|
|
|
|
|
cache_path = _get_cache_path()
|
|
|
|
|
if cache_path.exists():
|
|
|
|
|
with open(cache_path, encoding="utf-8") as f:
|
|
|
|
|
return json.load(f)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Failed to load models.dev disk cache: %s", e)
|
|
|
|
|
return {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _save_disk_cache(data: Dict[str, Any]) -> None:
|
2026-03-28 14:20:30 -07:00
|
|
|
"""Save models.dev data to disk cache atomically."""
|
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
try:
|
|
|
|
|
cache_path = _get_cache_path()
|
2026-03-28 14:20:30 -07:00
|
|
|
atomic_json_write(cache_path, data, indent=None, separators=(",", ":"))
|
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Failed to save models.dev disk cache: %s", e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
|
|
|
|
|
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
|
|
|
|
|
|
|
|
|
|
Returns the full registry dict keyed by provider ID, or empty dict on failure.
|
|
|
|
|
"""
|
|
|
|
|
global _models_dev_cache, _models_dev_cache_time
|
|
|
|
|
|
|
|
|
|
# Check in-memory cache
|
|
|
|
|
if (
|
|
|
|
|
not force_refresh
|
|
|
|
|
and _models_dev_cache
|
|
|
|
|
and (time.time() - _models_dev_cache_time) < _MODELS_DEV_CACHE_TTL
|
|
|
|
|
):
|
|
|
|
|
return _models_dev_cache
|
|
|
|
|
|
|
|
|
|
# Try network fetch
|
|
|
|
|
try:
|
|
|
|
|
response = requests.get(MODELS_DEV_URL, timeout=15)
|
|
|
|
|
response.raise_for_status()
|
|
|
|
|
data = response.json()
|
|
|
|
|
if isinstance(data, dict) and len(data) > 0:
|
|
|
|
|
_models_dev_cache = data
|
|
|
|
|
_models_dev_cache_time = time.time()
|
|
|
|
|
_save_disk_cache(data)
|
|
|
|
|
logger.debug(
|
|
|
|
|
"Fetched models.dev registry: %d providers, %d total models",
|
|
|
|
|
len(data),
|
|
|
|
|
sum(len(p.get("models", {})) for p in data.values() if isinstance(p, dict)),
|
|
|
|
|
)
|
|
|
|
|
return data
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Failed to fetch models.dev: %s", e)
|
|
|
|
|
|
2026-03-20 08:52:37 -07:00
|
|
|
# Fall back to disk cache — use a short TTL (5 min) so we retry
|
|
|
|
|
# the network fetch soon instead of serving stale data for a full hour.
|
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
if not _models_dev_cache:
|
|
|
|
|
_models_dev_cache = _load_disk_cache()
|
|
|
|
|
if _models_dev_cache:
|
2026-03-20 08:52:37 -07:00
|
|
|
_models_dev_cache_time = time.time() - _MODELS_DEV_CACHE_TTL + 300
|
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
|
|
|
logger.debug("Loaded models.dev from disk cache (%d providers)", len(_models_dev_cache))
|
|
|
|
|
|
|
|
|
|
return _models_dev_cache
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
|
|
|
|
|
"""Look up context_length for a provider+model combo in models.dev.
|
|
|
|
|
|
|
|
|
|
Returns the context window in tokens, or None if not found.
|
|
|
|
|
Handles case-insensitive matching and filters out context=0 entries.
|
|
|
|
|
"""
|
|
|
|
|
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
|
|
|
|
|
if not mdev_provider_id:
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
data = fetch_models_dev()
|
|
|
|
|
provider_data = data.get(mdev_provider_id)
|
|
|
|
|
if not isinstance(provider_data, dict):
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
models = provider_data.get("models", {})
|
|
|
|
|
if not isinstance(models, dict):
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
# Exact match
|
|
|
|
|
entry = models.get(model)
|
|
|
|
|
if entry:
|
|
|
|
|
ctx = _extract_context(entry)
|
|
|
|
|
if ctx:
|
|
|
|
|
return ctx
|
|
|
|
|
|
|
|
|
|
# Case-insensitive match
|
|
|
|
|
model_lower = model.lower()
|
|
|
|
|
for mid, mdata in models.items():
|
|
|
|
|
if mid.lower() == model_lower:
|
|
|
|
|
ctx = _extract_context(mdata)
|
|
|
|
|
if ctx:
|
|
|
|
|
return ctx
|
|
|
|
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _extract_context(entry: Dict[str, Any]) -> Optional[int]:
|
|
|
|
|
"""Extract context_length from a models.dev model entry.
|
|
|
|
|
|
|
|
|
|
Returns None for invalid/zero values (some audio/image models have context=0).
|
|
|
|
|
"""
|
|
|
|
|
if not isinstance(entry, dict):
|
|
|
|
|
return None
|
|
|
|
|
limit = entry.get("limit")
|
|
|
|
|
if not isinstance(limit, dict):
|
|
|
|
|
return None
|
|
|
|
|
ctx = limit.get("context")
|
|
|
|
|
if isinstance(ctx, (int, float)) and ctx > 0:
|
|
|
|
|
return int(ctx)
|
|
|
|
|
return None
|