hermes-agent

Files

Peppi Littera ec5fdb8b92 feat: query local servers for actual context window size

Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall
back to 2M tokens when /v1/models doesn't include context_length.

Adds _query_local_context_length() which queries server-specific APIs:
- LM Studio: /api/v1/models (max_context_length + loaded instances)
- Ollama: /api/show (model_info + num_ctx parameters)
- llama.cpp: /props (n_ctx from default_generation_settings)
- vLLM: /v1/models/{model} (max_model_len)

Prefers loaded instance context over max (e.g., 122K loaded vs 1M max).
Results are cached via save_context_length() to avoid repeated queries.

Also fixes detect_local_server_type() misidentifying LM Studio as
Ollama (LM Studio returns 200 for /api/tags with an error body).

2026-03-19 21:32:04 +01:00

__init__.py

Refactor Terminal and AIAgent cleanup

2026-02-21 22:31:43 -08:00

anthropic_adapter.py

fix(anthropic): tool_choice 'none' still allowed tool calls

2026-03-17 04:02:49 -07:00

auxiliary_client.py

fix: respect config.yaml model.base_url for Anthropic provider (#1948 ) (#1998 )

2026-03-18 16:51:24 -07:00

context_compressor.py

fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051 )