Commit Graph

466 Commits

Author SHA1 Message Date
Alexander Whitestone
4214082fb6 feat: A2A auth — mutual TLS between fleet agents
All checks were successful
Lint / lint (pull_request) Successful in 8s
Implements mTLS for securing agent-to-agent communication in the Hermes
fleet. Fixes #806.

Changes:
- scripts/gen_fleet_ca.sh: generate a self-signed Fleet CA (4096-bit RSA,
  10-year validity) that signs all agent certificates
- scripts/gen_agent_cert.sh: generate per-agent certs (Timmy, Allegro,
  Ezra) signed by the fleet CA with SAN entries and clientAuth/serverAuth
  extended key usage
- agent/mtls.py: new module providing:
  - build_server_ssl_context() — TLS_SERVER context with CERT_REQUIRED,
    enforces client cert against Fleet CA
  - build_client_ssl_context() — TLS_CLIENT context for outbound A2A calls
  - MTLSMiddleware — ASGI middleware that rejects unauthenticated requests
    to A2A routes (/.well-known/agent-card*, /api/agent-card, /a2a/) with
    HTTP 403 when mTLS is enabled
  - is_mtls_configured() — checks HERMES_MTLS_CERT/KEY/CA env vars
- hermes_cli/web_server.py: wire MTLSMiddleware into the FastAPI app;
  pass SSL context to uvicorn when HERMES_MTLS_* env vars are set so
  the server runs TLS with mandatory client cert verification
- ansible/roles/hermes_mtls/: Ansible role to distribute Fleet CA cert,
  agent cert, and agent key to fleet nodes; writes an env file with
  HERMES_MTLS_* vars and restarts the hermes-gateway service
- ansible/fleet_mtls.yml: fleet-wide playbook referencing the role for
  Timmy, Allegro, and Ezra nodes
- tests/test_mtls.py: 15 tests covering is_mtls_configured, SSL context
  creation with real cryptography-generated certs, and MTLSMiddleware
  (unauthorized agent rejected → 403, authorized agent accepted → 200)

mTLS is opt-in: set HERMES_MTLS_CERT, HERMES_MTLS_KEY, and HERMES_MTLS_CA
to enable. When unset, the server behaves exactly as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:04:00 -04:00
Alexander Whitestone
ac28444bf2 feat: add A2AMTLSServer routing API, A2AMTLSClient, and expand tests to 20 (#806)
All checks were successful
Lint / lint (pull_request) Successful in 9s
Builds on the existing A2AServer / build_*_ssl_context foundation:

- agent/a2a_mtls.py:
  - Add A2AMTLSServer: routing-based HTTPS server with add_route() and
    context-manager (__enter__/__exit__) lifecycle support
  - Add A2AMTLSClient: fleet-cert-presenting HTTP client with .get() / .post()
  - Widen imports (json, Callable, Dict, urlopen)

- tests/agent/test_a2a_mtls.py:
  - Fix datetime.utcnow() deprecation — use datetime.now(timezone.utc)
  - Add TestA2AMTLSServerAndClient (9 tests): routing GET/POST, 404,
    context-manager stop, rogue-cert rejection, A2AMTLSClient, concurrency
  - Total: 11 → 20 passing tests

Refs #806
2026-04-21 15:21:10 -04:00
Alexander Whitestone
91faf6f956 feat: A2A auth — mutual TLS between fleet agents
All checks were successful
Lint / lint (pull_request) Successful in 10s
Implements mutual TLS for secure agent-to-agent communication (#806).

- scripts/gen_fleet_ca.sh: generate fleet CA (4096-bit RSA, 10-year)
- scripts/gen_agent_cert.sh: per-agent cert signed by fleet CA (timmy, allegro, ezra)
- agent/a2a_mtls.py: A2AServer requiring client cert verification (CERT_REQUIRED),
  build_server_ssl_context / build_client_ssl_context helpers, server_from_env()
- ansible/roles/fleet_mtls_certs/: distribute CA + per-agent certs to fleet nodes,
  write /etc/hermes/a2a.env, notify hermes-a2a service on change
- ansible/fleet_mtls.yml + ansible/inventory/fleet.ini.example: playbook + example inventory
- tests/agent/test_a2a_mtls.py: 11 tests — authorized agent accepted (200/202),
  self-signed cert rejected, no-cert rejected, lifecycle, env-var wiring

Fixes #806

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 13:28:28 -04:00
46668505bc Merge pull request 'feat: tool fixation detection — break repetitive loops (#886)' (#914) from fix/886 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:35:08 +00:00
cac0c8224e Merge pull request 'fix: circuit breaker for error cascading (2.33x amplification)' (#927) from fix/885-circuit-breaker into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:35:04 +00:00
690d100afc Merge pull request 'feat: Poka-yoke token budget — progressive context overflow guard (#925)' (#943) from burn/925-1776770102 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Failing after 5s
Tests / e2e (push) Successful in 5m8s
Tests / test (push) Failing after 30m13s
Nix / nix (macos-latest) (push) Has been cancelled
2026-04-21 15:29:02 +00:00
bc55f40505 Merge pull request 'feat: time-aware model routing for cron jobs (#889)' (#909) from fix/889 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:43 +00:00
2adc72335e Merge pull request 'fix: profile session isolation — tag and filter by profile' (#907) from fix/891-profile-isolation into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:39 +00:00
719bb537c0 Merge pull request 'feat: provider preflight validation before session start (#924)' (#932) from fix/924 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:02 +00:00
27d2f2ca0e Merge pull request 'feat: Prevent context window overflow via proactive token counting (#838)' (#905) from fix/838-1776402240 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:31 +00:00
8ac26f54a5 feat: token budget with progressive poka-yoke thresholds (#925) 2026-04-21 11:40:39 +00:00
bdd0f2709b feat: provider preflight validation before session start (#924)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 47s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s
Tests / test (pull_request) Failing after 30m48s
Tests / e2e (pull_request) Successful in 2m9s
2026-04-21 04:48:57 +00:00
ccaa1cb021 feat: circuit breaker for error cascading
Closes #885

2.33x error cascade factor detected. After 3 consecutive errors,
circuit opens and agent must take corrective action.

Recovery pattern: terminal is the safety net (2300 recoveries).
2026-04-21 00:28:14 +00:00
Alexander Whitestone
ab968e910c feat: tool fixation detection — break repetitive loops (#886)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 37s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 43s
Tests / e2e (pull_request) Successful in 1m57s
Tests / test (pull_request) Failing after 18m57s
Marathon sessions show tool fixation: agent latches onto one tool
and calls it repeatedly. Observed streaks of 8-25 identical calls.

New agent/tool_fixation_detector.py:
- ToolFixationDetector: tracks consecutive tool calls
- record(tool_name): returns nudge prompt when threshold reached
- Default threshold: 5 consecutive calls (configurable via
  TOOL_FIXATION_THRESHOLD env var)
- Nudge prompt explains the fixation and suggests alternatives:
  1. Read error carefully
  2. Try different tool
  3. Ask user for clarification
  4. Check if task is complete
- get_streak_info(): current streak state
- format_report(): human-readable fixation events
- Singleton via get_fixation_detector()

Config:
- TOOL_FIXATION_THRESHOLD (default: 5)
- TOOL_FIXATION_WINDOW (default: 10)

Tests: tests/test_tool_fixation_detector.py (9 tests)

Closes #886
2026-04-17 01:57:37 -04:00
Alexander Whitestone
0b72884750 feat: time-aware model routing for cron jobs (#889)
Some checks failed
Tests / test (pull_request) Failing after 25m4s
Tests / e2e (pull_request) Successful in 3m19s
Contributor Attribution Check / check-attribution (pull_request) Failing after 14s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s
Error rate peaks at 18:00 (9.4%) during evening cron batches vs 4.0%
at 09:00 during interactive work. Route cron tasks to stronger models
during off-hours when user is not present to correct errors.

New agent/time_aware_routing.py:
- resolve_time_aware_model(): routes based on hour, error rate, task type
- Interactive sessions: always use base model (user corrects errors)
- Cron during business hours: use base model (low error rate)
- Cron during off-hours with high error rate (>6%): upgrade to strong model
- get_hour_error_rate(): error rates by hour from empirical audit
- is_off_hours(): 18:00-05:59 = off-hours
- RoutingDecision: model, provider, reason, hour, error_rate
- get_routing_report(): 24h forecast of routing decisions

Config via env vars:
- CRON_STRONG_MODEL (default: xiaomi/mimo-v2-pro)
- CRON_CHEAP_MODEL (default: qwen2.5:7b)
- CRON_ERROR_THRESHOLD (default: 6.0%)

Tests: tests/test_time_aware_routing.py (9 tests)

Closes #889
2026-04-17 01:15:09 -04:00
b5ba272efe feat: profile session isolation
Closes #891

Tags sessions with originating profile and provides filtered
access so profiles cannot see each other's data.
2026-04-17 05:13:01 +00:00
e3436e36c3 feat: Add context budget tracker for overflow prevention (#838) 2026-04-17 05:06:08 +00:00
dbabe0e6ae feat: 988 Suicide & Crisis Lifeline integration (#673)
agent/crisis_resources.py provides all 988 Lifeline contact
methods: phone (988), text (HOME to 988), chat, Spanish line.
Also Crisis Text Line (741741) and 911.

Closes #673
2026-04-17 05:04:48 +00:00
Hermes Merge Bot
dff451081d Merge PR #856 2026-04-16 02:05:42 -04:00
Hermes Merge Bot
05086e58ea Merge PR #871 2026-04-16 02:00:55 -04:00
Alexander Whitestone
e63cdaf16f feat: self-modifying agent that improves its own prompts (#813)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been cancelled
Contributor Attribution Check / check-attribution (pull_request) Has been cancelled
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Tests / e2e (pull_request) Has been cancelled
Resolves #813. Agent analyzes session transcripts for failure
patterns and generates prompt patches to prevent future failures.

agent/self_modify.py (PromptLearner class):
- analyze_session(): detects 5 failure types from transcripts:
  retry_loop, timeout, hallucination, context_loss, tool_failure
- generate_patches(): converts patterns to prompt patches with
  confidence scoring (frequency-based)
- apply_patches(): appends learned rules to system prompt with
  backup and rollback support
- learn_from_session(): full cycle analyze → patch → apply

Failures → patterns → patches → improved prompts → fewer failures.

Safety: patches only ADD rules (append-only), never remove.
Rollback:  restores from timestamped backup.
2026-04-16 01:23:48 -04:00
a474eb8459 fix: add agent/agent_card.py for agent card discovery 2026-04-16 03:45:01 +00:00
Alexander Whitestone
13ef670c05 feat: session compaction with fact extraction (#748)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 29s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 33s
Tests / e2e (pull_request) Successful in 3m26s
Tests / test (pull_request) Failing after 1h28m50s
Before compressing conversation context, extract durable facts
(user preferences, corrections, project details) and save to
fact store so they survive compression.

New agent/session_compactor.py:
- extract_facts_from_messages(): scans user messages for
  preferences, corrections, project/infra facts using regex
- 3 pattern categories: user_pref (5 patterns), correction
  (3 patterns), project (4 patterns)
- ExtractedFact: category, entity, content, confidence, source_turn
- save_facts_to_store(): saves to fact store (callback or auto-detect)
- extract_and_save_facts(): one-call extraction + persistence
- Deduplication by category+content
- Skips tool results, short messages, system messages
- format_facts_summary(): human-readable summary

Tests: tests/test_session_compactor.py (9 tests)

Closes #748
2026-04-15 22:41:54 -04:00
435d790201 feat: add agent/privacy_filter.py from PR #397 2026-04-16 01:35:14 +00:00
dfe23f66b1 feat: add ToolOrchestrator with circuit breaker 2026-04-15 15:49:00 +00:00
Teknium
722331a57d fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285)
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.

Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text

All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).

Reported by: Sandeep Narahari (PrithviDevs)
2026-04-15 04:57:55 -07:00
Teknium
772cfb6c4e fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix. The generated ExecStart pointed
   to a Python binary without site-packages.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models sometimes return empty after tool calls. The agent
   silently abandoned the remaining work.

   Fix: append assistant('(empty)') + user nudge message and retry
   once. Resets after each successful tool round.

4. Compression model fallback on permanent errors (#8620 sub-issue 4)
   When the default summary model (gemini-3-flash) returns 503
   'model_not_found' on custom proxies, the compressor entered a
   600s cooldown, leaving context growing unbounded.

   Fix: detect permanent model-not-found errors (503, 404,
   'model_not_found', 'no available channel') and fall back to
   using the main model for compression instead of entering
   cooldown. One-time fallback with immediate retry.

Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
kshitijk4poor
9855190f23 feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:25 -07:00
Julien Talbot
3b50821555 feat(xai): add xAI/Grok to provider prefix stripping
Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that
colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly
for context length lookups.

Cherry-picked from PR #9184 by @Julientalbot.
2026-04-14 16:43:42 -07:00
Teknium
6448e1da23 feat(zai): add GLM-5V-Turbo support for coding plan (#9907)
- Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists
- Add glm-5v context length entry (200K tokens) to model metadata
- Update Z.AI endpoint probe to try multiple candidate models per
  endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for
  newer coding plan accounts that lack older models
- Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks
  (vision_analyze, browser screenshots) route through 5v

Fixes #9888
2026-04-14 16:26:01 -07:00
Teknium
a37a095980 fix: detect qwen-oauth provider via CLI tokens in /model picker
Seed qwen-oauth credentials from resolve_qwen_runtime_credentials() in
_seed_from_singletons(). Users who authenticate via 'qwen auth qwen-oauth'
store tokens in ~/.qwen/oauth_creds.json which the runtime resolver reads
but the credential pool couldn't detect — same gap pattern as copilot.

Uses refresh_if_expiring=False to avoid network calls during discovery.
2026-04-14 11:16:26 -07:00
Marvae
0bd3f521ae fix: detect copilot provider via gh auth token in /model picker
Seed copilot credentials from resolve_copilot_token() in the credential
pool's _seed_from_singletons(), alongside the existing anthropic and
openai-codex seeding logic. This makes copilot appear in the /model
provider picker when the user authenticates solely through gh auth token.

Cherry-picked from PR #9767 by Marvae.
2026-04-14 11:16:26 -07:00
N0nb0at
b21b3bfd68 feat(plugins): namespaced skill registration for plugin skill bundles
Add ctx.register_skill() API so plugins can ship SKILL.md files under
a 'plugin:skill' namespace, preventing name collisions with built-in
Hermes skills. skill_view() detects the ':' separator and routes to
the plugin registry while bare names continue through the existing
flat-tree scan unchanged.

Key additions:
- agent/skill_utils: parse_qualified_name(), is_valid_namespace()
- hermes_cli/plugins: PluginContext.register_skill(), PluginManager
  skill registry (find/list/remove)
- tools/skills_tool: qualified name dispatch in skill_view(),
  _serve_plugin_skill() with full guards (disabled, platform,
  injection scan), bundle context banner with sibling listing,
  stale registry self-heal
- Hoisted _INJECTION_PATTERNS to module level (dedup)
- Updated skill_view schema description

Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim
(P2) for a simpler first merge.

Closes #8422
2026-04-14 10:42:58 -07:00
walli
884cd920d4 feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions
- Rename platform from 'qq' to 'qqbot' across all integration points
  (Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
  (prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
  functions that were lost during the original merge
2026-04-14 00:11:49 -07:00
Kenny Xie
cdd44817f2 fix(anthropic): send fast mode speed via extra_body 2026-04-13 22:32:39 -07:00
Teknium
943c01536f feat: add openrouter/elephant-alpha to curated model lists (#9378)
* Add hermes debug share instructions to all issue templates

- bug_report.yml: Add required Debug Report section with hermes debug share
  and /debug instructions, make OS/Python/Hermes version optional (covered
  by debug report), demote old logs field to optional supplementary
- setup_help.yml: Replace hermes doctor reference with hermes debug share,
  add Debug Report section with fallback chain (debug share -> --local -> doctor)
- feature_request.yml: Add optional Debug Report section for environment context

All templates now guide users to run hermes debug share (or /debug in chat)
and paste the resulting paste.rs links, giving maintainers system info,
config, and recent logs in one step.

* feat: add openrouter/elephant-alpha to curated model lists

- Add to OPENROUTER_MODELS (free, positioned above GPT models)
- Add to _PROVIDER_MODELS["nous"] mirror list
- Add 256K context window fallback in model_metadata.py
2026-04-13 21:16:14 -07:00
Teknium
d15efc9c1b fix: correct GPT-5 family context lengths in fallback defaults (#9309)
The generic 'gpt-5' fallback was set to 128,000 — which is the max
OUTPUT tokens, not the context window. GPT-5 base and most variants
(codex, mini) have 400,000 context. This caused /model to report
128k for models like gpt-5.3-codex when models.dev was unavailable.

Added specific entries for GPT-5 variants with different context sizes:
- gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M)
- gpt-5.4-mini, gpt-5.4-nano: 400,000
- gpt-5.3-codex-spark: 128,000 (reduced)
- gpt-5.1-chat: 128,000 (chat variant)
- gpt-5 (catch-all): 400,000

Sources: https://developers.openai.com/api/docs/models
2026-04-13 19:22:23 -07:00
Teknium
f324222b79 fix: add vLLM/local server error patterns + MCP initial connection retry (#9281)
Port two improvements inspired by Kilo-Org/kilocode analysis:

1. Error classifier: add context overflow patterns for vLLM, Ollama,
   and llama.cpp/llama-server. These local inference servers return
   different error formats than cloud providers (e.g., 'exceeds the
   max_model_len', 'context length exceeded', 'slot context'). Without
   these patterns, context overflow errors from local servers are
   misclassified as format errors, causing infinite retries instead
   of triggering compression.

2. MCP initial connection retry: previously, if the very first
   connection attempt to an MCP server failed (e.g., transient DNS
   blip at startup), the server was permanently marked as failed with
   no retry. Post-connect reconnection had 5 retries with exponential
   backoff, but initial connection had zero. Now initial connections
   retry up to 3 times with backoff before giving up, matching the
   resilience of post-connect reconnection.
   (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3)

Tests: 6 new error classifier tests, 4 new MCP retry tests, 1
updated existing test. All 276 affected tests pass.
2026-04-13 18:46:14 -07:00
arthurbr11
0a4cf5b3e1 feat(providers): add Arcee AI as direct API provider
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with
Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini.

Standard OpenAI-compatible provider checklist: auth.py, config.py,
models.py, main.py, providers.py, doctor.py, model_normalize.py,
model_metadata.py, setup.py, trajectory_compressor.py.

Based on PR #9274 by arthurbr11, simplified to a standard direct
provider without dual-endpoint OpenRouter routing.
2026-04-13 18:40:06 -07:00
Teknium
8d023e43ed refactor: remove dead code — 1,784 lines across 77 files (#9180)
Deep scan with vulture, pyflakes, and manual cross-referencing identified:
- 41 dead functions/methods (zero callers in production)
- 7 production-dead functions (only test callers, tests deleted)
- 5 dead constants/variables
- ~35 unused imports across agent/, hermes_cli/, tools/, gateway/

Categories of dead code removed:
- Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection,
  rebuild_lookups, clear_session_context, get_logs_dir, clear_session
- Unused API surface: search_models_dev, get_pricing, skills_categories,
  get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list
- Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob
- Stale debug helpers: get_debug_session_info copies in 4 tool files
  (centralized version in debug_helpers.py already exists)
- Dead gateway methods: send_emote, send_notice (matrix), send_reaction
  (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history
  (matrix), _start_typing_indicator (signal), parse_feishu_post_content
- Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION,
  FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR
- Unused UI code: _interactive_provider_selection,
  _interactive_model_selection (superseded by prompt_toolkit picker)

Test suite verified: 609 tests covering affected files all pass.
Tests for removed functions deleted. Tests using removed utilities
(clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.
2026-04-13 16:32:04 -07:00
Teknium
b27eaaa4db fix: improve ACP type check and restore comment accuracy
- Use isinstance() with try/except import for CopilotACPClient check
  in _to_async_client instead of fragile __class__.__name__ string check
- Restore accurate comment: GPT-5.x models *require* (not 'often require')
  the Responses API on OpenAI/OpenRouter; ACP is the exception, not a
  softening of the requirement
- Add inline comment explaining the ACP exclusion rationale
2026-04-13 16:17:43 -07:00
helix4u
8680f61f8b fix(copilot-acp): keep acp runtime off responses path 2026-04-13 16:17:43 -07:00
Teknium
0e60a9dc25 fix: add kimi-coding-cn to remaining provider touchpoints
Follow-up for salvaged PR #7637. Adds kimi-coding-cn to:
- model_normalize.py (prefix strip)
- providers.py (models.dev mapping)
- runtime_provider.py (credential resolution)
- setup.py (model list + setup label)
- doctor.py (health check)
- trajectory_compressor.py (URL detection)
- models_dev.py (registry mapping)
- integrations/providers.md (docs)
2026-04-13 11:20:37 -07:00
hcshen0111
2b3aa36242 feat(providers): add kimi-coding-cn provider for mainland China users
Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.
2026-04-13 11:20:37 -07:00
墨綠BG
c449cd1af5 fix(config): restore custom providers after v11→v12 migration
The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.

Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:

- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup

Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.

Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes #8776, salvaged from PR #8814
2026-04-13 10:50:52 -07:00
luyao618
8ec1608642 fix(agent): propagate api_mode to vision provider resolution
resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 05:02:54 -07:00
Teknium
e3ffe5b75f fix: remove legacy compression.summary_* config and env var fallbacks (#8992)
Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.

What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
  DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
  from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
  functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
  auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths

Compression model/provider is now configured exclusively via:
  auxiliary.compression.provider / auxiliary.compression.model

Closes #8923
2026-04-13 04:59:26 -07:00
Richard Li
82901695ff feat(wecom): add platform hint for native media sending 2026-04-13 04:46:04 -07:00
ismell0992-afk
3e99964789 fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max
_query_local_context_length was checking model_info.context_length
(the GGUF training max) before num_ctx (the Modelfile runtime override),
inverse to query_ollama_num_ctx. The two helpers therefore disagreed on
the same model:

  hermes-brain:qwen3-14b-ctx32k     # Modelfile: num_ctx 32768
  underlying qwen3:14b GGUF         # qwen3.context_length: 40960

query_ollama_num_ctx correctly returned 32768 (the value Ollama will
actually allocate KV cache for). _query_local_context_length returned
40960, which let ContextCompressor grow conversations past 32768 before
triggering compression — at which point Ollama silently truncated the
prefix, corrupting context.

Swap the order so num_ctx is checked first, matching query_ollama_num_ctx.
Adds a parametrized test that seeds both values and asserts num_ctx wins.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 04:24:07 -07:00
Teknium
400fe9b2a1 fix: add <thought> stripping to auxiliary_client + tests
auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.
2026-04-12 12:44:49 -07:00