feat: Atlas Inference Engine provider integration (#674)

Atlas is a Rust+CUDA inference engine 3x faster than vLLM with a 2.5GB image vs 20+GB. OpenAI-compatible API at localhost:8888/v1. New agent/atlas_provider.py: - AtlasProvider class with health_check(), list_models(), benchmark_inference(), get_provider_config() - ATLAS_SUPPORTED_MODELS list (8 models as of alpha-2.8) - get_atlas_config_hint() for config.yaml setup - get_atlas_docker_command() for quick deployment Integration: - 'atlas' added as provider alias in hermes_cli/auth.py (routes to 'custom' like ollama/vllm/lmstudio) - Atlas documented in cli-config.yaml.example with provider config and docker quick-start Config: provider: atlas base_url: http://localhost:8888/v1 Docker: docker run -d --gpus all --ipc=host -p 8888:8888 avarok/atlas-gb10:alpha-2.8 serve <model> --speculative Closes #674
2026-04-14 19:08:28 -04:00
parent 95d11dfd8e
commit 12b34f6928
3 changed files with 227 additions and 0 deletions
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -43,6 +43,13 @@ model:
  #   Set OLLAMA_API_KEY in .env — automatically picked up when base_url
  #   points to ollama.com.
  #
+  #   Atlas Inference Engine (Rust+CUDA, 3x faster than vLLM):
+  #     provider: "atlas"
+  #     base_url: "http://localhost:8888/v1"
+  #   Start with: docker run -d --gpus all --ipc=host -p 8888:8888
+  #     avarok/atlas-gb10:alpha-2.8 serve <model> --speculative
+  #   See: agent/atlas_provider.py for full config.
+  #
  # Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
  provider: "auto"