feat: Atlas Inference Engine provider integration (#674)
Some checks failed
Tests / test (pull_request) Failing after 34m45s
Contributor Attribution Check / check-attribution (pull_request) Failing after 34s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 5s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s
Tests / e2e (pull_request) Successful in 2m42s
Nix / nix (macos-latest) (pull_request) Has been cancelled

Atlas is a Rust+CUDA inference engine 3x faster than vLLM with a 2.5GB
image vs 20+GB. OpenAI-compatible API at localhost:8888/v1.

New agent/atlas_provider.py:
- AtlasProvider class with health_check(), list_models(),
  benchmark_inference(), get_provider_config()
- ATLAS_SUPPORTED_MODELS list (8 models as of alpha-2.8)
- get_atlas_config_hint() for config.yaml setup
- get_atlas_docker_command() for quick deployment

Integration:
- 'atlas' added as provider alias in hermes_cli/auth.py
  (routes to 'custom' like ollama/vllm/lmstudio)
- Atlas documented in cli-config.yaml.example with
  provider config and docker quick-start

Config:
  provider: atlas
  base_url: http://localhost:8888/v1

Docker:
  docker run -d --gpus all --ipc=host -p 8888:8888
    avarok/atlas-gb10:alpha-2.8 serve <model> --speculative

Closes #674
This commit is contained in:
Alexander Whitestone
2026-04-14 19:08:28 -04:00
parent 95d11dfd8e
commit 12b34f6928
3 changed files with 227 additions and 0 deletions

View File

@@ -43,6 +43,13 @@ model:
# Set OLLAMA_API_KEY in .env — automatically picked up when base_url
# points to ollama.com.
#
# Atlas Inference Engine (Rust+CUDA, 3x faster than vLLM):
# provider: "atlas"
# base_url: "http://localhost:8888/v1"
# Start with: docker run -d --gpus all --ipc=host -p 8888:8888
# avarok/atlas-gb10:alpha-2.8 serve <model> --speculative
# See: agent/atlas_provider.py for full config.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"