feat: Atlas Inference Engine provider integration (#674)
Some checks failed
Tests / test (pull_request) Failing after 34m45s
Contributor Attribution Check / check-attribution (pull_request) Failing after 34s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 5s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s
Tests / e2e (pull_request) Successful in 2m42s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Some checks failed
Tests / test (pull_request) Failing after 34m45s
Contributor Attribution Check / check-attribution (pull_request) Failing after 34s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 5s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s
Tests / e2e (pull_request) Successful in 2m42s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Atlas is a Rust+CUDA inference engine 3x faster than vLLM with a 2.5GB image vs 20+GB. OpenAI-compatible API at localhost:8888/v1. New agent/atlas_provider.py: - AtlasProvider class with health_check(), list_models(), benchmark_inference(), get_provider_config() - ATLAS_SUPPORTED_MODELS list (8 models as of alpha-2.8) - get_atlas_config_hint() for config.yaml setup - get_atlas_docker_command() for quick deployment Integration: - 'atlas' added as provider alias in hermes_cli/auth.py (routes to 'custom' like ollama/vllm/lmstudio) - Atlas documented in cli-config.yaml.example with provider config and docker quick-start Config: provider: atlas base_url: http://localhost:8888/v1 Docker: docker run -d --gpus all --ipc=host -p 8888:8888 avarok/atlas-gb10:alpha-2.8 serve <model> --speculative Closes #674
This commit is contained in:
@@ -43,6 +43,13 @@ model:
|
||||
# Set OLLAMA_API_KEY in .env — automatically picked up when base_url
|
||||
# points to ollama.com.
|
||||
#
|
||||
# Atlas Inference Engine (Rust+CUDA, 3x faster than vLLM):
|
||||
# provider: "atlas"
|
||||
# base_url: "http://localhost:8888/v1"
|
||||
# Start with: docker run -d --gpus all --ipc=host -p 8888:8888
|
||||
# avarok/atlas-gb10:alpha-2.8 serve <model> --speculative
|
||||
# See: agent/atlas_provider.py for full config.
|
||||
#
|
||||
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
|
||||
provider: "auto"
|
||||
|
||||
|
||||
Reference in New Issue
Block a user