feat: add OLLAMA_NUM_CTX config to cap context window (#83)

- Add ollama_num_ctx setting (default 4096) to config.py - Pass num_ctx option to Ollama in agent.py and agents/base.py - Add OLLAMA_NUM_CTX to .env.example with usage docs - Add context_window note in providers.yaml - Fix mock_settings in test_agent.py for new attribute - qwen3:30b with 4096 ctx uses ~19GB vs 45GB default
2026-03-14 18:54:43 -04:00
parent b01c1cb582
commit 9c59b386d8
6 changed files with 21 additions and 2 deletions
--- a/.env.example
+++ b/.env.example
@@ -17,6 +17,11 @@
 # LLM model to use via Ollama (default: qwen3.5:latest)
 # OLLAMA_MODEL=qwen3.5:latest

+# Ollama context window size (default: 4096 tokens)
+# Set higher for more context, lower to save RAM. 0 = model default.
+# qwen3:30b + 4096 ctx ≈ 19GB VRAM; default ctx ≈ 45GB.
+# OLLAMA_NUM_CTX=4096
+
 # Enable FastAPI interactive docs at /docs and /redoc (default: false)
 # DEBUG=true