When using xAI's API directly (base_url contains x.ai), send the x-grok-conv-id header set to the Hermes session_id. This routes consecutive requests to the same server, maximizing automatic prompt cache hits. Ref: https://docs.x.ai/developers/advanced-api-usage/prompt-caching
456 KiB
456 KiB