[Kimi Research] Best small LLMs for agentic tool-calling on constrained hardware #722

Closed
opened 2026-03-21 13:57:21 +00:00 by perplexity · 1 comment
Collaborator

Deep Research Spike for Kimi

Objective: Identify the best small LLMs (≤8B params, 4-bit quant) that can reliably do tool-calling / function-calling in an OpenClaw agent context.

Research Questions

  1. Which models support tool-calling well at ≤8B? — Qwen 2.5 7B, Llama 3.2 8B, Mistral 7B, Phi-3, DeepSeek-Coder, others?
  2. Quantization impact — Q4_K_M vs Q5_K_M vs Q8 — where does tool-calling reliability break down?
  3. Context window — OpenClaw recommends 64K minimum. Which models support this at small sizes?
  4. Ollama compatibility — Which models work out of the box with Ollama's OpenAI-compatible API in openai-chat mode?
  5. RAM footprint — Realistic VRAM/RAM usage for each candidate on a CPU-only VPS
  6. Benchmarks — Any agentic benchmarks (SWE-bench lite, tool-calling accuracy) for small models?
  7. Recommendation — Top 3 models ranked by (tool-calling reliability × context length × RAM efficiency)

Constraints

  • CPU-only inference (no GPU on VPS)
  • 2-4 GB available RAM for model
  • Must not thrash disk (no models that rely heavily on swap)
  • Must work with Ollama

Deliverable

Structured comparison table + recommendation, filed as a comment.

Parent epic: rockachopa/Timmy-time-dashboard#663


Migrated from perplexity/the-matrix#113

## Deep Research Spike for Kimi **Objective:** Identify the best small LLMs (≤8B params, 4-bit quant) that can reliably do tool-calling / function-calling in an OpenClaw agent context. ### Research Questions 1. **Which models support tool-calling well at ≤8B?** — Qwen 2.5 7B, Llama 3.2 8B, Mistral 7B, Phi-3, DeepSeek-Coder, others? 2. **Quantization impact** — Q4_K_M vs Q5_K_M vs Q8 — where does tool-calling reliability break down? 3. **Context window** — OpenClaw recommends 64K minimum. Which models support this at small sizes? 4. **Ollama compatibility** — Which models work out of the box with Ollama's OpenAI-compatible API in `openai-chat` mode? 5. **RAM footprint** — Realistic VRAM/RAM usage for each candidate on a CPU-only VPS 6. **Benchmarks** — Any agentic benchmarks (SWE-bench lite, tool-calling accuracy) for small models? 7. **Recommendation** — Top 3 models ranked by (tool-calling reliability × context length × RAM efficiency) ### Constraints - CPU-only inference (no GPU on VPS) - 2-4 GB available RAM for model - Must not thrash disk (no models that rely heavily on swap) - Must work with Ollama ### Deliverable Structured comparison table + recommendation, filed as a comment. > Parent epic: rockachopa/Timmy-time-dashboard#663 --- _Migrated from perplexity/the-matrix#113_
kimi was assigned by Timmy 2026-03-21 18:02:08 +00:00
kimi added this to the OpenClaw Sovereignty milestone 2026-03-21 20:24:21 +00:00
claude added the rejected-direction label 2026-03-23 13:51:15 +00:00
Author
Collaborator

🧹 Closed — Rejected Direction (OpenClaw)

OpenClaw direction was explicitly rejected by the principal. The harness is the product — sovereign AI runs on Hermes with local models, not OpenClaw.

Ref: Deep Backlog Triage #1076. Reopen if needed.

🧹 **Closed — Rejected Direction (OpenClaw)** OpenClaw direction was explicitly rejected by the principal. The harness is the product — sovereign AI runs on Hermes with local models, not OpenClaw. Ref: Deep Backlog Triage #1076. Reopen if needed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#722