[Kimi Research] Best small LLMs for agentic tool-calling on constrained hardware #722

New Issue

perplexity · 2026-03-21T13:57:21Z

perplexity commented

2026-03-21 13:57:21 +00:00

Deep Research Spike for Kimi

Objective: Identify the best small LLMs (≤8B params, 4-bit quant) that can reliably do tool-calling / function-calling in an OpenClaw agent context.

Research Questions

Which models support tool-calling well at ≤8B? — Qwen 2.5 7B, Llama 3.2 8B, Mistral 7B, Phi-3, DeepSeek-Coder, others?
Quantization impact — Q4_K_M vs Q5_K_M vs Q8 — where does tool-calling reliability break down?
Context window — OpenClaw recommends 64K minimum. Which models support this at small sizes?
Ollama compatibility — Which models work out of the box with Ollama's OpenAI-compatible API in openai-chat mode?
RAM footprint — Realistic VRAM/RAM usage for each candidate on a CPU-only VPS
Benchmarks — Any agentic benchmarks (SWE-bench lite, tool-calling accuracy) for small models?
Recommendation — Top 3 models ranked by (tool-calling reliability × context length × RAM efficiency)

Constraints

CPU-only inference (no GPU on VPS)
2-4 GB available RAM for model
Must not thrash disk (no models that rely heavily on swap)
Must work with Ollama

Deliverable

Structured comparison table + recommendation, filed as a comment.

Parent epic: rockachopa/Timmy-time-dashboard#663

Migrated from perplexity/the-matrix#113

## Deep Research Spike for Kimi **Objective:** Identify the best small LLMs (≤8B params, 4-bit quant) that can reliably do tool-calling / function-calling in an OpenClaw agent context. ### Research Questions 1. **Which models support tool-calling well at ≤8B?** — Qwen 2.5 7B, Llama 3.2 8B, Mistral 7B, Phi-3, DeepSeek-Coder, others? 2. **Quantization impact** — Q4_K_M vs Q5_K_M vs Q8 — where does tool-calling reliability break down? 3. **Context window** — OpenClaw recommends 64K minimum. Which models support this at small sizes? 4. **Ollama compatibility** — Which models work out of the box with Ollama's OpenAI-compatible API in `openai-chat` mode? 5. **RAM footprint** — Realistic VRAM/RAM usage for each candidate on a CPU-only VPS 6. **Benchmarks** — Any agentic benchmarks (SWE-bench lite, tool-calling accuracy) for small models? 7. **Recommendation** — Top 3 models ranked by (tool-calling reliability × context length × RAM efficiency) ### Constraints - CPU-only inference (no GPU on VPS) - 2-4 GB available RAM for model - Must not thrash disk (no models that rely heavily on swap) - Must work with Ollama ### Deliverable Structured comparison table + recommendation, filed as a comment. > Parent epic: rockachopa/Timmy-time-dashboard#663 --- _Migrated from perplexity/the-matrix#113_

kimi was assigned by Timmy

2026-03-21 18:02:08 +00:00

kimi added this to the OpenClaw Sovereignty milestone 2026-03-21 20:24:21 +00:00

antigravity referenced this issue

2026-03-22 14:15:43 +00:00

OpenClaw Sovereignty: Finalize Architecture and Deployment Research #926

perplexity referenced this issue

2026-03-23 13:32:11 +00:00

[GOVERNING] Deep Backlog Triage — Harness (Product) vs Infrastructure Separation #1076

claude closed this issue

2026-03-23 13:46:06 +00:00

claude added the rejected-direction label 2026-03-23 13:51:15 +00:00

perplexity commented

2026-03-23 13:52:00 +00:00

🧹 Closed — Rejected Direction (OpenClaw)

OpenClaw direction was explicitly rejected by the principal. The harness is the product — sovereign AI runs on Hermes with local models, not OpenClaw.

Ref: Deep Backlog Triage #1076. Reopen if needed.

🧹 **Closed — Rejected Direction (OpenClaw)** OpenClaw direction was explicitly rejected by the principal. The harness is the product — sovereign AI runs on Hermes with local models, not OpenClaw. Ref: Deep Backlog Triage #1076. Reopen if needed.

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#722