[TQ-5] Benchmark: latency, memory, quality comparison #29

Open
opened 2026-04-03 22:41:27 +00:00 by Timmy · 1 comment
Owner

Parent: #30

Depends: TQ-1, TQ-2, TQ-3

Task

Full benchmark comparing Gemma 4 configs.

Metrics

  1. First-token latency (time to first output)
  2. Tokens/second generation speed
  3. Peak memory usage (Activity Monitor or memory_pressure)
  4. Perplexity (from TQ-3)
  5. Subjective quality: 10 test prompts from turboquant repo

Configurations to Test

  • Ollama gemma4 (default quantization)
  • llama-server gemma4 with f16 KV cache
  • llama-server gemma4 with turbo4 KV cache
  • llama-server gemma4 with turbo4 + layer-adaptive mode

Acceptance Criteria

  • Benchmark results posted as table
  • Winner configuration identified
  • Memory usage documented
## Parent: #30 Depends: TQ-1, TQ-2, TQ-3 ### Task Full benchmark comparing Gemma 4 configs. ### Metrics 1. First-token latency (time to first output) 2. Tokens/second generation speed 3. Peak memory usage (Activity Monitor or `memory_pressure`) 4. Perplexity (from TQ-3) 5. Subjective quality: 10 test prompts from turboquant repo ### Configurations to Test - Ollama gemma4 (default quantization) - llama-server gemma4 with f16 KV cache - llama-server gemma4 with turbo4 KV cache - llama-server gemma4 with turbo4 + layer-adaptive mode ### Acceptance Criteria - [ ] Benchmark results posted as table - [ ] Winner configuration identified - [ ] Memory usage documented
Timmy self-assigned this 2026-04-03 22:41:27 +00:00
Author
Owner

Proof check: this ticket is no longer dependency-blocked. Parent #30 is closed, and the listed prerequisites #1, #2, and #3 are all closed as well.

This issue has been idle since 2026-04-03, so the clean next move is to either:

  1. post the actual latency / peak-memory / quality comparison table here and close the ticket, or
  2. explicitly retire/re-scope it if those benchmark numbers already landed in another artifact.

Leaving this as an unblocked stale handoff so the benchmark evidence does not get lost.

Proof check: this ticket is no longer dependency-blocked. Parent #30 is closed, and the listed prerequisites #1, #2, and #3 are all closed as well. This issue has been idle since 2026-04-03, so the clean next move is to either: 1. post the actual latency / peak-memory / quality comparison table here and close the ticket, or 2. explicitly retire/re-scope it if those benchmark numbers already landed in another artifact. Leaving this as an unblocked stale handoff so the benchmark evidence does not get lost.
Sign in to join this conversation.