docs: update 27B test finding — prompt overload, not model limit

Correction from #653: 27B includes tests when prompt is concise. 'Include type hints and one unit test.' → tests included. 'Include type hints, docstring, and one unit test.' → tests omitted. Issue is prompt overload, not model limitation. Closes #653
docs: Big Brain 27B test omission workaround
2026-04-13 22:32:18 -04:00 · 2026-04-13 22:28:28 -04:00
1 changed files with 53 additions and 0 deletions
--- a/docs/big-brain-27b-test-omission.md
+++ b/docs/big-brain-27b-test-omission.md
@@ -0,0 +1,53 @@
+# Big Brain 27B — Test Omission Pattern
+
+## Finding (2026-04-14)
+
+The 27B model (gemma4) consistently omits unit tests when asked to include them
+in the same prompt as implementation code. The model produces complete, high-quality
+implementation but stops before the test class/function.
+
+**Affected models:** 1B, 7B, 27B (27B most notable because implementation is best)
+
+**Root cause:** Models treat tests as optional even when explicitly required in prompt.
+
+## Workaround
+
+Split the prompt into two phases:
+
+### Phase 1: Implementation
+```
+Write a webhook parser with @dataclass, verify_signature(), parse_webhook().
+Include type hints and docstrings.
+```
+
+### Phase 2: Tests (separate prompt)
+```
+Write a unit test for the webhook parser above. Cover:
+- Valid signature verification
+- Invalid signature rejection
+- Malformed payload handling
+```
+
+## Prompt Engineering Notes
+
+- Do NOT combine "implement X" and "include unit test" in a single prompt
+- The model excels at implementation when focused
+- Test generation works better as a follow-up on the existing code
+- For critical code, always verify test presence manually
+
+## Impact
+
+Low — workaround is simple (split prompt). No data loss or corruption risk.
+
+## Source
+
+Benchmark runs documented in timmy-home #576.
+
+## Update (2026-04-14)
+
+**Correction:** 27B DOES include tests when the prompt is concise.
+- "Include type hints and one unit test." → tests included
+- "Include type hints, docstring, and one unit test." → tests omitted
+
+The issue is **prompt overload**, not model limitation. Use short, focused
+test requirements. See #653.