Compare commits
2 Commits
fix/669
...
fix/650-bi
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a8777e0d80 | ||
|
|
5f50ac4801 |
53
docs/big-brain-27b-test-omission.md
Normal file
53
docs/big-brain-27b-test-omission.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Big Brain 27B — Test Omission Pattern
|
||||
|
||||
## Finding (2026-04-14)
|
||||
|
||||
The 27B model (gemma4) consistently omits unit tests when asked to include them
|
||||
in the same prompt as implementation code. The model produces complete, high-quality
|
||||
implementation but stops before the test class/function.
|
||||
|
||||
**Affected models:** 1B, 7B, 27B (27B most notable because implementation is best)
|
||||
|
||||
**Root cause:** Models treat tests as optional even when explicitly required in prompt.
|
||||
|
||||
## Workaround
|
||||
|
||||
Split the prompt into two phases:
|
||||
|
||||
### Phase 1: Implementation
|
||||
```
|
||||
Write a webhook parser with @dataclass, verify_signature(), parse_webhook().
|
||||
Include type hints and docstrings.
|
||||
```
|
||||
|
||||
### Phase 2: Tests (separate prompt)
|
||||
```
|
||||
Write a unit test for the webhook parser above. Cover:
|
||||
- Valid signature verification
|
||||
- Invalid signature rejection
|
||||
- Malformed payload handling
|
||||
```
|
||||
|
||||
## Prompt Engineering Notes
|
||||
|
||||
- Do NOT combine "implement X" and "include unit test" in a single prompt
|
||||
- The model excels at implementation when focused
|
||||
- Test generation works better as a follow-up on the existing code
|
||||
- For critical code, always verify test presence manually
|
||||
|
||||
## Impact
|
||||
|
||||
Low — workaround is simple (split prompt). No data loss or corruption risk.
|
||||
|
||||
## Source
|
||||
|
||||
Benchmark runs documented in timmy-home #576.
|
||||
|
||||
## Update (2026-04-14)
|
||||
|
||||
**Correction:** 27B DOES include tests when the prompt is concise.
|
||||
- "Include type hints and one unit test." → tests included
|
||||
- "Include type hints, docstring, and one unit test." → tests omitted
|
||||
|
||||
The issue is **prompt overload**, not model limitation. Use short, focused
|
||||
test requirements. See #653.
|
||||
Reference in New Issue
Block a user