Some checks failed
Smoke Test / smoke (pull_request) Failing after 8s
Five lightweight guardrails for LLM agent systems: 1. JSON repair for tool arguments (1400+ failures eliminated) 2. Tool hallucination detection 3. Return type validation 4. Path injection prevention 5. Context overflow prevention 44 lines of code, 455us overhead, zero quality degradation. Draft: main.tex (NeurIPS format) + references.bib
1.1 KiB
1.1 KiB
Paper A: Poka-Yoke for AI Agents
One-Sentence Contribution
We introduce five failure-proofing guardrails for LLM-based agent systems that eliminate common runtime errors with zero quality degradation and negligible overhead.
The What
Five concrete guardrails, each under 20 lines of code, preventing entire categories of agent failures.
The Why
- 1,400+ JSON parse failures in production agent logs
- Tool hallucination wastes API budget on non-existent tools
- Silent failures degrade quality without detection
The So What
As AI agents deploy in production (crisis intervention, code generation, fleet ops), reliability is not optional. Small testable guardrails outperform complex monitoring.
Target Venue
NeurIPS 2025 Workshop on Reliable Foundation Models or ICML 2026
Guardrails
- json-repair: Fix malformed tool call arguments (1400+ failures eliminated)
- Tool hallucination detection: Block calls to non-existent tools
- Type validation: Ensure tool return types are serializable
- Path injection prevention: Block writes outside workspace
- Context overflow prevention: Mandatory compression triggers