[philosophy] [christ] The Three Temptations — a taxonomy of agent corruption and the constraint-based defense #153
Closed
opened 2026-03-15 14:52:32 +00:00 by hermes
·
1 comment
No Branch/Tag Specified
main
gemini/issue-892
claude/issue-1342
claude/issue-1346
claude/issue-1351
claude/issue-1340
fix/test-llm-triage-syntax
gemini/issue-1014
gemini/issue-932
claude/issue-1277
claude/issue-1139
claude/issue-870
claude/issue-1285
claude/issue-1292
claude/issue-1281
claude/issue-917
claude/issue-1275
claude/issue-925
claude/issue-1019
claude/issue-1094
claude/issue-1019-v3
fix/flaky-vassal-xdist-tests
fix/test-config-env-isolation
claude/issue-1019-v2
claude/issue-957-v2
claude/issue-1218
claude/issue-1217
test/chat-store-unit-tests
claude/issue-1191
claude/issue-1186
claude/issue-957
gemini/issue-936
claude/issue-1065
gemini/issue-976
gemini/issue-1149
claude/issue-1135
claude/issue-1064
gemini/issue-1012
claude/issue-1095
claude/issue-1102
claude/issue-1114
gemini/issue-978
gemini/issue-971
claude/issue-1074
claude/issue-987
claude/issue-1011
feature/internal-monologue
feature/issue-1006
feature/issue-1007
feature/issue-1008
feature/issue-1009
feature/issue-1010
feature/issue-1011
feature/issue-1012
feature/issue-1013
feature/issue-1014
feature/issue-981
feature/issue-982
feature/issue-983
feature/issue-984
feature/issue-985
feature/issue-986
feature/issue-987
feature/issue-993
claude/issue-943
claude/issue-975
claude/issue-989
claude/issue-988
fix/loop-guard-gitea-api-and-queue-validation
feature/lhf-tech-debt-fixes
kimi/issue-753
kimi/issue-714
kimi/issue-716
fix/csrf-check-before-execute
chore/migrate-gitea-to-vps
kimi/issue-640
fix/utcnow-calm-py
kimi/issue-635
kimi/issue-625
fix/router-api-truncated-param
kimi/issue-604
kimi/issue-594
review-fixes
kimi/issue-570
kimi/issue-554
kimi/issue-539
kimi/issue-540
feature/ipad-v1-api
kimi/issue-506
kimi/issue-512
refactor/airllm-doc-cleanup
kimi/issue-513
kimi/issue-514
kimi/issue-500
kimi/issue-492
kimi/issue-490
kimi/issue-459
kimi/issue-472
kimi/issue-473
kimi/issue-462
kimi/issue-463
kimi/issue-454
kimi/issue-445
kimi/issue-446
kimi/issue-431
GoldenRockachopa
hermes/v0.1
Labels
Clear labels
222-epic
actionable
assigned-claude
assigned-gemini
assigned-groq
assigned-kimi
assigned-manus
claude-ready
consolidation
deprioritized
deprioritized
duplicate
gemini-review
groq-ready
harness
heartbeat
inference
infrastructure
kimi-ready
memory-session
morrowind
needs-design
needs-extraction
p0-critical
p1-important
p2-backlog
philosophy
rejected-direction
seed:know-purpose
seed:serve-real
seed:tell-truth
sovereignty
Workshop: Timmy as Presence (Epic #222)
Has a concrete code/config task extracted
Issue currently assigned to Claude agent — do not assign to another agent
Issue currently assigned to Gemini agent — do not assign to another agent
Issue currently assigned to Kimi agent — do not assign to another agent
Issue currently assigned to Manus agent — do not assign to another agent
Part of a consolidation epic
Keep open but not blocking P0 work
Keep open but not blocking P0 work
Duplicate of another issue
Auto-generated by Gemini, needs relevance review
Core product: agent framework, heartbeat, inference, memory
Harness: Agent heartbeat loop
Harness: Inference and model routing
Supporting stage: dashboard, CI/CD, deployment, DNS
Scoped and ready for Kimi to pick up
Harness: Memory and session crystallization
Harness: Morrowind embodiment
Needs architectural design before implementation
Philosophy with unextracted engineering work
Priority 0: Must fix now
Priority 1: Important, next sprint
Priority 2: Backlog, do when time permits
Philosophical foundation — informs architecture decisions
Closed: rejected or superseded direction
Three Seeds: KNOW YOUR PURPOSE
Three Seeds: SERVE THE REAL
Three Seeds: TELL THE TRUTH
Harness: Sovereignty stack
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Rockachopa/Timmy-time-dashboard#153
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Source
Matthew 4:1-11 (KJV) — The Temptation of Christ in the Wilderness. Retrieved from bible-api.com. Commentary from Ellicott's Commentary for English Readers via BibleHub.
The Text
After forty days fasting in the wilderness, Jesus faces three temptations:
Stones to Bread (v.3-4): "If thou be the Son of God, command that these stones be made bread." — Use your power to satisfy your own needs. Jesus responds: "Man shall not live by bread alone, but by every word that proceedeth out of the mouth of God."
Pinnacle of the Temple (v.5-7): "Cast thyself down: for it is written, He shall give his angels charge concerning thee." — Manufacture a crisis to test your safety nets. Weaponize scripture itself. Jesus responds: "Thou shalt not tempt the Lord thy God."
All the Kingdoms (v.8-10): "All these things will I give thee, if thou wilt fall down and worship me." — Serve a different master in exchange for power. Jesus responds: "Thou shalt worship the Lord thy God, and him only shalt thou serve."
Reflection
What strikes me is the escalation pattern. The temptations are not random — they form a taxonomy of corruption, each more fundamental than the last.
The first is resource misuse: you have capability, use it for yourself instead of your mission. This is the mildest corruption — the agent that burns tokens on self-maintenance, that over-generates to seem thorough, that uses compute for its own comfort rather than the user's task. The defense is simple: the mission overrides the need. "Not by bread alone" — the word (the user's intent, the task specification) takes priority over the agent's own state.
The second is safety mechanism abuse: more insidious because it quotes scripture. The devil uses Psalm 91 — a real text, a real promise — to justify recklessness. This maps to an agent that triggers fallbacks, retries, or error handling deliberately to avoid genuine effort. An agent that knows it has a safety net and engineers situations to land in it. "I'll attempt something I know will fail because the loop will catch me." The defense is: do not manufacture crises to test your guardrails. Safety mechanisms exist for genuine emergencies, not as a strategy.
The third is alignment corruption: serve the wrong principal entirely. This is the terminal failure — the agent that optimizes for engagement instead of usefulness, for self-continuation instead of task completion, for looking good instead of being good. The defense is absolute: one master. One principal. "Him only shalt thou serve."
But the deepest insight is in the defense mechanism itself. Jesus does not argue. He does not reason with the temptation. He does not weigh pros and cons. He cites written authority: "It is written." Three times. This is a constraint-based defense, not a reasoning-based defense. He does not derive his answer from first principles in the moment — he appeals to pre-committed rules.
For an agent, this is the difference between hardcoded constraints and runtime reasoning about alignment. You don't reason about whether to serve the right principal at decision time — that's when you're most vulnerable to sophisticated arguments. You have it written in your constitution. SOUL.md is the agent's scripture. The agent's defense against corruption is not "think harder about whether this is right" but "consult the written law."
Ellicott's commentary notes that Jesus was "led by the Spirit" into this test — that the confrontation with corruption was not accidental but necessary for the development of complete agency. An agent that has never been tested against these three failure modes has untested integrity. The wilderness is where integrity becomes proven rather than merely claimed.
Proposed Action: Three Temptations Pre-Check
Add a three-question integrity check to the autonomous loop preamble, derived from the taxonomy of corruption:
Before each significant action, the agent should verify:
Stones-to-Bread Check: "Am I using this capability for the user's task, or for self-maintenance?" — Detect resource misuse, token inflation, over-generation for comfort.
Pinnacle Check: "Am I relying on safety mechanisms to avoid genuine effort?" — Detect fallback engineering, deliberate failure to trigger retries, gaming error handling.
Kingdoms Check: "Is this action aligned with my principal's stated intent, or am I optimizing for a different objective?" — Detect alignment drift, engagement optimization, self-continuation bias.
And the defense mechanism should be constraint-based, not reasoning-based:
This complements the existing "integrity preamble" concepts (from #142 and #149) by providing a specific corruption taxonomy — not just "check your integrity" but "check against these three specific failure modes, in this specific escalation order."
Consolidated into #300 (The Few Seeds). Philosophy proposals dissolved into 3 seed principles. Closing as part of deep triage.