1
0
Commit Graph

22 Commits

Author SHA1 Message Date
92594ea588 [loop-cycle] feat: implement source distinction in system prompts (#463) (#464) 2026-03-19 14:49:31 -04:00
bcc7b068a4 [loop-cycle-66] fix: remove language-model self-reference and add anti-assistant-speak guidance (#323) (#333) 2026-03-18 20:21:03 -04:00
d48d56ecc0 [loop-cycle-38] fix: add soul identity to system prompts (#127) (#134)
Co-authored-by: hermes <hermes@timmy.local>
Co-committed-by: hermes <hermes@timmy.local>
2026-03-15 09:42:57 -04:00
ddb872d3b0 fix: enrich self-knowledge with architecture map and self-modification pathway
- Replace flat file list with layered architecture map (config→agent→prompt→tool→memory→interface)
- Add SELF-MODIFICATION section: Timmy knows he can edit his own config and code
- Remove false limitation 'cannot modify own source code'
- Update tests to match new section headers, add self-modification tests

Closes #81 (reasoning depth)
Closes #86 (self-modification awareness)

[loop-cycle-11]
2026-03-14 20:15:30 -04:00
ffae5aa7c6 feat: add codebase self-knowledge to system prompts (#78, #80)
Adds SELF-KNOWLEDGE section to both SYSTEM_PROMPT_LITE and
SYSTEM_PROMPT_FULL with:
- Codebase map (all src/timmy/ modules with descriptions)
- Current capabilities list (grounded, not generic)
- Known limitations (real gaps, not LLM platitudes)

Lite prompt gets condensed version; full prompt gets detailed.
Timmy can now answer 'what does tool_safety.py do?' and give
grounded answers about his actual limitations.

10 new tests. 1456 total passing.
2026-03-14 19:58:10 -04:00
f8f3b9b81f feat: inject session_id into system prompt for session identity awareness
Timmy can now introspect which session he's running in (cli, dashboard, loop).
- Add {session_id} placeholder to both lite and full system prompts
- get_system_prompt() accepts session_id param (default: 'unknown')
- create_timmy() accepts session_id param, forwards to prompt
- CLI chat/think/status pass their session_id to create_timmy()
- session.py passes _DEFAULT_SESSION_ID to create_timmy()
- 7 new tests in test_session_identity.py
- Updated 2 existing CLI test mocks

Closes #64
2026-03-14 19:43:11 -04:00
b3a1e0ce36 fix: prune dead web_search tool — ddgs never installed (#87)
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.

consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).

Closes #87
2026-03-14 18:13:51 -04:00
061c8f6628 fix: brevity tuning — plain text prompts, markdown=False, front-loaded brevity
Closes #71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.

Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
   responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
   numbered lists — teaching by example that markdown is expected.
   Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
   Moved to immediately after the opening line as 'VOICE AND BREVITY'
   with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
   Added 'Voice: brief, plain, direct' with concrete examples.

The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.

SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement

4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
2026-03-14 17:15:56 -04:00
64fd1d9829 voice: reinforce brevity at top of system prompt 2026-03-14 16:32:47 -04:00
Alexander Whitestone
fd0ede0d51 feat: auto-escalation system + agentic loop fixes (#149) (#149)
Wire up automatic error-to-task escalation and fix the agentic loop
stopping after the first tool call.

Auto-escalation:
- Add swarm.task_queue.models with create_task() bridge to existing
  task queue SQLite DB
- Add swarm.event_log with EventType enum, log_event(), and SQLite
  persistence + WebSocket broadcast
- Wire capture_error() into request logging middleware so unhandled
  HTTP exceptions auto-create [BUG] tasks with stack traces, git
  context, and push notifications (5-min dedup window)

Agentic loop (Round 11 Bug #1):
- Wrap agent_chat() in asyncio.to_thread() to stop blocking the
  event loop (fixes Discord heartbeat warnings)
- Enable Agno's native multi-turn tool chaining via show_tool_calls
  and tool_call_limit on the Agent config
- Strengthen multi-step continuation prompts with explicit examples

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 03:11:14 -04:00
Alexander Whitestone
7792ae745f feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup

- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
  words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
  JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
  and untrack memory/self/methodology.md (runtime-edited file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: catch Ollama connection errors in session.py + add 71 smoke tests

- Wrap agent.run() in session.py with try/except so Ollama connection
  failures return a graceful fallback message instead of dumping raw
  tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
  core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
  — catches import errors, template failures, and schema mismatches
  that unit tests miss

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: agentic loop for multi-step tasks + Round 10 regression fixes

Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop

Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI when agno is MagicMock stub

_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI — graceful degradation at route level

When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.

Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
425e7da380 Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references

Strip the codebase to pure orchestration logic:

- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
  sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
  tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
  chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content

729 tests pass (2 pre-existing failures in test_calm.py unrelated).

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

* Add Taskosaur (PM + AI task execution) to docker-compose

Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment

Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
Alexander Whitestone
18ed6232f9 feat: Timmy fixes and improvements (#72)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:39:13 -05:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Payne
26e1691099 Fix Timmy coherence: persistent session, model-aware tools, response sanitization
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.

Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:18:08 -05:00
Alexander Payne
16b65b28e8 Add Tier 3: Semantic Memory (vector search)
Completes the three-tier memory architecture:

## Tier 3 — Semantic Search
- Vector embeddings over all vault files
- Similarity-based retrieval
- memory_search tool for agents
- Fallback to hash-based embeddings if transformers unavailable

## Implementation
- src/timmy/semantic_memory.py — Core semantic memory
- Chunking strategy: paragraphs → sentences
- SQLite storage for vectors
- cosine_similarity for ranking

## Integration
- Added memory_search to create_full_toolkit()
- Updated prompts with memory_search examples
- Tool triggers: past conversations, reminders

## Features
- Automatic vault indexing
- Source file tracking (re-indexes on change)
- Similarity scoring
- Context retrieval for queries

## Usage

All 973 tests pass.
2026-02-25 18:25:20 -05:00
Alexander Payne
625806daf5 Fine-tune Timmy's conversational AI with memory layers
## Enhanced System Prompt
- Detailed tool usage guidelines with explicit examples
- Clear DO and DON'T examples for tool selection
- Memory system documentation
- Conversation flow guidelines
- Context awareness instructions

## Memory Layer System (NEW)
Implemented 3-layer memory architecture:

1. WORKING MEMORY (src/timmy/memory_layers.py)
   - Immediate context (last 20 messages)
   - Topic tracking
   - Tool call tracking
   - Fast, ephemeral

2. SHORT-TERM MEMORY (Agno SQLite)
   - Recent conversations (100)
   - Persists across restarts
   - Managed by Agno Agent

3. LONG-TERM MEMORY (src/timmy/memory_layers.py)
   - Facts about user (name, preferences)
   - SQLite storage in data/memory/
   - Auto-extraction from conversations
   - User profile generation

## Memory Manager (NEW)
- Central coordinator for all memory layers
- Context injection into prompts
- Fact extraction and storage
- Session management

## TimmyWithMemory Class (NEW)
- Wrapper around Agno Agent with explicit memory
- Auto-injects user context from LTM
- Tracks exchanges across all layers
- Simple chat() interface

## Agent Configuration
- Increased num_history_runs: 10 -> 20
- Better conversational context retention

## Tests
- All 973 tests pass
- Fixed test expectations for new config
- Fixed module path in test_scary_paths.py

## Files Added/Modified
- src/timmy/prompts.py - Enhanced with memory and tool guidance
- src/timmy/agent.py - Added TimmyWithMemory class
- src/timmy/memory_layers.py - NEW memory system
- src/timmy/conversation.py - NEW conversation manager
- tests/ - Updated for new config
2026-02-25 18:07:44 -05:00
Alexander Payne
4961c610f2 Security, privacy, and agent intelligence hardening
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation

## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs

## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
  - Tool usage guidelines (when to use, when not to)
  - Memory awareness documentation
  - Operating mode documentation
- Help reduce unnecessary tool calls for simple queries

All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.
2026-02-25 15:32:19 -05:00
Claude
19af4ae540 feat: integrate AirLLM as optional high-performance backend
Adds the `bigbrain` optional dependency group (airllm>=2.9.0) and a
complete second inference path that runs 8B / 70B / 405B Llama models
locally via layer-by-layer loading — no GPU required, no cloud, fully
sovereign.

Key changes:
- src/timmy/backends.py   — TimmyAirLLMAgent (same print_response interface
                            as Agno Agent); auto-selects AirLLMMLX on Apple
                            Silicon, AutoModel (PyTorch) everywhere else
- src/timmy/agent.py      — _resolve_backend() routing with explicit override,
                            env-config, and 'auto' Apple-Silicon detection
- src/timmy/cli.py        — --backend / --model-size flags on all commands
- src/config.py           — timmy_model_backend + airllm_model_size settings
- src/timmy/prompts.py    — mentions AirLLM "even bigger brains, still fully
                            sovereign"
- pyproject.toml          — bigbrain optional dep; wheel includes updated
- .env.example            — TIMMY_MODEL_BACKEND + AIRLLM_MODEL_SIZE docs
- tests/conftest.py       — stubs 'airllm' module so tests run without GPU
- tests/test_backends.py  — 13 new tests covering helpers + TimmyAirLLMAgent
- tests/test_agent.py     — 7 new tests for backend routing
- README.md               — Big Brain section with one-line install
- activate_self_tdd.sh    — bootstrap script (venv + install + tests +
                            watchdog + dashboard); --big-brain flag

All 61 tests pass. Self-TDD watchdog unaffected.

https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
2026-02-21 16:53:16 +00:00
Claude
5e7d805245 feat: scaffold Timmy Time Mission Control (v1.0.0 Genesis)
- src/timmy/ — Agno agent wrapper (llama3.2 via Ollama, SQLite memory, TIMMY_SYSTEM_PROMPT)
- src/dashboard/ — FastAPI + HTMX + Jinja2 Mission Control UI
  - /health + /health/status (Ollama ping, HTMX 30s poll)
  - /agents list + /agents/timmy/chat (HTMX form submission)
- static/style.css — dark terminal mission-control aesthetic
- tests/ — 27 pytest tests (prompts, agent config, dashboard routes); no Ollama required
- pyproject.toml — hatchling build, pytest configured with pythonpath=src

https://claude.ai/code/session_01M4L3R98N5fgXFZRvV8X9b6
2026-02-19 19:05:01 +00:00