feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
"""Unit tests for the agentic loop module.
|
|
|
|
|
|
2026-03-18 20:54:02 -04:00
|
|
|
Tests cover data structures, plan parsing, planning, execution,
|
|
|
|
|
max_steps enforcement, failure adaptation, double-failure,
|
|
|
|
|
progress callbacks, broadcast helper, summary logic, and
|
|
|
|
|
response cleaning.
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
"""
|
|
|
|
|
|
2026-03-08 12:50:44 -04:00
|
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
|
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
import pytest
|
|
|
|
|
|
2026-03-18 20:54:02 -04:00
|
|
|
from timmy.agentic_loop import (
|
|
|
|
|
AgenticResult,
|
|
|
|
|
AgenticStep,
|
|
|
|
|
_broadcast_progress,
|
|
|
|
|
_parse_steps,
|
|
|
|
|
run_agentic_loop,
|
|
|
|
|
)
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Helpers
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
2026-03-08 12:50:44 -04:00
|
|
|
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
def _mock_run(content: str):
|
|
|
|
|
"""Create a mock return value for agent.run()."""
|
|
|
|
|
m = MagicMock()
|
|
|
|
|
m.content = content
|
|
|
|
|
return m
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# _parse_steps
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
2026-03-08 12:50:44 -04:00
|
|
|
|
2026-03-18 20:54:02 -04:00
|
|
|
class TestDataStructures:
|
|
|
|
|
def test_agentic_step_fields(self):
|
|
|
|
|
step = AgenticStep(
|
|
|
|
|
step_num=1, description="Do X", result="Done", status="completed", duration_ms=42
|
|
|
|
|
)
|
|
|
|
|
assert step.step_num == 1
|
|
|
|
|
assert step.status == "completed"
|
|
|
|
|
assert step.duration_ms == 42
|
|
|
|
|
|
|
|
|
|
def test_agentic_result_defaults(self):
|
|
|
|
|
r = AgenticResult(task_id="abc", task="test", summary="ok")
|
|
|
|
|
assert r.steps == []
|
|
|
|
|
assert r.status == "completed"
|
|
|
|
|
assert r.total_duration_ms == 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# _parse_steps
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
class TestParseSteps:
|
|
|
|
|
def test_numbered_with_dot(self):
|
|
|
|
|
text = "1. Search for data\n2. Write to file\n3. Verify"
|
|
|
|
|
assert _parse_steps(text) == ["Search for data", "Write to file", "Verify"]
|
|
|
|
|
|
|
|
|
|
def test_numbered_with_paren(self):
|
|
|
|
|
text = "1) Read config\n2) Update value\n3) Restart"
|
|
|
|
|
assert _parse_steps(text) == ["Read config", "Update value", "Restart"]
|
|
|
|
|
|
|
|
|
|
def test_fallback_plain_lines(self):
|
|
|
|
|
text = "Search the web\nWrite results\nDone"
|
|
|
|
|
assert _parse_steps(text) == ["Search the web", "Write results", "Done"]
|
|
|
|
|
|
|
|
|
|
def test_empty_returns_empty(self):
|
|
|
|
|
assert _parse_steps("") == []
|
|
|
|
|
|
2026-03-18 20:54:02 -04:00
|
|
|
def test_whitespace_only_returns_empty(self):
|
|
|
|
|
assert _parse_steps(" \n \n ") == []
|
|
|
|
|
|
|
|
|
|
def test_leading_whitespace_in_numbered(self):
|
|
|
|
|
text = " 1. First\n 2. Second"
|
|
|
|
|
assert _parse_steps(text) == ["First", "Second"]
|
|
|
|
|
|
|
|
|
|
def test_mixed_numbered_and_plain(self):
|
|
|
|
|
"""When numbered lines are present, only those are returned."""
|
|
|
|
|
text = "Here is the plan:\n1. Step one\n2. Step two\nGood luck!"
|
|
|
|
|
result = _parse_steps(text)
|
|
|
|
|
assert result == ["Step one", "Step two"]
|
|
|
|
|
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# run_agentic_loop
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
2026-03-08 12:50:44 -04:00
|
|
|
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_planning_phase_produces_steps():
|
|
|
|
|
"""Planning prompt returns numbered step list."""
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Search AI news\n2. Write to file\n3. Verify"),
|
|
|
|
|
_mock_run("Found 5 articles about AI."),
|
|
|
|
|
_mock_run("Wrote summary to /tmp/ai_news.md"),
|
|
|
|
|
_mock_run("File verified, 15 lines."),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Search AI news and write summary")
|
|
|
|
|
|
|
|
|
|
assert result.status == "completed"
|
|
|
|
|
assert len(result.steps) == 3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_loop_executes_all_steps():
|
|
|
|
|
"""Loop calls agent.run() for plan + each step + summary."""
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Do A\n2. Do B"),
|
|
|
|
|
_mock_run("A done"),
|
|
|
|
|
_mock_run("B done"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Do A and B")
|
|
|
|
|
|
2026-03-14 20:54:33 -04:00
|
|
|
# plan + 2 steps = 3 calls
|
|
|
|
|
assert mock_agent.run.call_count == 3
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
assert len(result.steps) == 2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_loop_respects_max_steps():
|
|
|
|
|
"""Loop stops at max_steps and returns status='partial'."""
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. A\n2. B\n3. C\n4. D\n5. E"),
|
|
|
|
|
_mock_run("A done"),
|
|
|
|
|
_mock_run("B done"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Do 5 things", max_steps=2)
|
|
|
|
|
|
|
|
|
|
assert len(result.steps) == 2
|
|
|
|
|
assert result.status == "partial"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_failure_triggers_adaptation():
|
|
|
|
|
"""Failed step feeds error back to model, step marked as adapted."""
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Read config\n2. Update setting\n3. Verify"),
|
|
|
|
|
_mock_run("Config: timeout=30"),
|
|
|
|
|
Exception("Permission denied"),
|
|
|
|
|
_mock_run("Adapted: wrote to ~/config.yaml instead"),
|
|
|
|
|
_mock_run("Verified: timeout=60"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Update config timeout to 60")
|
|
|
|
|
|
|
|
|
|
assert result.status == "completed"
|
|
|
|
|
assert any(s.status == "adapted" for s in result.steps)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_progress_callback_fires():
|
|
|
|
|
"""on_progress called for each step completion."""
|
|
|
|
|
events = []
|
|
|
|
|
|
|
|
|
|
async def on_progress(desc, step, total):
|
|
|
|
|
events.append((step, total))
|
|
|
|
|
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Do A\n2. Do B"),
|
|
|
|
|
_mock_run("A done"),
|
|
|
|
|
_mock_run("B done"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
await run_agentic_loop("Do A and B", on_progress=on_progress)
|
|
|
|
|
|
|
|
|
|
assert len(events) == 2
|
|
|
|
|
assert events[0] == (1, 2)
|
|
|
|
|
assert events[1] == (2, 2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_result_contains_step_metadata():
|
|
|
|
|
"""AgenticResult.steps has status and duration per step."""
|
|
|
|
|
mock_agent = MagicMock()
|
2026-03-08 12:50:44 -04:00
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Search\n2. Write"),
|
|
|
|
|
_mock_run("Found results"),
|
|
|
|
|
_mock_run("Written to file"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Search and write")
|
|
|
|
|
|
|
|
|
|
for step in result.steps:
|
|
|
|
|
assert step.status in ("completed", "failed", "adapted")
|
|
|
|
|
assert step.duration_ms >= 0
|
|
|
|
|
assert step.description
|
|
|
|
|
assert step.result
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_config_default_used():
|
|
|
|
|
"""When max_steps=0, uses settings.max_agent_steps."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
# Return more steps than default config allows (10)
|
|
|
|
|
steps_text = "\n".join(f"{i}. Step {i}" for i in range(1, 15))
|
|
|
|
|
side_effects = [_mock_run(steps_text)]
|
2026-03-14 20:54:33 -04:00
|
|
|
# 10 step results
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
for i in range(1, 11):
|
|
|
|
|
side_effects.append(_mock_run(f"Step {i} done"))
|
|
|
|
|
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=side_effects)
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Do 14 things", max_steps=0)
|
|
|
|
|
|
|
|
|
|
# Should be capped at 10 (config default)
|
|
|
|
|
assert len(result.steps) == 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_planning_failure_returns_failed():
|
|
|
|
|
"""If the planning phase fails, result.status is 'failed'."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=Exception("Model offline"))
|
|
|
|
|
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
2026-03-08 12:50:44 -04:00
|
|
|
):
|
feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
|
|
|
result = await run_agentic_loop("Do something")
|
|
|
|
|
|
|
|
|
|
assert result.status == "failed"
|
|
|
|
|
assert "Planning failed" in result.summary
|
2026-03-18 20:54:02 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_empty_plan_returns_failed():
|
|
|
|
|
"""Planning that produces no steps results in 'failed'."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(return_value=_mock_run(""))
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("Do nothing")
|
|
|
|
|
|
|
|
|
|
assert result.status == "failed"
|
|
|
|
|
assert "no steps" in result.summary.lower()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_double_failure_marks_step_failed():
|
|
|
|
|
"""When both execution and adaptation fail, step status is 'failed'."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Do something"),
|
|
|
|
|
Exception("Step failed"),
|
|
|
|
|
Exception("Adaptation also failed"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("Try and fail", max_steps=1)
|
|
|
|
|
|
|
|
|
|
assert len(result.steps) == 1
|
|
|
|
|
assert result.steps[0].status == "failed"
|
|
|
|
|
assert "Failed" in result.steps[0].result
|
|
|
|
|
assert result.status == "partial"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_broadcast_progress_ignores_ws_errors():
|
|
|
|
|
"""_broadcast_progress swallows import/connection errors."""
|
|
|
|
|
with patch(
|
|
|
|
|
"timmy.agentic_loop.ws_manager",
|
|
|
|
|
create=True,
|
|
|
|
|
side_effect=ImportError("no ws"),
|
|
|
|
|
):
|
|
|
|
|
# Should not raise
|
|
|
|
|
await _broadcast_progress("test.event", {"key": "value"})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_broadcast_progress_sends_to_ws():
|
|
|
|
|
"""_broadcast_progress calls ws_manager.broadcast."""
|
|
|
|
|
mock_ws = AsyncMock()
|
|
|
|
|
with patch("infrastructure.ws_manager.handler.ws_manager", mock_ws):
|
|
|
|
|
await _broadcast_progress("agentic.plan_ready", {"task_id": "abc"})
|
|
|
|
|
mock_ws.broadcast.assert_awaited_once_with("agentic.plan_ready", {"task_id": "abc"})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_summary_counts_step_statuses():
|
|
|
|
|
"""Summary string includes completed, adapted, and failed counts."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. A\n2. B\n3. C"),
|
|
|
|
|
_mock_run("A done"),
|
|
|
|
|
Exception("B broke"),
|
|
|
|
|
_mock_run("B adapted"),
|
|
|
|
|
Exception("C broke"),
|
|
|
|
|
Exception("C adapt broke too"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("A B C", max_steps=3)
|
|
|
|
|
|
|
|
|
|
assert "1 adapted" in result.summary
|
|
|
|
|
assert "1 failed" in result.summary
|
|
|
|
|
assert result.status == "partial"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_task_id_is_set():
|
|
|
|
|
"""Result has a non-empty task_id."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=[_mock_run("1. X"), _mock_run("done")])
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("One step")
|
|
|
|
|
|
|
|
|
|
assert result.task_id
|
|
|
|
|
assert len(result.task_id) == 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_total_duration_is_set():
|
|
|
|
|
"""Result.total_duration_ms is a positive integer."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=[_mock_run("1. X"), _mock_run("done")])
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("Quick task")
|
|
|
|
|
|
|
|
|
|
assert result.total_duration_ms >= 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_agent_run_without_content_attr():
|
|
|
|
|
"""When agent.run() returns an object without .content, str() is used."""
|
|
|
|
|
|
|
|
|
|
class PlanResult:
|
|
|
|
|
def __str__(self):
|
|
|
|
|
return "1. Only step"
|
|
|
|
|
|
|
|
|
|
class StepResult:
|
|
|
|
|
def __str__(self):
|
|
|
|
|
return "Step result"
|
|
|
|
|
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=[PlanResult(), StepResult()])
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
result = await run_agentic_loop("Fallback test", max_steps=1)
|
|
|
|
|
|
|
|
|
|
assert len(result.steps) == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_adapted_step_calls_on_progress():
|
|
|
|
|
"""on_progress is called even for adapted steps."""
|
|
|
|
|
events = []
|
|
|
|
|
|
|
|
|
|
async def on_progress(desc, step, total):
|
|
|
|
|
events.append((desc, step))
|
|
|
|
|
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(
|
|
|
|
|
side_effect=[
|
|
|
|
|
_mock_run("1. Risky step"),
|
|
|
|
|
Exception("boom"),
|
|
|
|
|
_mock_run("Adapted result"),
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", new_callable=AsyncMock),
|
|
|
|
|
):
|
|
|
|
|
await run_agentic_loop("Adapt test", max_steps=1, on_progress=on_progress)
|
|
|
|
|
|
|
|
|
|
assert len(events) == 1
|
|
|
|
|
assert "[Adapted]" in events[0][0]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
|
|
|
async def test_broadcast_called_for_each_phase():
|
|
|
|
|
"""_broadcast_progress is called for plan_ready, step_complete, and task_complete."""
|
|
|
|
|
mock_agent = MagicMock()
|
|
|
|
|
mock_agent.run = MagicMock(side_effect=[_mock_run("1. Do it"), _mock_run("Done")])
|
|
|
|
|
|
|
|
|
|
broadcast = AsyncMock()
|
|
|
|
|
with (
|
|
|
|
|
patch("timmy.agentic_loop._get_loop_agent", return_value=mock_agent),
|
|
|
|
|
patch("timmy.agentic_loop._broadcast_progress", broadcast),
|
|
|
|
|
):
|
|
|
|
|
await run_agentic_loop("One step task", max_steps=1)
|
|
|
|
|
|
|
|
|
|
event_names = [call.args[0] for call in broadcast.call_args_list]
|
|
|
|
|
assert "agentic.plan_ready" in event_names
|
|
|
|
|
assert "agentic.step_complete" in event_names
|
|
|
|
|
assert "agentic.task_complete" in event_names
|