test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
"""Functional tests for timmy.tools — tool tracking, persona toolkits, catalog.
|
|
|
|
|
|
|
|
|
|
Covers tool usage statistics, persona-to-toolkit mapping, catalog generation,
|
|
|
|
|
and graceful degradation when Agno is unavailable.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import pytest
|
|
|
|
|
|
|
|
|
|
from timmy.tools import (
|
|
|
|
|
_TOOL_USAGE,
|
2026-03-08 12:50:44 -04:00
|
|
|
PERSONA_TOOLKITS,
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
_track_tool_usage,
|
2026-03-08 12:50:44 -04:00
|
|
|
get_all_available_tools,
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
get_tool_stats,
|
|
|
|
|
get_tools_for_persona,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
|
|
|
def clear_usage():
|
|
|
|
|
"""Clear tool usage tracking between tests."""
|
|
|
|
|
_TOOL_USAGE.clear()
|
|
|
|
|
yield
|
|
|
|
|
_TOOL_USAGE.clear()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ── Tool usage tracking ──────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestToolTracking:
|
|
|
|
|
def test_track_creates_agent_entry(self):
|
2026-03-14 18:13:51 -04:00
|
|
|
_track_tool_usage("agent-1", "calculator", success=True)
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
assert "agent-1" in _TOOL_USAGE
|
|
|
|
|
assert len(_TOOL_USAGE["agent-1"]) == 1
|
|
|
|
|
|
|
|
|
|
def test_track_records_metadata(self):
|
|
|
|
|
_track_tool_usage("agent-1", "shell", success=False)
|
|
|
|
|
entry = _TOOL_USAGE["agent-1"][0]
|
|
|
|
|
assert entry["tool"] == "shell"
|
|
|
|
|
assert entry["success"] is False
|
|
|
|
|
assert "timestamp" in entry
|
|
|
|
|
|
|
|
|
|
def test_track_multiple_calls(self):
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
_track_tool_usage("a1", "read")
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
assert len(_TOOL_USAGE["a1"]) == 3
|
|
|
|
|
|
|
|
|
|
def test_track_multiple_agents(self):
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
_track_tool_usage("a2", "shell")
|
|
|
|
|
assert len(_TOOL_USAGE) == 2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestGetToolStats:
|
|
|
|
|
def test_stats_for_specific_agent(self):
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
_track_tool_usage("a1", "read")
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
stats = get_tool_stats("a1")
|
|
|
|
|
assert stats["agent_id"] == "a1"
|
|
|
|
|
assert stats["total_calls"] == 3
|
|
|
|
|
assert set(stats["tools_used"]) == {"search", "read"}
|
|
|
|
|
assert len(stats["recent_calls"]) == 3
|
|
|
|
|
|
|
|
|
|
def test_stats_for_unknown_agent(self):
|
|
|
|
|
stats = get_tool_stats("nonexistent")
|
|
|
|
|
assert stats["total_calls"] == 0
|
|
|
|
|
assert stats["tools_used"] == []
|
|
|
|
|
assert stats["recent_calls"] == []
|
|
|
|
|
|
|
|
|
|
def test_stats_recent_capped_at_10(self):
|
|
|
|
|
for i in range(15):
|
|
|
|
|
_track_tool_usage("a1", f"tool_{i}")
|
|
|
|
|
stats = get_tool_stats("a1")
|
|
|
|
|
assert len(stats["recent_calls"]) == 10
|
|
|
|
|
|
|
|
|
|
def test_stats_all_agents(self):
|
|
|
|
|
_track_tool_usage("a1", "search")
|
|
|
|
|
_track_tool_usage("a2", "shell")
|
|
|
|
|
_track_tool_usage("a2", "read")
|
|
|
|
|
stats = get_tool_stats()
|
|
|
|
|
assert "a1" in stats
|
|
|
|
|
assert "a2" in stats
|
|
|
|
|
assert stats["a1"]["total_calls"] == 1
|
|
|
|
|
assert stats["a2"]["total_calls"] == 2
|
|
|
|
|
|
|
|
|
|
def test_stats_empty(self):
|
|
|
|
|
stats = get_tool_stats()
|
|
|
|
|
assert stats == {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ── Persona toolkit mapping ──────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestPersonaToolkits:
|
|
|
|
|
def test_all_expected_personas_present(self):
|
2026-02-26 23:17:19 -05:00
|
|
|
expected = {
|
|
|
|
|
"echo",
|
|
|
|
|
"mace",
|
|
|
|
|
"helm",
|
|
|
|
|
"seer",
|
|
|
|
|
"forge",
|
|
|
|
|
"quill",
|
2026-03-08 12:50:44 -04:00
|
|
|
"lab",
|
2026-02-26 23:17:19 -05:00
|
|
|
"pixel",
|
|
|
|
|
"lyra",
|
|
|
|
|
"reel",
|
|
|
|
|
}
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
assert set(PERSONA_TOOLKITS.keys()) == expected
|
|
|
|
|
|
2026-03-09 21:54:04 -04:00
|
|
|
def test_get_tools_for_known_persona_returns_toolkit(self):
|
|
|
|
|
"""Known personas should return a Toolkit with registered tools."""
|
|
|
|
|
result = get_tools_for_persona("echo")
|
|
|
|
|
assert result is not None
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
|
|
|
|
|
def test_get_tools_for_unknown_persona(self):
|
|
|
|
|
result = get_tools_for_persona("nonexistent")
|
|
|
|
|
assert result is None
|
|
|
|
|
|
2026-03-09 21:54:04 -04:00
|
|
|
def test_creative_personas_return_toolkit(self):
|
|
|
|
|
"""Creative personas (pixel, lyra, reel) return toolkits."""
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
for persona_id in ("pixel", "lyra", "reel"):
|
|
|
|
|
result = get_tools_for_persona(persona_id)
|
2026-03-09 21:54:04 -04:00
|
|
|
assert result is not None
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ── Tool catalog ─────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestToolCatalog:
|
|
|
|
|
def test_catalog_contains_base_tools(self):
|
|
|
|
|
catalog = get_all_available_tools()
|
2026-02-26 23:17:19 -05:00
|
|
|
base_tools = {
|
|
|
|
|
"shell",
|
|
|
|
|
"python",
|
|
|
|
|
"read_file",
|
|
|
|
|
"write_file",
|
|
|
|
|
"list_files",
|
|
|
|
|
}
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
for tool_id in base_tools:
|
|
|
|
|
assert tool_id in catalog, f"Missing base tool: {tool_id}"
|
2026-03-14 18:13:51 -04:00
|
|
|
# web_search removed — dead code, ddgs never installed (#87)
|
|
|
|
|
assert "web_search" not in catalog
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
|
|
|
|
|
def test_catalog_tool_structure(self):
|
|
|
|
|
catalog = get_all_available_tools()
|
|
|
|
|
for tool_id, info in catalog.items():
|
|
|
|
|
assert "name" in info, f"{tool_id} missing 'name'"
|
|
|
|
|
assert "description" in info, f"{tool_id} missing 'description'"
|
|
|
|
|
assert "available_in" in info, f"{tool_id} missing 'available_in'"
|
|
|
|
|
assert isinstance(info["available_in"], list)
|
|
|
|
|
|
Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references
Strip the codebase to pure orchestration logic:
- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content
729 tests pass (2 pre-existing failures in test_calm.py unrelated).
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
* Add Taskosaur (PM + AI task execution) to docker-compose
Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment
Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
|
|
|
def test_catalog_orchestrator_has_all_base_tools(self):
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
catalog = get_all_available_tools()
|
2026-02-26 23:17:19 -05:00
|
|
|
base_tools = {
|
|
|
|
|
"shell",
|
|
|
|
|
"python",
|
|
|
|
|
"read_file",
|
|
|
|
|
"write_file",
|
|
|
|
|
"list_files",
|
|
|
|
|
}
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
for tool_id in base_tools:
|
ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
|
|
|
assert "orchestrator" in catalog[tool_id]["available_in"], (
|
|
|
|
|
f"Orchestrator missing tool: {tool_id}"
|
|
|
|
|
)
|
test: add 157 functional tests covering 8 low-coverage modules
Analyze test coverage (75.3% → 85.4%) and add functional test suites
for the major gaps identified:
- test_agent_core.py: Full coverage for agent_core/interface.py (0→100%)
and agent_core/ollama_adapter.py (0→100%) — data classes, factories,
abstract enforcement, perceive/reason/act/recall workflow, effect logging
- test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%)
— container spawn/stop/list lifecycle with mocked subprocess
- test_timmy_tools.py: Tool usage tracking, persona toolkit mapping,
catalog generation, graceful degradation without Agno
- test_routes_tools.py: /tools page, API stats endpoint, and WebSocket
/swarm/live connect/disconnect/send lifecycle (41→82%)
- test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping,
voice listing, graceful degradation (41→94%)
- test_watchdog_functional.py: _run_tests, watch loop state transitions,
regression detection, KeyboardInterrupt (47→97%)
- test_lnd_backend.py: LND init from params/env, grpc stub enforcement,
method-level BackendNotAvailableError, settle returns False (25→61%)
- test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction,
insights, UI partials, error paths (63→92%)
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-24 23:36:50 +00:00
|
|
|
|
|
|
|
|
def test_catalog_echo_research_tools(self):
|
|
|
|
|
catalog = get_all_available_tools()
|
|
|
|
|
assert "echo" in catalog["read_file"]["available_in"]
|
|
|
|
|
# Echo should NOT have shell
|
|
|
|
|
assert "echo" not in catalog["shell"]["available_in"]
|
|
|
|
|
|
|
|
|
|
def test_catalog_forge_code_tools(self):
|
|
|
|
|
catalog = get_all_available_tools()
|
|
|
|
|
assert "forge" in catalog["shell"]["available_in"]
|
|
|
|
|
assert "forge" in catalog["python"]["available_in"]
|
|
|
|
|
assert "forge" in catalog["write_file"]["available_in"]
|
|
|
|
|
|
2026-02-26 23:17:19 -05:00
|
|
|
def test_catalog_forge_has_aider(self):
|
|
|
|
|
"""Verify Aider AI tool is available in Forge's toolkit."""
|
|
|
|
|
catalog = get_all_available_tools()
|
|
|
|
|
assert "aider" in catalog
|
|
|
|
|
assert "forge" in catalog["aider"]["available_in"]
|
Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references
Strip the codebase to pure orchestration logic:
- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content
729 tests pass (2 pre-existing failures in test_calm.py unrelated).
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
* Add Taskosaur (PM + AI task execution) to docker-compose
Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment
Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
|
|
|
assert "orchestrator" in catalog["aider"]["available_in"]
|
2026-02-26 23:17:19 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestAiderTool:
|
|
|
|
|
"""Test the Aider AI coding assistant tool."""
|
|
|
|
|
|
|
|
|
|
def test_aider_in_tool_catalog(self):
|
|
|
|
|
"""Verify Aider appears in the tool catalog."""
|
|
|
|
|
catalog = get_all_available_tools()
|
|
|
|
|
assert "aider" in catalog
|
|
|
|
|
assert "forge" in catalog["aider"]["available_in"]
|
2026-03-14 17:54:58 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
class TestFullToolkitConfirmationWarning:
|
|
|
|
|
"""Regression tests for issue #79 — confirmation tool WARNING spam."""
|
|
|
|
|
|
|
|
|
|
def test_create_full_toolkit_no_confirmation_warning(self, caplog):
|
|
|
|
|
"""create_full_toolkit should not emit 'Requires confirmation tool(s)' warnings.
|
|
|
|
|
|
|
|
|
|
Agno's Toolkit.__init__ validates requires_confirmation_tools against the
|
|
|
|
|
initial (empty) tool list. We set the attribute *after* construction to
|
|
|
|
|
avoid the spurious warning while keeping per-tool confirmation checks.
|
|
|
|
|
"""
|
|
|
|
|
import logging
|
|
|
|
|
|
|
|
|
|
from timmy.tools import create_full_toolkit
|
|
|
|
|
|
|
|
|
|
with caplog.at_level(logging.WARNING):
|
|
|
|
|
create_full_toolkit()
|
|
|
|
|
|
|
|
|
|
warning_msgs = [
|
|
|
|
|
r.message for r in caplog.records if "Requires confirmation tool" in r.message
|
|
|
|
|
]
|
|
|
|
|
assert warning_msgs == [], f"Unexpected confirmation warnings: {warning_msgs}"
|
|
|
|
|
|
|
|
|
|
def test_dangerous_tools_listed_for_confirmation(self):
|
|
|
|
|
"""After the fix, the toolkit still carries the full DANGEROUS_TOOLS list
|
|
|
|
|
so Agno can gate execution at runtime."""
|
|
|
|
|
from timmy.tool_safety import DANGEROUS_TOOLS
|
|
|
|
|
from timmy.tools import create_full_toolkit
|
|
|
|
|
|
|
|
|
|
toolkit = create_full_toolkit()
|
|
|
|
|
if toolkit is None:
|
|
|
|
|
pytest.skip("Agno tools not available")
|
|
|
|
|
|
|
|
|
|
assert set(toolkit.requires_confirmation_tools) == set(DANGEROUS_TOOLS)
|