Timmy-time-dashboard

Author	SHA1	Message	Date
Claude (Opus 4.6)	1e1689f931	[claude] Qwen3 two-model routing via task complexity classifier (#1065 ) v2 (#1233 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Co-authored-by: Claude (Opus 4.6) <claude@hermes.local> Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>	2026-03-23 22:58:21 +00:00
Claude (Opus 4.6)	cd1bc2bf6b	[claude] Add agent emotional state simulation (#1013 ) (#1144 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Co-authored-by: Claude (Opus 4.6) <claude@hermes.local> Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>	2026-03-23 18:36:52 +00:00
Claude (Opus 4.6)	a29e615f76	[claude] Load fine-tuned Timmy model into Hermes harness (#1104 ) (#1122 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-23 18:21:32 +00:00
Google Gemini	e8b3d59041	[gemini] feat: Add Claude API fallback tier to cascade.py (#980 ) (#1119 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Co-authored-by: Google Gemini <gemini@hermes.local> Co-committed-by: Google Gemini <gemini@hermes.local>	2026-03-23 18:21:18 +00:00
Claude (Opus 4.6)	19dbdec314	[claude] Add Hermes 4 14B Modelfile, providers config, and smoke test (#1101 ) (#1110 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-23 17:59:45 +00:00
Claude (Opus 4.6)	f2a277f7b5	[claude] Add vllm-mlx as high-performance local inference backend (#1069 ) (#1089 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Co-authored-by: Claude (Opus 4.6) <claude@hermes.local> Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>	2026-03-23 15:34:13 +00:00
Claude (Opus 4.6)	7fdd532260	[claude] Configure Dolphin 3.0 8B as creative writing fallback (#1068 ) (#1088 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-23 15:25:06 +00:00
Claude (Opus 4.6)	1697e55cdb	[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#1056 ) (#1059 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-23 02:14:42 +00:00
Kimi Agent	a95cf806c8	[kimi] Implement token quest system for agents (#713 ) (#789 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-21 20:45:35 +00:00
Kimi Agent	5f4580f98d	[kimi] Add matrix config loader utility (#680 ) (#742 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-21 15:05:06 +00:00
Kimi Agent	9d4ac8e7cc	[kimi] Add /api/matrix/config endpoint for world configuration (#674 ) (#736 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-21 14:25:19 +00:00
Timmy Time	15eb7c3b45	[loop-cycle-538] refactor: remove dead airllm provider from cascade router (#459 ) (#481 ) All checks were successful Tests / lint (push) Successful in 3s Details Tests / test (push) Successful in 1m28s Details	2026-03-19 15:44:10 -04:00
hermes	96c7e6deae	[loop-cycle-52] fix: remove all qwen3.5 references (#182 ) (#190 ) Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-03-15 12:34:21 -04:00
hermes	3e7a35b3df	Merge pull request '[loop-cycle-12] feat: Kimi delegation tool for coding tasks (#67 )' (#112 ) from fix/kimi-delegation-67 into main Some checks failed Tests / lint (push) Successful in 3s Details Tests / test (push) Failing after 43s Details	2026-03-14 20:31:08 -04:00
Kimi Agent	453c9a0694	feat: add delegate_to_kimi() tool for coding delegation (#67 ) Some checks failed Tests / lint (pull_request) Successful in 2s Details Tests / test (pull_request) Failing after 1m2s Details Timmy can now delegate coding tasks to Kimi CLI (262K context). Includes timeout handling, workdir validation, output truncation. Sovereign division of labor — Timmy plans, Kimi codes.	2026-03-14 20:29:03 -04:00
Kimi Agent	2fb104528f	feat: add run_self_tests() tool for self-verification (#65 ) Some checks failed Tests / lint (pull_request) Successful in 4s Details Tests / test (pull_request) Failing after 59s Details Timmy can now run his own test suite via the run_self_tests() tool. Supports 'fast' (unit only), 'full', or specific path scopes. Returns structured results with pass/fail counts. Sovereign self-verification — a fundamental capability.	2026-03-14 20:28:24 -04:00
hermes	4b553fa0ed	Merge pull request 'fix: word-boundary routing + debug route command (#31 )' (#102 ) from fix/routing-patterns into main Some checks failed Tests / lint (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / lint (pull_request) Successful in 2s Details Tests / test (pull_request) Successful in 42s Details	2026-03-14 19:24:16 -04:00
Kimi Agent	67497133fd	fix: word-boundary routing + debug route command (#31 ) All checks were successful Tests / lint (pull_request) Successful in 2s Details Tests / test (pull_request) Successful in 42s Details - Replace substring matching with word-boundary regex in route_request() - "fix the bug" now correctly routes to coder - Multi-word patterns match if all words appear (any order) - Add "timmy route" CLI command for debugging routing - Add route_request_with_match() for pattern visibility - Expand routing keywords in agents.yaml - 22 new routing tests, all passing	2026-03-14 19:21:30 -04:00
Kimi Agent	9c59b386d8	feat: add OLLAMA_NUM_CTX config to cap context window (#83 ) All checks were successful Tests / lint (pull_request) Successful in 3s Details Tests / test (pull_request) Successful in 43s Details - Add ollama_num_ctx setting (default 4096) to config.py - Pass num_ctx option to Ollama in agent.py and agents/base.py - Add OLLAMA_NUM_CTX to .env.example with usage docs - Add context_window note in providers.yaml - Fix mock_settings in test_agent.py for new attribute - qwen3:30b with 4096 ctx uses ~19GB vs 45GB default	2026-03-14 18:54:43 -04:00
Kimi Agent	d28e2f4a7e	[loop-cycle-1] feat: tool allowlist for autonomous operation (#69 ) Some checks failed Tests / lint (pull_request) Successful in 4s Details Tests / test (pull_request) Failing after 13s Details Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded tool calls when no human is present. When Timmy runs with --autonomous or stdin is not a terminal, tool calls are checked against allowlist: matched → auto-approved, else → rejected. Changes: - config/allowlist.yaml: shell prefixes, deny patterns, path rules - tool_safety.py: is_allowlisted() checks tools against YAML rules - cli.py: --autonomous flag, _is_interactive() detection - 44 new allowlist tests, 8 updated CLI tests Closes #69	2026-03-14 17:39:48 -04:00
Trip T	0e89caa830	test: update delegation tests for YAML-driven agent IDs All checks were successful Tests / lint (pull_request) Successful in 8s Details Tests / test (pull_request) Successful in 1m7s Details Old hardcoded IDs (seer, forge, echo, helm, quill) replaced with YAML-defined IDs (orchestrator, researcher, coder, writer, memory, experimenter). Added test that old names are explicitly rejected.	2026-03-14 08:40:24 -04:00
Trip T	f6a6c0f62e	feat: upgrade to qwen3.5, self-hosted Gitea CI, optimize Docker image All checks were successful Tests / lint (pull_request) Successful in 2s Details Tests / test (pull_request) Successful in 32s Details Model upgrade: - qwen2.5:14b → qwen3.5:latest across config, tools, and docs - Added qwen3.5 to multimodal model registry Self-hosted Gitea CI: - .gitea/workflows/tests.yml: lint + test jobs via act_runner - Unified Dockerfile: pre-baked deps from poetry.lock for fast CI - sitepackages=true in tox for ~2s dep resolution (was ~40s) - OLLAMA_URL set to dead port in CI to prevent real LLM calls Test isolation fixes: - Smoke test fixture mocks create_timmy (was hitting real Ollama) - WebSocket sends initial_state before joining broadcast pool (race fix) - Tests use settings.ollama_model/url instead of hardcoded values - skip_ci marker for Ollama-dependent tests, excluded in CI tox envs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:36:42 -04:00
Alexander Whitestone	36fc10097f	Claude/angry cerf (#173 ) Some checks failed Tests / lint (push) Failing after 4s Details Tests / test (push) Has been skipped Details Tests / docker-build (push) Failing after 1s Details * feat: set qwen3.5:latest as default model - Make qwen3.5:latest the primary default model for faster inference - Move llama3.1:8b-instruct to fallback chain - Update text fallback chain to prioritize qwen3.5:latest Retains full backward compatibility via cascade fallback. * test: remove ~55 brittle, duplicate, and useless tests Audit of all 100 test files identified tests that provided no real regression protection. Removed: - 4 files deleted entirely: test_setup_script (always skipped), test_csrf_bypass (tautological assertions), test_input_validation (accepts 200-500 status codes), test_security_regression (fragile source-pattern checks redundant with rendering tests) - Duplicate test classes (TestToolTracking, TestCalculatorExtended) - Mock-only tests that just verify mock wiring, not behavior - Structurally broken tests (TestCreateToolFunctions patches after import) - Empty/pass-body tests and meaningless assertions (len > 20) - Flaky subprocess tests (aider tool calling real binary) All 1328 remaining tests pass. Net: -699 lines, zero coverage loss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent test pollution from autoresearch_enabled mutation test_autoresearch_perplexity.py was setting settings.autoresearch_enabled = True but never restoring it in the finally block — polluting subsequent tests. When pytest-randomly ordered it before test_experiments_page_shows_disabled_when_off, the victim test saw enabled=True and failed to find "Disabled" in the page. Fix both sides: - Restore autoresearch_enabled in the finally block (root cause) - Mock settings explicitly in the victim test (defense in depth) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 16:55:27 -04:00
Alexander Payne	72a58f1f49	feat: Multi-modal support with automatic model fallback - Add MultiModalManager with capability detection for vision/audio/tools - Define fallback chains: vision (llama3.2:3b -> llava:7b -> moondream) tools (llama3.1:8b-instruct -> qwen2.5:7b) - Update CascadeRouter to detect content type and select appropriate models - Add model pulling with automatic fallback in agent creation - Update providers.yaml with multi-modal model configurations - Update OllamaAdapter to use model resolution with vision support Tests: All 96 infrastructure tests pass	2026-02-26 22:29:44 -05:00
Claude	211c54bc8c	feat: add custom weights, model registry, per-agent models, and reward scoring Inspired by OpenClaw-RL's multi-model orchestration, this adds four features for custom model management: 1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed registry for GGUF, safetensors, HF checkpoint, and Ollama models with role-based lookups (general, reward, teacher, judge). 2. Per-agent model assignment — each swarm persona can use a different model instead of sharing the global default. Resolved via registry assignment > persona default > global default. 3. Runtime model management API (/api/v1/models) — REST endpoints to register, list, assign, enable/disable, and remove custom models without restart. Includes a dashboard page at /models. 4. Reward model scoring (PRM-style) — majority-vote quality evaluation of agent outputs using a configurable reward model. Scores persist in SQLite and feed into the swarm learner. New config settings: custom_weights_dir, reward_model_enabled, reward_model_name, reward_model_votes. 54 new tests covering registry CRUD, API endpoints, agent assignments, role lookups, and reward scoring. https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC	2026-02-27 01:27:53 +00:00
Alexander Payne	c658ca829c	Phase 3: Cascade LLM Router with automatic failover - YAML-based provider configuration (config/providers.yaml) - Priority-ordered provider routing - Circuit breaker pattern for failing providers - Health check and availability monitoring - Metrics tracking (latency, errors, success rates) - Support for Ollama, OpenAI, Anthropic, AirLLM providers - Automatic failover on rate limits or errors - REST API endpoints for monitoring and control - 41 comprehensive tests API Endpoints: - POST /api/v1/router/complete - Chat completion with failover - GET /api/v1/router/status - Provider health status - GET /api/v1/router/metrics - Detailed metrics - GET /api/v1/router/providers - List all providers - POST /api/v1/router/providers/{name}/control - Enable/disable/reset - POST /api/v1/router/health-check - Run health checks - GET /api/v1/router/config - View configuration	2026-02-25 19:43:43 -05:00

26 Commits