* feat: set qwen3.5:latest as default model - Make qwen3.5:latest the primary default model for faster inference - Move llama3.1:8b-instruct to fallback chain - Update text fallback chain to prioritize qwen3.5:latest Retains full backward compatibility via cascade fallback. * test: remove ~55 brittle, duplicate, and useless tests Audit of all 100 test files identified tests that provided no real regression protection. Removed: - 4 files deleted entirely: test_setup_script (always skipped), test_csrf_bypass (tautological assertions), test_input_validation (accepts 200-500 status codes), test_security_regression (fragile source-pattern checks redundant with rendering tests) - Duplicate test classes (TestToolTracking, TestCalculatorExtended) - Mock-only tests that just verify mock wiring, not behavior - Structurally broken tests (TestCreateToolFunctions patches after import) - Empty/pass-body tests and meaningless assertions (len > 20) - Flaky subprocess tests (aider tool calling real binary) All 1328 remaining tests pass. Net: -699 lines, zero coverage loss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent test pollution from autoresearch_enabled mutation test_autoresearch_perplexity.py was setting settings.autoresearch_enabled = True but never restoring it in the finally block — polluting subsequent tests. When pytest-randomly ordered it before test_experiments_page_shows_disabled_when_off, the victim test saw enabled=True and failed to find "Disabled" in the page. Fix both sides: - Restore autoresearch_enabled in the finally block (root cause) - Mock settings explicitly in the victim test (defense in depth) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
63 lines
2.4 KiB
Python
63 lines
2.4 KiB
Python
"""Functional tests for CLI entry points via Typer's CliRunner.
|
|
|
|
Each test invokes the real CLI command. Ollama is not running, so
|
|
commands that need inference will fail gracefully — and that's a valid
|
|
user scenario we want to verify.
|
|
"""
|
|
|
|
|
|
# ── timmy CLI ─────────────────────────────────────────────────────────────────
|
|
|
|
|
|
class TestTimmyCLI:
|
|
"""Tests the `timmy` command (chat, think, status)."""
|
|
|
|
def test_chat_requires_message(self, timmy_runner):
|
|
runner, app = timmy_runner
|
|
result = runner.invoke(app, ["chat"])
|
|
assert result.exit_code != 0
|
|
assert "Missing argument" in result.output or "Usage" in result.output
|
|
|
|
def test_think_requires_topic(self, timmy_runner):
|
|
runner, app = timmy_runner
|
|
result = runner.invoke(app, ["think"])
|
|
assert result.exit_code != 0
|
|
assert "Missing argument" in result.output or "Usage" in result.output
|
|
|
|
def test_help_text(self, timmy_runner):
|
|
runner, app = timmy_runner
|
|
result = runner.invoke(app, ["--help"])
|
|
assert result.exit_code == 0
|
|
assert "Timmy" in result.output or "sovereign" in result.output.lower()
|
|
|
|
|
|
# ── timmy-serve CLI ───────────────────────────────────────────────────────────
|
|
|
|
|
|
class TestTimmyServeCLI:
|
|
"""Tests the `timmy-serve` command (start, status)."""
|
|
|
|
def test_start_dry_run(self, serve_runner):
|
|
runner, app = serve_runner
|
|
result = runner.invoke(app, ["start", "--dry-run"])
|
|
assert result.exit_code == 0
|
|
assert "Starting Timmy Serve" in result.output
|
|
|
|
def test_start_dry_run_custom_port(self, serve_runner):
|
|
runner, app = serve_runner
|
|
result = runner.invoke(app, ["start", "--dry-run", "--port", "9999"])
|
|
assert result.exit_code == 0
|
|
assert "9999" in result.output
|
|
|
|
def test_status(self, serve_runner):
|
|
runner, app = serve_runner
|
|
result = runner.invoke(app, ["status"])
|
|
assert result.exit_code == 0
|
|
assert "Timmy Serve" in result.output
|
|
|
|
def test_help_text(self, serve_runner):
|
|
runner, app = serve_runner
|
|
result = runner.invoke(app, ["--help"])
|
|
assert result.exit_code == 0
|
|
assert "Serve" in result.output
|