feat: Self-Coding Foundation (Phase 1)

Implements the foundational infrastructure for Timmy's self-modification capability:

## New Services

1. **GitSafety** (src/self_coding/git_safety.py)
   - Atomic git operations with rollback capability
   - Snapshot/restore for safe experimentation
   - Feature branch management (timmy/self-edit/{timestamp})
   - Merge to main only after tests pass

2. **CodebaseIndexer** (src/self_coding/codebase_indexer.py)
   - AST-based parsing of Python source files
   - Extracts classes, functions, imports, docstrings
   - Builds dependency graph for blast radius analysis
   - SQLite storage with hash-based incremental indexing
   - get_summary() for LLM context (<4000 tokens)
   - get_relevant_files() for task-based file discovery

3. **ModificationJournal** (src/self_coding/modification_journal.py)
   - Persistent log of all self-modification attempts
   - Tracks outcomes: success, failure, rollback
   - find_similar() for learning from past attempts
   - Success rate metrics and recent failure tracking
   - Supports vector embeddings (Phase 2)

4. **ReflectionService** (src/self_coding/reflection.py)
   - LLM-powered analysis of modification attempts
   - Generates lessons learned from successes and failures
   - Fallback templates when LLM unavailable
   - Supports context from similar past attempts

## Test Coverage

- 104 new tests across 7 test files
- 95% code coverage on self_coding module
- Green path tests: full workflow integration
- Red path tests: errors, rollbacks, edge cases
- Safety constraint tests: test coverage requirements, protected files

## Usage

    from self_coding import GitSafety, CodebaseIndexer, ModificationJournal

    git = GitSafety(repo_path=/path/to/repo)
    indexer = CodebaseIndexer(repo_path=/path/to/repo)
    journal = ModificationJournal()

Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.
This commit is contained in:
Alexander Payne
2026-02-26 11:08:05 -05:00
parent 6c6b6f8a54
commit 18bc64b36d
12 changed files with 4535 additions and 0 deletions

View File

@@ -0,0 +1,50 @@
"""Self-Coding Layer — Timmy's ability to modify its own source code safely.
This module provides the foundational infrastructure for self-modification:
- GitSafety: Atomic git operations with rollback capability
- CodebaseIndexer: Live mental model of the codebase
- ModificationJournal: Persistent log of modification attempts
- ReflectionService: Generate lessons learned from attempts
Usage:
from self_coding import GitSafety, CodebaseIndexer, ModificationJournal
from self_coding import ModificationAttempt, Outcome, Snapshot
# Initialize services
git = GitSafety(repo_path="/path/to/repo")
indexer = CodebaseIndexer(repo_path="/path/to/repo")
journal = ModificationJournal()
# Use in self-modification workflow
snapshot = await git.snapshot()
# ... make changes ...
if tests_pass:
await git.commit("Changes", ["file.py"])
else:
await git.rollback(snapshot)
"""
from self_coding.git_safety import GitSafety, Snapshot
from self_coding.codebase_indexer import CodebaseIndexer, ModuleInfo, FunctionInfo, ClassInfo
from self_coding.modification_journal import (
ModificationJournal,
ModificationAttempt,
Outcome,
)
from self_coding.reflection import ReflectionService
__all__ = [
# Core services
"GitSafety",
"CodebaseIndexer",
"ModificationJournal",
"ReflectionService",
# Data classes
"Snapshot",
"ModuleInfo",
"FunctionInfo",
"ClassInfo",
"ModificationAttempt",
"Outcome",
]

View File

@@ -0,0 +1,772 @@
"""Codebase Indexer — Live mental model of Timmy's own codebase.
Parses Python files using AST to extract classes, functions, imports, and
docstrings. Builds a dependency graph and provides semantic search for
relevant files.
"""
from __future__ import annotations
import ast
import hashlib
import json
import logging
import sqlite3
from dataclasses import asdict, dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Optional
logger = logging.getLogger(__name__)
# Default database location
DEFAULT_DB_PATH = Path("data/self_coding.db")
@dataclass
class FunctionInfo:
"""Information about a function."""
name: str
args: list[str]
returns: Optional[str] = None
docstring: Optional[str] = None
line_number: int = 0
is_async: bool = False
is_method: bool = False
@dataclass
class ClassInfo:
"""Information about a class."""
name: str
methods: list[FunctionInfo] = field(default_factory=list)
docstring: Optional[str] = None
line_number: int = 0
bases: list[str] = field(default_factory=list)
@dataclass
class ModuleInfo:
"""Information about a Python module."""
file_path: str
module_name: str
classes: list[ClassInfo] = field(default_factory=list)
functions: list[FunctionInfo] = field(default_factory=list)
imports: list[str] = field(default_factory=list)
docstring: Optional[str] = None
test_coverage: Optional[str] = None
class CodebaseIndexer:
"""Indexes Python codebase for self-modification workflows.
Parses all Python files using AST to extract:
- Module names and structure
- Class definitions with methods
- Function signatures with args and return types
- Import relationships
- Test coverage mapping
Stores everything in SQLite for fast querying.
Usage:
indexer = CodebaseIndexer(repo_path="/path/to/repo")
# Full reindex
await indexer.index_all()
# Incremental update
await indexer.index_changed()
# Get LLM context summary
summary = await indexer.get_summary()
# Find relevant files for a task
files = await indexer.get_relevant_files("Add error handling to health endpoint")
# Get dependency chain
deps = await indexer.get_dependency_chain("src/timmy/agent.py")
"""
def __init__(
self,
repo_path: Optional[str | Path] = None,
db_path: Optional[str | Path] = None,
src_dirs: Optional[list[str]] = None,
) -> None:
"""Initialize CodebaseIndexer.
Args:
repo_path: Root of repository to index. Defaults to current directory.
db_path: SQLite database path. Defaults to data/self_coding.db
src_dirs: Source directories to index. Defaults to ["src", "tests"]
"""
self.repo_path = Path(repo_path).resolve() if repo_path else Path.cwd()
self.db_path = Path(db_path) if db_path else DEFAULT_DB_PATH
self.src_dirs = src_dirs or ["src", "tests"]
self._ensure_schema()
logger.info("CodebaseIndexer initialized for %s", self.repo_path)
def _get_conn(self) -> sqlite3.Connection:
"""Get database connection with schema ensured."""
self.db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(self.db_path))
conn.row_factory = sqlite3.Row
return conn
def _ensure_schema(self) -> None:
"""Create database tables if they don't exist."""
with self._get_conn() as conn:
# Main codebase index table
conn.execute(
"""
CREATE TABLE IF NOT EXISTS codebase_index (
file_path TEXT PRIMARY KEY,
module_name TEXT NOT NULL,
classes JSON,
functions JSON,
imports JSON,
test_coverage TEXT,
last_indexed TIMESTAMP NOT NULL,
content_hash TEXT NOT NULL,
docstring TEXT,
embedding BLOB
)
"""
)
# Dependency graph table
conn.execute(
"""
CREATE TABLE IF NOT EXISTS dependency_graph (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_file TEXT NOT NULL,
target_file TEXT NOT NULL,
import_type TEXT NOT NULL,
UNIQUE(source_file, target_file)
)
"""
)
# Create indexes
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_module_name ON codebase_index(module_name)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_test_coverage ON codebase_index(test_coverage)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_deps_source ON dependency_graph(source_file)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_deps_target ON dependency_graph(target_file)"
)
conn.commit()
def _compute_hash(self, content: str) -> str:
"""Compute SHA-256 hash of file content."""
return hashlib.sha256(content.encode("utf-8")).hexdigest()
def _find_python_files(self) -> list[Path]:
"""Find all Python files in source directories."""
files = []
for src_dir in self.src_dirs:
src_path = self.repo_path / src_dir
if src_path.exists():
files.extend(src_path.rglob("*.py"))
return sorted(files)
def _find_test_file(self, source_file: Path) -> Optional[str]:
"""Find corresponding test file for a source file.
Uses conventions:
- src/x/y.py -> tests/test_x_y.py
- src/x/y.py -> tests/x/test_y.py
- src/x/y.py -> tests/test_y.py
"""
rel_path = source_file.relative_to(self.repo_path)
# Only look for tests for files in src/
if not str(rel_path).startswith("src/"):
return None
# Try various test file naming conventions
possible_tests = [
# tests/test_module.py
self.repo_path / "tests" / f"test_{source_file.stem}.py",
# tests/test_path_module.py (flat)
self.repo_path / "tests" / f"test_{'_'.join(rel_path.with_suffix('').parts[1:])}.py",
]
# Try mirroring src structure in tests (tests/x/test_y.py)
try:
src_relative = rel_path.relative_to("src")
possible_tests.append(
self.repo_path / "tests" / src_relative.parent / f"test_{source_file.stem}.py"
)
except ValueError:
pass
for test_path in possible_tests:
if test_path.exists():
return str(test_path.relative_to(self.repo_path))
return None
def _parse_function(self, node: ast.FunctionDef | ast.AsyncFunctionDef, is_method: bool = False) -> FunctionInfo:
"""Parse a function definition node."""
args = []
# Handle different Python versions' AST structures
func_args = node.args
# Positional args
for arg in func_args.args:
arg_str = arg.arg
if arg.annotation:
arg_str += f": {ast.unparse(arg.annotation)}"
args.append(arg_str)
# Keyword-only args
for arg in func_args.kwonlyargs:
arg_str = arg.arg
if arg.annotation:
arg_str += f": {ast.unparse(arg.annotation)}"
args.append(arg_str)
# Return type
returns = None
if node.returns:
returns = ast.unparse(node.returns)
# Docstring
docstring = ast.get_docstring(node)
return FunctionInfo(
name=node.name,
args=args,
returns=returns,
docstring=docstring,
line_number=node.lineno,
is_async=isinstance(node, ast.AsyncFunctionDef),
is_method=is_method,
)
def _parse_class(self, node: ast.ClassDef) -> ClassInfo:
"""Parse a class definition node."""
methods = []
for item in node.body:
if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
methods.append(self._parse_function(item, is_method=True))
# Get bases
bases = [ast.unparse(base) for base in node.bases]
return ClassInfo(
name=node.name,
methods=methods,
docstring=ast.get_docstring(node),
line_number=node.lineno,
bases=bases,
)
def _parse_module(self, file_path: Path) -> Optional[ModuleInfo]:
"""Parse a Python module file.
Args:
file_path: Path to Python file
Returns:
ModuleInfo or None if parsing fails
"""
try:
content = file_path.read_text(encoding="utf-8")
tree = ast.parse(content)
# Compute module name from file path
rel_path = file_path.relative_to(self.repo_path)
module_name = str(rel_path.with_suffix("")).replace("/", ".")
classes = []
functions = []
imports = []
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.append(alias.name)
elif isinstance(node, ast.ImportFrom):
module = node.module or ""
for alias in node.names:
imports.append(f"{module}.{alias.name}")
# Get top-level definitions (not in classes)
for node in tree.body:
if isinstance(node, ast.ClassDef):
classes.append(self._parse_class(node))
elif isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
functions.append(self._parse_function(node))
# Get module docstring
docstring = ast.get_docstring(tree)
# Find test coverage
test_coverage = self._find_test_file(file_path)
return ModuleInfo(
file_path=str(rel_path),
module_name=module_name,
classes=classes,
functions=functions,
imports=imports,
docstring=docstring,
test_coverage=test_coverage,
)
except SyntaxError as e:
logger.warning("Syntax error in %s: %s", file_path, e)
return None
except Exception as e:
logger.error("Failed to parse %s: %s", file_path, e)
return None
def _store_module(self, conn: sqlite3.Connection, module: ModuleInfo, content_hash: str) -> None:
"""Store module info in database."""
conn.execute(
"""
INSERT OR REPLACE INTO codebase_index
(file_path, module_name, classes, functions, imports, test_coverage,
last_indexed, content_hash, docstring)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
module.file_path,
module.module_name,
json.dumps([asdict(c) for c in module.classes]),
json.dumps([asdict(f) for f in module.functions]),
json.dumps(module.imports),
module.test_coverage,
datetime.now(timezone.utc).isoformat(),
content_hash,
module.docstring,
),
)
def _build_dependency_graph(self, conn: sqlite3.Connection) -> None:
"""Build and store dependency graph from imports."""
# Clear existing graph
conn.execute("DELETE FROM dependency_graph")
# Get all modules
rows = conn.execute("SELECT file_path, module_name, imports FROM codebase_index").fetchall()
# Map module names to file paths
module_to_file = {row["module_name"]: row["file_path"] for row in rows}
# Also map without src/ prefix for package imports like myproject.utils
module_to_file_alt = {}
for row in rows:
module_name = row["module_name"]
if module_name.startswith("src."):
alt_name = module_name[4:] # Remove "src." prefix
module_to_file_alt[alt_name] = row["file_path"]
# Build dependencies
for row in rows:
source_file = row["file_path"]
imports = json.loads(row["imports"])
for imp in imports:
# Try to resolve import to a file
# Handle both "module.name" and "module.name.Class" forms
# First try exact match
if imp in module_to_file:
conn.execute(
"""
INSERT OR IGNORE INTO dependency_graph
(source_file, target_file, import_type)
VALUES (?, ?, ?)
""",
(source_file, module_to_file[imp], "import"),
)
continue
# Try alternative name (without src/ prefix)
if imp in module_to_file_alt:
conn.execute(
"""
INSERT OR IGNORE INTO dependency_graph
(source_file, target_file, import_type)
VALUES (?, ?, ?)
""",
(source_file, module_to_file_alt[imp], "import"),
)
continue
# Try prefix match (import myproject.utils.Helper -> myproject.utils)
imp_parts = imp.split(".")
for i in range(len(imp_parts), 0, -1):
prefix = ".".join(imp_parts[:i])
# Try original module name
if prefix in module_to_file:
conn.execute(
"""
INSERT OR IGNORE INTO dependency_graph
(source_file, target_file, import_type)
VALUES (?, ?, ?)
""",
(source_file, module_to_file[prefix], "import"),
)
break
# Try alternative name (without src/ prefix)
if prefix in module_to_file_alt:
conn.execute(
"""
INSERT OR IGNORE INTO dependency_graph
(source_file, target_file, import_type)
VALUES (?, ?, ?)
""",
(source_file, module_to_file_alt[prefix], "import"),
)
break
conn.commit()
async def index_all(self) -> dict[str, int]:
"""Perform full reindex of all Python files.
Returns:
Dict with stats: {"indexed": int, "failed": int, "skipped": int}
"""
logger.info("Starting full codebase index")
files = self._find_python_files()
stats = {"indexed": 0, "failed": 0, "skipped": 0}
with self._get_conn() as conn:
for file_path in files:
try:
content = file_path.read_text(encoding="utf-8")
content_hash = self._compute_hash(content)
# Check if file needs reindexing
existing = conn.execute(
"SELECT content_hash FROM codebase_index WHERE file_path = ?",
(str(file_path.relative_to(self.repo_path)),),
).fetchone()
if existing and existing["content_hash"] == content_hash:
stats["skipped"] += 1
continue
module = self._parse_module(file_path)
if module:
self._store_module(conn, module, content_hash)
stats["indexed"] += 1
else:
stats["failed"] += 1
except Exception as e:
logger.error("Failed to index %s: %s", file_path, e)
stats["failed"] += 1
# Build dependency graph
self._build_dependency_graph(conn)
conn.commit()
logger.info(
"Indexing complete: %(indexed)d indexed, %(failed)d failed, %(skipped)d skipped",
stats,
)
return stats
async def index_changed(self) -> dict[str, int]:
"""Perform incremental index of only changed files.
Compares content hashes to detect changes.
Returns:
Dict with stats: {"indexed": int, "failed": int, "skipped": int}
"""
logger.info("Starting incremental codebase index")
files = self._find_python_files()
stats = {"indexed": 0, "failed": 0, "skipped": 0}
with self._get_conn() as conn:
for file_path in files:
try:
rel_path = str(file_path.relative_to(self.repo_path))
content = file_path.read_text(encoding="utf-8")
content_hash = self._compute_hash(content)
# Check if changed
existing = conn.execute(
"SELECT content_hash FROM codebase_index WHERE file_path = ?",
(rel_path,),
).fetchone()
if existing and existing["content_hash"] == content_hash:
stats["skipped"] += 1
continue
module = self._parse_module(file_path)
if module:
self._store_module(conn, module, content_hash)
stats["indexed"] += 1
else:
stats["failed"] += 1
except Exception as e:
logger.error("Failed to index %s: %s", file_path, e)
stats["failed"] += 1
# Rebuild dependency graph (some imports may have changed)
self._build_dependency_graph(conn)
conn.commit()
logger.info(
"Incremental indexing complete: %(indexed)d indexed, %(failed)d failed, %(skipped)d skipped",
stats,
)
return stats
async def get_summary(self, max_tokens: int = 4000) -> str:
"""Generate compressed codebase summary for LLM context.
Lists modules, their purposes, key classes/functions, and test coverage.
Keeps output under max_tokens (approximate).
Args:
max_tokens: Maximum approximate tokens for summary
Returns:
Summary string suitable for LLM context
"""
with self._get_conn() as conn:
rows = conn.execute(
"""
SELECT file_path, module_name, classes, functions, test_coverage, docstring
FROM codebase_index
ORDER BY module_name
"""
).fetchall()
lines = ["# Codebase Summary\n"]
lines.append(f"Total modules: {len(rows)}\n")
lines.append("---\n")
for row in rows:
module_name = row["module_name"]
file_path = row["file_path"]
docstring = row["docstring"]
test_coverage = row["test_coverage"]
lines.append(f"\n## {module_name}")
lines.append(f"File: `{file_path}`")
if test_coverage:
lines.append(f"Tests: `{test_coverage}`")
else:
lines.append("Tests: None")
if docstring:
# Take first line of docstring
first_line = docstring.split("\n")[0][:100]
lines.append(f"Purpose: {first_line}")
# Classes
classes = json.loads(row["classes"])
if classes:
lines.append("Classes:")
for cls in classes[:5]: # Limit to 5 classes
methods = [m["name"] for m in cls["methods"][:3]]
method_str = ", ".join(methods) + ("..." if len(cls["methods"]) > 3 else "")
lines.append(f" - {cls['name']}({method_str})")
if len(classes) > 5:
lines.append(f" ... and {len(classes) - 5} more")
# Functions
functions = json.loads(row["functions"])
if functions:
func_names = [f["name"] for f in functions[:5]]
func_str = ", ".join(func_names)
if len(functions) > 5:
func_str += f"... and {len(functions) - 5} more"
lines.append(f"Functions: {func_str}")
lines.append("")
summary = "\n".join(lines)
# Rough token estimation (1 token ≈ 4 characters)
if len(summary) > max_tokens * 4:
# Truncate with note
summary = summary[:max_tokens * 4]
summary += "\n\n[Summary truncated due to length]"
return summary
async def get_relevant_files(self, task_description: str, limit: int = 5) -> list[str]:
"""Find files relevant to a task description.
Uses keyword matching and import relationships. In Phase 2,
this will use semantic search with vector embeddings.
Args:
task_description: Natural language description of the task
limit: Maximum number of files to return
Returns:
List of file paths sorted by relevance
"""
# Simple keyword extraction for now
keywords = set(task_description.lower().split())
# Remove common words
keywords -= {"the", "a", "an", "to", "in", "on", "at", "for", "with", "and", "or", "of", "is", "are"}
with self._get_conn() as conn:
rows = conn.execute(
"""
SELECT file_path, module_name, classes, functions, docstring, test_coverage
FROM codebase_index
"""
).fetchall()
scored_files = []
for row in rows:
score = 0
file_path = row["file_path"].lower()
module_name = row["module_name"].lower()
docstring = (row["docstring"] or "").lower()
classes = json.loads(row["classes"])
functions = json.loads(row["functions"])
# Score based on keyword matches
for keyword in keywords:
if keyword in file_path:
score += 3
if keyword in module_name:
score += 2
if keyword in docstring:
score += 2
# Check class/function names
for cls in classes:
if keyword in cls["name"].lower():
score += 2
for method in cls["methods"]:
if keyword in method["name"].lower():
score += 1
for func in functions:
if keyword in func["name"].lower():
score += 1
# Boost files with test coverage (only if already matched)
if score > 0 and row["test_coverage"]:
score += 1
if score > 0:
scored_files.append((score, row["file_path"]))
# Sort by score descending, return top N
scored_files.sort(reverse=True, key=lambda x: x[0])
return [f[1] for f in scored_files[:limit]]
async def get_dependency_chain(self, file_path: str) -> list[str]:
"""Get all files that import the given file.
Useful for understanding blast radius of changes.
Args:
file_path: Path to file (relative to repo root)
Returns:
List of file paths that import this file
"""
with self._get_conn() as conn:
rows = conn.execute(
"""
SELECT source_file FROM dependency_graph
WHERE target_file = ?
""",
(file_path,),
).fetchall()
return [row["source_file"] for row in rows]
async def has_test_coverage(self, file_path: str) -> bool:
"""Check if a file has corresponding test coverage.
Args:
file_path: Path to file (relative to repo root)
Returns:
True if test file exists, False otherwise
"""
with self._get_conn() as conn:
row = conn.execute(
"SELECT test_coverage FROM codebase_index WHERE file_path = ?",
(file_path,),
).fetchone()
return row is not None and row["test_coverage"] is not None
async def get_module_info(self, file_path: str) -> Optional[ModuleInfo]:
"""Get detailed info for a specific module.
Args:
file_path: Path to file (relative to repo root)
Returns:
ModuleInfo or None if not indexed
"""
with self._get_conn() as conn:
row = conn.execute(
"""
SELECT file_path, module_name, classes, functions, imports,
test_coverage, docstring
FROM codebase_index
WHERE file_path = ?
""",
(file_path,),
).fetchone()
if not row:
return None
# Parse classes - convert dict methods to FunctionInfo objects
classes_data = json.loads(row["classes"])
classes = []
for cls_data in classes_data:
methods = [FunctionInfo(**m) for m in cls_data.get("methods", [])]
cls_info = ClassInfo(
name=cls_data["name"],
methods=methods,
docstring=cls_data.get("docstring"),
line_number=cls_data.get("line_number", 0),
bases=cls_data.get("bases", []),
)
classes.append(cls_info)
# Parse functions
functions_data = json.loads(row["functions"])
functions = [FunctionInfo(**f) for f in functions_data]
return ModuleInfo(
file_path=row["file_path"],
module_name=row["module_name"],
classes=classes,
functions=functions,
imports=json.loads(row["imports"]),
docstring=row["docstring"],
test_coverage=row["test_coverage"],
)

View File

@@ -0,0 +1,505 @@
"""Git Safety Layer — Atomic git operations with rollback capability.
All self-modifications happen on feature branches. Only merge to main after
full test suite passes. Snapshots enable rollback on failure.
"""
from __future__ import annotations
import asyncio
import hashlib
import logging
import subprocess
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass(frozen=True)
class Snapshot:
"""Immutable snapshot of repository state before modification.
Attributes:
commit_hash: Git commit hash at snapshot time
branch: Current branch name
timestamp: When snapshot was taken
test_status: Whether tests were passing at snapshot time
test_output: Pytest output from test run
clean: Whether working directory was clean
"""
commit_hash: str
branch: str
timestamp: datetime
test_status: bool
test_output: str
clean: bool
class GitSafetyError(Exception):
"""Base exception for git safety operations."""
pass
class GitNotRepositoryError(GitSafetyError):
"""Raised when operation is attempted outside a git repository."""
pass
class GitDirtyWorkingDirectoryError(GitSafetyError):
"""Raised when working directory is not clean and clean_required=True."""
pass
class GitOperationError(GitSafetyError):
"""Raised when a git operation fails."""
pass
class GitSafety:
"""Safe git operations for self-modification workflows.
All operations are atomic and support rollback. Self-modifications happen
on feature branches named 'timmy/self-edit/{timestamp}'. Only merged to
main after tests pass.
Usage:
safety = GitSafety(repo_path="/path/to/repo")
# Take snapshot before changes
snapshot = await safety.snapshot()
# Create feature branch
branch = await safety.create_branch(f"timmy/self-edit/{timestamp}")
# Make changes, commit them
await safety.commit("Add error handling", ["src/file.py"])
# Run tests, merge if pass
if tests_pass:
await safety.merge_to_main(branch)
else:
await safety.rollback(snapshot)
"""
def __init__(
self,
repo_path: Optional[str | Path] = None,
main_branch: str = "main",
test_command: str = "python -m pytest --tb=short -q",
) -> None:
"""Initialize GitSafety with repository path.
Args:
repo_path: Path to git repository. Defaults to current working directory.
main_branch: Name of main branch (main, master, etc.)
test_command: Command to run tests for snapshot validation
"""
self.repo_path = Path(repo_path).resolve() if repo_path else Path.cwd()
self.main_branch = main_branch
self.test_command = test_command
self._verify_git_repo()
logger.info("GitSafety initialized for %s", self.repo_path)
def _verify_git_repo(self) -> None:
"""Verify that repo_path is a git repository."""
git_dir = self.repo_path / ".git"
if not git_dir.exists():
raise GitNotRepositoryError(
f"{self.repo_path} is not a git repository"
)
async def _run_git(
self,
*args: str,
check: bool = True,
capture_output: bool = True,
timeout: float = 30.0,
) -> subprocess.CompletedProcess:
"""Run a git command asynchronously.
Args:
*args: Git command arguments
check: Whether to raise on non-zero exit
capture_output: Whether to capture stdout/stderr
timeout: Maximum time to wait for command
Returns:
CompletedProcess with returncode, stdout, stderr
Raises:
GitOperationError: If git command fails and check=True
"""
cmd = ["git", *args]
logger.debug("Running: %s", " ".join(cmd))
try:
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=self.repo_path,
stdout=asyncio.subprocess.PIPE if capture_output else None,
stderr=asyncio.subprocess.PIPE if capture_output else None,
)
stdout, stderr = await asyncio.wait_for(
proc.communicate(),
timeout=timeout,
)
result = subprocess.CompletedProcess(
args=cmd,
returncode=proc.returncode or 0,
stdout=stdout.decode() if stdout else "",
stderr=stderr.decode() if stderr else "",
)
if check and result.returncode != 0:
raise GitOperationError(
f"Git command failed: {' '.join(args)}\n"
f"stdout: {result.stdout}\nstderr: {result.stderr}"
)
return result
except asyncio.TimeoutError as e:
proc.kill()
raise GitOperationError(f"Git command timed out after {timeout}s: {' '.join(args)}") from e
async def _run_shell(
self,
command: str,
timeout: float = 120.0,
) -> subprocess.CompletedProcess:
"""Run a shell command asynchronously.
Args:
command: Shell command to run
timeout: Maximum time to wait
Returns:
CompletedProcess with returncode, stdout, stderr
"""
logger.debug("Running shell: %s", command)
proc = await asyncio.create_subprocess_shell(
command,
cwd=self.repo_path,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(
proc.communicate(),
timeout=timeout,
)
return subprocess.CompletedProcess(
args=command,
returncode=proc.returncode or 0,
stdout=stdout.decode(),
stderr=stderr.decode(),
)
async def is_clean(self) -> bool:
"""Check if working directory is clean (no uncommitted changes).
Returns:
True if clean, False if there are uncommitted changes
"""
result = await self._run_git("status", "--porcelain", check=False)
return result.stdout.strip() == ""
async def get_current_branch(self) -> str:
"""Get current git branch name.
Returns:
Current branch name
"""
result = await self._run_git("branch", "--show-current")
return result.stdout.strip()
async def get_current_commit(self) -> str:
"""Get current commit hash.
Returns:
Full commit hash
"""
result = await self._run_git("rev-parse", "HEAD")
return result.stdout.strip()
async def _run_tests(self) -> tuple[bool, str]:
"""Run test suite and return results.
Returns:
Tuple of (all_passed, test_output)
"""
logger.info("Running tests: %s", self.test_command)
result = await self._run_shell(self.test_command, timeout=300.0)
passed = result.returncode == 0
output = result.stdout + "\n" + result.stderr
if passed:
logger.info("Tests passed")
else:
logger.warning("Tests failed with returncode %d", result.returncode)
return passed, output
async def snapshot(self, run_tests: bool = True) -> Snapshot:
"""Take a snapshot of current repository state.
Captures commit hash, branch, test status. Used for rollback if
modifications fail.
Args:
run_tests: Whether to run tests as part of snapshot
Returns:
Snapshot object with current state
Raises:
GitOperationError: If git operations fail
"""
logger.info("Taking snapshot of repository state")
commit_hash = await self.get_current_commit()
branch = await self.get_current_branch()
clean = await self.is_clean()
timestamp = datetime.now(timezone.utc)
test_status = False
test_output = ""
if run_tests:
test_status, test_output = await self._run_tests()
else:
test_status = True # Assume OK if not running tests
test_output = "Tests skipped"
snapshot = Snapshot(
commit_hash=commit_hash,
branch=branch,
timestamp=timestamp,
test_status=test_status,
test_output=test_output,
clean=clean,
)
logger.info(
"Snapshot taken: %s@%s (clean=%s, tests=%s)",
branch,
commit_hash[:8],
clean,
test_status,
)
return snapshot
async def create_branch(self, name: str, base: Optional[str] = None) -> str:
"""Create and checkout a new feature branch.
Args:
name: Branch name (e.g., 'timmy/self-edit/20260226-143022')
base: Base branch to create from (defaults to main_branch)
Returns:
Name of created branch
Raises:
GitOperationError: If branch creation fails
"""
base = base or self.main_branch
# Ensure we're on base branch and it's up to date
await self._run_git("checkout", base)
await self._run_git("pull", "origin", base, check=False) # May fail if no remote
# Create and checkout new branch
await self._run_git("checkout", "-b", name)
logger.info("Created branch %s from %s", name, base)
return name
async def commit(
self,
message: str,
files: Optional[list[str | Path]] = None,
allow_empty: bool = False,
) -> str:
"""Commit changes to current branch.
Args:
message: Commit message
files: Specific files to commit (None = all changes)
allow_empty: Whether to allow empty commits
Returns:
Commit hash of new commit
Raises:
GitOperationError: If commit fails
"""
# Add files
if files:
for file_path in files:
full_path = self.repo_path / file_path
if not full_path.exists():
logger.warning("File does not exist: %s", file_path)
await self._run_git("add", str(file_path))
else:
await self._run_git("add", "-A")
# Check if there's anything to commit
if not allow_empty:
diff_result = await self._run_git(
"diff", "--cached", "--quiet", check=False
)
if diff_result.returncode == 0:
logger.warning("No changes to commit")
return await self.get_current_commit()
# Commit
commit_args = ["commit", "-m", message]
if allow_empty:
commit_args.append("--allow-empty")
await self._run_git(*commit_args)
commit_hash = await self.get_current_commit()
logger.info("Committed %s: %s", commit_hash[:8], message)
return commit_hash
async def get_diff(self, from_hash: str, to_hash: Optional[str] = None) -> str:
"""Get diff between commits.
Args:
from_hash: Starting commit hash (or Snapshot object hash)
to_hash: Ending commit hash (None = current)
Returns:
Git diff as string
"""
args = ["diff", from_hash]
if to_hash:
args.append(to_hash)
result = await self._run_git(*args)
return result.stdout
async def rollback(self, snapshot: Snapshot | str) -> str:
"""Rollback to a previous snapshot.
Hard resets to the snapshot commit and deletes any uncommitted changes.
Use with caution — this is destructive.
Args:
snapshot: Snapshot object or commit hash to rollback to
Returns:
Commit hash after rollback
Raises:
GitOperationError: If rollback fails
"""
if isinstance(snapshot, Snapshot):
target_hash = snapshot.commit_hash
target_branch = snapshot.branch
else:
target_hash = snapshot
target_branch = None
logger.warning("Rolling back to %s", target_hash[:8])
# Reset to target commit
await self._run_git("reset", "--hard", target_hash)
# Clean any untracked files
await self._run_git("clean", "-fd")
# If we know the original branch, switch back to it
if target_branch:
branch_exists = await self._run_git(
"branch", "--list", target_branch, check=False
)
if branch_exists.stdout.strip():
await self._run_git("checkout", target_branch)
logger.info("Switched back to branch %s", target_branch)
current = await self.get_current_commit()
logger.info("Rolled back to %s", current[:8])
return current
async def merge_to_main(
self,
branch: str,
require_tests: bool = True,
) -> str:
"""Merge a feature branch into main after tests pass.
Args:
branch: Feature branch to merge
require_tests: Whether to require tests to pass before merging
Returns:
Merge commit hash
Raises:
GitOperationError: If merge fails or tests don't pass
"""
logger.info("Preparing to merge %s into %s", branch, self.main_branch)
# Checkout the feature branch and run tests
await self._run_git("checkout", branch)
if require_tests:
passed, output = await self._run_tests()
if not passed:
raise GitOperationError(
f"Cannot merge {branch}: tests failed\n{output}"
)
# Checkout main and merge
await self._run_git("checkout", self.main_branch)
await self._run_git("merge", "--no-ff", "-m", f"Merge {branch}", branch)
# Optionally delete the feature branch
await self._run_git("branch", "-d", branch, check=False)
merge_hash = await self.get_current_commit()
logger.info("Merged %s into %s: %s", branch, self.main_branch, merge_hash[:8])
return merge_hash
async def get_modified_files(self, since_hash: Optional[str] = None) -> list[str]:
"""Get list of files modified since a commit.
Args:
since_hash: Commit to compare against (None = uncommitted changes)
Returns:
List of modified file paths
"""
if since_hash:
result = await self._run_git(
"diff", "--name-only", since_hash, "HEAD"
)
else:
result = await self._run_git(
"diff", "--name-only", "HEAD"
)
files = [f.strip() for f in result.stdout.split("\n") if f.strip()]
return files
async def stage_file(self, file_path: str | Path) -> None:
"""Stage a single file for commit.
Args:
file_path: Path to file relative to repo root
"""
await self._run_git("add", str(file_path))
logger.debug("Staged %s", file_path)

View File

@@ -0,0 +1,425 @@
"""Modification Journal — Persistent log of self-modification attempts.
Tracks successes and failures so Timmy can learn from experience.
Supports semantic search for similar past attempts.
"""
from __future__ import annotations
import json
import logging
import sqlite3
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# Default database location
DEFAULT_DB_PATH = Path("data/self_coding.db")
class Outcome(str, Enum):
"""Possible outcomes of a modification attempt."""
SUCCESS = "success"
FAILURE = "failure"
ROLLBACK = "rollback"
@dataclass
class ModificationAttempt:
"""A single self-modification attempt.
Attributes:
id: Unique identifier (auto-generated by database)
timestamp: When the attempt was made
task_description: What was Timmy trying to do
approach: Strategy/approach planned
files_modified: List of file paths that were modified
diff: The actual git diff of changes
test_results: Pytest output
outcome: success, failure, or rollback
failure_analysis: LLM-generated analysis of why it failed
reflection: LLM-generated lessons learned
retry_count: Number of retry attempts
embedding: Vector embedding of task_description (for semantic search)
"""
task_description: str
approach: str = ""
files_modified: list[str] = field(default_factory=list)
diff: str = ""
test_results: str = ""
outcome: Outcome = Outcome.FAILURE
failure_analysis: str = ""
reflection: str = ""
retry_count: int = 0
id: Optional[int] = None
timestamp: Optional[datetime] = None
embedding: Optional[bytes] = None
class ModificationJournal:
"""Persistent log of self-modification attempts.
Before any self-modification, Timmy should query the journal for
similar past attempts and include relevant ones in the LLM context.
Usage:
journal = ModificationJournal()
# Log an attempt
attempt = ModificationAttempt(
task_description="Add error handling",
files_modified=["src/app.py"],
outcome=Outcome.SUCCESS,
)
await journal.log_attempt(attempt)
# Find similar past attempts
similar = await journal.find_similar("Add error handling to endpoints")
# Get success metrics
metrics = await journal.get_success_rate()
"""
def __init__(
self,
db_path: Optional[str | Path] = None,
) -> None:
"""Initialize ModificationJournal.
Args:
db_path: SQLite database path. Defaults to data/self_coding.db
"""
self.db_path = Path(db_path) if db_path else DEFAULT_DB_PATH
self._ensure_schema()
logger.info("ModificationJournal initialized at %s", self.db_path)
def _get_conn(self) -> sqlite3.Connection:
"""Get database connection with schema ensured."""
self.db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(self.db_path))
conn.row_factory = sqlite3.Row
return conn
def _ensure_schema(self) -> None:
"""Create database tables if they don't exist."""
with self._get_conn() as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS modification_journal (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
task_description TEXT NOT NULL,
approach TEXT,
files_modified JSON,
diff TEXT,
test_results TEXT,
outcome TEXT CHECK(outcome IN ('success', 'failure', 'rollback')),
failure_analysis TEXT,
reflection TEXT,
retry_count INTEGER DEFAULT 0,
embedding BLOB
)
"""
)
# Create indexes for common queries
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_journal_outcome ON modification_journal(outcome)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_journal_timestamp ON modification_journal(timestamp)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_journal_task ON modification_journal(task_description)"
)
conn.commit()
async def log_attempt(self, attempt: ModificationAttempt) -> int:
"""Log a modification attempt to the journal.
Args:
attempt: The modification attempt to log
Returns:
ID of the logged entry
"""
with self._get_conn() as conn:
cursor = conn.execute(
"""
INSERT INTO modification_journal
(task_description, approach, files_modified, diff, test_results,
outcome, failure_analysis, reflection, retry_count, embedding)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
attempt.task_description,
attempt.approach,
json.dumps(attempt.files_modified),
attempt.diff,
attempt.test_results,
attempt.outcome.value,
attempt.failure_analysis,
attempt.reflection,
attempt.retry_count,
attempt.embedding,
),
)
conn.commit()
attempt_id = cursor.lastrowid
logger.info(
"Logged modification attempt %d: %s (%s)",
attempt_id,
attempt.task_description[:50],
attempt.outcome.value,
)
return attempt_id
async def find_similar(
self,
task_description: str,
limit: int = 5,
include_outcomes: Optional[list[Outcome]] = None,
) -> list[ModificationAttempt]:
"""Find similar past modification attempts.
Uses keyword matching for now. In Phase 2, will use vector embeddings
for semantic search.
Args:
task_description: Task to find similar attempts for
limit: Maximum number of results
include_outcomes: Filter by outcomes (None = all)
Returns:
List of similar modification attempts
"""
# Extract keywords from task description
keywords = set(task_description.lower().split())
keywords -= {"the", "a", "an", "to", "in", "on", "at", "for", "with", "and", "or", "of", "is", "are"}
with self._get_conn() as conn:
# Build query
if include_outcomes:
outcome_filter = "AND outcome IN ({})".format(
",".join("?" * len(include_outcomes))
)
outcome_values = [o.value for o in include_outcomes]
else:
outcome_filter = ""
outcome_values = []
rows = conn.execute(
f"""
SELECT id, timestamp, task_description, approach, files_modified,
diff, test_results, outcome, failure_analysis, reflection,
retry_count
FROM modification_journal
WHERE 1=1 {outcome_filter}
ORDER BY timestamp DESC
LIMIT ?
""",
outcome_values + [limit * 3], # Get more for scoring
).fetchall()
# Score by keyword match
scored = []
for row in rows:
score = 0
task = row["task_description"].lower()
approach = (row["approach"] or "").lower()
for kw in keywords:
if kw in task:
score += 3
if kw in approach:
score += 1
# Boost recent attempts (only if already matched)
if score > 0:
timestamp = datetime.fromisoformat(row["timestamp"])
if timestamp.tzinfo is None:
timestamp = timestamp.replace(tzinfo=timezone.utc)
age_days = (datetime.now(timezone.utc) - timestamp).days
if age_days < 7:
score += 2
elif age_days < 30:
score += 1
if score > 0:
scored.append((score, row))
# Sort by score, take top N
scored.sort(reverse=True, key=lambda x: x[0])
top_rows = scored[:limit]
# Convert to ModificationAttempt objects
return [self._row_to_attempt(row) for _, row in top_rows]
async def get_success_rate(self) -> dict[str, float]:
"""Get success rate metrics.
Returns:
Dict with overall and per-category success rates:
{
"overall": float, # 0.0 to 1.0
"success": int, # count
"failure": int, # count
"rollback": int, # count
"total": int, # total attempts
}
"""
with self._get_conn() as conn:
rows = conn.execute(
"""
SELECT outcome, COUNT(*) as count
FROM modification_journal
GROUP BY outcome
"""
).fetchall()
counts = {row["outcome"]: row["count"] for row in rows}
success = counts.get("success", 0)
failure = counts.get("failure", 0)
rollback = counts.get("rollback", 0)
total = success + failure + rollback
overall = success / total if total > 0 else 0.0
return {
"overall": overall,
"success": success,
"failure": failure,
"rollback": rollback,
"total": total,
}
async def get_recent_failures(self, limit: int = 10) -> list[ModificationAttempt]:
"""Get recent failed attempts with their analyses.
Args:
limit: Maximum number of failures to return
Returns:
List of failed modification attempts
"""
with self._get_conn() as conn:
rows = conn.execute(
"""
SELECT id, timestamp, task_description, approach, files_modified,
diff, test_results, outcome, failure_analysis, reflection,
retry_count
FROM modification_journal
WHERE outcome IN ('failure', 'rollback')
ORDER BY timestamp DESC
LIMIT ?
""",
(limit,),
).fetchall()
return [self._row_to_attempt(row) for row in rows]
async def get_by_id(self, attempt_id: int) -> Optional[ModificationAttempt]:
"""Get a specific modification attempt by ID.
Args:
attempt_id: ID of the attempt
Returns:
ModificationAttempt or None if not found
"""
with self._get_conn() as conn:
row = conn.execute(
"""
SELECT id, timestamp, task_description, approach, files_modified,
diff, test_results, outcome, failure_analysis, reflection,
retry_count
FROM modification_journal
WHERE id = ?
""",
(attempt_id,),
).fetchone()
if not row:
return None
return self._row_to_attempt(row)
async def update_reflection(self, attempt_id: int, reflection: str) -> bool:
"""Update the reflection for a modification attempt.
Args:
attempt_id: ID of the attempt
reflection: New reflection text
Returns:
True if updated, False if not found
"""
with self._get_conn() as conn:
cursor = conn.execute(
"""
UPDATE modification_journal
SET reflection = ?
WHERE id = ?
""",
(reflection, attempt_id),
)
conn.commit()
if cursor.rowcount > 0:
logger.info("Updated reflection for attempt %d", attempt_id)
return True
return False
async def get_attempts_for_file(
self,
file_path: str,
limit: int = 10,
) -> list[ModificationAttempt]:
"""Get all attempts that modified a specific file.
Args:
file_path: Path to file (relative to repo root)
limit: Maximum number of attempts
Returns:
List of modification attempts affecting this file
"""
with self._get_conn() as conn:
# Try exact match first, then partial match
rows = conn.execute(
"""
SELECT id, timestamp, task_description, approach, files_modified,
diff, test_results, outcome, failure_analysis, reflection,
retry_count
FROM modification_journal
WHERE files_modified LIKE ? OR files_modified LIKE ?
ORDER BY timestamp DESC
LIMIT ?
""",
(f'%"{file_path}"%', f'%{file_path}%', limit),
).fetchall()
return [self._row_to_attempt(row) for row in rows]
def _row_to_attempt(self, row: sqlite3.Row) -> ModificationAttempt:
"""Convert a database row to ModificationAttempt."""
return ModificationAttempt(
id=row["id"],
timestamp=datetime.fromisoformat(row["timestamp"]),
task_description=row["task_description"],
approach=row["approach"] or "",
files_modified=json.loads(row["files_modified"] or "[]"),
diff=row["diff"] or "",
test_results=row["test_results"] or "",
outcome=Outcome(row["outcome"]),
failure_analysis=row["failure_analysis"] or "",
reflection=row["reflection"] or "",
retry_count=row["retry_count"] or 0,
)

View File

@@ -0,0 +1,259 @@
"""Reflection Service — Generate lessons learned from modification attempts.
After every self-modification (success or failure), the Reflection Service
prompts an LLM to analyze the attempt and extract actionable insights.
"""
from __future__ import annotations
import logging
from typing import Optional
from self_coding.modification_journal import ModificationAttempt, Outcome
logger = logging.getLogger(__name__)
REFLECTION_SYSTEM_PROMPT = """You are a software engineering mentor analyzing a self-modification attempt.
Your goal is to provide constructive, specific feedback that helps improve future attempts.
Focus on patterns and principles rather than one-off issues.
Be concise but insightful. Maximum 300 words."""
REFLECTION_PROMPT_TEMPLATE = """A software agent just attempted to modify its own source code.
Task: {task_description}
Approach: {approach}
Files modified: {files_modified}
Outcome: {outcome}
Test results: {test_results}
{failure_section}
Reflect on this attempt:
1. What went well? (Be specific about techniques or strategies)
2. What could be improved? (Focus on process, not just the code)
3. What would you do differently next time?
4. What general lesson can be extracted for future similar tasks?
Provide your reflection in a structured format:
**What went well:**
[Your analysis]
**What could be improved:**
[Your analysis]
**Next time:**
[Specific actionable change]
**General lesson:**
[Extracted principle for similar tasks]"""
class ReflectionService:
"""Generates reflections on self-modification attempts.
Uses an LLM to analyze attempts and extract lessons learned.
Stores reflections in the Modification Journal for future reference.
Usage:
from self_coding.reflection import ReflectionService
from timmy.cascade_adapter import TimmyCascadeAdapter
adapter = TimmyCascadeAdapter()
reflection_service = ReflectionService(llm_adapter=adapter)
# After a modification attempt
reflection_text = await reflection_service.reflect_on_attempt(attempt)
# Store in journal
await journal.update_reflection(attempt_id, reflection_text)
"""
def __init__(
self,
llm_adapter: Optional[object] = None,
model_preference: str = "fast", # "fast" or "quality"
) -> None:
"""Initialize ReflectionService.
Args:
llm_adapter: LLM adapter (e.g., TimmyCascadeAdapter)
model_preference: "fast" for quick reflections, "quality" for deeper analysis
"""
self.llm_adapter = llm_adapter
self.model_preference = model_preference
logger.info("ReflectionService initialized")
async def reflect_on_attempt(self, attempt: ModificationAttempt) -> str:
"""Generate a reflection on a modification attempt.
Args:
attempt: The modification attempt to reflect on
Returns:
Reflection text (structured markdown)
"""
# Build the prompt
failure_section = ""
if attempt.outcome == Outcome.FAILURE and attempt.failure_analysis:
failure_section = f"\nFailure analysis: {attempt.failure_analysis}"
prompt = REFLECTION_PROMPT_TEMPLATE.format(
task_description=attempt.task_description,
approach=attempt.approach or "(No approach documented)",
files_modified=", ".join(attempt.files_modified) if attempt.files_modified else "(No files modified)",
outcome=attempt.outcome.value.upper(),
test_results=attempt.test_results[:500] if attempt.test_results else "(No test results)",
failure_section=failure_section,
)
# Call LLM if available
if self.llm_adapter:
try:
response = await self.llm_adapter.chat(
message=prompt,
context=REFLECTION_SYSTEM_PROMPT,
)
reflection = response.content.strip()
logger.info("Generated reflection for attempt (via %s)",
response.provider_used)
return reflection
except Exception as e:
logger.error("LLM reflection failed: %s", e)
return self._generate_fallback_reflection(attempt)
else:
# No LLM available, use fallback
return self._generate_fallback_reflection(attempt)
def _generate_fallback_reflection(self, attempt: ModificationAttempt) -> str:
"""Generate a basic reflection without LLM.
Used when no LLM adapter is available or LLM call fails.
Args:
attempt: The modification attempt
Returns:
Basic reflection text
"""
if attempt.outcome == Outcome.SUCCESS:
return f"""**What went well:**
Successfully completed: {attempt.task_description}
Files modified: {', '.join(attempt.files_modified) if attempt.files_modified else 'N/A'}
**What could be improved:**
Document the approach taken for future reference.
**Next time:**
Use the same pattern for similar tasks.
**General lesson:**
Modifications to {', '.join(attempt.files_modified) if attempt.files_modified else 'these files'} should include proper test coverage."""
elif attempt.outcome == Outcome.FAILURE:
return f"""**What went well:**
Attempted: {attempt.task_description}
**What could be improved:**
The modification failed after {attempt.retry_count} retries.
{attempt.failure_analysis if attempt.failure_analysis else 'Failure reason not documented.'}
**Next time:**
Consider breaking the task into smaller steps.
Validate approach with simpler test case first.
**General lesson:**
Changes affecting {', '.join(attempt.files_modified) if attempt.files_modified else 'multiple files'} require careful dependency analysis."""
else: # ROLLBACK
return f"""**What went well:**
Recognized failure and rolled back to maintain stability.
**What could be improved:**
Early detection of issues before full implementation.
**Next time:**
Run tests more frequently during development.
Use smaller incremental commits.
**General lesson:**
Rollback is preferable to shipping broken code."""
async def reflect_with_context(
self,
attempt: ModificationAttempt,
similar_attempts: list[ModificationAttempt],
) -> str:
"""Generate reflection with context from similar past attempts.
Includes relevant past reflections to build cumulative learning.
Args:
attempt: The current modification attempt
similar_attempts: Similar past attempts (with reflections)
Returns:
Reflection text incorporating past learnings
"""
# Build context from similar attempts
context_parts = []
for past in similar_attempts[:3]: # Top 3 similar
if past.reflection:
context_parts.append(
f"Past similar task ({past.outcome.value}):\n"
f"Task: {past.task_description}\n"
f"Lesson: {past.reflection[:200]}..."
)
context = "\n\n".join(context_parts)
# Build enhanced prompt
failure_section = ""
if attempt.outcome == Outcome.FAILURE and attempt.failure_analysis:
failure_section = f"\nFailure analysis: {attempt.failure_analysis}"
enhanced_prompt = f"""A software agent just attempted to modify its own source code.
Task: {attempt.task_description}
Approach: {attempt.approach or "(No approach documented)"}
Files modified: {', '.join(attempt.files_modified) if attempt.files_modified else "(No files modified)"}
Outcome: {attempt.outcome.value.upper()}
Test results: {attempt.test_results[:500] if attempt.test_results else "(No test results)"}
{failure_section}
---
Relevant past attempts:
{context if context else "(No similar past attempts)"}
---
Given this history, reflect on the current attempt:
1. What went well?
2. What could be improved?
3. How does this compare to past similar attempts?
4. What pattern or principle should guide future similar tasks?
Provide your reflection in a structured format:
**What went well:**
**What could be improved:**
**Comparison to past attempts:**
**Guiding principle:**"""
if self.llm_adapter:
try:
response = await self.llm_adapter.chat(
message=enhanced_prompt,
context=REFLECTION_SYSTEM_PROMPT,
)
return response.content.strip()
except Exception as e:
logger.error("LLM reflection with context failed: %s", e)
return await self.reflect_on_attempt(attempt)
else:
return await self.reflect_on_attempt(attempt)