Compare commits

..

2 Commits

Author SHA1 Message Date
10d7cd7d0c test(#752): Add tests for error classification
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 44s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 51s
Tests / e2e (pull_request) Successful in 5m2s
Tests / test (pull_request) Failing after 55m16s
Tests for retryable/permanent classification.
Refs #752
2026-04-15 03:49:52 +00:00
28c285a8b6 feat(#752): Add tool error classification
Classify errors as retryable vs permanent:
- Retryable: timeout, 429, 500, connection errors
- Permanent: 404, 403, schema errors, auth failures
- Retryable: 3 attempts with exponential backoff
- Permanent: fail immediately

Resolves #752
2026-04-15 03:49:31 +00:00
5 changed files with 288 additions and 606 deletions

View File

@@ -1,134 +0,0 @@
# Anthropic Cybersecurity Skills Integration
Import and use the Anthropic Cybersecurity Skills library (754 skills, 26 domains, 5 frameworks) with Hermes Agent.
## Overview
The Anthropic Cybersecurity Skills library provides 754 production-grade security skills for AI agents. Each skill follows the agentskills.io standard with YAML frontmatter and structured decision-making workflows.
## Source
- **Repository:** https://github.com/mukul975/Anthropic-Cybersecurity-Skills
- **License:** Apache 2.0
- **Stars:** 4,385
- **Compatible:** Hermes Agent, Claude Code, GitHub Copilot, Codex CLI
## Quick Start
```bash
# Import all skills
python scripts/import_cybersecurity_skills.py
# Import by domain
python scripts/import_cybersecurity_skills.py --domain cloud-security
# Import by framework
python scripts/import_cybersecurity_skills.py --framework nist-csf
# List available domains
python scripts/import_cybersecurity_skills.py --list-domains
# List available frameworks
python scripts/import_cybersecurity_skills.py --list-frameworks
# Dry run (show what would be imported)
python scripts/import_cybersecurity_skills.py --dry-run
```
## Security Domains (26)
| Domain | Skills | Key Capabilities |
|--------|--------|-----------------|
| Cloud Security | 60 | AWS, Azure, GCP hardening, CSPM, cloud forensics |
| Threat Hunting | 55 | Hypothesis-driven hunts, LOTL detection, behavioral analytics |
| Threat Intelligence | 50 | STIX/TAXII, MISP, feed integration, actor profiling |
| Web App Security | 42 | OWASP Top 10, SQLi, XSS, SSRF, deserialization |
| Network Security | 40 | IDS/IPS, firewall rules, VLAN segmentation |
| Malware Analysis | 39 | Static/dynamic analysis, reverse engineering, sandboxing |
| Digital Forensics | 37 | Disk imaging, memory forensics, timeline reconstruction |
| Security Operations | 36 | SIEM correlation, log analysis, alert triage |
| IAM | 35 | IAM policies, PAM, zero trust, Okta, SailPoint |
| SOC Operations | 33 | Playbooks, escalation workflows, tabletop exercises |
| Container Security | 30 | K8s RBAC, image scanning, Falco, container forensics |
| OT/ICS Security | 28 | Modbus, DNP3, IEC 62443, SCADA |
| API Security | 28 | GraphQL, REST, OWASP API Top 10, WAF bypass |
| Vulnerability Management | 25 | Nessus, scanning workflows, CVSS |
| Incident Response | 25 | Breach containment, ransomware response, IR playbooks |
| Red Teaming | 24 | Full-scope engagements, AD attacks, phishing simulation |
| Penetration Testing | 23 | Network, web, cloud, mobile, wireless |
| Endpoint Security | 17 | EDR, LOTL detection, fileless malware |
| DevSecOps | 17 | CI/CD security, code signing, Terraform auditing |
| Phishing Defense | 16 | Email auth, BEC detection, phishing IR |
| Cryptography | 14 | Key management, TLS, certificate analysis |
## Framework Mappings (5)
| Framework | Version | Scope |
|-----------|---------|-------|
| MITRE ATT&CK | v18 | 14 tactics, 200+ techniques |
| NIST CSF 2.0 | 2.0 | 6 functions, 22 categories |
| MITRE ATLAS | v5.4 | 16 tactics, 84 techniques |
| MITRE D3FEND | v1.3 | 7 categories, 267 techniques |
| NIST AI RMF | 1.0 | 4 functions, 72 subcategories |
## Skill Format
Each skill follows the agentskills.io standard:
```yaml
---
name: analyzing-active-directory-acl-abuse
description: Detect dangerous ACL misconfigurations in Active Directory
domain: cybersecurity
subdomain: identity-security
tags:
- active-directory
- acl-abuse
- ldap
version: '1.0'
author: mahipal
license: Apache-2.0
nist_csf:
- PR.AA-01
- PR.AA-05
- PR.AA-06
---
```
## Use Cases for Hermes
1. **Fleet security** — Agents can audit their own infrastructure
2. **Incident response** — Structured IR playbooks for security events
3. **Threat hunting** — Hypothesis-driven hunts across fleet logs
4. **Compliance** — Framework-mapped skills for audit preparation
5. **Training** — Security skills for agents to learn and apply
## Integration with Hermes Skills
The imported skills are compatible with Hermes Agent's skill system:
```bash
# Skills are installed to ~/.hermes/skills/cybersecurity/
# Each skill has a SKILL.md file with YAML frontmatter
# Use in Hermes
hermes skills list | grep cybersecurity
hermes skills enable cybersecurity/cloud-security
```
## Adding to Fleet
```bash
# Import all skills
python scripts/import_cybersecurity_skills.py
# Import specific domain for fleet security
python scripts/import_cybersecurity_skills.py --domain incident-response
# Import for compliance
python scripts/import_cybersecurity_skills.py --framework nist-csf
```
## Index
After import, an index is generated at `~/.hermes/skills/cybersecurity/index.json` listing all installed skills with their metadata.

View File

@@ -1,227 +0,0 @@
#!/usr/bin/env python3
"""
import-cybersecurity-skills.py — Import Anthropic Cybersecurity Skills into Hermes.
Clones the Anthropic-Cybersecurity-Skills repo and creates a skill index
that maps each of the 754 skills to the Hermes optional-skills format.
Usage:
python3 scripts/import-cybersecurity-skills.py --clone # Clone repo
python3 scripts/import-cybersecurity-skills.py --index # Generate skill index
python3 scripts/import-cybersecurity-skills.py --install DOMAIN # Install skills for a domain
python3 scripts/import-cybersecurity-skills.py --list # List all domains
python3 scripts/import-cybersecurity-skills.py --status # Import status
"""
import argparse
import json
import os
import subprocess
import sys
import yaml
from pathlib import Path
from collections import defaultdict
REPO_URL = "https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git"
SKILLS_DIR = Path.home() / ".hermes" / "cybersecurity-skills"
INDEX_PATH = SKILLS_DIR / "skill-index.json"
OPTIONAL_SKILLS_DIR = Path.home() / ".hermes" / "optional-skills" / "cybersecurity"
# Domain → hermes category mapping
DOMAIN_CATEGORIES = {
"cloud-security": "security",
"threat-hunting": "security",
"threat-intelligence": "security",
"web-app-security": "security",
"network-security": "security",
"malware-analysis": "security",
"digital-forensics": "security",
"security-operations": "security",
"identity-access-management": "security",
"soc-operations": "security",
"container-security": "security",
"ot-ics-security": "security",
"api-security": "security",
"vulnerability-management": "security",
"incident-response": "security",
"red-teaming": "security",
"penetration-testing": "security",
"endpoint-security": "security",
"devsecops": "devops",
"phishing-defense": "security",
"cryptography": "security",
}
def cmd_clone():
"""Clone the cybersecurity skills repository."""
if SKILLS_DIR.exists():
print(f"Updating existing clone at {SKILLS_DIR}")
subprocess.run(["git", "-C", str(SKILLS_DIR), "pull"], capture_output=True)
else:
SKILLS_DIR.parent.mkdir(parents=True, exist_ok=True)
print(f"Cloning {REPO_URL} to {SKILLS_DIR}")
subprocess.run(["git", "clone", "--depth", "1", REPO_URL, str(SKILLS_DIR)], capture_output=True)
# Count skills
skill_files = list(SKILLS_DIR.rglob("*.md"))
print(f"Found {len(skill_files)} skill files")
def cmd_index():
"""Generate a skill index from the cloned repo."""
if not SKILLS_DIR.exists():
print("Run --clone first", file=sys.stderr)
sys.exit(1)
skills = []
domains = defaultdict(list)
for md_file in SKILLS_DIR.rglob("*.md"):
if md_file.name in ("README.md", "LICENSE.md", "DESCRIPTION.md"):
continue
try:
content = md_file.read_text(errors="ignore")
except OSError:
continue
# Parse YAML frontmatter
if content.startswith("---"):
parts = content.split("---", 2)
if len(parts) >= 3:
try:
frontmatter = yaml.safe_load(parts[1]) or {}
except yaml.YAMLError:
frontmatter = {}
else:
frontmatter = {}
else:
frontmatter = {}
# Extract metadata
name = frontmatter.get("name", md_file.stem)
description = frontmatter.get("description", "")
domain = frontmatter.get("domain", frontmatter.get("subdomain", "general"))
tags = frontmatter.get("tags", [])
frameworks = frontmatter.get("nist_csf", []) + frontmatter.get("mitre_attack", [])
skill = {
"name": name,
"file": str(md_file.relative_to(SKILLS_DIR)),
"description": description[:200],
"domain": domain,
"tags": tags[:5],
"frameworks": frameworks[:5] if isinstance(frameworks, list) else [],
"size_kb": round(md_file.stat().st_size / 1024, 1),
}
skills.append(skill)
domains[domain].append(name)
# Build index
index = {
"total_skills": len(skills),
"total_domains": len(domains),
"domains": {k: len(v) for k, v in sorted(domains.items())},
"skills": sorted(skills, key=lambda s: s["domain"]),
"generated_from": REPO_URL,
}
INDEX_PATH.write_text(json.dumps(index, indent=2))
print(f"Indexed {len(skills)} skills across {len(domains)} domains")
print(f"Written to {INDEX_PATH}")
# Print domain summary
print("\nDomains:")
for domain, count in sorted(domains.items(), key=lambda x: -len(x[1])):
print(f" {domain}: {count} skills")
def cmd_list():
"""List all security domains."""
if not INDEX_PATH.exists():
print("Run --index first", file=sys.stderr)
sys.exit(1)
index = json.loads(INDEX_PATH.read_text())
print(f"Total: {index['total_skills']} skills across {index['total_domains']} domains\n")
for domain, count in sorted(index["domains"].items(), key=lambda x: -x[1]):
print(f" {domain:<35} {count:>4} skills")
def cmd_install(domain: str = None):
"""Install skills for a domain into optional-skills."""
if not INDEX_PATH.exists():
print("Run --index first", file=sys.stderr)
sys.exit(1)
index = json.loads(INDEX_PATH.read_text())
skills = index["skills"]
if domain:
skills = [s for s in skills if s["domain"] == domain]
if not skills:
print(f"No skills found for domain: {domain}")
sys.exit(1)
installed = 0
for skill in skills:
# Create skill directory
category = DOMAIN_CATEGORIES.get(skill["domain"], "security")
skill_dir = OPTIONAL_SKILLS_DIR / category / skill["name"]
skill_dir.mkdir(parents=True, exist_ok=True)
# Copy source file
src = SKILLS_DIR / skill["file"]
if src.exists():
dst = skill_dir / "SKILL.md"
dst.write_text(src.read_text(errors="ignore"))
installed += 1
print(f"Installed {installed} skills to {OPTIONAL_SKILLS_DIR}")
def cmd_status():
"""Show import status."""
print(f"Clone dir: {SKILLS_DIR}")
print(f" Exists: {SKILLS_DIR.exists()}")
print(f"Index: {INDEX_PATH}")
print(f" Exists: {INDEX_PATH.exists()}")
if INDEX_PATH.exists():
index = json.loads(INDEX_PATH.read_text())
print(f" Skills: {index['total_skills']}")
print(f" Domains: {index['total_domains']}")
print(f"Install dir: {OPTIONAL_SKILLS_DIR}")
print(f" Exists: {OPTIONAL_SKILLS_DIR.exists()}")
if OPTIONAL_SKILLS_DIR.exists():
installed = len(list(OPTIONAL_SKILLS_DIR.rglob("SKILL.md")))
print(f" Installed skills: {installed}")
def main():
parser = argparse.ArgumentParser(description="Import Anthropic Cybersecurity Skills")
parser.add_argument("--clone", action="store_true", help="Clone the skills repo")
parser.add_argument("--index", action="store_true", help="Generate skill index")
parser.add_argument("--list", action="store_true", help="List all domains")
parser.add_argument("--install", metavar="DOMAIN", nargs="?", const="all", help="Install skills for domain")
parser.add_argument("--status", action="store_true", help="Import status")
args = parser.parse_args()
if args.clone:
cmd_clone()
elif args.index:
cmd_index()
elif args.list:
cmd_list()
elif args.install is not None:
cmd_install(None if args.install == "all" else args.install)
elif args.status:
cmd_status()
else:
parser.print_help()
if __name__ == "__main__":
main()

View File

@@ -1,245 +0,0 @@
#!/usr/bin/env python3
"""
import_cybersecurity_skills.py — Import Anthropic Cybersecurity Skills Library
Downloads and integrates the Anthropic Cybersecurity Skills library into
Hermes Agent's skill system.
Source: https://github.com/mukul975/Anthropic-Cybersecurity-Skills
License: Apache 2.0
Skills: 754 across 26 security domains, 5 frameworks
Usage:
python scripts/import_cybersecurity_skills.py
python scripts/import_cybersecurity_skills.py --domain cloud-security
python scripts/import_cybersecurity_skills.py --framework nist-csf
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
import tempfile
import urllib.request
from pathlib import Path
from typing import List, Dict, Any
# Configuration
REPO_URL = "https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git"
SKILLS_DIR = Path.home() / ".hermes" / "skills" / "cybersecurity"
CACHE_DIR = Path.home() / ".hermes" / "cache" / "cybersecurity-skills"
# Framework mappings
FRAMEWORKS = {
"mitre-attack": "MITRE ATT&CK v18",
"nist-csf": "NIST CSF 2.0",
"mitre-atlas": "MITRE ATLAS v5.4",
"mitre-d3fend": "MITRE D3FEND v1.3",
"nist-ai-rmf": "NIST AI RMF 1.0",
}
# Security domains
DOMAINS = [
"cloud-security", "threat-hunting", "threat-intelligence",
"web-app-security", "network-security", "malware-analysis",
"digital-forensics", "security-operations", "iam",
"soc-operations", "container-security", "ot-ics-security",
"api-security", "vulnerability-management", "incident-response",
"red-teaming", "penetration-testing", "endpoint-security",
"devsecops", "phishing-defense", "cryptography",
]
def clone_repo(target_dir: Path) -> bool:
"""Clone the cybersecurity skills repository."""
print(f"Cloning {REPO_URL}...")
try:
subprocess.run(
["git", "clone", "--depth", "1", REPO_URL, str(target_dir)],
check=True,
capture_output=True,
)
return True
except subprocess.CalledProcessError as e:
print(f"Error cloning repository: {e}", file=sys.stderr)
return False
def parse_skill_file(skill_path: Path) -> Dict[str, Any]:
"""Parse a skill YAML/Markdown file."""
content = skill_path.read_text(encoding="utf-8")
# Extract YAML frontmatter
if content.startswith("---"):
parts = content.split("---", 2)
if len(parts) >= 3:
import yaml
try:
metadata = yaml.safe_load(parts[1])
metadata["content"] = parts[2].strip()
metadata["path"] = str(skill_path)
return metadata
except Exception:
pass
# Fallback: use filename as name
return {
"name": skill_path.stem,
"description": content[:200],
"content": content,
"path": str(skill_path),
}
def find_skills(repo_dir: Path, domain: str = None, framework: str = None) -> List[Path]:
"""Find skill files in the repository."""
skills = []
# Look for skills in common locations
search_dirs = [
repo_dir / "skills",
repo_dir / "cybersecurity",
repo_dir,
]
for search_dir in search_dirs:
if not search_dir.exists():
continue
for path in search_dir.rglob("*.md"):
# Skip README files
if path.name.upper() == "README.MD":
continue
# Filter by domain if specified
if domain:
if domain.lower() not in str(path).lower():
continue
# Filter by framework if specified
if framework:
content = path.read_text(encoding="utf-8", errors="ignore").lower()
if framework.lower() not in content:
continue
skills.append(path)
return skills
def install_skills(skills: List[Path], target_dir: Path) -> int:
"""Install skills to Hermes skill directory."""
target_dir.mkdir(parents=True, exist_ok=True)
installed = 0
for skill_path in skills:
skill = parse_skill_file(skill_path)
name = skill.get("name", skill_path.stem)
# Create skill directory
skill_dir = target_dir / name
skill_dir.mkdir(exist_ok=True)
# Copy skill file
dest = skill_dir / "SKILL.md"
shutil.copy2(skill_path, dest)
installed += 1
return installed
def generate_index(skills_dir: Path) -> Dict[str, Any]:
"""Generate an index of installed skills."""
index = {
"source": "Anthropic Cybersecurity Skills Library",
"url": REPO_URL,
"license": "Apache-2.0",
"skills": [],
}
for skill_dir in skills_dir.iterdir():
if not skill_dir.is_dir():
continue
skill_file = skill_dir / "SKILL.md"
if not skill_file.exists():
continue
skill = parse_skill_file(skill_file)
index["skills"].append({
"name": skill.get("name", skill_dir.name),
"description": skill.get("description", "")[:200],
"domain": skill.get("domain", ""),
"frameworks": skill.get("frameworks", []),
})
return index
def main():
parser = argparse.ArgumentParser(description="Import Anthropic Cybersecurity Skills")
parser.add_argument("--domain", "-d", help="Filter by security domain")
parser.add_argument("--framework", "-f", help="Filter by framework (e.g., nist-csf)")
parser.add_argument("--list-domains", action="store_true", help="List available domains")
parser.add_argument("--list-frameworks", action="store_true", help="List available frameworks")
parser.add_argument("--output", "-o", help="Output directory for skills")
parser.add_argument("--dry-run", action="store_true", help="Show what would be imported")
args = parser.parse_args()
# List domains
if args.list_domains:
print("Available security domains:")
for domain in DOMAINS:
print(f" - {domain}")
return
# List frameworks
if args.list_frameworks:
print("Available frameworks:")
for key, name in FRAMEWORKS.items():
print(f" - {key}: {name}")
return
# Set output directory
output_dir = Path(args.output) if args.output else SKILLS_DIR
# Clone repository
with tempfile.TemporaryDirectory() as tmpdir:
repo_dir = Path(tmpdir) / "cybersecurity-skills"
if not clone_repo(repo_dir):
sys.exit(1)
# Find skills
print(f"Searching for skills (domain={args.domain}, framework={args.framework})...")
skills = find_skills(repo_dir, args.domain, args.framework)
print(f"Found {len(skills)} skills")
if args.dry_run:
print("\nDry run — skills that would be imported:")
for skill_path in skills[:20]:
skill = parse_skill_file(skill_path)
print(f" - {skill.get('name', skill_path.stem)}: {skill.get('description', '')[:60]}...")
if len(skills) > 20:
print(f" ... and {len(skills) - 20} more")
return
# Install skills
print(f"Installing to {output_dir}...")
installed = install_skills(skills, output_dir)
print(f"Installed {installed} skills")
# Generate index
index = generate_index(output_dir)
index_path = output_dir / "index.json"
with open(index_path, "w") as f:
json.dump(index, f, indent=2)
print(f"Index saved to {index_path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,55 @@
"""
Tests for error classification (#752).
"""
import pytest
from tools.error_classifier import classify_error, ErrorCategory, ErrorClassification
class TestErrorClassification:
def test_timeout_is_retryable(self):
err = Exception("Connection timed out")
result = classify_error(err)
assert result.category == ErrorCategory.RETRYABLE
assert result.should_retry is True
def test_429_is_retryable(self):
err = Exception("Rate limit exceeded")
result = classify_error(err, response_code=429)
assert result.category == ErrorCategory.RETRYABLE
assert result.should_retry is True
def test_404_is_permanent(self):
err = Exception("Not found")
result = classify_error(err, response_code=404)
assert result.category == ErrorCategory.PERMANENT
assert result.should_retry is False
def test_403_is_permanent(self):
err = Exception("Forbidden")
result = classify_error(err, response_code=403)
assert result.category == ErrorCategory.PERMANENT
assert result.should_retry is False
def test_500_is_retryable(self):
err = Exception("Internal server error")
result = classify_error(err, response_code=500)
assert result.category == ErrorCategory.RETRYABLE
assert result.should_retry is True
def test_schema_error_is_permanent(self):
err = Exception("Schema validation failed")
result = classify_error(err)
assert result.category == ErrorCategory.PERMANENT
assert result.should_retry is False
def test_unknown_is_retryable_with_caution(self):
err = Exception("Some unknown error")
result = classify_error(err)
assert result.category == ErrorCategory.UNKNOWN
assert result.should_retry is True
assert result.max_retries == 1
if __name__ == "__main__":
pytest.main([__file__])

233
tools/error_classifier.py Normal file
View File

@@ -0,0 +1,233 @@
"""
Tool Error Classification — Retryable vs Permanent.
Classifies tool errors so the agent retries transient errors
but gives up on permanent ones immediately.
"""
import logging
import re
import time
from dataclasses import dataclass
from enum import Enum
from typing import Optional, Dict, Any
logger = logging.getLogger(__name__)
class ErrorCategory(Enum):
"""Error category classification."""
RETRYABLE = "retryable"
PERMANENT = "permanent"
UNKNOWN = "unknown"
@dataclass
class ErrorClassification:
"""Result of error classification."""
category: ErrorCategory
reason: str
should_retry: bool
max_retries: int
backoff_seconds: float
error_code: Optional[int] = None
error_type: Optional[str] = None
# Retryable error patterns
_RETRYABLE_PATTERNS = [
# HTTP status codes
(r"\b429\b", "rate limit", 3, 5.0),
(r"\b500\b", "server error", 3, 2.0),
(r"\b502\b", "bad gateway", 3, 2.0),
(r"\b503\b", "service unavailable", 3, 5.0),
(r"\b504\b", "gateway timeout", 3, 5.0),
# Timeout patterns
(r"timeout", "timeout", 3, 2.0),
(r"timed out", "timeout", 3, 2.0),
(r"TimeoutExpired", "timeout", 3, 2.0),
# Connection errors
(r"connection refused", "connection refused", 2, 5.0),
(r"connection reset", "connection reset", 2, 2.0),
(r"network unreachable", "network unreachable", 2, 10.0),
(r"DNS", "DNS error", 2, 5.0),
# Transient errors
(r"temporary", "temporary error", 2, 2.0),
(r"transient", "transient error", 2, 2.0),
(r"retry", "retryable", 2, 2.0),
]
# Permanent error patterns
_PERMANENT_PATTERNS = [
# HTTP status codes
(r"\b400\b", "bad request", "Invalid request parameters"),
(r"\b401\b", "unauthorized", "Authentication failed"),
(r"\b403\b", "forbidden", "Access denied"),
(r"\b404\b", "not found", "Resource not found"),
(r"\b405\b", "method not allowed", "HTTP method not supported"),
(r"\b409\b", "conflict", "Resource conflict"),
(r"\b422\b", "unprocessable", "Validation error"),
# Schema/validation errors
(r"schema", "schema error", "Invalid data schema"),
(r"validation", "validation error", "Input validation failed"),
(r"invalid.*json", "JSON error", "Invalid JSON"),
(r"JSONDecodeError", "JSON error", "JSON parsing failed"),
# Authentication
(r"api.?key", "API key error", "Invalid or missing API key"),
(r"token.*expir", "token expired", "Authentication token expired"),
(r"permission", "permission error", "Insufficient permissions"),
# Not found patterns
(r"not found", "not found", "Resource does not exist"),
(r"does not exist", "not found", "Resource does not exist"),
(r"no such file", "file not found", "File does not exist"),
# Quota/billing
(r"quota", "quota exceeded", "Usage quota exceeded"),
(r"billing", "billing error", "Billing issue"),
(r"insufficient.*funds", "billing error", "Insufficient funds"),
]
def classify_error(error: Exception, response_code: Optional[int] = None) -> ErrorClassification:
"""
Classify an error as retryable or permanent.
Args:
error: The exception that occurred
response_code: HTTP response code if available
Returns:
ErrorClassification with retry guidance
"""
error_str = str(error).lower()
error_type = type(error).__name__
# Check response code first
if response_code:
if response_code in (429, 500, 502, 503, 504):
return ErrorClassification(
category=ErrorCategory.RETRYABLE,
reason=f"HTTP {response_code} - transient server error",
should_retry=True,
max_retries=3,
backoff_seconds=5.0 if response_code == 429 else 2.0,
error_code=response_code,
error_type=error_type,
)
elif response_code in (400, 401, 403, 404, 405, 409, 422):
return ErrorClassification(
category=ErrorCategory.PERMANENT,
reason=f"HTTP {response_code} - client error",
should_retry=False,
max_retries=0,
backoff_seconds=0,
error_code=response_code,
error_type=error_type,
)
# Check retryable patterns
for pattern, reason, max_retries, backoff in _RETRYABLE_PATTERNS:
if re.search(pattern, error_str, re.IGNORECASE):
return ErrorClassification(
category=ErrorCategory.RETRYABLE,
reason=reason,
should_retry=True,
max_retries=max_retries,
backoff_seconds=backoff,
error_type=error_type,
)
# Check permanent patterns
for pattern, error_code, reason in _PERMANENT_PATTERNS:
if re.search(pattern, error_str, re.IGNORECASE):
return ErrorClassification(
category=ErrorCategory.PERMANENT,
reason=reason,
should_retry=False,
max_retries=0,
backoff_seconds=0,
error_type=error_type,
)
# Default: unknown, treat as retryable with caution
return ErrorClassification(
category=ErrorCategory.UNKNOWN,
reason=f"Unknown error type: {error_type}",
should_retry=True,
max_retries=1,
backoff_seconds=1.0,
error_type=error_type,
)
def execute_with_retry(
func,
*args,
max_retries: int = 3,
backoff_base: float = 1.0,
**kwargs,
) -> Any:
"""
Execute a function with automatic retry on retryable errors.
Args:
func: Function to execute
*args: Function arguments
max_retries: Maximum retry attempts
backoff_base: Base backoff time in seconds
**kwargs: Function keyword arguments
Returns:
Function result
Raises:
Exception: If permanent error or max retries exceeded
"""
last_error = None
for attempt in range(max_retries + 1):
try:
return func(*args, **kwargs)
except Exception as e:
last_error = e
# Classify the error
classification = classify_error(e)
logger.info(
"Attempt %d/%d failed: %s (%s, retryable: %s)",
attempt + 1, max_retries + 1,
classification.reason,
classification.category.value,
classification.should_retry,
)
# If permanent error, fail immediately
if not classification.should_retry:
logger.error("Permanent error: %s", classification.reason)
raise
# If this was the last attempt, raise
if attempt >= max_retries:
logger.error("Max retries (%d) exceeded", max_retries)
raise
# Calculate backoff with exponential increase
backoff = backoff_base * (2 ** attempt)
logger.info("Retrying in %.1fs...", backoff)
time.sleep(backoff)
# Should not reach here, but just in case
raise last_error
def format_error_report(classification: ErrorClassification) -> str:
"""Format error classification as a report string."""
icon = "🔄" if classification.should_retry else ""
return f"{icon} {classification.category.value}: {classification.reason}"