Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 28s
Smoke Test / smoke (pull_request) Failing after 26s
Validate Config / YAML Lint (pull_request) Failing after 18s
Validate Config / JSON Validate (pull_request) Successful in 22s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m3s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m6s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 30s
Architecture Lint / Lint Repository (pull_request) Failing after 24s
PR Checklist / pr-checklist (pull_request) Successful in 6m22s
Load checkpoints from the checkpoint table when resuming paused jobs and atomically claim pending work so parallel runs do not process the same job twice.
Pipeline Infrastructure
Shared orchestrator for all batch pipelines.
Components
orchestrator.py
Shared orchestrator providing:
- Job Queue: SQLite-backed with priority support
- Worker Pool: Configurable parallelism (default 10)
- Token Budget: Per-job tracking and limits
- Checkpointing: Resume from any point after restart
- Rate Limiting: Provider-aware request throttling
- Retry Logic: Exponential backoff with configurable retries
- Reporting: Generate summary reports
Usage
Python API
from pipelines.orchestrator import PipelineOrchestrator, JobPriority
# Create orchestrator
orchestrator = PipelineOrchestrator(max_workers=10)
# Register pipeline handler
def my_handler(job):
# Process job.task
return {"result": "done"}
orchestrator.register_handler("my_pipeline", my_handler)
# Submit jobs
job_id = orchestrator.submit_job(
pipeline="my_pipeline",
task={"action": "process", "data": "..."},
priority=JobPriority.HIGH,
token_budget=100000
)
# Run orchestrator
orchestrator.run()
CLI
# Submit a job
python -m pipelines.orchestrator submit my_pipeline --task '{"action": "process"}'
# Run orchestrator
python -m pipelines.orchestrator run --workers 10 --max-jobs 100
# Check job status
python -m pipelines.orchestrator status <job_id>
# Resume paused job
python -m pipelines.orchestrator resume <job_id>
# Show stats
python -m pipelines.orchestrator stats
# Generate report
python -m pipelines.orchestrator report
Database
Jobs are stored in ~/.hermes/pipelines/orchestrator.db:
jobs- Job queue and statecheckpoints- Resume pointsreports- Generated reports
Configuration
Rate Limits
orchestrator.configure_rate_limit("Nous", rpm=60, tpm=1000000)
orchestrator.configure_rate_limit("Anthropic", rpm=50, tpm=800000)
Token Budgets
Default: 1M tokens per job. Override per-job:
orchestrator.submit_job("pipeline", task, token_budget=500000)
Pipelines
All pipelines share this orchestrator:
- batch-runner - Run prompts across datasets
- data-gen - Generate training data
- eval-runner - Run evaluations
- trajectory-compress - Compress trajectories
- web-research - Research tasks