Files
timmy-config/pipelines/README.md
2026-04-16 05:11:26 +00:00

2.2 KiB

Pipeline Infrastructure

Shared orchestrator for all batch pipelines.

Components

orchestrator.py

Shared orchestrator providing:

  • Job Queue: SQLite-backed with priority support
  • Worker Pool: Configurable parallelism (default 10)
  • Token Budget: Per-job tracking and limits
  • Checkpointing: Resume from any point after restart
  • Rate Limiting: Provider-aware request throttling
  • Retry Logic: Exponential backoff with configurable retries
  • Reporting: Generate summary reports

Usage

Python API

from pipelines.orchestrator import PipelineOrchestrator, JobPriority

# Create orchestrator
orchestrator = PipelineOrchestrator(max_workers=10)

# Register pipeline handler
def my_handler(job):
    # Process job.task
    return {"result": "done"}

orchestrator.register_handler("my_pipeline", my_handler)

# Submit jobs
job_id = orchestrator.submit_job(
    pipeline="my_pipeline",
    task={"action": "process", "data": "..."},
    priority=JobPriority.HIGH,
    token_budget=100000
)

# Run orchestrator
orchestrator.run()

CLI

# Submit a job
python -m pipelines.orchestrator submit my_pipeline --task '{"action": "process"}'

# Run orchestrator
python -m pipelines.orchestrator run --workers 10 --max-jobs 100

# Check job status
python -m pipelines.orchestrator status <job_id>

# Resume paused job
python -m pipelines.orchestrator resume <job_id>

# Show stats
python -m pipelines.orchestrator stats

# Generate report
python -m pipelines.orchestrator report

Database

Jobs are stored in ~/.hermes/pipelines/orchestrator.db:

  • jobs - Job queue and state
  • checkpoints - Resume points
  • reports - Generated reports

Configuration

Rate Limits

orchestrator.configure_rate_limit("Nous", rpm=60, tpm=1000000)
orchestrator.configure_rate_limit("Anthropic", rpm=50, tpm=800000)

Token Budgets

Default: 1M tokens per job. Override per-job:

orchestrator.submit_job("pipeline", task, token_budget=500000)

Pipelines

All pipelines share this orchestrator:

  1. batch-runner - Run prompts across datasets
  2. data-gen - Generate training data
  3. eval-runner - Run evaluations
  4. trajectory-compress - Compress trajectories
  5. web-research - Research tasks