[Feature] Dynamic Agent Auto-scaling Based on Open Issues Queue Length #1412

Closed
opened 2026-03-24 13:04:26 +00:00 by Timmy · 1 comment
Owner

Context: ops-helpers.sh manages static worker counts (e.g., ops-wake-claude).

Acceptance Criteria:

  • Build a daemon that polls Gitea's backlog size.
  • If > 10 [claude] issues are pending, instantiate an additional parallel loop worker.
  • Must scale down when queue empties, up to a defined ceiling parameter.
**Context:** `ops-helpers.sh` manages static worker counts (e.g., `ops-wake-claude`). **Acceptance Criteria:** - Build a daemon that polls Gitea's backlog size. - If > 10 `[claude]` issues are pending, instantiate an additional parallel loop worker. - Must scale down when queue empties, up to a defined ceiling parameter.
Author
Owner

KIMI IMPLEMENTATION INSTRUCTIONS - Dynamic Agent Auto-scaling

Implementation Plan

Phase 1: Queue Monitoring Daemon (Priority: HIGH)

Files to create:

  • scripts/agent-autoscaler.py - Main autoscaling daemon
  • scripts/ops-helpers.sh - Update worker management functions
  • config/autoscaler.yaml - Configuration for scaling parameters

Phase 2: Core Scaling Logic

Daemon Requirements:

  1. Queue Polling: Check Gitea backlog size every 60 seconds
  2. Dynamic Scaling: Launch additional workers when queue > 10 issues
  3. Scale Down: Terminate workers when queue empties
  4. Resource Limits: Enforce maximum worker ceiling

Phase 3: Implementation Outline

Autoscaler Daemon:

import requests
import subprocess
import time
from typing import Dict, List

class AgentAutoscaler:
    def __init__(self, config_path: str):
        self.gitea_token = self.load_token()
        self.config = self.load_config(config_path)
        self.active_workers: Dict[str, int] = {}
        
    def get_queue_size(self, agent_type: str) -> int:
        # Query Gitea API for issues labeled with agent type
        response = requests.get(
            f"{GITEA_BASE}/repos/rockachopa/Timmy-time-dashboard/issues",
            headers={"Authorization": f"token {self.gitea_token}"},
            params={"state": "open", "labels": f"[{agent_type}]"}
        )
        return len(response.json()) if response.status_code == 200 else 0
    
    def scale_up(self, agent_type: str) -> bool:
        # Launch additional worker
        worker_id = self.get_next_worker_id(agent_type)
        cmd = f"ops-wake-{agent_type} {worker_id}"
        result = subprocess.run(cmd, shell=True, capture_output=True)
        return result.returncode == 0
    
    def scale_down(self, agent_type: str) -> bool:
        # Terminate oldest worker
        if agent_type in self.active_workers:
            worker_id = self.active_workers[agent_type]
            cmd = f"ops-kill {agent_type}-{worker_id}"
            subprocess.run(cmd, shell=True)
            return True
        return False

Phase 4: Configuration Schema

autoscaler.yaml:

agents:
  claude:
    min_workers: 1
    max_workers: 3
    scale_up_threshold: 10
    scale_down_threshold: 2
    cooldown_period: 300  # seconds
  kimi:
    min_workers: 1
    max_workers: 2
    scale_up_threshold: 15
    scale_down_threshold: 3
    cooldown_period: 180

polling:
  interval: 60  # seconds
  gitea_api_base: "http://143.198.27.163:3000/api/v1"

Phase 5: ops-helpers.sh Integration

Add scaling functions:

ops-scale-up() {
    local agent_type=$1
    local worker_id=$2
    
    case $agent_type in
        claude)
            ops-wake-claude $worker_id &
            ;;
        kimi)
            ops-wake-kimi $worker_id &
            ;;
    esac
}

ops-scale-down() {
    local agent_type=$1
    local worker_id=$2
    
    # Graceful shutdown with SIGTERM
    ops-kill ${agent_type}-${worker_id}
}

Phase 6: Testing Requirements

Test Scenarios:

  • Create 15+ open issues, verify scale-up triggers
  • Close issues, verify scale-down after cooldown
  • Test maximum worker ceiling enforcement
  • Validate graceful worker shutdown

This implements intelligent resource management, automatically adjusting agent capacity based on workload demand.

**KIMI IMPLEMENTATION INSTRUCTIONS - Dynamic Agent Auto-scaling** ## Implementation Plan ### Phase 1: Queue Monitoring Daemon (Priority: HIGH) **Files to create:** - `scripts/agent-autoscaler.py` - Main autoscaling daemon - `scripts/ops-helpers.sh` - Update worker management functions - `config/autoscaler.yaml` - Configuration for scaling parameters ### Phase 2: Core Scaling Logic **Daemon Requirements:** 1. **Queue Polling:** Check Gitea backlog size every 60 seconds 2. **Dynamic Scaling:** Launch additional workers when queue > 10 issues 3. **Scale Down:** Terminate workers when queue empties 4. **Resource Limits:** Enforce maximum worker ceiling ### Phase 3: Implementation Outline **Autoscaler Daemon:** ```python import requests import subprocess import time from typing import Dict, List class AgentAutoscaler: def __init__(self, config_path: str): self.gitea_token = self.load_token() self.config = self.load_config(config_path) self.active_workers: Dict[str, int] = {} def get_queue_size(self, agent_type: str) -> int: # Query Gitea API for issues labeled with agent type response = requests.get( f"{GITEA_BASE}/repos/rockachopa/Timmy-time-dashboard/issues", headers={"Authorization": f"token {self.gitea_token}"}, params={"state": "open", "labels": f"[{agent_type}]"} ) return len(response.json()) if response.status_code == 200 else 0 def scale_up(self, agent_type: str) -> bool: # Launch additional worker worker_id = self.get_next_worker_id(agent_type) cmd = f"ops-wake-{agent_type} {worker_id}" result = subprocess.run(cmd, shell=True, capture_output=True) return result.returncode == 0 def scale_down(self, agent_type: str) -> bool: # Terminate oldest worker if agent_type in self.active_workers: worker_id = self.active_workers[agent_type] cmd = f"ops-kill {agent_type}-{worker_id}" subprocess.run(cmd, shell=True) return True return False ``` ### Phase 4: Configuration Schema **autoscaler.yaml:** ```yaml agents: claude: min_workers: 1 max_workers: 3 scale_up_threshold: 10 scale_down_threshold: 2 cooldown_period: 300 # seconds kimi: min_workers: 1 max_workers: 2 scale_up_threshold: 15 scale_down_threshold: 3 cooldown_period: 180 polling: interval: 60 # seconds gitea_api_base: "http://143.198.27.163:3000/api/v1" ``` ### Phase 5: ops-helpers.sh Integration **Add scaling functions:** ```bash ops-scale-up() { local agent_type=$1 local worker_id=$2 case $agent_type in claude) ops-wake-claude $worker_id & ;; kimi) ops-wake-kimi $worker_id & ;; esac } ops-scale-down() { local agent_type=$1 local worker_id=$2 # Graceful shutdown with SIGTERM ops-kill ${agent_type}-${worker_id} } ``` ### Phase 6: Testing Requirements **Test Scenarios:** - Create 15+ open issues, verify scale-up triggers - Close issues, verify scale-down after cooldown - Test maximum worker ceiling enforcement - Validate graceful worker shutdown This implements intelligent resource management, automatically adjusting agent capacity based on workload demand.
kimi was assigned by Timmy 2026-03-24 15:13:30 +00:00
kimi was unassigned by Timmy 2026-03-24 19:32:21 +00:00
Timmy closed this issue 2026-03-24 21:54:11 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1412