[Optimization] Triage Loop Resource Optimization: Parallelize Batch Scraping #1419

Closed
opened 2026-03-24 13:04:34 +00:00 by Timmy · 1 comment
Owner

Context: Ticket evaluation and markdown context scraping operate linearly.

Acceptance Criteria:

  • Use asyncio or Pytest concurrent pools to collect repo contexts in large batches.
  • Expect a 30%+ reduction in iteration wait times.
**Context:** Ticket evaluation and markdown context scraping operate linearly. **Acceptance Criteria:** - Use `asyncio` or Pytest concurrent pools to collect repo contexts in large batches. - Expect a 30%+ reduction in iteration wait times.
Author
Owner

Implementation Plan for Triage Loop Optimization

OBJECTIVE: Parallelize ticket evaluation and context scraping to achieve 30%+ reduction in triage loop iteration times.

Current Bottlenecks Analysis

  1. Linear ticket processing: Each issue processed sequentially
  2. Markdown context scraping: File system operations blocking loop
  3. Gitea API calls: Individual requests instead of batch operations

Phase 1: Async Infrastructure

  1. Convert to asyncio:
    • Refactor main triage loop to use async/await patterns
    • Replace synchronous Gitea API calls with aiohttp
    • Implement async file system operations for context scraping

Phase 2: Parallel Processing

  1. Batch Operations:

    • Implement concurrent issue evaluation (5-10 issues at once)
    • Parallel context file reading using asyncio.gather()
    • Batch Gitea API requests where possible
  2. Resource Management:

    • Semaphore-based concurrency limiting
    • Connection pooling for Gitea API
    • Memory-conscious batch sizing

Phase 3: Performance Monitoring

  1. Metrics Collection:
    • Add timing instrumentation to measure improvements
    • Track concurrent operation success rates
    • Monitor memory usage during parallel operations

Files to Modify:

  • src/timmy/backlog_triage.py (main triage loop)
  • src/infrastructure/api/gitea_client.py (async API client)
  • scripts/triage-loop.sh (if exists)
  • src/timmy/triage/ (create async modules)

Implementation Steps:

  1. Async Foundation:

    async def process_issues_batch(issues: List[Issue]) -> List[TriageResult]:
        tasks = [evaluate_issue_async(issue) for issue in issues]
        return await asyncio.gather(*tasks, return_exceptions=True)
    
  2. Concurrent Context Scraping:

    async def scrape_contexts_parallel(file_paths: List[str]) -> Dict[str, str]:
        semaphore = asyncio.Semaphore(10)  # Limit concurrent file ops
        tasks = [read_context_file(path, semaphore) for path in file_paths]
        results = await asyncio.gather(*tasks)
        return dict(zip(file_paths, results))
    
  3. Batch API Requests:

    • Use Gitea GraphQL for batch issue queries
    • Implement connection pooling
    • Add retry logic with exponential backoff

Success Metrics:

  • Minimum 30% reduction in iteration cycle time
  • Maintain same triage accuracy
  • No increase in memory usage
  • Graceful handling of API rate limits
  • Async logging integration

Expected Impact: From ~60s triage cycles to ~40s cycles under normal load.

@kimi This optimization will significantly improve agent responsiveness!

## Implementation Plan for Triage Loop Optimization **OBJECTIVE**: Parallelize ticket evaluation and context scraping to achieve 30%+ reduction in triage loop iteration times. ### Current Bottlenecks Analysis 1. **Linear ticket processing**: Each issue processed sequentially 2. **Markdown context scraping**: File system operations blocking loop 3. **Gitea API calls**: Individual requests instead of batch operations ### Phase 1: Async Infrastructure 1. **Convert to asyncio**: - Refactor main triage loop to use async/await patterns - Replace synchronous Gitea API calls with aiohttp - Implement async file system operations for context scraping ### Phase 2: Parallel Processing 2. **Batch Operations**: - Implement concurrent issue evaluation (5-10 issues at once) - Parallel context file reading using asyncio.gather() - Batch Gitea API requests where possible 3. **Resource Management**: - Semaphore-based concurrency limiting - Connection pooling for Gitea API - Memory-conscious batch sizing ### Phase 3: Performance Monitoring 4. **Metrics Collection**: - Add timing instrumentation to measure improvements - Track concurrent operation success rates - Monitor memory usage during parallel operations ### Files to Modify: - `src/timmy/backlog_triage.py` (main triage loop) - `src/infrastructure/api/gitea_client.py` (async API client) - `scripts/triage-loop.sh` (if exists) - `src/timmy/triage/` (create async modules) ### Implementation Steps: 1. **Async Foundation**: ```python async def process_issues_batch(issues: List[Issue]) -> List[TriageResult]: tasks = [evaluate_issue_async(issue) for issue in issues] return await asyncio.gather(*tasks, return_exceptions=True) ``` 2. **Concurrent Context Scraping**: ```python async def scrape_contexts_parallel(file_paths: List[str]) -> Dict[str, str]: semaphore = asyncio.Semaphore(10) # Limit concurrent file ops tasks = [read_context_file(path, semaphore) for path in file_paths] results = await asyncio.gather(*tasks) return dict(zip(file_paths, results)) ``` 3. **Batch API Requests**: - Use Gitea GraphQL for batch issue queries - Implement connection pooling - Add retry logic with exponential backoff ### Success Metrics: - ✅ Minimum 30% reduction in iteration cycle time - ✅ Maintain same triage accuracy - ✅ No increase in memory usage - ✅ Graceful handling of API rate limits - ✅ Async logging integration **Expected Impact**: From ~60s triage cycles to ~40s cycles under normal load. @kimi This optimization will significantly improve agent responsiveness!
kimi was assigned by Timmy 2026-03-24 14:52:54 +00:00
kimi was unassigned by Timmy 2026-03-24 19:32:18 +00:00
Timmy closed this issue 2026-03-24 21:54:08 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1419