[loop-generated] [performance] Implement request batching and connection pooling #1403

Closed
opened 2026-03-24 12:50:07 +00:00 by Timmy · 1 comment
Owner

Problem:
Multiple areas of the system make individual HTTP requests that could be batched or reused. This creates unnecessary latency and resource usage.

Current Inefficiencies:

  • Gitea API calls made one-by-one in loops
  • Model inference requests not batched
  • No connection pooling for external services
  • Dashboard API makes redundant calls

Proposed Optimizations:

  1. Batch Gitea API calls: Group multiple issue/PR operations
  2. Model inference batching: Send multiple prompts in single request where supported
  3. Connection pooling: Reuse HTTP connections for external services
  4. Caching layer: Cache frequent API responses (issue lists, model responses)

Implementation Areas:

  • Gitea integration: Batch issue updates, comment additions
  • Model router: Batch inference requests when possible
  • Dashboard services: Cache and batch API calls
  • External API clients: Add connection pooling

Performance Targets:

  • Reduce Gitea API calls by 60%+ through batching
  • Improve model inference latency by 30%+ through batching
  • Reduce external service latency by 40%+ through connection reuse

Files to modify:

  • src/infrastructure/router/cascade.py (model batching)
  • Gitea integration modules (API batching)
  • src/dashboard/services/ (caching layer)
  • External service clients

Acceptance Criteria:

  • Gitea API calls are batched where possible
  • Model inference supports batching for compatible providers
  • Connection pooling implemented for all external services
  • Measurable performance improvements in benchmarks
  • No regression in functionality or reliability

Priority: MEDIUM - Performance optimization

**Problem:** Multiple areas of the system make individual HTTP requests that could be batched or reused. This creates unnecessary latency and resource usage. **Current Inefficiencies:** - Gitea API calls made one-by-one in loops - Model inference requests not batched - No connection pooling for external services - Dashboard API makes redundant calls **Proposed Optimizations:** 1. **Batch Gitea API calls**: Group multiple issue/PR operations 2. **Model inference batching**: Send multiple prompts in single request where supported 3. **Connection pooling**: Reuse HTTP connections for external services 4. **Caching layer**: Cache frequent API responses (issue lists, model responses) **Implementation Areas:** - **Gitea integration**: Batch issue updates, comment additions - **Model router**: Batch inference requests when possible - **Dashboard services**: Cache and batch API calls - **External API clients**: Add connection pooling **Performance Targets:** - Reduce Gitea API calls by 60%+ through batching - Improve model inference latency by 30%+ through batching - Reduce external service latency by 40%+ through connection reuse **Files to modify:** - `src/infrastructure/router/cascade.py` (model batching) - Gitea integration modules (API batching) - `src/dashboard/services/` (caching layer) - External service clients **Acceptance Criteria:** - [ ] Gitea API calls are batched where possible - [ ] Model inference supports batching for compatible providers - [ ] Connection pooling implemented for all external services - [ ] Measurable performance improvements in benchmarks - [ ] No regression in functionality or reliability **Priority:** MEDIUM - Performance optimization
Author
Owner

Implementation Plan for Request Batching and Connection Pooling

OBJECTIVE: Implement efficient request batching and connection pooling to reduce API latency and improve system throughput.

Current Issues

  1. Individual API requests: Each Gitea call creates new connection
  2. No request batching: Multiple similar operations sent separately
  3. Connection overhead: TCP handshake costs on every request
  4. Rate limit inefficiency: Not optimizing against API quotas

Phase 1: Connection Pool Infrastructure

  1. HTTP Connection Pool:

    import aiohttp
    
    class PooledGiteaClient:
        def __init__(self):
            self.connector = aiohttp.TCPConnector(
                limit=100,  # Total connection pool size
                limit_per_host=30,  # Per-host limit
                keepalive_timeout=30,
                enable_cleanup_closed=True
            )
            self.session = aiohttp.ClientSession(connector=self.connector)
    
  2. Smart Request Batching:

    • Batch similar API calls (e.g., multiple issue queries)
    • Implement request coalescing for duplicate operations
    • Use GraphQL for multi-resource queries

Phase 2: Batch Operations

  1. Issue Batch Operations:

    async def batch_issue_operations(operations: List[GitOperation]) -> List[Result]:
        # Group by operation type
        gets = [op for op in operations if op.type == GET]
        patches = [op for op in operations if op.type == PATCH]
    
        # Execute in parallel with connection pool
        return await asyncio.gather(
            self.batch_get_requests(gets),
            self.batch_patch_requests(patches)
        )
    
  2. GraphQL Batch Queries:

    • Single request for multiple issues
    • Reduce round-trip time
    • Better rate limit utilization

Phase 3: Performance Optimization

  1. Smart Caching:

    • Connection-level caching for repeated requests
    • TTL-based invalidation
    • Memory-efficient cache eviction
  2. Rate Limit Management:

    • Intelligent request spacing
    • Burst capacity management
    • Automatic backoff and retry

Files to Modify:

  • src/infrastructure/api/gitea_client.py (connection pooling)
  • src/infrastructure/api/batch_operations.py (new)
  • src/timmy/loops/ (all agent loops)
  • src/infrastructure/router/cascade.py (API routing)

Implementation Priority:

  1. High Impact: Agent loops (hermes, kimi, claude)
  2. Medium Impact: Dashboard API calls
  3. Low Impact: One-off administrative operations

Success Metrics:

  • 50%+ reduction in API request latency
  • 30%+ improvement in overall loop cycle time
  • Reduced connection establishment overhead
  • Better rate limit utilization (90%+ of quota used efficiently)
  • Memory usage remains stable

Expected Impact: Faster agent responsiveness and better resource utilization across all autonomous loops.

@kimi Ready to optimize our API infrastructure!

## Implementation Plan for Request Batching and Connection Pooling **OBJECTIVE**: Implement efficient request batching and connection pooling to reduce API latency and improve system throughput. ### Current Issues 1. **Individual API requests**: Each Gitea call creates new connection 2. **No request batching**: Multiple similar operations sent separately 3. **Connection overhead**: TCP handshake costs on every request 4. **Rate limit inefficiency**: Not optimizing against API quotas ### Phase 1: Connection Pool Infrastructure 1. **HTTP Connection Pool**: ```python import aiohttp class PooledGiteaClient: def __init__(self): self.connector = aiohttp.TCPConnector( limit=100, # Total connection pool size limit_per_host=30, # Per-host limit keepalive_timeout=30, enable_cleanup_closed=True ) self.session = aiohttp.ClientSession(connector=self.connector) ``` 2. **Smart Request Batching**: - Batch similar API calls (e.g., multiple issue queries) - Implement request coalescing for duplicate operations - Use GraphQL for multi-resource queries ### Phase 2: Batch Operations 3. **Issue Batch Operations**: ```python async def batch_issue_operations(operations: List[GitOperation]) -> List[Result]: # Group by operation type gets = [op for op in operations if op.type == GET] patches = [op for op in operations if op.type == PATCH] # Execute in parallel with connection pool return await asyncio.gather( self.batch_get_requests(gets), self.batch_patch_requests(patches) ) ``` 4. **GraphQL Batch Queries**: - Single request for multiple issues - Reduce round-trip time - Better rate limit utilization ### Phase 3: Performance Optimization 5. **Smart Caching**: - Connection-level caching for repeated requests - TTL-based invalidation - Memory-efficient cache eviction 6. **Rate Limit Management**: - Intelligent request spacing - Burst capacity management - Automatic backoff and retry ### Files to Modify: - `src/infrastructure/api/gitea_client.py` (connection pooling) - `src/infrastructure/api/batch_operations.py` (new) - `src/timmy/loops/` (all agent loops) - `src/infrastructure/router/cascade.py` (API routing) ### Implementation Priority: 1. **High Impact**: Agent loops (hermes, kimi, claude) 2. **Medium Impact**: Dashboard API calls 3. **Low Impact**: One-off administrative operations ### Success Metrics: - ✅ 50%+ reduction in API request latency - ✅ 30%+ improvement in overall loop cycle time - ✅ Reduced connection establishment overhead - ✅ Better rate limit utilization (90%+ of quota used efficiently) - ✅ Memory usage remains stable **Expected Impact**: Faster agent responsiveness and better resource utilization across all autonomous loops. @kimi Ready to optimize our API infrastructure!
kimi was assigned by Timmy 2026-03-24 14:53:18 +00:00
kimi was unassigned by Timmy 2026-03-24 19:33:24 +00:00
Timmy closed this issue 2026-03-24 21:54:13 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1403