turboquant/docs/upstream-watch.md

# TurboQuant Upstream Watch

**Issue:** #15 - [P4] Upstream llama.cpp / Ollama TurboQuant watch
**Purpose:** Monitor upstream llama.cpp and Ollama for TurboQuant/PolarQuant/QJL support

## Overview

This system monitors upstream repositories for when TurboQuant (or similar KV cache compression techniques) land in official releases. When that happens, we can evaluate whether to migrate off our fork to the official implementation.

## Components

### 1. `scripts/upstream_watch.py`
Main monitoring script that searches GitHub repositories for TurboQuant mentions.

**Usage:**
```bash
# Scan last 30 days (default)
python scripts/upstream_watch.py

# Scan last 60 days
python scripts/upstream_watch.py --days 60

# JSON output
python scripts/upstream_watch.py --format json

# Save to file
python scripts/upstream_watch.py --output report.md

# With GitHub token (for higher rate limits)
python scripts/upstream_watch.py --github-token $GITHUB_TOKEN
```

**Features:**
- Searches llama.cpp, Ollama, and ggml repositories
- Checks issues, PRs, and release notes
- Looks for TurboQuant/PolarQuant/QJL keywords
- Generates text or JSON reports
- Compares fork status with upstream

### 2. `.github/workflows/upstream-watch.yml`
GitHub Action that runs weekly to monitor upstream.

**Schedule:** Every Monday at 9:00 AM UTC
**Manual Trigger:** Can be run manually with custom days parameter

**What it does:**
1. Runs the monitoring script
2. Generates JSON and text reports
3. Uploads reports as artifacts
4. Creates an issue if findings are detected
5. Commits reports to repository (optional)

### 3. Documentation
This file and related documentation.

## Keywords Monitored

The system searches for these keywords in upstream repositories:

- `turborot` (common misspelling/search term)
- `turborotquant`
- `polarquant`
- `qjl`
- `kv cache compression`
- `kv cache quantization`
- `quantized kv`
- `kv quant`
- `cache compression`

## Repositories Monitored

1. **llama.cpp** (`ggerganov/llama.cpp`)
   - Main C++ implementation of LLaMA
   - Where TurboQuant would likely land first

2. **Ollama** (`ollama/ollama`)
   - Go wrapper around llama.cpp
   - Release notes may mention TurboQuant support

3. **ggml** (`ggml-org/ggml`)
   - Tensor library used by llama.cpp
   - Low-level KV cache compression implementations

## Current Status

**Fork:** TheTom/llama-cpp-turboquant
**Status:** Active, maintained
**Upstream Status:** No TurboQuant support found in upstream yet

## When Upstream Lands

When TurboQuant is detected in upstream, follow this evaluation process:

### 1. **Detection**
- The monitoring system will detect mentions in issues, PRs, or releases
- An issue will be created automatically

### 2. **Evaluation**
Compare upstream implementation with our fork:

**Performance:**
- Benchmark compression ratio
- Measure inference speed
- Test memory usage

**Features:**
- What quantization methods are supported?
- What hardware backends are available?
- What model architectures are supported?

**Compatibility:**
- Does it work with our models?
- Does it integrate with our toolchain?
- Are there breaking changes?

### 3. **Decision**
Based on evaluation:

**If upstream is better:**
- Plan migration from fork to upstream
- Update dependencies
- Test thoroughly
- Document migration process

**If our fork is better:**
- Continue using fork
- Consider contributing improvements upstream
- Document why we're keeping the fork

**If they're equivalent:**
- Consider migrating for maintenance benefits
- Less work to track upstream

## Rate Limits

GitHub API has rate limits:
- **Unauthenticated:** 60 requests/hour
- **Authenticated:** 5,000 requests/hour

The script uses multiple API calls per repository, so use a GitHub token for better limits.

## Troubleshooting

### No findings detected
- Check if keywords are correct
- Verify repositories are being scanned
- Check GitHub API rate limits
- Try increasing `--days` parameter

### GitHub Action failing
- Check if `GITHUB_TOKEN` secret is set
- Verify workflow permissions
- Check for syntax errors in workflow file

### Script errors
- Ensure Python 3.7+ is installed
- Check internet connectivity
- Verify GitHub API is accessible

## Future Enhancements

1. **Email/Slack notifications** when findings are detected
2. **More repositories** to monitor (e.g., huggingface/transformers)
3. **Automated benchmarking** when upstream lands
4. **Dashboard** for tracking upstream status over time

## Related Issues

- **Issue #1:** Main TurboQuant implementation
- **Issue #15:** This monitoring system
- **Parent Issue:** #1 (mentioned in #15)

## Acceptance Criteria

From issue #15:
- [x] Monitoring cadence established (weekly via GitHub Action)
- [x] Upstream landing detection and reporting when it happens

## Files

```
scripts/upstream_watch.py           # Main monitoring script
.github/workflows/upstream-watch.yml # GitHub Action workflow
docs/upstream-watch.md              # This documentation
```

## License

Part of the Timmy Foundation TurboQuant project.