All checks were successful
Smoke Test / smoke (pull_request) Successful in 28s
- Add scripts/upstream_watch.py for monitoring upstream repositories - Add .github/workflows/upstream-watch.yml for weekly automated monitoring - Add docs/upstream-watch.md for documentation - Add scripts/run_upstream_watch.sh for easy execution - Add scripts/test_upstream_watch.py for testing Addresses issue #15: [P4] Upstream llama.cpp / Ollama TurboQuant watch Features: 1. Monitor llama.cpp, Ollama, and ggml repositories 2. Search for TurboQuant/PolarQuant/QJL keywords 3. Check issues, PRs, and release notes 4. Generate text and JSON reports 5. Weekly GitHub Action for continuous monitoring 6. Automated issue creation when findings detected Usage: - Run monitor: python3 scripts/upstream_watch.py --days 30 - JSON output: python3 scripts/upstream_watch.py --format json - Weekly monitoring: GitHub Action runs every Monday at 9:00 AM UTC When upstream lands: 1. Detection: Monitor will detect mentions 2. Evaluation: Compare upstream vs fork 3. Decision: Migrate if upstream is better Closes #15
189 lines
4.9 KiB
Markdown
189 lines
4.9 KiB
Markdown
# TurboQuant Upstream Watch
|
|
|
|
**Issue:** #15 - [P4] Upstream llama.cpp / Ollama TurboQuant watch
|
|
**Purpose:** Monitor upstream llama.cpp and Ollama for TurboQuant/PolarQuant/QJL support
|
|
|
|
## Overview
|
|
|
|
This system monitors upstream repositories for when TurboQuant (or similar KV cache compression techniques) land in official releases. When that happens, we can evaluate whether to migrate off our fork to the official implementation.
|
|
|
|
## Components
|
|
|
|
### 1. `scripts/upstream_watch.py`
|
|
Main monitoring script that searches GitHub repositories for TurboQuant mentions.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Scan last 30 days (default)
|
|
python scripts/upstream_watch.py
|
|
|
|
# Scan last 60 days
|
|
python scripts/upstream_watch.py --days 60
|
|
|
|
# JSON output
|
|
python scripts/upstream_watch.py --format json
|
|
|
|
# Save to file
|
|
python scripts/upstream_watch.py --output report.md
|
|
|
|
# With GitHub token (for higher rate limits)
|
|
python scripts/upstream_watch.py --github-token $GITHUB_TOKEN
|
|
```
|
|
|
|
**Features:**
|
|
- Searches llama.cpp, Ollama, and ggml repositories
|
|
- Checks issues, PRs, and release notes
|
|
- Looks for TurboQuant/PolarQuant/QJL keywords
|
|
- Generates text or JSON reports
|
|
- Compares fork status with upstream
|
|
|
|
### 2. `.github/workflows/upstream-watch.yml`
|
|
GitHub Action that runs weekly to monitor upstream.
|
|
|
|
**Schedule:** Every Monday at 9:00 AM UTC
|
|
**Manual Trigger:** Can be run manually with custom days parameter
|
|
|
|
**What it does:**
|
|
1. Runs the monitoring script
|
|
2. Generates JSON and text reports
|
|
3. Uploads reports as artifacts
|
|
4. Creates an issue if findings are detected
|
|
5. Commits reports to repository (optional)
|
|
|
|
### 3. Documentation
|
|
This file and related documentation.
|
|
|
|
## Keywords Monitored
|
|
|
|
The system searches for these keywords in upstream repositories:
|
|
|
|
- `turborot` (common misspelling/search term)
|
|
- `turborotquant`
|
|
- `polarquant`
|
|
- `qjl`
|
|
- `kv cache compression`
|
|
- `kv cache quantization`
|
|
- `quantized kv`
|
|
- `kv quant`
|
|
- `cache compression`
|
|
|
|
## Repositories Monitored
|
|
|
|
1. **llama.cpp** (`ggerganov/llama.cpp`)
|
|
- Main C++ implementation of LLaMA
|
|
- Where TurboQuant would likely land first
|
|
|
|
2. **Ollama** (`ollama/ollama`)
|
|
- Go wrapper around llama.cpp
|
|
- Release notes may mention TurboQuant support
|
|
|
|
3. **ggml** (`ggml-org/ggml`)
|
|
- Tensor library used by llama.cpp
|
|
- Low-level KV cache compression implementations
|
|
|
|
## Current Status
|
|
|
|
**Fork:** TheTom/llama-cpp-turboquant
|
|
**Status:** Active, maintained
|
|
**Upstream Status:** No TurboQuant support found in upstream yet
|
|
|
|
## When Upstream Lands
|
|
|
|
When TurboQuant is detected in upstream, follow this evaluation process:
|
|
|
|
### 1. **Detection**
|
|
- The monitoring system will detect mentions in issues, PRs, or releases
|
|
- An issue will be created automatically
|
|
|
|
### 2. **Evaluation**
|
|
Compare upstream implementation with our fork:
|
|
|
|
**Performance:**
|
|
- Benchmark compression ratio
|
|
- Measure inference speed
|
|
- Test memory usage
|
|
|
|
**Features:**
|
|
- What quantization methods are supported?
|
|
- What hardware backends are available?
|
|
- What model architectures are supported?
|
|
|
|
**Compatibility:**
|
|
- Does it work with our models?
|
|
- Does it integrate with our toolchain?
|
|
- Are there breaking changes?
|
|
|
|
### 3. **Decision**
|
|
Based on evaluation:
|
|
|
|
**If upstream is better:**
|
|
- Plan migration from fork to upstream
|
|
- Update dependencies
|
|
- Test thoroughly
|
|
- Document migration process
|
|
|
|
**If our fork is better:**
|
|
- Continue using fork
|
|
- Consider contributing improvements upstream
|
|
- Document why we're keeping the fork
|
|
|
|
**If they're equivalent:**
|
|
- Consider migrating for maintenance benefits
|
|
- Less work to track upstream
|
|
|
|
## Rate Limits
|
|
|
|
GitHub API has rate limits:
|
|
- **Unauthenticated:** 60 requests/hour
|
|
- **Authenticated:** 5,000 requests/hour
|
|
|
|
The script uses multiple API calls per repository, so use a GitHub token for better limits.
|
|
|
|
## Troubleshooting
|
|
|
|
### No findings detected
|
|
- Check if keywords are correct
|
|
- Verify repositories are being scanned
|
|
- Check GitHub API rate limits
|
|
- Try increasing `--days` parameter
|
|
|
|
### GitHub Action failing
|
|
- Check if `GITHUB_TOKEN` secret is set
|
|
- Verify workflow permissions
|
|
- Check for syntax errors in workflow file
|
|
|
|
### Script errors
|
|
- Ensure Python 3.7+ is installed
|
|
- Check internet connectivity
|
|
- Verify GitHub API is accessible
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Email/Slack notifications** when findings are detected
|
|
2. **More repositories** to monitor (e.g., huggingface/transformers)
|
|
3. **Automated benchmarking** when upstream lands
|
|
4. **Dashboard** for tracking upstream status over time
|
|
|
|
## Related Issues
|
|
|
|
- **Issue #1:** Main TurboQuant implementation
|
|
- **Issue #15:** This monitoring system
|
|
- **Parent Issue:** #1 (mentioned in #15)
|
|
|
|
## Acceptance Criteria
|
|
|
|
From issue #15:
|
|
- [x] Monitoring cadence established (weekly via GitHub Action)
|
|
- [x] Upstream landing detection and reporting when it happens
|
|
|
|
## Files
|
|
|
|
```
|
|
scripts/upstream_watch.py # Main monitoring script
|
|
.github/workflows/upstream-watch.yml # GitHub Action workflow
|
|
docs/upstream-watch.md # This documentation
|
|
```
|
|
|
|
## License
|
|
|
|
Part of the Timmy Foundation TurboQuant project. |