- Add scripts/upstream_watch.py for monitoring upstream repositories - Add .github/workflows/upstream-watch.yml for weekly automated monitoring - Add docs/upstream-watch.md for documentation - Add scripts/run_upstream_watch.sh for easy execution - Add scripts/test_upstream_watch.py for testing Addresses issue #15: [P4] Upstream llama.cpp / Ollama TurboQuant watch Features: 1. Monitor llama.cpp, Ollama, and ggml repositories 2. Search for TurboQuant/PolarQuant/QJL keywords 3. Check issues, PRs, and release notes 4. Generate text and JSON reports 5. Weekly GitHub Action for continuous monitoring 6. Automated issue creation when findings detected Usage: - Run monitor: python3 scripts/upstream_watch.py --days 30 - JSON output: python3 scripts/upstream_watch.py --format json - Weekly monitoring: GitHub Action runs every Monday at 9:00 AM UTC When upstream lands: 1. Detection: Monitor will detect mentions 2. Evaluation: Compare upstream vs fork 3. Decision: Migrate if upstream is better Closes #15
4.9 KiB
TurboQuant Upstream Watch
Issue: #15 - [P4] Upstream llama.cpp / Ollama TurboQuant watch
Purpose: Monitor upstream llama.cpp and Ollama for TurboQuant/PolarQuant/QJL support
Overview
This system monitors upstream repositories for when TurboQuant (or similar KV cache compression techniques) land in official releases. When that happens, we can evaluate whether to migrate off our fork to the official implementation.
Components
1. scripts/upstream_watch.py
Main monitoring script that searches GitHub repositories for TurboQuant mentions.
Usage:
# Scan last 30 days (default)
python scripts/upstream_watch.py
# Scan last 60 days
python scripts/upstream_watch.py --days 60
# JSON output
python scripts/upstream_watch.py --format json
# Save to file
python scripts/upstream_watch.py --output report.md
# With GitHub token (for higher rate limits)
python scripts/upstream_watch.py --github-token $GITHUB_TOKEN
Features:
- Searches llama.cpp, Ollama, and ggml repositories
- Checks issues, PRs, and release notes
- Looks for TurboQuant/PolarQuant/QJL keywords
- Generates text or JSON reports
- Compares fork status with upstream
2. .github/workflows/upstream-watch.yml
GitHub Action that runs weekly to monitor upstream.
Schedule: Every Monday at 9:00 AM UTC
Manual Trigger: Can be run manually with custom days parameter
What it does:
- Runs the monitoring script
- Generates JSON and text reports
- Uploads reports as artifacts
- Creates an issue if findings are detected
- Commits reports to repository (optional)
3. Documentation
This file and related documentation.
Keywords Monitored
The system searches for these keywords in upstream repositories:
turborot(common misspelling/search term)turborotquantpolarquantqjlkv cache compressionkv cache quantizationquantized kvkv quantcache compression
Repositories Monitored
-
llama.cpp (
ggerganov/llama.cpp)- Main C++ implementation of LLaMA
- Where TurboQuant would likely land first
-
Ollama (
ollama/ollama)- Go wrapper around llama.cpp
- Release notes may mention TurboQuant support
-
ggml (
ggml-org/ggml)- Tensor library used by llama.cpp
- Low-level KV cache compression implementations
Current Status
Fork: TheTom/llama-cpp-turboquant
Status: Active, maintained
Upstream Status: No TurboQuant support found in upstream yet
When Upstream Lands
When TurboQuant is detected in upstream, follow this evaluation process:
1. Detection
- The monitoring system will detect mentions in issues, PRs, or releases
- An issue will be created automatically
2. Evaluation
Compare upstream implementation with our fork:
Performance:
- Benchmark compression ratio
- Measure inference speed
- Test memory usage
Features:
- What quantization methods are supported?
- What hardware backends are available?
- What model architectures are supported?
Compatibility:
- Does it work with our models?
- Does it integrate with our toolchain?
- Are there breaking changes?
3. Decision
Based on evaluation:
If upstream is better:
- Plan migration from fork to upstream
- Update dependencies
- Test thoroughly
- Document migration process
If our fork is better:
- Continue using fork
- Consider contributing improvements upstream
- Document why we're keeping the fork
If they're equivalent:
- Consider migrating for maintenance benefits
- Less work to track upstream
Rate Limits
GitHub API has rate limits:
- Unauthenticated: 60 requests/hour
- Authenticated: 5,000 requests/hour
The script uses multiple API calls per repository, so use a GitHub token for better limits.
Troubleshooting
No findings detected
- Check if keywords are correct
- Verify repositories are being scanned
- Check GitHub API rate limits
- Try increasing
--daysparameter
GitHub Action failing
- Check if
GITHUB_TOKENsecret is set - Verify workflow permissions
- Check for syntax errors in workflow file
Script errors
- Ensure Python 3.7+ is installed
- Check internet connectivity
- Verify GitHub API is accessible
Future Enhancements
- Email/Slack notifications when findings are detected
- More repositories to monitor (e.g., huggingface/transformers)
- Automated benchmarking when upstream lands
- Dashboard for tracking upstream status over time
Related Issues
- Issue #1: Main TurboQuant implementation
- Issue #15: This monitoring system
- Parent Issue: #1 (mentioned in #15)
Acceptance Criteria
From issue #15:
- Monitoring cadence established (weekly via GitHub Action)
- Upstream landing detection and reporting when it happens
Files
scripts/upstream_watch.py # Main monitoring script
.github/workflows/upstream-watch.yml # GitHub Action workflow
docs/upstream-watch.md # This documentation
License
Part of the Timmy Foundation TurboQuant project.