Files
turboquant/docs/upstream-watch.md
Alexander Whitestone 3172415da1
All checks were successful
Smoke Test / smoke (pull_request) Successful in 28s
feat: implement TurboQuant upstream watch monitoring system
- Add scripts/upstream_watch.py for monitoring upstream repositories
- Add .github/workflows/upstream-watch.yml for weekly automated monitoring
- Add docs/upstream-watch.md for documentation
- Add scripts/run_upstream_watch.sh for easy execution
- Add scripts/test_upstream_watch.py for testing

Addresses issue #15: [P4] Upstream llama.cpp / Ollama TurboQuant watch

Features:
1. Monitor llama.cpp, Ollama, and ggml repositories
2. Search for TurboQuant/PolarQuant/QJL keywords
3. Check issues, PRs, and release notes
4. Generate text and JSON reports
5. Weekly GitHub Action for continuous monitoring
6. Automated issue creation when findings detected

Usage:
- Run monitor: python3 scripts/upstream_watch.py --days 30
- JSON output: python3 scripts/upstream_watch.py --format json
- Weekly monitoring: GitHub Action runs every Monday at 9:00 AM UTC

When upstream lands:
1. Detection: Monitor will detect mentions
2. Evaluation: Compare upstream vs fork
3. Decision: Migrate if upstream is better

Closes #15
2026-04-14 22:40:18 -04:00

4.9 KiB

TurboQuant Upstream Watch

Issue: #15 - [P4] Upstream llama.cpp / Ollama TurboQuant watch
Purpose: Monitor upstream llama.cpp and Ollama for TurboQuant/PolarQuant/QJL support

Overview

This system monitors upstream repositories for when TurboQuant (or similar KV cache compression techniques) land in official releases. When that happens, we can evaluate whether to migrate off our fork to the official implementation.

Components

1. scripts/upstream_watch.py

Main monitoring script that searches GitHub repositories for TurboQuant mentions.

Usage:

# Scan last 30 days (default)
python scripts/upstream_watch.py

# Scan last 60 days
python scripts/upstream_watch.py --days 60

# JSON output
python scripts/upstream_watch.py --format json

# Save to file
python scripts/upstream_watch.py --output report.md

# With GitHub token (for higher rate limits)
python scripts/upstream_watch.py --github-token $GITHUB_TOKEN

Features:

  • Searches llama.cpp, Ollama, and ggml repositories
  • Checks issues, PRs, and release notes
  • Looks for TurboQuant/PolarQuant/QJL keywords
  • Generates text or JSON reports
  • Compares fork status with upstream

2. .github/workflows/upstream-watch.yml

GitHub Action that runs weekly to monitor upstream.

Schedule: Every Monday at 9:00 AM UTC
Manual Trigger: Can be run manually with custom days parameter

What it does:

  1. Runs the monitoring script
  2. Generates JSON and text reports
  3. Uploads reports as artifacts
  4. Creates an issue if findings are detected
  5. Commits reports to repository (optional)

3. Documentation

This file and related documentation.

Keywords Monitored

The system searches for these keywords in upstream repositories:

  • turborot (common misspelling/search term)
  • turborotquant
  • polarquant
  • qjl
  • kv cache compression
  • kv cache quantization
  • quantized kv
  • kv quant
  • cache compression

Repositories Monitored

  1. llama.cpp (ggerganov/llama.cpp)

    • Main C++ implementation of LLaMA
    • Where TurboQuant would likely land first
  2. Ollama (ollama/ollama)

    • Go wrapper around llama.cpp
    • Release notes may mention TurboQuant support
  3. ggml (ggml-org/ggml)

    • Tensor library used by llama.cpp
    • Low-level KV cache compression implementations

Current Status

Fork: TheTom/llama-cpp-turboquant
Status: Active, maintained
Upstream Status: No TurboQuant support found in upstream yet

When Upstream Lands

When TurboQuant is detected in upstream, follow this evaluation process:

1. Detection

  • The monitoring system will detect mentions in issues, PRs, or releases
  • An issue will be created automatically

2. Evaluation

Compare upstream implementation with our fork:

Performance:

  • Benchmark compression ratio
  • Measure inference speed
  • Test memory usage

Features:

  • What quantization methods are supported?
  • What hardware backends are available?
  • What model architectures are supported?

Compatibility:

  • Does it work with our models?
  • Does it integrate with our toolchain?
  • Are there breaking changes?

3. Decision

Based on evaluation:

If upstream is better:

  • Plan migration from fork to upstream
  • Update dependencies
  • Test thoroughly
  • Document migration process

If our fork is better:

  • Continue using fork
  • Consider contributing improvements upstream
  • Document why we're keeping the fork

If they're equivalent:

  • Consider migrating for maintenance benefits
  • Less work to track upstream

Rate Limits

GitHub API has rate limits:

  • Unauthenticated: 60 requests/hour
  • Authenticated: 5,000 requests/hour

The script uses multiple API calls per repository, so use a GitHub token for better limits.

Troubleshooting

No findings detected

  • Check if keywords are correct
  • Verify repositories are being scanned
  • Check GitHub API rate limits
  • Try increasing --days parameter

GitHub Action failing

  • Check if GITHUB_TOKEN secret is set
  • Verify workflow permissions
  • Check for syntax errors in workflow file

Script errors

  • Ensure Python 3.7+ is installed
  • Check internet connectivity
  • Verify GitHub API is accessible

Future Enhancements

  1. Email/Slack notifications when findings are detected
  2. More repositories to monitor (e.g., huggingface/transformers)
  3. Automated benchmarking when upstream lands
  4. Dashboard for tracking upstream status over time
  • Issue #1: Main TurboQuant implementation
  • Issue #15: This monitoring system
  • Parent Issue: #1 (mentioned in #15)

Acceptance Criteria

From issue #15:

  • Monitoring cadence established (weekly via GitHub Action)
  • Upstream landing detection and reporting when it happens

Files

scripts/upstream_watch.py           # Main monitoring script
.github/workflows/upstream-watch.yml # GitHub Action workflow
docs/upstream-watch.md              # This documentation

License

Part of the Timmy Foundation TurboQuant project.