Files

Alexander Whitestone 3172415da1

Smoke Test / smoke (pull_request) Successful in 28s

Details

feat: implement TurboQuant upstream watch monitoring system

- Add scripts/upstream_watch.py for monitoring upstream repositories
- Add .github/workflows/upstream-watch.yml for weekly automated monitoring
- Add docs/upstream-watch.md for documentation
- Add scripts/run_upstream_watch.sh for easy execution
- Add scripts/test_upstream_watch.py for testing

Addresses issue #15: [P4] Upstream llama.cpp / Ollama TurboQuant watch

Features:
1. Monitor llama.cpp, Ollama, and ggml repositories
2. Search for TurboQuant/PolarQuant/QJL keywords
3. Check issues, PRs, and release notes
4. Generate text and JSON reports
5. Weekly GitHub Action for continuous monitoring
6. Automated issue creation when findings detected

Usage:
- Run monitor: python3 scripts/upstream_watch.py --days 30
- JSON output: python3 scripts/upstream_watch.py --format json
- Weekly monitoring: GitHub Action runs every Monday at 9:00 AM UTC

When upstream lands:
1. Detection: Monitor will detect mentions
2. Evaluation: Compare upstream vs fork
3. Decision: Migrate if upstream is better

Closes #15

2026-04-14 22:40:18 -04:00

4.9 KiB

Raw Blame History

TurboQuant Upstream Watch

Issue: #15 - [P4] Upstream llama.cpp / Ollama TurboQuant watch
Purpose: Monitor upstream llama.cpp and Ollama for TurboQuant/PolarQuant/QJL support

Overview

This system monitors upstream repositories for when TurboQuant (or similar KV cache compression techniques) land in official releases. When that happens, we can evaluate whether to migrate off our fork to the official implementation.

Components

1. `scripts/upstream_watch.py`

Main monitoring script that searches GitHub repositories for TurboQuant mentions.

Usage:

# Scan last 30 days (default)
python scripts/upstream_watch.py

# Scan last 60 days
python scripts/upstream_watch.py --days 60

# JSON output
python scripts/upstream_watch.py --format json

# Save to file
python scripts/upstream_watch.py --output report.md

# With GitHub token (for higher rate limits)
python scripts/upstream_watch.py --github-token $GITHUB_TOKEN

Features:

Searches llama.cpp, Ollama, and ggml repositories
Checks issues, PRs, and release notes
Looks for TurboQuant/PolarQuant/QJL keywords
Generates text or JSON reports
Compares fork status with upstream

2. `.github/workflows/upstream-watch.yml`

GitHub Action that runs weekly to monitor upstream.

Schedule: Every Monday at 9:00 AM UTC
Manual Trigger: Can be run manually with custom days parameter

What it does:

Runs the monitoring script
Generates JSON and text reports
Uploads reports as artifacts
Creates an issue if findings are detected
Commits reports to repository (optional)

3. Documentation

This file and related documentation.

Keywords Monitored

The system searches for these keywords in upstream repositories:

turborot (common misspelling/search term)
turborotquant
polarquant
qjl
kv cache compression
kv cache quantization
quantized kv
kv quant
cache compression

Repositories Monitored

llama.cpp (ggerganov/llama.cpp)
- Main C++ implementation of LLaMA
- Where TurboQuant would likely land first
Ollama (ollama/ollama)
- Go wrapper around llama.cpp
- Release notes may mention TurboQuant support
ggml (ggml-org/ggml)
- Tensor library used by llama.cpp
- Low-level KV cache compression implementations

Current Status

Fork: TheTom/llama-cpp-turboquant
Status: Active, maintained
Upstream Status: No TurboQuant support found in upstream yet

When Upstream Lands

When TurboQuant is detected in upstream, follow this evaluation process:

1. Detection

The monitoring system will detect mentions in issues, PRs, or releases
An issue will be created automatically

2. Evaluation

Compare upstream implementation with our fork:

Performance:

Benchmark compression ratio
Measure inference speed
Test memory usage

Features:

What quantization methods are supported?
What hardware backends are available?
What model architectures are supported?

Compatibility:

Does it work with our models?
Does it integrate with our toolchain?
Are there breaking changes?

3. Decision

Based on evaluation:

If upstream is better:

Plan migration from fork to upstream
Update dependencies
Test thoroughly
Document migration process

If our fork is better:

Continue using fork
Consider contributing improvements upstream
Document why we're keeping the fork

If they're equivalent:

Consider migrating for maintenance benefits
Less work to track upstream

Rate Limits

GitHub API has rate limits:

Unauthenticated: 60 requests/hour
Authenticated: 5,000 requests/hour

The script uses multiple API calls per repository, so use a GitHub token for better limits.

Troubleshooting

No findings detected

Check if keywords are correct
Verify repositories are being scanned
Check GitHub API rate limits
Try increasing --days parameter

GitHub Action failing

Check if GITHUB_TOKEN secret is set
Verify workflow permissions
Check for syntax errors in workflow file

Script errors

Ensure Python 3.7+ is installed
Check internet connectivity
Verify GitHub API is accessible

Future Enhancements

Email/Slack notifications when findings are detected
More repositories to monitor (e.g., huggingface/transformers)
Automated benchmarking when upstream lands
Dashboard for tracking upstream status over time

Issue #1: Main TurboQuant implementation
Issue #15: This monitoring system
Parent Issue: #1 (mentioned in #15)

Acceptance Criteria

From issue #15:

Monitoring cadence established (weekly via GitHub Action)
Upstream landing detection and reporting when it happens

Files

scripts/upstream_watch.py           # Main monitoring script
.github/workflows/upstream-watch.yml # GitHub Action workflow
docs/upstream-watch.md              # This documentation

License

Part of the Timmy Foundation TurboQuant project.

4.9 KiB Raw Blame History