Files
timmy-config/training-data/generate_prompt_enhancement.py
Alexander Payne 29278efdc9
Some checks failed
Smoke Test / smoke (pull_request) Failing after 22s
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 19s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m0s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m4s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Successful in 5m24s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Training Data / validate (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Failing after 13s
Training Factory: Prompt Enhancement — 3K Terse→Rich Expansions
Add generator script and combined artifact for prompt-enhancement dataset.
- generate_prompt_enhancement.py combines per-category files from training/data/prompt-enhancement/
- Produces prompt-enhancement.jsonl with 3K terse→rich pairs across 6 domains
- Covers required: visual scenes, music moods, dream descriptions, emotional weather
- Output format: {"terse": str, "rich": str, "domain": str}
- Usage: python3 training-data/generate_prompt_enhancement.py --output ~/.hermes/training-data/prompt-enhancement.jsonl

Closes #575
2026-04-26 22:30:08 -04:00

52 lines
1.9 KiB
Python
Executable File

#!/usr/bin/env python3
import argparse
import json
import sys
from pathlib import Path
def main():
parser = argparse.ArgumentParser(description="Combine prompt-enhancement files.")
parser.add_argument('--output', required=True, help='Output JSONL path')
parser.add_argument('--source-dir', default='training/data/prompt-enhancement',
help='Source directory containing per-category JSONL files')
args = parser.parse_args()
source = Path(args.source_dir)
out_path = Path(args.output)
out_path.parent.mkdir(parents=True, exist_ok=True)
if not source.exists():
print(f"ERROR: Source directory {source} not found", file=sys.stderr)
sys.exit(1)
files = sorted(source.glob('*.jsonl'))
if not files:
print(f"ERROR: No .jsonl files found in {source}", file=sys.stderr)
sys.exit(1)
count = 0
with open(out_path, 'w', encoding='utf-8') as out_f:
for jsonl_file in files:
with open(jsonl_file) as in_f:
for line in in_f:
line = line.strip()
if not line:
continue
try:
entry = json.loads(line)
except json.JSONDecodeError as e:
print(f"WARNING: Skipping invalid JSON in {jsonl_file}: {e}", file=sys.stderr)
continue
# Validate required keys
if not all(k in entry for k in ('terse', 'rich', 'domain')):
print(f"WARNING: Skipping entry without required keys in {jsonl_file}", file=sys.stderr)
continue
out_f.write(json.dumps(entry, ensure_ascii=False) + '\n')
count += 1
print(f"Wrote {count} entries to {out_path}")
if __name__ == '__main__':
main()