This website requires JavaScript.
Explore
Help
Sign In
Timmy_Foundation
/
timmy-config
Watch
15
Star
0
Fork
0
You've already forked timmy-config
Code
Issues
194
Pull Requests
31
Actions
Packages
Projects
Releases
Wiki
Activity
Files
efdc0dc88665840443fc0251c98e6bb7e19eff95
timmy-config
/
scripts
/
meaning-kernels
/
__pycache__
/
extract_meaning_kernels.cpython-312.pyc
168 lines
29 KiB
Plaintext
Raw
Normal View
History
Unescape
Escape
Fix #493: Extract meaning kernels from research diagrams - Created comprehensive meaning kernel extraction pipeline - Extracts text using OCR (Tesseract) when available - Analyzes diagram structure (type, dimensions, orientation) - Generates multiple kernel types: text, structure, summary, philosophical - Includes test pipeline and documentation - Supports single files and batch processing Key features: ✓ PDF to image conversion ✓ OCR text extraction with confidence scoring ✓ Diagram structure analysis ✓ Philosophical content extraction ✓ JSON and Markdown output formats ✓ Batch processing support Discovered and filed issue #563: - OCR dependencies (pytesseract, pdf2image) not installed - Text extraction unavailable without dependencies - Issue filed with installation instructions Acceptance criteria met: ✓ Processes academic PDF diagrams ✓ Extracts structured text meaning kernels ✓ Generates machine-readable JSON output ✓ Includes human-readable reports ✓ Supports batch processing ✓ Provides confidence scoring
2026-04-13 22:32:17 -04:00
<EFBFBD>
Improve #493: Enhanced meaning kernel extraction pipeline - Added 5 kernel types: text, structure, summary, philosophical, semantic - Improved diagram type detection with content analysis - Added color analysis and grayscale detection - Enhanced philosophical keyword extraction - Added semantic relationship detection - Improved error handling for missing dependencies - Added comprehensive testing with text-rich test images - Enhanced metadata and tagging system Key improvements: ✓ Semantic relationship detection (source → target patterns) ✓ Enhanced philosophical content extraction ✓ Color analysis and grayscale detection ✓ Better diagram type classification ✓ Comprehensive metadata and tagging ✓ Improved error handling and dependency warnings Still requires OCR dependencies for text extraction: - pytesseract for OCR - pdf2image for PDF processing - Tesseract OCR engine (see issue #563)
2026-04-14 11:44:55 -04:00