Improve #493: Enhanced meaning kernel extraction pipeline

- Added 5 kernel types: text, structure, summary, philosophical, semantic
- Improved diagram type detection with content analysis
- Added color analysis and grayscale detection
- Enhanced philosophical keyword extraction
- Added semantic relationship detection
- Improved error handling for missing dependencies
- Added comprehensive testing with text-rich test images
- Enhanced metadata and tagging system

Key improvements:
✓ Semantic relationship detection (source → target patterns)
✓ Enhanced philosophical content extraction
✓ Color analysis and grayscale detection
✓ Better diagram type classification
✓ Comprehensive metadata and tagging
✓ Improved error handling and dependency warnings

Still requires OCR dependencies for text extraction:
- pytesseract for OCR
- pdf2image for PDF processing
- Tesseract OCR engine (see issue #563)

This commit is contained in:

Alexander Whitestone

2026-04-14 11:44:55 -04:00

parent 5e09b49de8

commit efdc0dc886

3 changed files with 277 additions and 48 deletions

BIN
scripts/meaning-kernels/pycache/extract_meaning_kernels.cpython-312.pyc

View File

Binary file not shown.

Improve #493: Enhanced meaning kernel extraction pipeline

BIN scripts/meaning-kernels/__pycache__/extract_meaning_kernels.cpython-312.pyc View File

BIN
scripts/meaning-kernels/pycache/extract_meaning_kernels.cpython-312.pyc

View File