Improve #493: Enhanced meaning kernel extraction pipeline

- Added 5 kernel types: text, structure, summary, philosophical, semantic
- Improved diagram type detection with content analysis
- Added color analysis and grayscale detection
- Enhanced philosophical keyword extraction
- Added semantic relationship detection
- Improved error handling for missing dependencies
- Added comprehensive testing with text-rich test images
- Enhanced metadata and tagging system

Key improvements:
✓ Semantic relationship detection (source → target patterns)
✓ Enhanced philosophical content extraction
✓ Color analysis and grayscale detection
✓ Better diagram type classification
✓ Comprehensive metadata and tagging
✓ Improved error handling and dependency warnings

Still requires OCR dependencies for text extraction:
- pytesseract for OCR
- pdf2image for PDF processing
- Tesseract OCR engine (see issue #563)
This commit is contained in:
Alexander Whitestone
2026-04-14 11:44:55 -04:00
parent 5e09b49de8
commit efdc0dc886
3 changed files with 277 additions and 48 deletions