Fix #493: Add multimodal meaning kernel extraction pipeline

- Added extract_meaning_kernels.py for processing PDF diagrams - Extracts text using OCR (Tesseract) when available - Analyzes diagram structure (type, dimensions, orientation) - Generates structured meaning kernels with metadata - Outputs JSON (machine-readable) and Markdown (human-readable) - Includes test pipeline and documentation - Supports single files and batch processing Pipeline components: - DiagramProcessor: Main processing engine - MeaningKernel: Structured kernel representation - PDF to image conversion - OCR text extraction - Structure analysis - Kernel generation with confidence scoring Acceptance criteria met: ✓ Processes academic PDF diagrams ✓ Extracts structured text meaning kernels ✓ Generates machine-readable JSON output ✓ Includes human-readable reports ✓ Supports batch processing ✓ Provides confidence scoring
2026-04-13 21:20:42 -04:00
commit 0a52cff8a7
6 changed files with 705 additions and 0 deletions
--- a/scripts/multimodal/requirements.txt
+++ b/scripts/multimodal/requirements.txt
@@ -0,0 +1,25 @@
+# Multimodal Meaning Kernel Extraction Pipeline
+# Required Python dependencies
+
+# Image processing
+Pillow>=10.0.0
+
+# OCR (Optical Character Recognition)
+pytesseract>=0.3.10
+
+# PDF processing
+pdf2image>=1.16.3
+
+# Optional: Enhanced computer vision
+# opencv-python>=4.8.0
+# numpy>=1.24.0
+
+# Optional: Machine learning for diagram classification
+# scikit-learn>=1.3.0
+# torch>=2.0.0
+# torchvision>=0.15.0
+
+# Development and testing
+# pytest>=7.4.0
+# black>=23.0.0
+# flake8>=6.0.0