Fix #493: Add multimodal meaning kernel extraction pipeline
- Added extract_meaning_kernels.py for processing PDF diagrams - Extracts text using OCR (Tesseract) when available - Analyzes diagram structure (type, dimensions, orientation) - Generates structured meaning kernels with metadata - Outputs JSON (machine-readable) and Markdown (human-readable) - Includes test pipeline and documentation - Supports single files and batch processing Pipeline components: - DiagramProcessor: Main processing engine - MeaningKernel: Structured kernel representation - PDF to image conversion - OCR text extraction - Structure analysis - Kernel generation with confidence scoring Acceptance criteria met: ✓ Processes academic PDF diagrams ✓ Extracts structured text meaning kernels ✓ Generates machine-readable JSON output ✓ Includes human-readable reports ✓ Supports batch processing ✓ Provides confidence scoring
This commit is contained in:
25
scripts/multimodal/requirements.txt
Normal file
25
scripts/multimodal/requirements.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
# Multimodal Meaning Kernel Extraction Pipeline
|
||||
# Required Python dependencies
|
||||
|
||||
# Image processing
|
||||
Pillow>=10.0.0
|
||||
|
||||
# OCR (Optical Character Recognition)
|
||||
pytesseract>=0.3.10
|
||||
|
||||
# PDF processing
|
||||
pdf2image>=1.16.3
|
||||
|
||||
# Optional: Enhanced computer vision
|
||||
# opencv-python>=4.8.0
|
||||
# numpy>=1.24.0
|
||||
|
||||
# Optional: Machine learning for diagram classification
|
||||
# scikit-learn>=1.3.0
|
||||
# torch>=2.0.0
|
||||
# torchvision>=0.15.0
|
||||
|
||||
# Development and testing
|
||||
# pytest>=7.4.0
|
||||
# black>=23.0.0
|
||||
# flake8>=6.0.0
|
||||
Reference in New Issue
Block a user