Multimodal-alignment study using text-derived views (semantic embeddings, lexical cues, affective/psycholinguistic proxies) to predict affect on CMU-MOSEI and qualitatively project to counseling dialogues.
- Fusion (semantic + proxies + TF-IDF) outperforms best unimodal baseline on Macro-F1 (+1.3 points, p=0.0076) and improves calibration (Brier 0.176 vs 0.179) on test split.
- Variance across folds drops (Std Macro-F1 0.0045 fused vs 0.0063 semantic), indicating stabler alignment.
- Counseling projection shows intuitive polarity spread, aiding interpretability despite unlabeled data.
- Ensure virtual env:
uv venv && source .venv/bin/activate. - Dependencies tracked in
pyproject.toml(installed viauv add ...). - Run experiments:
source .venv/bin/activate && python notebooks/run_multimodal_alignment.py - Outputs land in
results/(metrics JSON, plots, counseling projections).
planning.md— research plan and methodology.notebooks/run_multimodal_alignment.py— end-to-end experiment script.results/— metrics (cv_metrics_raw.json,test_metrics.json, etc.) and plots.datasets/— local MOSEI text and counseling data (excluded from git).REPORT.md— full report with analysis and conclusions.
- Seed fixed at 42; CPU execution ~2 minutes.
- Uses
sentence-transformers/all-MiniLM-L6-v2for semantic embeddings; seeREPORT.mdfor full details and limitations.