Official implementation of:
Lim, H., Park, S., Li, Q., Li, X., & Kim, J. (2026). What makes a review helpful? A multimodal prediction model in e-commerce. Electronic Commerce Research and Applications, 76, 101586. Paper
This repository provides the official implementation of MCHPM (Multimodal Cue-based Helpfulness Prediction Model), a theory-driven deep learning framework for review helpfulness prediction in e-commerce. MCHPM is grounded in the Elaboration Likelihood Model and reflects how consumers evaluate online reviews through central and peripheral information-processing routes.
Existing MRHP (Multimodal Review Helpfulness Prediction) models primarily focus on deep semantic representations from text and images while overlooking shallow cues such as readability and image quality. To address this limitation, MCHPM systematically integrates central cues extracted via BERT and VGG-16 with peripheral cues computed from textual and visual surface features.
A co-attention mechanism models the interdependencies between central and peripheral cues within each modality, and a Gated Multimodal Unit dynamically adjusts the relative importance of text and image representations during prediction. Experiments on large-scale Amazon datasets demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE. These results validate the effectiveness of theory-driven multimodal cue integration for review helpfulness prediction.
- python >= 3.9
- torch == 2.3.1
- torchvision == 0.18.1
- tensorflow == 2.15.0
- transformers == 4.28.1
- tokenizers == 0.13.3
- sentencepiece == 0.2.0
- huggingface-hub == 0.23.4
- nltk == 3.9.2
- textblob == 0.19.0
- textstat == 0.7.11
- numpy == 1.26.4
- pandas == 2.2.1
- pyarrow == 12.0.1
- scikit-learn == 1.4.2
- opencv-python
- Pillow == 10.3.0
- tqdm == 4.66.4
- PyYAML == 6.0.1
Below is the project structure for quick reference.
├── data/ # Dataset directory
│ ├── raw/ # Original (unprocessed) datasets
│ └── processed/ # Preprocessed data for training and evaluation
│
├── model/ # MCHPM architecture and training pipeline
│ └── proposed.py # End-to-end MCHPM implementation
│
├── src/ # Core source code
│ ├── data.py # Data preprocessing and dataset loader
│ ├── bert.py # Text central cue extraction using BERT
│ ├── vgg16.py # Image central cue extraction using VGG-16
│ ├── peripheral_features.py # Peripheral cue extraction pipeline for text and images
│ ├── image_manager.py # Image downloading and path management utilities
│ ├── config.yaml # Model and training configuration file
│ ├── path.py # Path and directory management utilities
│ └── utils.py # Helper functions (metrics and logging)
│
├── main.py # Entry point for model training and evaluation
│
├── requirements.txt # Python package dependencies
│
├── README.md # Project documentation
│
└── .gitignore # Git ignore configurationMCHPM (Multimodal Cue-based Helpfulness Prediction Model) is a theory-driven review helpfulness prediction framework designed to reflect consumers’ dual-route information processing mechanism. Grounded in the Elaboration Likelihood Model, MCHPM explicitly models both central cues (deep semantic and visual representations) and peripheral cues (surface-level textual and image-quality features) within a unified multimodal architecture.
The model consists of three main modules:
- Multi-Cue Extraction Module: Extracts central and peripheral cues from review text and images.
- Cue-Integration Module: Models the interdependencies between central and peripheral cues within each modality.
- Multimodal Fusion Module: Dynamically fuses textual and visual representations to predict review helpfulness.
In the Multi-Cue Extraction module, textual central features are obtained from BERT, while visual central features are extracted from VGG-16. Peripheral cues, including sentiment, subjectivity, readability, extremity, brightness, contrast, saturation, and edge intensity, are computed using Python-based feature extraction. These cues represent shallow attributes that influence consumers’ evaluation processes.
In the Cue-Integration module, a co-attention mechanism captures the interactions between textual and visual representations. This mechanism enables the model to learn how features from one modality inform and refine the representations of the other. Feed-forward layers and residual connections further stabilize and enhance feature learning.
In the Multimodal Fusion module, a GMU (Gated Multimodal Fusion) mechanism dynamically adjusts the relative importance of text and image modalities. The fused representation is then passed to a multilayer perceptron for final helpfulness score prediction.
Create a virtual environment (Python ≥ 3.9 recommended) and install the required dependencies:
python3.9 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtconda create -n mchpm python=3.9
conda activate mchpm
pip install -r requirements.txtPlace your dataset under data/raw/ and ensure that its format matches the preprocessing pipeline defined in src/data.py.
Preprocessed data will be stored under data/processed/ after feature extraction.
Edit src/config.yaml to configure training, data paths, and model hyperparameters before running the experiment.
Run the training and evaluation script:
python main.pyMCHPM was evaluated on two large-scale Amazon review datasets: Cell Phones & Accessories and Electronics. The results demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines across all evaluation metrics, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE compared with the strongest benchmark model.
| Model | Cell Phones & Accessories | Electronics | ||||||
|---|---|---|---|---|---|---|---|---|
| MAE | MSE | RMSE | MAPE | MAE | MSE | RMSE | MAPE | |
| LSTM | 0.647 | 0.821 | 0.849 | 56.702 | 0.711 | 0.896 | 0.946 | 57.678 |
| TNN | 0.643 | 0.714 | 0.845 | 56.650 | 0.722 | 0.904 | 0.851 | 59.556 |
| DMAF | 0.625 | 0.691 | 0.836 | 53.139 | 0.697 | 0.880 | 0.939 | 55.198 |
| CS-IMD | 0.615 | 0.681 | 0.825 | 52.392 | 0.687 | 0.831 | 0.912 | 56.032 |
| MFRHP (Proposed) | 0.625 | 0.695 | 0.837 | 53.116 | 0.695 | 0.840 | 0.916 | 57.488 |
If you use this repository in your research, please cite:
@article{LIM2026101586,
title = {What makes a review helpful? A multimodal prediction model in e-commerce},
author = {Heena Lim and Seonu Park and Qinglong Li and Xinzhe Li and Jaekyeong Kim},
journal = {Electronic Commerce Research and Applications},
volume = {76},
pages = {101586},
year = {2026},
doi = {10.1016/j.elerap.2026.101586}
}For research inquiries or collaborations, please contact:
Seonu Park
Ph.D. Student, Department of Big Data Analytics
Kyung Hee University
Email: sunu0087@khu.ac.kr
Qinglong Li
Assistant Professor, Division of Computer Engineering
Hansung University
Email: leecy@hansung.ac.kr
Last updated: March 2026
