Skip to content

PRISM-AILAB/MCHPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCHPM

Official implementation of:

Lim, H., Park, S., Li, Q., Li, X., & Kim, J. (2026). What makes a review helpful? A multimodal prediction model in e-commerce. Electronic Commerce Research and Applications, 76, 101586. Paper

Overview

This repository provides the official implementation of MCHPM (Multimodal Cue-based Helpfulness Prediction Model), a theory-driven deep learning framework for review helpfulness prediction in e-commerce. MCHPM is grounded in the Elaboration Likelihood Model and reflects how consumers evaluate online reviews through central and peripheral information-processing routes.

Existing MRHP (Multimodal Review Helpfulness Prediction) models primarily focus on deep semantic representations from text and images while overlooking shallow cues such as readability and image quality. To address this limitation, MCHPM systematically integrates central cues extracted via BERT and VGG-16 with peripheral cues computed from textual and visual surface features.

A co-attention mechanism models the interdependencies between central and peripheral cues within each modality, and a Gated Multimodal Unit dynamically adjusts the relative importance of text and image representations during prediction. Experiments on large-scale Amazon datasets demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE. These results validate the effectiveness of theory-driven multimodal cue integration for review helpfulness prediction.

Requirements

  • python >= 3.9
  • torch == 2.3.1
  • torchvision == 0.18.1
  • tensorflow == 2.15.0
  • transformers == 4.28.1
  • tokenizers == 0.13.3
  • sentencepiece == 0.2.0
  • huggingface-hub == 0.23.4
  • nltk == 3.9.2
  • textblob == 0.19.0
  • textstat == 0.7.11
  • numpy == 1.26.4
  • pandas == 2.2.1
  • pyarrow == 12.0.1
  • scikit-learn == 1.4.2
  • opencv-python
  • Pillow == 10.3.0
  • tqdm == 4.66.4
  • PyYAML == 6.0.1

Repository Structure

Below is the project structure for quick reference.

├── data/                        # Dataset directory
│   ├── raw/                     # Original (unprocessed) datasets
│   └── processed/               # Preprocessed data for training and evaluation
│
├── model/                       # MCHPM architecture and training pipeline
│   └── proposed.py              # End-to-end MCHPM implementation
│
├── src/                         # Core source code
│   ├── data.py                  # Data preprocessing and dataset loader
│   ├── bert.py                  # Text central cue extraction using BERT
│   ├── vgg16.py                 # Image central cue extraction using VGG-16
│   ├── peripheral_features.py   # Peripheral cue extraction pipeline for text and images
│   ├── image_manager.py         # Image downloading and path management utilities
│   ├── config.yaml              # Model and training configuration file
│   ├── path.py                  # Path and directory management utilities
│   └── utils.py                 # Helper functions (metrics and logging)
│
├── main.py                      # Entry point for model training and evaluation
│
├── requirements.txt             # Python package dependencies
│
├── README.md                    # Project documentation
│
└── .gitignore                   # Git ignore configuration

Model Description

MCHPM (Multimodal Cue-based Helpfulness Prediction Model) is a theory-driven review helpfulness prediction framework designed to reflect consumers’ dual-route information processing mechanism. Grounded in the Elaboration Likelihood Model, MCHPM explicitly models both central cues (deep semantic and visual representations) and peripheral cues (surface-level textual and image-quality features) within a unified multimodal architecture.

The model consists of three main modules:

  • Multi-Cue Extraction Module: Extracts central and peripheral cues from review text and images.
  • Cue-Integration Module: Models the interdependencies between central and peripheral cues within each modality.
  • Multimodal Fusion Module: Dynamically fuses textual and visual representations to predict review helpfulness.

In the Multi-Cue Extraction module, textual central features are obtained from BERT, while visual central features are extracted from VGG-16. Peripheral cues, including sentiment, subjectivity, readability, extremity, brightness, contrast, saturation, and edge intensity, are computed using Python-based feature extraction. These cues represent shallow attributes that influence consumers’ evaluation processes.

In the Cue-Integration module, a co-attention mechanism captures the interactions between textual and visual representations. This mechanism enables the model to learn how features from one modality inform and refine the representations of the other. Feed-forward layers and residual connections further stabilize and enhance feature learning.

In the Multimodal Fusion module, a GMU (Gated Multimodal Fusion) mechanism dynamically adjusts the relative importance of text and image modalities. The fused representation is then passed to a multilayer perceptron for final helpfulness score prediction.

MCHPM Architecture

How to Run

Environment Setup

Create a virtual environment (Python ≥ 3.9 recommended) and install the required dependencies:

Option A: Using venv

python3.9 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Option B: Using Conda

conda create -n mchpm python=3.9
conda activate mchpm
pip install -r requirements.txt

Data Preparation

Place your dataset under data/raw/ and ensure that its format matches the preprocessing pipeline defined in src/data.py.

Preprocessed data will be stored under data/processed/ after feature extraction.

Configuration

Edit src/config.yaml to configure training, data paths, and model hyperparameters before running the experiment.

Train and Evaluate the Model

Run the training and evaluation script:

python main.py

Experimental Results

MCHPM was evaluated on two large-scale Amazon review datasets: Cell Phones & Accessories and Electronics. The results demonstrate that MCHPM consistently outperforms strong unimodal and multimodal baselines across all evaluation metrics, achieving average improvements of 3.864% in MAE, 4.061% in MSE, 2.172% in RMSE, and 6.349% in MAPE compared with the strongest benchmark model.

Model Cell Phones & Accessories Electronics
MAE MSE RMSE MAPE MAE MSE RMSE MAPE
LSTM 0.6470.8210.84956.702 0.7110.8960.94657.678
TNN 0.6430.7140.84556.650 0.7220.9040.85159.556
DMAF 0.6250.6910.83653.139 0.6970.8800.93955.198
CS-IMD 0.6150.6810.82552.392 0.6870.8310.91256.032
MFRHP (Proposed) 0.6250.6950.83753.116 0.6950.8400.91657.488

Citation

If you use this repository in your research, please cite:

@article{LIM2026101586,
  title = {What makes a review helpful? A multimodal prediction model in e-commerce},
  author = {Heena Lim and Seonu Park and Qinglong Li and Xinzhe Li and Jaekyeong Kim},
  journal = {Electronic Commerce Research and Applications},
  volume = {76},
  pages = {101586},
  year = {2026},
  doi = {10.1016/j.elerap.2026.101586}  
}

Contact

For research inquiries or collaborations, please contact:

Seonu Park
Ph.D. Student, Department of Big Data Analytics
Kyung Hee University
Email: sunu0087@khu.ac.kr

Qinglong Li
Assistant Professor, Division of Computer Engineering
Hansung University
Email: leecy@hansung.ac.kr

Last updated: March 2026

About

About Official implementation of "What makes a review helpful? A multimodal prediction model in e-commerce" (Electronic Commerce Research and Applications, 2026)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages