Skip to content

asyau/online-learning-for-intent

Repository files navigation

Online Learning for Intent Classification with User Feedback

This repository implements a two-stage framework for intent classification inspired by the paper User Feedback-based Online Learning for Intent Classification (ICMI 2023). The original approach has been extended with several minor modifications:

  • Text Encoding: A lightweight SentenceTransformer model (paraphrase-MiniLM-L6-v2) is employed for encoding user utterances instead of the RoBERTa encoder used in the paper.
  • Offline Phase: The NeuraLCB algorithm is utilized for pretraining the policy from a small labeled dataset. Code from ofine_neural_bandits is incorporated to implement this phase.
  • Online Fine-Tuning: A single-step REINFORCE update is used to refine the pretrained policy based solely on user feedback (a reward of +1 is assigned for correct predictions and -1 for incorrect ones). Gradient clipping and a sliding window strategy are integrated to reduce the effect of distributional shift.
  • Dataset: The CLINC50 dataset (a subset of CLINC150) is employed, following the setup described in the paper. The dataset can be obtained from this repository.

Note: The entire process—including both offline pretraining and online fine-tuning—is executed sequentially through a single run of main.py.

Overview

Intent classification aims to infer the goal behind a user query. Traditional supervised approaches require large, fully annotated datasets and may struggle to adapt to new user intents or distributional shifts. The present approach addresses these challenges by employing a two-stage framework:

Offline Pretraining

A contextual bandit model (ExactNeuraLCBV2) is trained using the NeuraLCB algorithm to build an initial policy from limited labeled data. In this phase, each context is paired with all possible actions and rewards are assigned (1 for correct actions and 0 for incorrect ones).

Online Fine-Tuning

The pretrained policy is refined using a REINFORCE-based update that utilizes user feedback. A sliding window over recent samples is used to mitigate the impact of distributional shift.

Efficient Text Encoding

User utterances are transformed into 384-dimensional embeddings by the SentenceTransformer model, which serve as input features for the bandit algorithms.

Files & Visualizations

The following files are included in the repository to provide insights into the execution and performance of the model:

  • output.txt: This file contains the execution log of the pipeline, detailing data loading, training steps, and online fine-tuning results. It provides insights into model performance, accuracy improvements, and recommended intent classifications.

  • Figure_1.png: A visualization of Offline Training Loss vs. Training Steps, showcasing the reduction in loss over multiple training iterations. This graph demonstrates how the model's loss decreases as it learns from the initial dataset, improving its predictions.

  • Figure_2.png: A visualization of Online Fine-Tuning: Average Reward & Accuracy vs. Update Steps, demonstrating improvements in model accuracy and reward over time as it learns from user feedback. The graph shows how online reinforcement learning refines the intent classification model dynamically.

These visualizations help understand the effectiveness of both offline pretraining and online fine-tuning in improving intent classification accuracy.

Dependencies

The project requires the following dependencies:

  • Python 3.7+
  • JAX
  • Optax
  • NumPy
  • Pandas
  • Haiku
  • scikit-learn
  • Sentence-Transformers
  • absl-py
  • easydict

Installation of the dependencies can be performed with:

pip install -r requirements.txt

Installation

To install the project, the repository should be cloned and a virtual environment set up:

git clone https://github.com/yourusername/intent-online-learning.git
cd intent-online-learning
python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Dataset

The CLINC50 dataset (a subset of CLINC150) is required and can be downloaded from this repository. The data_full.json file must be placed in the project root directory.

Running the Code

The entire pipeline is executed through a single script. For example, the following command runs both the offline and online phases sequentially:

python main.py --num_contexts=500

Execution Details

A typical execution produces logs detailing the following:

  • Data loading and encoding of utterances with the SentenceTransformer.
  • Offline training of the ExactNeuraLCBV2 model (with a decrease in loss over 50 training steps).
  • Recommended actions obtained from the offline phase.
  • Online fine-tuning logs indicating improvements in average reward and accuracy.
  • Final recommended intents after online fine-tuning.

Implementation Details

Offline Phase

Data Preparation

An offline dataset is constructed by pairing each context with every possible action. Rewards are assigned as:

  • 1 when the predicted action matches the true intent.
  • 0 otherwise.

NeuraLCB Pretraining

The ExactNeuraLCBV2 model, provided by the NeuraLCB framework (offline_neural_bandits GitHub), is trained using a deep neural network with a pessimistic lower confidence bound strategy. This phase produces an initial policy that is robust despite the limited size of the labeled dataset.

Online Phase

REINFORCE Updates

In the online phase:

  • Batches of contexts are sampled, and actions are predicted stochastically.
  • A reward of +1 is assigned for a correct prediction and -1 for an incorrect one.
  • Samples are stored in a buffer, and a sliding window is used to form the training set.
  • The loss function is defined as the negative product of the reward and the log probability of the selected action.
  • Gradient descent with gradient clipping is used to minimize the loss.

Sliding Window Strategy

Only the most recent samples are utilized for updating the policy, reducing the influence of outdated predictions and minimizing distributional shift.


Text Encoding

The paraphrase-MiniLM-L6-v2 SentenceTransformer model is used to convert utterances into 384-dimensional embeddings. These embeddings serve as input features for both the offline and online phases.


References

When citing this work, the following papers should be referenced:

@inproceedings{nguyen2023user,
  title = {User Feedback-based Online Learning for Intent Classification},
  author = {Gönç, Kaan and Sağlam, Baturay and Dalmaz, Onat and Çukur, Tolga and Kozat, Süleyman S. and Dibeklioğlu, Hamdi},
  booktitle = {International Conference on Multimodal Interaction (ICMI 2023)},
  year = {2023},
  url = {https://doi.org/10.1145/3577190.3614137}
}
@inproceedings{nguyen-tang2022offline,
  title = {Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization},
  author = {Nguyen-Tang, Thanh and Gupta, Sunil and Nguyen, A. Tuan and Venkatesh, Svetha},
  booktitle = {International Conference on Learning Representations},
  year = {2022},
  url = {https://openreview.net/forum?id=sPIFuucA3F}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages