This repository implements a two-stage framework for intent classification inspired by the paper User Feedback-based Online Learning for Intent Classification (ICMI 2023). The original approach has been extended with several minor modifications:
- Text Encoding: A lightweight SentenceTransformer model (
paraphrase-MiniLM-L6-v2) is employed for encoding user utterances instead of the RoBERTa encoder used in the paper. - Offline Phase: The NeuraLCB algorithm is utilized for pretraining the policy from a small labeled dataset. Code from ofine_neural_bandits is incorporated to implement this phase.
- Online Fine-Tuning: A single-step REINFORCE update is used to refine the pretrained policy based solely on user feedback (a reward of +1 is assigned for correct predictions and -1 for incorrect ones). Gradient clipping and a sliding window strategy are integrated to reduce the effect of distributional shift.
- Dataset: The CLINC50 dataset (a subset of CLINC150) is employed, following the setup described in the paper. The dataset can be obtained from this repository.
Note: The entire process—including both offline pretraining and online fine-tuning—is executed sequentially through a single run of
main.py.
Intent classification aims to infer the goal behind a user query. Traditional supervised approaches require large, fully annotated datasets and may struggle to adapt to new user intents or distributional shifts. The present approach addresses these challenges by employing a two-stage framework:
A contextual bandit model (ExactNeuraLCBV2) is trained using the NeuraLCB algorithm to build an initial policy from limited labeled data. In this phase, each context is paired with all possible actions and rewards are assigned (1 for correct actions and 0 for incorrect ones).
The pretrained policy is refined using a REINFORCE-based update that utilizes user feedback. A sliding window over recent samples is used to mitigate the impact of distributional shift.
User utterances are transformed into 384-dimensional embeddings by the SentenceTransformer model, which serve as input features for the bandit algorithms.
The following files are included in the repository to provide insights into the execution and performance of the model:
-
output.txt: This file contains the execution log of the pipeline, detailing data loading, training steps, and online fine-tuning results. It provides insights into model performance, accuracy improvements, and recommended intent classifications. -
Figure_1.png: A visualization of Offline Training Loss vs. Training Steps, showcasing the reduction in loss over multiple training iterations. This graph demonstrates how the model's loss decreases as it learns from the initial dataset, improving its predictions. -
Figure_2.png: A visualization of Online Fine-Tuning: Average Reward & Accuracy vs. Update Steps, demonstrating improvements in model accuracy and reward over time as it learns from user feedback. The graph shows how online reinforcement learning refines the intent classification model dynamically.
These visualizations help understand the effectiveness of both offline pretraining and online fine-tuning in improving intent classification accuracy.
The project requires the following dependencies:
- Python 3.7+
- JAX
- Optax
- NumPy
- Pandas
- Haiku
- scikit-learn
- Sentence-Transformers
- absl-py
- easydict
Installation of the dependencies can be performed with:
pip install -r requirements.txtTo install the project, the repository should be cloned and a virtual environment set up:
git clone https://github.com/yourusername/intent-online-learning.git
cd intent-online-learning
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtThe CLINC50 dataset (a subset of CLINC150) is required and can be downloaded from this repository. The data_full.json file must be placed in the project root directory.
The entire pipeline is executed through a single script. For example, the following command runs both the offline and online phases sequentially:
python main.py --num_contexts=500A typical execution produces logs detailing the following:
- Data loading and encoding of utterances with the
SentenceTransformer. - Offline training of the
ExactNeuraLCBV2model (with a decrease in loss over 50 training steps). - Recommended actions obtained from the offline phase.
- Online fine-tuning logs indicating improvements in average reward and accuracy.
- Final recommended intents after online fine-tuning.
An offline dataset is constructed by pairing each context with every possible action. Rewards are assigned as:
1when the predicted action matches the true intent.0otherwise.
The ExactNeuraLCBV2 model, provided by the NeuraLCB framework (offline_neural_bandits GitHub), is trained using a deep neural network with a pessimistic lower confidence bound strategy. This phase produces an initial policy that is robust despite the limited size of the labeled dataset.
In the online phase:
- Batches of contexts are sampled, and actions are predicted stochastically.
- A reward of
+1is assigned for a correct prediction and-1for an incorrect one. - Samples are stored in a buffer, and a sliding window is used to form the training set.
- The loss function is defined as the negative product of the reward and the log probability of the selected action.
- Gradient descent with gradient clipping is used to minimize the loss.
Only the most recent samples are utilized for updating the policy, reducing the influence of outdated predictions and minimizing distributional shift.
The paraphrase-MiniLM-L6-v2 SentenceTransformer model is used to convert utterances into 384-dimensional embeddings. These embeddings serve as input features for both the offline and online phases.
When citing this work, the following papers should be referenced:
@inproceedings{nguyen2023user,
title = {User Feedback-based Online Learning for Intent Classification},
author = {Gönç, Kaan and Sağlam, Baturay and Dalmaz, Onat and Çukur, Tolga and Kozat, Süleyman S. and Dibeklioğlu, Hamdi},
booktitle = {International Conference on Multimodal Interaction (ICMI 2023)},
year = {2023},
url = {https://doi.org/10.1145/3577190.3614137}
}@inproceedings{nguyen-tang2022offline,
title = {Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization},
author = {Nguyen-Tang, Thanh and Gupta, Sunil and Nguyen, A. Tuan and Venkatesh, Svetha},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://openreview.net/forum?id=sPIFuucA3F}
}