Skip to content

Toolkit for the paper "Explainable Subjective Stance Classification with SetFit in Political Discourse". The project leverages the SetFit few-shot learning framework, Sentence Transformers architecture, and traditional linguistic ML to enhance explainability in stance classification.

Notifications You must be signed in to change notification settings

pacoreyes/stance_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explainable Subjective Stance Classification with SetFit in Political Discourse

This repository contains the code and resources for the paper "Explainable Subjective Stance Classification with SetFit in Political Discourse". The project leverages the SetFit few-shot learning framework, Sentence Transformers architecture, and traditional linguistic approaches to enhance explainability in stance classification for political discourse.

Project Overview

Stance classification in NLP is a crucial tool for understanding political discourse and the attitudes underlying political statements. This research addresses the challenge of limited annotated datasets in political science by proposing a practical sentence-level dataset for binary subjective stance classification—support or oppose—using the SetFit few-shot learning framework.

The project focuses on identifying linguistic markers that predict subjective stance toward explicitly identified political targets or policy issues, using:

  • SetFit few-shot learning framework
  • Sentence Transformers architecture
  • Traditional linguistic approaches for explainability
  • SHAP (SHapley Additive exPlanations) analysis

Academic Context

This paper is part of the doctoral research at the Institute of Computer Science, Brandenburgische Technische Universität Cottbus-Senftenberg by Juan-Francisco Reyes.

Repository Structure

Data Processing and Dataset Creation

  • paper_b_1_dataset_extract_sentences.py: Extracts sentences from source texts
  • paper_b_2_dataset_build_pools.py: Builds data pools for annotation
  • paper_b_3_dataset_build_unlabeled.py: Creates unlabeled dataset
  • paper_b_4_dataset_build_dataset_gsheets.py: Builds dataset from Google Sheets
  • paper_b_4_dataset_filter_unlabeled.py: Filters unlabeled data
  • paper_b_5_dataset_preprocess.py: Preprocesses dataset
  • paper_b_6_dataset_1a_split.py: Splits dataset into train/validation/test sets

Feature Extraction and Analysis

  • paper_b_7_dataset_1a_feature_extraction.py: Extracts linguistic features
  • paper_b_7_dataset_1a_feature_extraction_token_level.py: Token-level feature extraction
  • paper_b_8_dataset_1a_feature_aggregation_binary.py: Aggregates features for binary classification
  • paper_b_9_frames_chi2.py: Chi-square analysis for frames
  • paper_b_10_logistic_regression.py: Logistic regression baseline model
  • paper_b_11_shap_analysis.py: SHAP analysis for model explainability
  • paper_b_12_shap_aggregate_and_rank.py: Aggregates and ranks SHAP values
  • paper_b_13_dataset_1a_data_analysis_binary.py: Binary data analysis
  • paper_b_14_dataset_1a_data_processing.py: Data processing for analysis
  • paper_b_15_dataset_1a_feature_aggregation_1.py: Feature aggregation

Model Training and Inference

  • paper_b_16_rb_frameBERT_sentence_analizer.py: FrameBERT sentence analyzer
  • paper_b_17_rb_frameBERT_visualizer.py: FrameBERT visualizer
  • paper_b_18_lime_analysis.ipynb: LIME analysis for model explainability
  • paper_b_19_dl_chat_gpt_inference.py: ChatGPT inference for comparison
  • paper_b_20_dl_setfit_train.py: Trains the SetFit model
  • paper_b_21_dl_setfit_hop.py: Hyperparameter optimization for SetFit
  • paper_b_22_dl_setfit_inference.py: Performs inference using the trained SetFit model
  • paper_b_23_dl_setfit_test.py: Tests the SetFit model performance

Library Files (lib/)

The lib folder contains utility modules and specialized components for linguistic analysis and stance classification. Note that some of these files are used directly by the root Python files, while others serve as supporting modules or resources for other library files:

Utility Functions

  • utils.py: General utility functions for file operations (JSON, JSONL, TXT) and Google Sheet interactions
  • utils2.py: Dataset manipulation utilities including deduplication, anonymization, and stratified splitting
  • utils_db.py: Database utility functions

Text Processing

  • text_utils.py: Comprehensive text preprocessing functions for cleaning and normalizing text
  • linguistic_utils.py: Utilities for linguistic analysis including checking sentence structure
  • count_tokens.py: Functions for token counting and analysis

Stance Lexicons

  • stance_markers_adj.py: Adjective-based stance markers for identifying stance expressions
  • stance_markers_adv.py: Adverb-based stance markers for identifying stance expressions
  • stance_markers_verb.py: Verb-based stance markers for identifying stance expressions
  • stance_markers_modals.py: Modal verb-based stance markers for expressing certainty and possibility

Semantic Analysis

  • frames.py: Semantic frame definitions for understanding conceptual structures in text
  • semantic_frames.py: Functions for processing and analyzing semantic frames
  • issues_matcher.py: Custom named entity recognition for identifying political issues

Visualization

  • visualizations.py: Functions for creating visualizations of analysis results including confusion matrices, dependency trees, and feature distributions

Key Features

The study leverages several approaches to enhance explainability:

  1. Corpus Linguistics: Analysis of language patterns in political discourse
  2. Tailored Lexicons: Custom lexicons for political language analysis
  3. Lexicogrammatical Rules: Rules based on linguistic structures
  4. SHAP Analysis: Quantifies the influence of linguistic features on model decisions

The project identifies eight distinct linguistic features for stance classification:

  • Positive affect
  • Negative affect
  • Pro polarity
  • Con polarity
  • Certainty
  • Emphatics
  • Doubt
  • Hedges

Usage

  1. Dataset Preparation: Run the dataset extraction and preprocessing scripts
  2. Feature Extraction: Extract linguistic features using the feature extraction scripts
  3. Model Training: Train the SetFit model using paper_b_20_dl_setfit_train.py
  4. Model Evaluation: Evaluate the model using paper_b_23_dl_setfit_test.py
  5. Inference: Perform inference on new data using paper_b_22_dl_setfit_inference.py
  6. Explainability Analysis: Analyze feature importance using SHAP and LIME analysis scripts

Requirements

The project dependencies are listed in the requirements.txt file. Install them using:

pip install -r requirements.txt

Key dependencies include:

  • setfit
  • transformers
  • sentence-transformers
  • datasets
  • scikit-learn
  • shap
  • spacy
  • torch
  • pandas
  • matplotlib

Research Findings

The findings demonstrate the efficacy of few-shot learning in subjective stance classification and highlight the importance of linguistic features, particularly pro/con polarity and affective expressions. The StanceSentences dataset and the hybrid analytical approach offer a benchmark for future research, emphasizing the need for nuanced, multi-layered analysis in political discourse.

Citation

If you use this code or the findings in your research, please cite the original paper:

Reyes, J. F. (2024). Explainable Subjective Stance Classification with SetFit in Political Discourse. Institute of Computer Science, Brandenburgische Technische Universität Cottbus-Senftenberg.

License

This project is released under the MIT License.

About

Toolkit for the paper "Explainable Subjective Stance Classification with SetFit in Political Discourse". The project leverages the SetFit few-shot learning framework, Sentence Transformers architecture, and traditional linguistic ML to enhance explainability in stance classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published