In recent years, sentiment analysis on social media posts has emerged as a key tool for understanding collective mood, the diffusion of opinions and public perception in real time. This particular type of analysis has proven useful in a wide range of contexts, including monitoring reactions to political events, communication campaigns, health crises and marketing campaigns.
In this context, the present work aims to explore and compare different sentiment analysis approaches, while withstanding increasing levels of modeling and computational complexity. The analysis is conducted on a subset of texts extracted from posts from the social media Bluesky, an emerging platform that is rapidly gaining popularity as a decentralized alternative to traditional social networks.
The goal is to evaluate the performance of the models with respect to computational constraints, accuracy and simplicity of implementation, with particular attention to the trade-off between classification quality and computational cost. In particular, the following will be compared:
- Classifical machine learning methods, such as: Random Forest and Naive Bayes;
- Deep Learning methods, such as: MLP, bidirectional RNN, BERTweet and RoBERTa.
To know more about the technical details of this work, you can access the technical report under deliverables folder.
At the moment, only the italian version is available. An english version is under development and will be updated soon.
The original dataset employed in this study is available at withalim/bluesky-posts.
The extracted and processed subset employed for training and validation is available at the following link: here.
Our findings are somewhat unexpected: simpler models, such as MLP and Naive Bayes, outperformed more complex architectures, including RNNs and pre-trained models like BERTweet and RoBERTa.
This underscores the principle that model complexity does not always translate to better performance. This is particularly important in resource-constrained environments, where efficiency and simplicity can offer significant advantages.
The results are summarized in the table below.
| Model | Accuracy | Precision | Recall | F1-Score | Training time | # Parametri |
|---|---|---|---|---|---|---|
| RandomForest | 0.60 | 0.58 | 0.59 | 0.60 | 4 hours | 1746 |
| MLPClassifier | 0.65 | 0.65 | 0.66 | 0.65 | 2 hours | 1101 |
| Naive Bayes | 0.63 | 0.63 | 0.64 | 0.63 | 1 minute | 835K |
| RNN | 0.38 | 0.38 | 0.37 | 0.37 | 15 hours | 16M |
| BERTweet | 0.59 | 0.60 | 0.59 | 0.59 | - | 134M |
| RoBERTa | 0.59 | 0.60 | 0.59 | 0.59 | - | 124M |
In the following figure is presented a more direct visualization of the difference between models size and performance.
To install the necessary requirements for the project, please follow the steps below.
Verify you have Python installed on your machine. The project is compatible with Python 3.10 or higher.
If you do not have Python installed, please refer to the official Python Guide.
It's strongly recommended to create a virtual environment for the project and activate it before proceeding.
Feel free to use any Python package manager to create the virtual environment. However, for a smooth installation of the requirements we recommend you use pip. Please refer to Creating a virtual environment.
You may skip this step, but please keep in mind that doing so could potentially lead to conflicts if you have other projects on your machine.
To clone this repository, download and extract the .zip project files using the <Code> button on the top-right or run the following command in your terminal:
git clone https://github.com/amigli/NaLA.gitTo install the requirements, please:
-
Make sure you have activated the virtual environment where you installed the project's requirements. If activated, your terminal, assuming you are using bash, should look like the following:
(name-of-your-virtual-environment) user@user path -
Install the project requirements using
pip:
pip install -r requirements.txtIf you use this project, please consider citing:
@article{costantemiglinonazzaro2025:nala,
author = {Luigina Costante, Annalaura Miglino, Angelo Nazzaro},
title = {NaLA: Natural Analysis of Language Attitudes in BlueSky Conversations},
year = {2025},
institution = {University of Salerno}
}

