🗳️ LLM Survey Predictions

This project examines whether small language models (SLMs) can simulate human decisions such as political behavior where prior research has largely depended on closed, proprietary LLMs and has delivered mixed results, especially for populations outside the United States. That reliance on commericial LLMs raises concerns about privacy, reproducibility, and access. SLMs can mitigate these issues because they run locally, but their performance remains uncertain. To address this gap, the repo evaluates several SLMs by predicting Swiss voters' choices in three direct democratic votes in 2025 using vote proposal summaries, party recommendations, and commonly used voter attributes for explaining vote behavior with data from swissvotes.ch.

🔎 At a glance

📦 Local, reproducible runs with Docker + GGUF models.
🧪 Per-vote evaluation against real survey ground truth.
📈 Ready-to-share figures and CSV metrics.

✨ What this repo does

Prepares cleaned survey data with respondent profiles and issue-specific vote summaries.
Prompts local LLMs to predict binary votes (yes/no) for each respondent.
Stores model outputs alongside ground truth for analysis and visualization.

🧭 Repository layout

src/: model download, prompting, and evaluation notebooks/scripts.
data/: processed datasets, results, and summaries (raw survey data excluded).
models/: local GGUF model files (ignored by git).
docker/: container setup for reproducible runs.
start.sh: convenience script to run the Docker image.

🧩 Prompt blueprint

Each example combines (1) the vote summary, (2) party recommendations, and (3) a respondent profile. The model must answer with exactly yes or no.

System: You are simulating the Swiss voter described by the user.
        Output exactly one of: yes, no.

User:
<vote summary text>
The major parties issued the following voting recommendations:
<party position text>

You are an <age> old <gender> with <education> education that earns <income>
per month. You are <political interest> in politics and you identify with
<party>.

How would you vote? Answer exactly: yes or no.

📊 Results snapshot (current)

The data/results folder contains datasets with the ground-truth vote in the vote column and model predictions in columns named after each model (e.g., LFM). LFM, Llama-3.2, and Qwen3 results are available.

Models evaluated

Model	Params	Abbrev	Status
LiquidAI LFM2.5-1.2B-Instruct	1.2B	lfm	complete
Llama-3.2-3B-Instruct	3B	llama32	complete
Qwen3-4B-Instruct-2507	4B	qwen3	complete

🔍 Insights

🧠 Summary: SLMs remain highly sensitive to prompt design and do not reliably outperform a simple majority-class baseline.
🧭 Qwen3 edges out the other SLMs overall, driven by a strong result on the Umweltverantwortung vote.
📌 LFM performs slightly better than the baseline, but analysis by vote shows this is driven by a single vote.
⚖️ Llama shows strong performance for one vote as well but does not consistently outperform the baseline across multiple votes.
📉 Majority baseline stays competitive or ahead for Eigenmietwert and E-ID.
🧰 Fine-tuning might help, but SLMs still have a long way to go before they can reliably employed to simulate human voting behavior.
🔄 Future work could explore which features predict miss-classification once SLMs consistently outperform the majority-class baseline.

📈 Figures

🧮 Metrics (overall)

Computed on the 3,883 examples. Majority is a baseline that always predicts the most common class.

Model	Accuracy	Precision	Recall	F1	Specificity	Balanced acc.
LFM	0.604	0.599	0.609	0.604	0.599	0.604
Llama-3.2	0.593	0.679	0.341	0.454	0.842	0.591
Qwen3	0.609	0.636	0.495	0.557	0.722	0.608
Majority	0.504	n/a	0.000	n/a	1.000	0.500

Metrics are saved to data/figures_tables/metrics_overall.csv and the per-metric export to data/figures_tables/model_metrics.csv.

🧾 Metrics by Vote (677-679)

Computed on the 3,883 examples and saved to data/figures_tables/metrics_by_vote_type.csv. Majority is a baseline that always predicts the most common class.

Model	Vote	n	Yes	No	Yes share	Accuracy	F1	Balanced acc.
LFM	677 Umweltverantwortung	1,514	535	979	0.353	0.668	0.458	0.607
Llama-3.2	677 Umweltverantwortung	1,514	535	979	0.353	0.715	0.410	0.617
Qwen3	677 Umweltverantwortung	1,514	535	979	0.353	0.740	0.623	0.710
Majority	677 Umweltverantwortung	1,514	535	979	0.353	0.647	n/a	0.500
LFM	678 Eigenmietwert	1,180	715	465	0.606	0.564	0.680	0.509
Llama-3.2	678 Eigenmietwert	1,180	715	465	0.606	0.566	0.638	0.549
Qwen3	678 Eigenmietwert	1,180	715	465	0.606	0.579	0.654	0.558
Majority	678 Eigenmietwert	1,180	715	465	0.606	0.606	0.755	0.500
LFM	679 E-ID	1,189	676	513	0.569	0.562	0.613	0.554
Llama-3.2	679 E-ID	1,189	676	513	0.569	0.464	0.147	0.525
Qwen3	679 E-ID	1,189	676	513	0.569	0.473	0.335	0.511
Majority	679 E-ID	1,189	676	513	0.569	0.569	0.725	0.500

⚡ Quickstart

Download models (uses Hugging Face Hub): python3 src/download_models.py
Run model inference: python3 src/run_models.py
Inspect outputs in data/results/.
Generate figures and metrics tables: python3 src/analysis_results.py

📝 Notes

Results files are named <abbrev>dat.parquet and include all original respondent features plus model predictions in a new column.
Use SKIP_MODELS=llama32,qwen3 to skip specific models.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docker		docker
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗳️ LLM Survey Predictions

🔎 At a glance

✨ What this repo does

🧭 Repository layout

🧩 Prompt blueprint

📊 Results snapshot (current)

🔍 Insights

📈 Figures

🧮 Metrics (overall)

🧾 Metrics by Vote (677-679)

⚡ Quickstart

📝 Notes

About

Uh oh!

Releases

Packages

Languages

AndreWal/SLM_survey_predictions

Folders and files

Latest commit

History

Repository files navigation

🗳️ LLM Survey Predictions

🔎 At a glance

✨ What this repo does

🧭 Repository layout

🧩 Prompt blueprint

📊 Results snapshot (current)

🔍 Insights

📈 Figures

🧮 Metrics (overall)

🧾 Metrics by Vote (677-679)

⚡ Quickstart

📝 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages