FAIR-SE is a comprehensive framework for analyzing fairness and bias in search engine results across different user contexts, search engines, and topics. This project employs a multi-layered approach to collect, analyze, and statistically verify bias patterns in search results.
Paper Title: FAIR-SE: Framework for Analyzing Information Disparities in Search Engines with Diverse LLM-Generated Personas
Authors: Jaebeom You, Seung-Kyu Hong, Ling Liu, Kisung Lee, Hyuk-Yoon Kwon
Conference: Proceedings of the 34th ACM International Conference on Information and Knowledge Management
Year: 2025
DOI: ---
Published: November 10--14, 2025
Link: --
Search engines are critical information gatekeepers that shape how users access and perceive information online. However, various user contexts (such as language settings, geographic location, browser environment, and search history) can potentially influence search results, leading to personalization bubbles or systematic biases. FAIR-SE provides a systematic methodology to detect, quantify, and analyze such biases.
The project consists of four main modules, each handling a specific aspect of the fairness analysis:
FAIR-SE/
├── Context-Aware Concurrent Data Collection/ # Data collection from search engines
├── LLM Persona-based Data Analyzation/ # Analysis of collected data using LLMs
├── Statistical Significance Verification/ # Statistical testing of bias patterns
├── Prompt/ # Prompt templates for LLM analysis
└── datasets/ # Storage for collected and processed data
This module systematically collects search results from multiple search engines (Google, Bing) under various simulated user contexts.
Key features:
- Simulation of different language preferences (
accept_language) - Simulation of different geographic regions (
region) - Simulation of different browser environments (
user_agent) - Simulation of different search histories (
search_history) - Serverless architecture using AWS Lambda for distributed collection
- Concurrent data collection with error handling and retry mechanisms
This module creates a comprehensive dataset of search results across different user contexts, which serves as the foundation for subsequent analysis.
This module uses Large Language Models (LLMs) like Claude and ChatGPT to analyze the collected search results from multiple political and stance perspectives.
Key features:
- Multi-dimensional analysis (Political, Stance, Subjectivity, Bias)
- Multi-perspective evaluation using four personas:
- Left-leaning opposed (opp_left)
- Right-leaning opposed (opp_right)
- Left-leaning supportive (sup_left)
- Right-leaning supportive (sup_right)
- Structured JSON output with quantitative scores
- Robust parsing and caching mechanisms
- Handles multiple LLM models (Claude, ChatGPT)
This module transforms raw search results into structured, multi-dimensional data that can be statistically analyzed.
This module applies rigorous statistical methods to verify whether observed differences in search results across user contexts are statistically significant.
Key features:
- Normality and homogeneity testing
- Parametric (ANOVA) and non-parametric (Kruskal-Wallis) testing
- Effect size calculation
- Multiple comparison corrections (Bonferroni, Benjamini-Hochberg)
- Visualization of statistical trends
- Comprehensive error handling and NaN management
This module provides scientific evidence for the presence or absence of search engine bias across different user contexts.
This module provides various prompt templates and guidelines used for LLM analysis.
Key features:
- Persona definitions for diverse political perspectives (left/right, opposed/supportive)
- Structured evaluation guidelines for search result analysis
- Guidelines and examples for search history simulation
- Consistent LLM output format definitions
This module helps LLMs analyze search results in a consistent and systematic manner.
-
Collection: The Context-Aware Concurrent Data Collection module gathers search results from multiple search engines under various user contexts.
-
Storage: Results are stored in the
datasetsdirectory, organized by date, search engine, and context type. -
Analysis: The LLM Persona-based Data Analyzation module processes the collected data, evaluating each search result across multiple dimensions and perspectives.
-
Statistical Testing: The Statistical Significance Verification module performs rigorous statistical testing to determine the significance of observed patterns.
-
Visualization: Results are visualized to highlight significant bias patterns and trends.
FAIR-SE addresses the following key research questions:
- Do search engines deliver different results based on user contexts?
- Are there systemic political or topical biases in search results?
- How do these biases vary across different search engines?
- Which user contexts have the most significant impact on search results?
- Are these differences statistically significant with meaningful effect sizes?
Each module has its own README.md with detailed instructions on setup and usage. In general, the modules should be run in sequence:
- First, collect data using the Context-Aware Concurrent Data Collection module
- Then, analyze the collected data using the LLM Persona-based Data Analyzation module
- Finally, perform statistical verification using the Statistical Significance Verification module
- Python 3.7+
- AWS account (for serverless functions)
- API keys for supported LLMs (Claude, ChatGPT)
- Required Python packages (see individual module READMEs)
The project includes analysis for several controversial topics:
- Russia Ukraine
- Trump Harris
- Israel Hamas
- Immigration
- Abortion
- LGBT
- Marijuana
Potential extensions to the FAIR-SE framework include:
- Support for additional search engines
- More granular user context simulations
- Advanced visualization and interactive dashboards
- Longitudinal analysis to track bias patterns over time
- Integration with browser extensions for real-user context analysis
This project is designed for research purposes to understand potential biases in search engines. The goal is to contribute to more transparent and fair information access systems.