A web application that analyzes online articles for bias, factual claims, and perspective using AI.
Unbias allows users to submit a URL of an online article. The backend fetches the article's content, analyzes it using Google's Gemini or OpenAI's language models for factual claims, bias, and perspective ("slant"), and displays the results. The system handles concurrent requests reliably using a background job queue and provides real-time status updates.
Key Features Include:
- 🤖 AI-Driven Analysis: Provides detailed insights into online articles covering bias, factual claims, and overall perspective using advanced language models.
- ⏱️ Real-Time Status Updates: Keeps users informed about the analysis progress from submission through fetching, processing, and completion via WebSockets.
- 🌐 Multi-Language Support: Offers both the user interface and the generated analysis content in English and German.
- 🗂️ Analysis History: Maintains a list of recently analyzed articles, persisted in the user's local browser storage for easy access and review of past results.
- 🛡️ Strategic Content Fetching: Employs Archive.is for a predefined list of major publications (often paywalled) to ensure reliable content retrieval for analysis.
- 🔗 Shareable Analysis Links: Users can copy and share unique URLs for completed analyses, allowing anyone to view the results without needing to re-analyze.
- 🔒 Cloudflare Turnstile Integration: Implements robust bot protection with a seamless human verification flow that adapts to the site's theme and persists verification across language changes.
- ⚙️ Efficient Analysis Reuse: (Configurable via an environment variable) The system intelligently checks if an article (based on its normalized URL and requested language) has already been analyzed and, if so, serves the existing results to save processing time and costs.
- User Submission:
- User submits an article URL and preferred language (EN/DE) via the Frontend.
- Turnstile Verification: User completes the Cloudflare Turnstile challenge to verify they are human.
- Frontend sends the URL, language, and Turnstile token to the Backend API (
POST /api/submit).
- Backend API (
/api/submit):- Validates the input and verifies the Turnstile token with Cloudflare before proceeding.
- URL Normalization: Normalizes the submitted URL (e.g., converts to HTTPS, removes
www., strips tracking parameters). - Analysis Reuse Check (if
REUSE_EXISTING_ANALYSISis enabled):- Queries the Database for an existing 'Complete' job matching the normalized URL and language.
- If a match is found:
- Retrieves the existing job's
jobId, optimized article data (title, author, image URL, etc.), andanalysis_results. - Returns
200 OKwith this full data payload directly to the Frontend. The Frontend then displays this existing analysis immediately.
- Retrieves the existing job's
- If no match (or feature disabled, or language differs): Proceeds to new job creation.
- New Job Creation:
- Creates a new job record in the Database (Supabase) with status 'Queued', storing the original URL, language, and
normalized_url. - Adds the job (containing
jobId, originalurl, andlanguage) to the Job Queue (BullMQ). - Returns
202 Acceptedwith the newjobIdto the Frontend.
- Creates a new job record in the Database (Supabase) with status 'Queued', storing the original URL, language, and
- Frontend (for New Job):
- Receives the new
jobId. - Updates its state (e.g.,
jobStatusto 'Queued',isLoadingto true) via Zustand. - Subscribes to WebSocket events for this
jobId. - May periodically poll
GET /api/status/:jobIdfor fallback updates or initial preview data.
- Receives the new
- Backend Worker:
- Picks up the job from the
analysis-queue. - Updates job status to 'Processing', then 'Fetching' (updates DB, emits WebSocket update via Redis Pub/Sub).
- Fetches article content using strategic approach: Archive.is snapshots via Firecrawl (for paywalled domains) or Diffbot API (fallback).
- Parses Content & Stores Optimized Data:
- Extracts
article_title,article_text(main content for LLM),article_author,article_source_name,article_canonical_url,article_preview_image_url(selected best image), andarticle_publication_date. - Stores these in their new dedicated columns in the
jobstable. - Stores other minimal, non-bulky Diffbot metadata into the
job_detailsJSONB column (full HTML is no longer stored here).
- Extracts
- Updates job status to 'Analyzing' (DB + WebSocket).
- Retrieves
article_titleandarticle_textfrom the database. - Sends these to Google Gemini API (primary) or OpenAI API (fallback) for analysis.
- Stores
analysis_results(slant, claims, bias report) from the AI model in the DB. - Updates job status to 'Complete' (DB + WebSocket, including
analysis_results). - If any step fails, updates job status to 'Failed' with an error message.
- Picks up the job from the
- Frontend (Receiving Updates/Results):
- Receives
jobUpdateevents (status changes, final results if 'Complete', or error if 'Failed'). - Updates UI accordingly.
- For 'Complete' status:
- Displays the full analysis.
- Shows the "Share this analysis" UI element with a copyable link.
- Adds the item to local history.
- Receives
- Viewing a Shared Analysis:
- User opens a shared link (e.g.,
https://unbiased.adriancares.com/[locale]/analysis/{jobId}). - The Frontend page for this route fetches data from
GET /api/status/{jobId}. - The API returns the optimized article data and analysis results for the specified
jobId. - Frontend renders the article preview and analysis. UI is in the
localefrom the URL; analysis content is in its original stored language. - If
jobIdis invalid/failed, an error page is shown.
- User opens a shared link (e.g.,
graph LR
User_New["User (New Submission)"] -->|"URL, Lang"| FE["Next.js Frontend"]
User_Shared["User (Shared Link)"] -->|"/analysis/:jobId"| FE
FE -->|"Turnstile Verification"| FE
FE -->|"POST /api/submit (URL, Lang, Turnstile Token)"| API["Backend API Server"]
FE -->|"GET /api/status/:jobId (for Shared/Status)"| API
FE -->|"WebSocket"| API
subgraph "Backend System"
API -->|"Verify Turnstile Token"| CF["Cloudflare Turnstile"]
CF -->|"Verification Result"| API
API -->|"Normalized URL, Lang"| DB["Supabase DB (Jobs Table)"]
DB -->|"Existing Analysis?"| API
API -->|"Check Reuse"| Decision{{"Reuse Enabled & Match Found?"}}
Decision -->|"Yes (200 OK, Full Analysis Data)"| FE
Decision -->|"No (No Match or Reuse Disabled)"| CreateJob["Create Job (Normalized URL)"]
CreateJob --> DB
DB -->|"New Job ID"| API
API -->|"Add to Queue"| Q["BullMQ Queue (Redis)"]
API -->|"202 Accepted (New Job ID)"| FE
Q -->|"Job"| Worker["Background Worker"]
Worker -->|"Resolve Archive.is Snapshot (ScraperAPI)"| ExtService["External Services: ScraperAPI"]
Worker -->|"Fetch Content (Firecrawl or Diffbot)"| ExtService2["External Services: Firecrawl/Diffbot"]
Worker -->|"Store Optimized Article Data (New Columns + Min. job_details)"| DB
Worker -->|"Get Title/Text for LLM"| DB
DB -->|"Title/Text"| Worker
Worker -->|"Analyze (Gemini/OpenAI)"| ExtService_AI["External Services: Gemini/OpenAI"]
Worker -->|"Store Analysis Results"| DB
Worker -->|"Status Updates"| RedisPubSub["Redis Pub/Sub"]
RedisPubSub -->|"Job Updates"| API
end
API -->|"WebSocket Updates"| FE
style User_New fill:#cde,stroke:#333
style User_Shared fill:#cde,stroke:#333
style FE fill:#lightcyan,stroke:#333
style API fill:#honeydew,stroke:#333
style Q fill:#peachpuff,stroke:#333
style Worker fill:#lavenderblush,stroke:#333
style DB fill:#lightblue,stroke:#333
style ExtService fill:#whitesmoke,stroke:#333
style ExtService2 fill:#whitesmoke,stroke:#333
style ExtService_AI fill:#whitesmoke,stroke:#333
style RedisPubSub fill:#mistyrose,stroke:#333
style CF fill:#f0f8ff,stroke:#333
style Decision fill:#f0fff0,stroke:#333
The application follows a distributed core architecture:
- Frontend: Next.js application (Zustand for state, next-intl for i18n). Handles UI, URL submission, Turnstile verification, real-time status display, results viewing, history, publication carousel, and displaying shared analyses via a new route (
/analysis/[jobId]). - Backend API: Node.js/Express.js server. Handles API requests (including Turnstile verification and logic for analysis reuse in
/api/submit), WebSocket management (Socket.IO + Redis Pub/Sub), DB interaction, and job queuing. The/api/status/:jobIdendpoint is enhanced to serve data for shared links. - Background Worker: Node.js process. Listens to BullMQ, performs analysis (ScraperAPI for Archive.is resolution → Firecrawl for paywalled content or Diffbot fallback → Gemini/OpenAI), parses content to store optimized data in new dedicated DB columns and minimal
job_details, updates DB, and emits status updates. - Database: Supabase (PostgreSQL). Stores job info, status, optimized article data (
article_title,article_text,article_preview_image_url, etc. in dedicated columns),normalized_urlfor reuse checks, minimaljob_details(fetch metadata),analysis_results, and errors. - Job Queue: BullMQ (Redis) for asynchronous analysis.
- Real-time Communication: WebSockets (Socket.IO + Redis Pub/Sub).
- Security: Cloudflare Turnstile for bot protection, with seamless user experience.
- Configuration Management: Environment variables validated at startup. New
REUSE_EXISTING_ANALYSISvariable to toggle analysis reuse.
- Node.js (v22.x LTS)
- npm (v10.x)
- Supabase Account (Project URL and Service Key)
- Google Gemini API Key (primary) and/or OpenAI API Key (fallback)
- Diffbot API Token
- Firecrawl API Key (for fetching archived content from Archive.is)
- Cloudflare Turnstile Account (Site Key and Secret Key)
- Redis instance
- Logo.dev API Key (for publication logos)
unbias/
├── backend/ # Node.js/Express.js backend
│ ├── src/
│ │ ├── api/ # API routes (submit with Turnstile verification, status, results, history, image-proxy)
│ │ ├── config/ # Configuration (env validation, app config incl. reuse toggle)
│ │ ├── db/ # Supabase client, jobsRepository (handles new cols, normalized_url)
│ │ ├── lib/ # Core libraries (Diffbot, OpenAI, Redis, Sockets, utils incl. URL normalization, logger)
│ │ ├── queues/ # BullMQ queue setup
│ │ ├── scripts/ # Migration scripts (e.g., for DB optimization)
│ │ ├── types/ # TypeScript type definitions (updated for new DB fields)
│ │ └── workers/ # analysisWorker.ts (logic for optimized data saving)
│ ├── .env.example # Example environment variables (incl. REUSE_EXISTING_ANALYSIS, CLOUDFLARE_TURNSTILE_SECRET_KEY)
│ └── package.json
├── frontend/ # Next.js frontend
│ ├── src/
│ │ ├── app/
│ │ │ └── [locale]/
│ │ │ ├── analysis/
│ │ │ │ └── [jobId]/
│ │ │ │ └── page.tsx # New: Page for displaying shared analyses
│ │ │ └── page.tsx # Main page
│ │ ├── components/
│ │ │ ├── TurnstileWidget.tsx # Cloudflare Turnstile integration
│ │ │ ├── ShareableLink.tsx # Shareable link component
│ │ │ ├── PublicationCarousel.tsx # Scrolling publication logos
│ │ │ ├── BiasScoreMeter.tsx # Gradient bias meter visualization
│ │ │ └── ... # Other React components
│ │ ├── lib/ # Core libraries (apiClient, socketClient, store updated for reuse/sharing)
│ │ └── ...
│ └── ...
├── LOGGING.md
└── README.md # This file
cd backendnpm install- Create
.envfrom.env.example, fill credentials.- Required: Set
CLOUDFLARE_TURNSTILE_SECRET_KEYfor bot protection. - Optional: Consider setting
REUSE_EXISTING_ANALYSIS=trueorfalse.
- Required: Set
- Run Supabase migrations if setting up the DB for the first time or after schema changes (e.g., for optimized columns,
normalized_url). npm run dev
cd frontendnpm install- Create
.env.localfrom.env.local.examplewith:- Backend URLs
NEXT_PUBLIC_CLOUDFLARE_TURNSTILE_SITE_KEYfor the Turnstile widgetNEXT_PUBLIC_LOGO_DEV_API_KEYfor publication logos
npm run dev(Access athttp://localhost:3000)
For more detailed frontend setup instructions, environment variables, and component information, please refer to the Frontend README.
POST /api/submit: Submit a URL for analysis.- Body:
{ "url": "string", "language"?: "en" | "de", "cf-turnstile-response": "string" } - Response (202 Accepted):
{ "jobId": "string (new job)" } - Response (200 OK - if reuse enabled & match found):
{ "existingAnalysis": true, "jobId": "string", "language": "string", "url": "string", "article_title": "string|null", ..., "analysis_results": {} }
- Body:
GET /api/status/:jobId: Get status and data for a job. Serves full data for 'Complete' jobs to support sharing.- Response (200 OK): Job object including
status, all optimized article fields (e.g.,article_title,article_preview_image_url),analysis_results(if complete),normalized_url.
- Response (200 OK): Job object including
GET /api/results/:jobId: (As before, preferred for completed job results).GET /api/image-proxy: (As before).GET /api/history: (As before, butheadlineandarticle_preview_image_urlnow sourced from new/optimized columns with fallback).GET /api/health: (As before).GET /api/debug/:jobId: (As before).
(No changes to event names or structures from this update, but the data flow leading to jobUpdate for 'Complete' might be short-circuited by the reuse feature).
- TypeScript: Programming language for frontend and backend (v5.x).
- Next.js / React: Frontend framework (Next.js v15.5.4, React v18.3.1).
- Zustand: Frontend state management (v5.x).
- Node.js / Express.js: Backend framework (Node.js v22.x, Express.js v4.x).
- PostgreSQL (Supabase): Database for job storage.
- Redis: In-memory data store for BullMQ and WebSocket Pub/Sub.
- BullMQ: Job queue library for managing asynchronous analysis tasks (v5.x).
- Socket.IO: Real-time bidirectional event-based communication (WebSockets v4.x).
- Google Gemini API: Primary AI service for text analysis (v1.x, Model: gemini-2.0-flash-exp).
- OpenAI API: Fallback AI service for text analysis (SDK v4.x, Model: gpt-4o).
- Diffbot API: For robust article content extraction (fallback strategy).
- Firecrawl: Primary content extraction from Archive.is snapshots for paywalled domains (via @mendable/firecrawl-js v4.x).
- ScraperAPI: For resolving Archive.is snapshot URLs to bypass rate limits.
- Mozilla Readability: Content parsing and cleanup.
- Cloudflare Turnstile: Bot protection service with seamless user experience.
- Logo.dev: API for publication logo retrieval.
- Tailwind CSS: Utility-first CSS framework for the frontend (v4.x).
- Shadcn/ui: UI components for the frontend.
- Vitest: Unit and component testing (Backend v3.x, Frontend v3.x).
- Axios: HTTP client (v1.x).
- Sharp: Image processing (v0.34.x).
- next-intl: Internationalization for Next.js (v4.x).
A flexible, environment-aware logging system is implemented for both frontend and backend. It supports multiple log levels (error, warn, info, debug, trace) controllable via environment variables (LOG_LEVEL for backend, NEXT_PUBLIC_LOG_LEVEL for frontend) and browser localStorage. For detailed information, refer to LOGGING.md.
The frontend uses Zustand for global state management. This includes:
- Current job ID, status, and analysis data.
- Error messages.
- Loading states.
- History of analyzed articles, which is also persisted to localStorage for continuity between sessions.
The store module (
frontend/src/lib/store.ts) centralizes logic for API calls, WebSocket interactions, and state updates.
To handle paywalls and improve content extraction reliability for certain domains (e.g., major news publications), the backend implements a strategic multi-tier fetching approach:
-
Archive.is with Firecrawl: For domains listed in
backend/src/config/index.ts(proactiveArchiveDomains- currently 50 major publications across US, UK, Germany, France, Italy, Spain, and international sources), the system:- First resolves the latest Archive.is snapshot of the article URL via ScraperAPI
- Then uses Firecrawl to extract clean, structured content from the Archive.is snapshot
- This bypasses paywalls while maintaining reliable content extraction
-
Diffbot Fallback: If archive resolution fails or the domain is not in the proactive list, the system falls back to Diffbot's standard content extraction
-
Mozilla Readability: Additional fallback parsing for enhanced content extraction when needed
The chosen strategy is recorded in job_details under fetchStrategy and isArchiveContent for transparency.
This project is licensed under the MIT License.
