Twitter Embeddings Analysis

Analyze and visualize your Twitter activity using semantic embeddings, clustering, and similarity search. See what topics you've tweeted about and compare it to others. Powered by @chromadb

Features

Tweet Scraping: Extract tweets using twscrape
Semantic Embeddings: Convert tweets to vectors using local model (all-mpnet-base-v2 via sentence transformers library) or OpenAI (text-embedding-3-small)
Clustering: Group and visualize similar tweets using UMAP + HDBSCAN
Interactive Visualization: 2D/3D plots with animation option
Semantic Search: Find tweets by meaning

Demo Link

Twitter

Quick Start

1. Clone and Setup

git clone <your-repo-url>
cd twitter-embeddings
python -m venv env
# source env/bin/activate  # macOS/Linux
# or: env\Scripts\activate  # Windows

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

cp .env.example .env

Edit .env with your API credentials:

Twitter/X tokens (required for scraping)
OpenAI API key (optional, for better embeddings)

4. Run the Pipeline

# Step 1: Scrape tweets
python extractTW.py

# Edit this line (68) for the user and tweet # to scrape (rec 200-300):
new_tweets = await fetch_user_tweets(api, "michaelyhan_", limit=250)

# Step 2: Create embeddings and cluster
python cluster.py

# Step 3: Search your tweets
python search.py

Usage Guide

Tweet Extraction (`extractTW.py`)

Scrapes tweets from specified users
Removes URLs, mentions, hashtags
Saves to tweets.json
Handles duplicate detection

Clustering Analysis (`cluster.py`)

Choose embedding provider (local or OpenAI)
Select users to analyze (single or compare two)
Adjust clustering parameters
Visualize in 2D/3D with animations

Visualization Controls:

t - Toggle between representative tweets and keywords
r - Show/hide all tweet text

Semantic Search (`search.py`)

Search across all tweets or filter by user
Find semantically similar content
Works with both local and OpenAI embeddings

Configuration

Twitter API Setup

Get your browser cookies from Twitter/X
Extract auth_token and ct0 values
Add to .env file

OpenAI Setup (Optional)

Get API key from OpenAI
Add OPENAI_API_KEY to .env
Choose "OpenAI API" option when running cluster.py

Troubleshooting

Import Errors: Make sure virtual environment is activated and requirements installed

Embedding Dimension Mismatch: Collections created with different embedding providers can't be mixed

No Collections Found: Run cluster.py first to create embeddings before using search.py

Twitter Scraping Issues: Check that your browser cookies are valid and up-to-date

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
animate.py		animate.py
cluster.py		cluster.py
embedTweets.py		embedTweets.py
extractTW.py		extractTW.py
requirements.txt		requirements.txt
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Embeddings Analysis

Features

Demo Link

Quick Start

1. Clone and Setup

2. Install Dependencies

3. Configure Environment

4. Run the Pipeline

Usage Guide

Tweet Extraction (`extractTW.py`)

Clustering Analysis (`cluster.py`)

Semantic Search (`search.py`)

Configuration

Twitter API Setup

OpenAI Setup (Optional)

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Embeddings Analysis

Features

Demo Link

Quick Start

1. Clone and Setup

2. Install Dependencies

3. Configure Environment

4. Run the Pipeline

Usage Guide

Tweet Extraction (extractTW.py)

Clustering Analysis (cluster.py)

Semantic Search (search.py)

Configuration

Twitter API Setup

OpenAI Setup (Optional)

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Tweet Extraction (`extractTW.py`)

Clustering Analysis (`cluster.py`)

Semantic Search (`search.py`)

Packages