Analyze and visualize your Twitter activity using semantic embeddings, clustering, and similarity search. See what topics you've tweeted about and compare it to others. Powered by @chromadb
- Tweet Scraping: Extract tweets using twscrape
- Semantic Embeddings: Convert tweets to vectors using local model (all-mpnet-base-v2 via sentence transformers library) or OpenAI (text-embedding-3-small)
- Clustering: Group and visualize similar tweets using UMAP + HDBSCAN
- Interactive Visualization: 2D/3D plots with animation option
- Semantic Search: Find tweets by meaning
git clone <your-repo-url>
cd twitter-embeddings
python -m venv env
# source env/bin/activate # macOS/Linux
# or: env\Scripts\activate # Windowspip install -r requirements.txtcp .env.example .envEdit .env with your API credentials:
- Twitter/X tokens (required for scraping)
- OpenAI API key (optional, for better embeddings)
# Step 1: Scrape tweets
python extractTW.py
# Edit this line (68) for the user and tweet # to scrape (rec 200-300):
new_tweets = await fetch_user_tweets(api, "michaelyhan_", limit=250)
# Step 2: Create embeddings and cluster
python cluster.py
# Step 3: Search your tweets
python search.py- Scrapes tweets from specified users
- Removes URLs, mentions, hashtags
- Saves to
tweets.json - Handles duplicate detection
- Choose embedding provider (local or OpenAI)
- Select users to analyze (single or compare two)
- Adjust clustering parameters
- Visualize in 2D/3D with animations
Visualization Controls:
t- Toggle between representative tweets and keywordsr- Show/hide all tweet text
- Search across all tweets or filter by user
- Find semantically similar content
- Works with both local and OpenAI embeddings
- Get your browser cookies from Twitter/X
- Extract
auth_tokenandct0values - Add to
.envfile
- Get API key from OpenAI
- Add
OPENAI_API_KEYto.env - Choose "OpenAI API" option when running
cluster.py
Import Errors: Make sure virtual environment is activated and requirements installed
Embedding Dimension Mismatch: Collections created with different embedding providers can't be mixed
No Collections Found: Run cluster.py first to create embeddings before using search.py
Twitter Scraping Issues: Check that your browser cookies are valid and up-to-date