Twitter Influencer Network Analytics

Course: MSBA Social Media Analytics, Spring 2026 Team: Alisha Surabhi, Shivangi Gupta, Simoni Dalal, Rohan Chimne, Radha Pawar, Justin Yang

The starting point for this project was a frustration with how "influencer" gets defined in practice. Most tools and most brand briefs use follower count as the primary metric. It's intuitive, it's easy to measure, and it's mostly wrong.

A user with 80,000 followers who gets 200 retweets per post and whose content doesn't travel outside their immediate network has less actual influence than a user with 4,000 followers who sits at the junction between five different topic communities. The second person is a bridge — when they share something, it reaches people who would never have encountered it otherwise. The first person is just popular within a bubble.

This project is about measuring that distinction at scale, using network structure rather than raw popularity metrics.

What we did

We took a Twitter interaction dataset and built a directed graph where users are nodes and retweets/mentions/replies are edges. Then for every user in the graph, we computed four centrality measures:

Degree centrality — how many direct connections a user has. This is roughly what follower count captures.
Betweenness centrality — how often a user appears on the shortest path between other pairs of users. High betweenness = bridge node.
Closeness centrality — how quickly a user can reach the rest of the network in terms of graph distance.
Eigenvector centrality — whether a user is connected to other well-connected users. Being known by influential people matters more than being known by many people.

We then framed influencer identification as a binary classification problem and trained a logistic regression model using these four centrality measures as features.

The result that actually mattered

Accuracy came out around 84%, which is reasonable. But the more interesting result was which features drove that accuracy.

Betweenness centrality was the strongest predictor by coefficient magnitude — by a significant margin over degree centrality (follower count). The users the model was most confident about labeling as influencers were overwhelmingly the bridge nodes — people who connect otherwise-disconnected communities — rather than the nodes with the most followers.

For brand strategy that's a fairly direct implication: if you're allocating influencer marketing budget based on follower count, you're probably missing the people who would actually spread your message furthest.

Numbers

Metric	Score
Accuracy	~84%
Precision	~81%
Recall	~79%

Running it

pip install pandas numpy scikit-learn matplotlib seaborn networkx
jupyter notebook Assignment_1_SMA.ipynb

tweets_sample.csv needs to be in the same directory.

Files

Assignment_1_SMA.ipynb — the full analysis, end to end
tweets_sample.csv — the tweet/interaction dataset
MSBA SMA S2026 Assignment 1.docx — original assignment brief

What we'd extend

The obvious next step is moving beyond binary classification to a ranked influence score that could directly power an outreach prioritization tool. You'd score every user in a brand's potential influencer pool, rank by weighted centrality combination (betweenness-heavy weighting based on this analysis), and surface the top candidates. The logistic regression gives you a probability that could serve as that score directly.

There's also a temporal dimension we didn't explore — centrality isn't static. A user's betweenness centrality can shift significantly as conversations evolve around new topics. Building a time-series version of this analysis would be a more realistic representation of how influence actually works.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Assignment_1_SMA.ipynb		Assignment_1_SMA.ipynb
README.md		README.md
tweets_sample.csv		tweets_sample.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Influencer Network Analytics

What we did

The result that actually mattered

Numbers

Running it

Files

What we'd extend

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Influencer Network Analytics

What we did

The result that actually mattered

Numbers

Running it

Files

What we'd extend

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages