Skip to content

radhapawar/twitter-influencer-network-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Influencer Network Analytics

Course: MSBA Social Media Analytics, Spring 2026 Team: Alisha Surabhi, Shivangi Gupta, Simoni Dalal, Rohan Chimne, Radha Pawar, Justin Yang


The starting point for this project was a frustration with how "influencer" gets defined in practice. Most tools and most brand briefs use follower count as the primary metric. It's intuitive, it's easy to measure, and it's mostly wrong.

A user with 80,000 followers who gets 200 retweets per post and whose content doesn't travel outside their immediate network has less actual influence than a user with 4,000 followers who sits at the junction between five different topic communities. The second person is a bridge — when they share something, it reaches people who would never have encountered it otherwise. The first person is just popular within a bubble.

This project is about measuring that distinction at scale, using network structure rather than raw popularity metrics.

What we did

We took a Twitter interaction dataset and built a directed graph where users are nodes and retweets/mentions/replies are edges. Then for every user in the graph, we computed four centrality measures:

  • Degree centrality — how many direct connections a user has. This is roughly what follower count captures.
  • Betweenness centrality — how often a user appears on the shortest path between other pairs of users. High betweenness = bridge node.
  • Closeness centrality — how quickly a user can reach the rest of the network in terms of graph distance.
  • Eigenvector centrality — whether a user is connected to other well-connected users. Being known by influential people matters more than being known by many people.

We then framed influencer identification as a binary classification problem and trained a logistic regression model using these four centrality measures as features.

The result that actually mattered

Accuracy came out around 84%, which is reasonable. But the more interesting result was which features drove that accuracy.

Betweenness centrality was the strongest predictor by coefficient magnitude — by a significant margin over degree centrality (follower count). The users the model was most confident about labeling as influencers were overwhelmingly the bridge nodes — people who connect otherwise-disconnected communities — rather than the nodes with the most followers.

For brand strategy that's a fairly direct implication: if you're allocating influencer marketing budget based on follower count, you're probably missing the people who would actually spread your message furthest.

Numbers

Metric Score
Accuracy ~84%
Precision ~81%
Recall ~79%

Running it

pip install pandas numpy scikit-learn matplotlib seaborn networkx
jupyter notebook Assignment_1_SMA.ipynb

tweets_sample.csv needs to be in the same directory.

Files

  • Assignment_1_SMA.ipynb — the full analysis, end to end
  • tweets_sample.csv — the tweet/interaction dataset
  • MSBA SMA S2026 Assignment 1.docx — original assignment brief

What we'd extend

The obvious next step is moving beyond binary classification to a ranked influence score that could directly power an outreach prioritization tool. You'd score every user in a brand's potential influencer pool, rank by weighted centrality combination (betweenness-heavy weighting based on this analysis), and surface the top candidates. The logistic regression gives you a probability that could serve as that score directly.

There's also a temporal dimension we didn't explore — centrality isn't static. A user's betweenness centrality can shift significantly as conversations evolve around new topics. Building a time-series version of this analysis would be a more realistic representation of how influence actually works.

About

Network analytics and ML-based social influencer identification from Twitter data using logistic regression and centrality features

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors