2022 ITM DB_WEB Hadoop Project - User analysis from Twitter log data
- Team5
Data Collecting : Emilie Greeker
Data Engineering : Jooseung Lee
Data Analysis : Suho Lee ✌️
Web : Jaeyou Lee
- Term
2022 1st semester of ITM Programme 05/01 ~ 05/31
- Got an A+ grade on this lecture and high score on project as well 💯
- Original Plan
Flume : Connection and Storage Flume Client And Agent Obstacles
- New Plan
Combination with C#
Microsoft.Net.Http
- API testing with Postman
- C# Script
Flume Client Substitute
Quick and Effective collection
- Result of log data collection
1018 user log data files
1,000,000 user log data in JSON format.
Name Node
Resource Manager
Worker Nodes
hadoop / Java /Setting Environment Variables
local - worker node3 instance local - put HDFS
- Result of Pyspark(Partitioned) => Bringing out to local
1. Average ‘Tweet count’ by time of day
Based on the analysis of Twitter log data, which indicates that most users upload tweets between 20:00 PM and 24:00 PM.
- Optimal Timing:
it is recommended to schedule important or high-engagement tweets during this time frame.
- Targeted Content:
Analyzing the type of content that performs well during this time period can provide insights for creating targeted content
- Engage with Users:
Monitor relevant hashtags, join conversations, and respond to user queries or comments in real-time.
2. tweet_count and followers_count
It was found that most users had a value of 0 in tweet_count and followers_count. This is followed by users with a value of 1-10, followed by users with a value of 100 or greater.
- Targeting new users:
Users with a tweet_count and followers_count of 0 are identified as users who are not yet actively tweeting or have many followers.
- Build a small community:
Users with a tweet_count and followers_count of 1-10 have already posted a few tweets and have a small number of followers. A small community can be formed through communication and interaction with these users.
- Target influential users:
you can encourage their support and sharing through collaboration, partnership, brand mention, etc.
3. Location Analysis
prepare to make heat-map for each continent Before done, Location should be converted into 'latitude' and longitude' based on location
4. User Language Data Analysis
Using word-cloud














