This project was created using all the skills I learned in the Google Data Analytics Professional Certificate.
In order to answer the business question, I'm following all steps in the data analysis process: Ask, Prepare, Process, Analyse, Share and Act.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago, IL. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a solid opportunity to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
The director of marketing, Lily Moreno, has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members.
But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Moreno has assigned you to answer the question: How do annual members and casual riders use Cyclistic bikes differently?
I'm using the last 12-month available data from divvy-tripdata (August 2023 to July 2024).
Note: The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. The data has been made available by Motivate International Inc. under this license. This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes
There are 12 tables each containing information for a specific month. The columns are: ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng and member_casual.
After importing the csv.s to MySQL, I combined all data into one table combined_data. In total, the table had 5,715,693 rows.
First thing I did was finding the duplicates. There were 211 duplicates with the same ride_id. After much investigation, I realized the 211 duplicates were all trips that started on May and ended in June that ended up on both tables. I was having trouble deleting those rows so, to make things easier, I decided to truncate the table, delete those duplicates (211) in the June table and combine the data again.
I calculated the ride length for each trip using the started_at and ended_at columns. Then deleted all trips that had negative time values, trips shorter than a minute and trips longer than a day.
Created a column indicating the day of the week of each trip for further analysis.
There were a lot a of rows with missing information on these columns: start_station_name, start_station_id, end_station_name, end_station_id, end_lat, end_lng. A total of 1,397,330 null values, about 25% of the data. Since it was a big portion of the data and I don't know how crucial those columns would be in analysis, I flagged them creating a new column: incomplete_station_data. Any incomplete data now has a 1 value in this new column.
After data cleaning, we now have 5,576,344 rows to analyse.
Wrote some queries to find trends in the data. After that, I copied the resulting tables into Excel to save as a workbook for Tableau.
First valuable information obtained was the total number of rides completed by members. As expected, the majority of rides were conducted by members.
Then I calculated the average ride length for members and casuals. The result indicated members tend to take shorter trips.
Decided to see how this value changed over the year and discovered members' ride length mostly stays consistent throughout the year, while casuals tend to take longer trips during the summer season.
I also had access to the time of day of each trip. So I extracted the hour and calculated the count of trips in each hour of the day. Realized members have two high peaks at 8AM and 17PM, indicating they are probably using the bicycles for transportation during work hours.
Using the day of the week column created during the Process step, I discovered that members use the bicycles more often during work days while casuals tend to ride more during the weekend
I extracted the month from the started_at column and had a big discovery. There is a clear season effect in the usage of bicycles, especially for casual members.
I created a calculated field on Tableau that let me make this graph, separating the months in seasons. This indicated very clearly that the warmer months are a lot more profitable to the company.
Then it was time to use the stations information. I decided to only show the most popular stations for casual users, since these are the ones we are trying to target. The result shows that the top stations for casuals are in busy places, with access to tourist attractions and/or located near Lake Michigan.
| Casuals | Members |
|---|---|
| Tend to ride on weekends | Tend to ride on week days and during work hours |
| Travel for longer, but less frequently | Travel for shorter time, but take more trips |
| Prefer the warmer months | Also travel more during summer months, but some still use the membership in colder months |
| Tend to use the bicycles for leisure or tourism | Tend to use the bikes to go to work or run errands |
Top 3 recommendations based on my analysis:
-
Create subscription plans for the summer season
Consider creating special offers and discounts on subscription plans during the summer to capitalize on the increase of bike usage. Promote it with advertisements on tourist locations. -
Promote Cycling with City Partnership
One way to convert casuals into the membership is to make them consider biking as a means of transportation for daily life. We can collaborate with the city council to encourage people to choose cycling as a sustainable commuting option. The campaign can address climate issues, highlight how traffic can improve and promote healthy choices for the population. -
Develop weekend-specific benefits
Marketing can create a new membership that lowers cost of bike rentals during the weekends. The campaign could target the most popular stations used by casual riders.







