Cyclistic Bike-Share
How can a bike-share company use data-driven insights to increase membership?
_______________________________________________________________
Cyclistic Bike-Share
How can a bike-share company use data-driven insights to increase membership?
_______________________________________________________________
Introduction
In this exercise, I’m taking the role of a data analyst working for the marketing team at Cyclistic, a Chicago bike-share company. I’ll be analyzing this publicly-available dataset of Cyclistic historical trip data to search for trends, and ultimately make recommendations to the company.
Cyclisitic is a bike-share company that features more than 5,800 bicycles and 600 docking stations across the city of Chicago. The company offers multiple plans: single-ride passes, full-day passes, and annual memberships. Customers who opt for the single-ride or full-day option are referred to by Cyclistic as "casual" riders, whereas annual membership holders are called "members".
The company wants to increase the use of its bikes, customer satisfaction, and specifically annual membership, which previous analysis has found to be more profitable than casual rides. My job as a marketing analyst is to provide data-backed recommendations for achieving these goals.
Note: Because this is a work sample meant to showcase my methods, I’ve included a lot more technical details about the specific steps I took than I would present in a real-life work situation. On the job, my stakeholders would most likely be executives interested in analysis results, not the specific data cleaning and analysis steps I took in Excel.
Data Cleaning
First, I needed to clean the data so that it was ready to work with.
1) Preparation
a. Made a copy of the dataset to preserve the original data
b. Loaded the .csv file into excel, ensuring that the delimiter was set correctly
2) Inspected the Data
a. Looked for missing values, duplicates, and inconsistencies
b. Took note of the data types present
3) Removed Duplicates
a. Searched for and removed duplicate rows using Excel’s “remove duplicates” option
4) Handled Missing Values
a. Searched for empty cells using filters
b. Filled in or estimated missing values if possible
c. Deleted rows with irreplaceable missing values
5) Standardized Date Formats
a. Ensured that all dates and times were in a consistent format
6) Handled Outliers
a. Searched for possible errors by filtering for numbers much smaller or larger than expected
b. Replaced extreme values, or deleted them if necessary
Process the Data
Great, now the data was clean, but it still wasn't ready for real analysis. I needed some extra columns that didn't exist yet. I wanted to see each ride's length (in time), the day of the week, and I wanted to calculate the distance of each ride based on the start and end coordinates.
1) Created ride_length column by subtracting started_at time from ended_at, and formatted as HH:MM:SS using Excel formatting
2) Created day_of_week column calculating the day of the week for each ride start time, which output a number (1-7) corresponding to the weekday (1 being Sunday) (=WEEKDAY(C2,1))
3) Created weekday column displaying the day of the week as a word rather than a number (=IFS(O2=1, "Sunday", O2=2, "Monday", O2=3, "Tuesday", O2=4, "Wednesday", O2=5, "Thursday", O2=6, "Friday", O2=7, "Saturday"))
4) Created columns for the difference between the start and end latitude and start and end longitude
5) Converted lat_difference and lng_difference from degrees to miles by multiplying lat_difference by 69 (the number of miles in a latitude degree) and lng_difference by 45 (the number of miles in a latitude degree at latitude 41.9 degrees, which is where the data was collected)
6) Created total_distance_mi column by using the Pythagorean Theorem on the latitude and longitude distance with this equation: =SQRT((S2^2)+(T2^2))
Analysis
Finally, time for the main attraction: the data analysis! I needed to get some answers that I could translate into actionable recommendations for Cyclistic. The first few steps were pretty simple.
1) Calculated the median ride_length
2) Calculated the mean ride_length
3) Calculated the max ride_length
4) Calculated the mode day_of_week (3 = Tuesday)
The next steps in my analysis required some pivot tables.
5) Created a pivot table comparing average distance traveled per trip between members vs casual riders
6) Created pivot table comparing the average ride length on each day of the week for members vs casual riders
7) Created a pivot table and corresponding grouped bar chart comparing total number of rides taken per day of the week for members vs casual riders
Observations
1) Although the mean ride length was only 9 minutes 10 seconds, the maximum “ride length” was 2,607 hours, which is 108 days.
2) The average length for casual riders was 1 hour 35 minutes, while the average for members was under 13 minutes.
3) Ride distances (the distance between the start and end point) were very similar for casual riders and members, at 1.26 miles and 1.09 miles, respectively.
4) Casual riders rode the most on weekends, particularly Sunday
5) Members rode the most on weekdays, peaking on Tuesday
Conclusions
So what does it all mean? I crunched the numbers; I made some observations. Now it's time to pull everything I've done together and find some answers.
1) Some riders kept the bikes for an extremely long time, the longest being 108 days.
2) Members rode mainly for transportation, while casual riders rode mainly for leisure. The evidence for this is that casual riders had much longer ride times, but similar ride distances, compared with members. Apparently, casual riders often rode in loops, ending near the spot where they began, while members rode from point A to B.
3) Members’ higher use rate on weekdays suggests that they use the bikes as transportation to and from work.
4) Casual riders’ higher use rate on weekends further suggests rides taken for leisure, sightseeing, or exercise.
Recommendations
And on to the deliverable. I do like playing around with pivot tables, but the real reason I did all this work is because Cyclistic needed a plan of action.
1) To solve the issue of riders keeping the bikes for many days or months, a maximum ride time could be instituted, after which the rider is charged for the cost of the bike. This would increase profits and cut down on customer frustration at not being able to find an available bike when they need one.
2) To convert casual users to members (the most profitable type of users), digital marketing could focus on the benefits of biking as transportation. The data supports the idea that riders who use bikes or practical transportation are the ones who buy memberships. Ads could mention the health, financial, and environmental benefits of biking to work instead of driving.
3) Bikes could be outfitted with racks for carrying cargo, to fulfill the needs to practical users, who either ride to work or on errands, and may need to transport their belongings or purchased items. This will encourage practical use, e.g. membership.
Final Thoughts
Cyclistic's goal is to grow as a company by increasing profits, courting new customers, and encouraging yearly membership. After a methodical analysis of the data, I was able to present an evidence-based action plan to meet those goals. That's what I love about data. Instead of relying on hunches and guesswork, you can make a plan that works based on hard numbers. Thanks for sticking with me.