Bellabeat Fitness Tracker
What can we learn from the way customers use wearable fitness trackers?
Bellabeat Fitness Tracker
What can we learn from the way customers use wearable fitness trackers?
Introduction
In this scenario, I’ll be in the role of a data analyst working for Bellabeat, a provider of health-focused smart technology. I'll be analyzing this dataset on Kaggle of Fitbit user data to look for trends and insights on the way customers use fitness trackers, which can then be applied to Bellabeat's future marketing strategies.
Founded in 2013, Bellabeat is a manufacturer of high-tech health and fitness products, including a wearable fitness tracker in competition with Fitbit.
Bellabeat wishes to learn more about customers fitness habits, and how they currently use fitness trackers, in order to discover opportunities for the company to increase sales, growth, and customer satisfaction. Once I've processed and analyzed this dataset, I'll make recommendations for what Bellabeat can do to achieve its goals.
Note: Because this is a work sample meant to showcase my methods, I’ve included a lot more technical details about the specific steps I took than I would present in a real-life work situation. On the job, my stakeholders would most likely be executives only interested in analysis results, not the various data cleaning steps and SQL queries I ran.
Data Cleaning
Let's get started! As always, the first step was to clean the data. This dataset included seven csv files containing various user health and fitness data:
dailyActivity_merged.csv
sleepDay_merged.csv
hourlySteps_merged.csv
hourlyIntensities_merged.csv
hourlyCalories_merged.csv
heartRate_merged.csv
weightLogInfo_merged.csv
I relied on SQL queries in Google BigQuery for my analysis, but for cleaning I opened each of these files in Excel.
1) Preparation
a. Made a copy of the dataset to preserve the original data
b. Loaded the .csv file into excel, ensuring that the delimiter was set correctly
2) Inspected the Data
a. Looked for missing values, duplicates, and inconsistencies
b. Took note of the data types present
3) Removed Duplicates
a. Searched for and removed duplicate rows using Excel’s “remove duplicates” option
4) Handled Missing Values
a. Searched for empty cells using filters
b. Filled in or estimated missing values if possible
c. Deleted rows with irreplaceable missing values
5) Standardized Number and Date Formats
a. Reformatted numerical data to Number format with up to 2 decimal places
b. Reformatted date data into either mm/dd/yy or mm/dd/yy h:mm format
6) Handled Outliers
a. Searched for possible errors by filtering for numbers much smaller or larger than expected
b. Replaced extreme values, or deleted them if necessary
This isn't data cleaning, but while I had Daily_Activity_Merged open in Excel, I added a column to display the day of the week for each row's date using: =TEXT(B2,"dddd")
Initial Analysis
Now that the cleaning was done, it was time for the real fun to begin! I uploaded all seven csv files to BigQuery so I could analyze them with SQL queries, and then create visualizations using Tableau.
The first thing I wanted to learn was the number of distinct user IDs in each dataset, which I found using the query to the right.
I ran a version of this query for each of my seven datasets to find the number of distinct users in each, with these results:
dailyActivity_merged – 33
sleepDay_merged – 24
hourlySteps_merged – 33
hourlyIntensities_merged – 33
hourlyCalories_merged – 33
heartRate_merged – 7
weightLogInfo_merged – 8
At this point, I eliminated heartRate_merged and weightLogInfo_merged from my analysis due to the low number of users. For real-world data analysis, even a sample size of 33 would be too low to draw reliable results from, but since this was only a demonstration, I decided it was acceptable.
Next I found out how many days over the study period each user wore their Fitbit:
SELECT
Id,
COUNT(Id) AS Total_Id
FROM `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
GROUP BY Id
I then found out how many users logged data x number of times.
The time period of the study (04–12–2016 to 05–12–2016) was 31 days, so 61% of subjects wore their Fitbit and logged data every day, and 82% missed three or fewer days. Despite the low number of total users in the study, I was pleased to see that most of them did log data often enough to be relevant for my analysis.
Now I wanted to get a better sense of what the data was saying, so I wrote a query to show the min, max, and average daily steps, daily distance, daily calories, and activity level of each user by ID:
SELECT
Id,
MIN(TotalSteps) AS Min_Daily_Steps,
MAX(TotalSteps) AS Max_Daily_Steps,
AVG(TotalSteps) AS Avg_Daily_Stpes,
MIN(TotalDistance) AS Min_Daily_Distance,
MAX(TotalDistance) AS Max_Daily_Distance,
AVG(TotalDistance) AS Avg_Daily_Distance,
MIN(Calories) AS Min_Daily_Calories,
MAX(Calories) AS Max_Daily_Calories,
AVG(Calories) AS Avg_Daily_Calories,
MIN(SedentaryMinutes) AS Min_Minutes_Sedentary,
MAX(SedentaryMinutes) AS Max_Minutes_Sedentary,
AVG(SedentaryMinutes) AS Avg_Minutes_Sedentary,
MIN(LightlyActiveMinutes) AS Min_Minutes_Lightly_Active,
MAX(LightlyActiveMinutes) AS Max_Minutes_Lightly_Active,
AVG(LightlyActiveMinutes) AS Avg_Minutes_Lightly_Active,
MIN(FairlyActiveMinutes) AS Min_Minutes_Fairly_Active,
MAX(FairlyActiveMinutes) AS Max_Minutes_Fairly_Active,
AVG(FairlyActiveMinutes) AS Avg_Minutes_Fairly_Active,
MIN(VeryActiveMinutes) AS Min_Minutes_Very_Active,
MAX(VeryActiveMinutes) AS Max_Minutes_Very_Active,
AVG(VeryActiveMinutes) AS Avg_Minutes_Very_Active
From `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
Group BY Id
Activity Minutes
This was too much information to visualize all at once, so I ran another, more targeted query to only find the averages of each of these metrics, then created a visualization using Tableau:
SELECT
Id,
AVG(SedentaryMinutes) AS Avg_Minutes_Sedentary,
AVG(LightlyActiveMinutes) AS Avg_Minutes_Lightly_Active,
AVG(FairlyActiveMinutes) AS Avg_Minutes_Fairly_Active,
AVG(VeryActiveMinutes) AS Avg_Minutes_Very_Active
From `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
Group BY Id
You can see from chart that all users were sedentary the majority of the time, and sedentary OR lightly active for the overwhelming majority of the time.
Next I wanted to see the average number of minutes of each activity level per weekday.
This analysis confirms that sedentary time dwarfs all types of activity.
I was wondering how activity levels would vary on different days of the week, but the visualization below shows that they hardly vary at all, remaining very consistent throughout the week.
This shows an opportunity for Bellabeat to help users become more physically active.
However, before encouraging users to increase their daily activity, I wanted to see whether or not they were at an adequate activity level already. The CDC recommends 150 minutes per week of physical activity. Divided by seven, this works out to a goal of 21.4 minutes of activity per day, a pretty achievable goal.
First, I wanted to add up all activity intensities to see how many minutes per day our subjects were active at all, so I wrote this query:
SELECT
Id,
AVG(VeryActiveMinutes) + AVG(FairlyActiveMinutes) + AVG(LightlyActiveMinutes) AS Total_Avg_Active_Minutes
FROM `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
GROUP BY Id
ORDER BY Total_Avg_Active_Minutes DESC
Based on this metric, every single user greatly exceeded 21.4 minutes per day, with the minimum being 40.6. However, I was skeptical of “Lightly Active Minutes” meeting the CDC’s definition of physical activity at all.
The aforementioned CDC web page gives examples of the kind of activity they’re talking about, which includes water aerobics, doubles tennis, and riding a bike on level ground. My Fitbit data set doesn’t explain what types of exercise qualify as “lightly active,” but I suspect all the CDC’s examples would qualify as “fairly active”, not lightly.
I rewrote my query to only include very active and fairly active minutes, and reran it:
SELECT
Id,
AVG(VeryActiveMinutes) + AVG(FairlyActiveMinutes) AS Total_Avg_Active_Minutes
FROM `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
GROUP BY Id
ORDER BY Total_Avg_Active_Minutes DESC
Then I ran a query to simply report how many users hit the daily goal.
By this standard, 14 users, or 42%, did not meet the CDC guidelines for 150 minutes of physical activity per week. We should also keep in mind that 150 minutes is the suggested minimum, not the ideal amount of activity, and that Fitbit users are almost certainly more active than the average American.
All of this demonstrates plenty of room for improvement, highlighting a consumer need which Bellabeat could fulfill.
Step Count
In this article, based on the 2011 study by Tudor-Locke et. al. called “How many steps a day are enough? for adults,” Healthline divides individuals’ step count into six categories:
Basal Below 2,500
Limited 2,500–4,999
Low 5,000–7,499
Somewhat active 7,500–9,999
Active 10,000-12,499
Very active over 12,500
I wrote a query to divide the study participants into these six groups based on their average daily step count.
SELECT
Id,
AVG(TotalSteps) AS Avg_Total_Steps,
CASE
WHEN avg(TotalSteps) < 2500 THEN 'Basal'
WHEN avg(TotalSteps) BETWEEN 2500 AND 4999 THEN 'Limited'
WHEN avg(TotalSteps) BETWEEN 5000 AND 7499 THEN 'Low'
WHEN avg(TotalSteps) BETWEEN 7500 AND 9999 THEN 'Somewhat Active'
WHEN avg(TotalSteps) BETWEEN 10000 AND 12499 THEN 'Active'
WHEN avg(TotalSteps) >= 12500 THEN 'Very Active'
END User_Type
FROM `analytics-demo-452102.Fitbit_Data.dailyActivity_merged`
GROUP BY Id
ORDER BY Avg_Total_Steps
The results of this analysis were:
Very Active: 2 users (6.06%)
Active: 5 users (15.15%)
Somewhat Active: 9 users (27.27%)
Low: 9 users (27.27%)
Limited: 5 users (15.15%)
Basal: 3 users (9.09%)
The Tudor-Locke study famously recommended 10,000 steps per day as a fitness goal, which would include the Active and Very Active categories here.
Combining these, we can see that seven out of thirty three, or 21.21% of users, averaged 10,000 steps per day or above.
This puts 78.79% of users below the recommended number of steps per day, a deficit that Bellabeat could help to rectify.
Now I wanted to know if users had more average steps on certain days of the week:
I expected people to walk more on the weekend, so I was surprised to see that average steps remained fairly constant throughout the week.
Next I analyzed the steps per hour to find out what time of day users were most active.
The data covered several days, but I wanted the average number of steps taken each hour of the day.
SELECT
TIME(ActivityHour),
AVG(StepTotal) AS Avg_Steps_Per_Hour
FROM `analytics-demo-452102.Fitbit_Data.hourlySteps_merged`
GROUP BY TIME(ActivityHour)
Not surprisingly, users walked the most during the day, and very little from midnight to 5 am. The peak time with the most steps was noon to 8 pm.
Now I wanted to learn more about users’ sleep, and how it correlated to the other health factors of steps walked and calories burned.
SELECT a.Id,
AVG(a.TotalSteps) AS AvgSteps,
AVG(a.Calories) AS AvgCal,
AVG(s.TotalMinutesAsleep) AS AvgMinutesAsleep
FROM `analytics-demo-452102.Fitbit_Data.dailyActivity_merged` AS a
INNER JOIN `analytics-demo-452102.Fitbit_Data.sleepDay_merged` AS s ON a.Id=s.Id
GROUP BY a.Id
As you can see, users who slept between 5 and 8 hours per night tended to walk the most and burn the most calories. 7-8 hours of sleep per night is generally recommended, so it was somewhat surprising that these other health indicators, steps and calories, did not sharply spike over the 7-8 hour sleep range, but instead remained largely constant over a range beginning with only 5 hours of sleep a night.
Of course, this could be a consequence of the low number of users in the dataset. It would be interesting to run the same analysis on the larger dataset and see if the results remained the same.
Conclusions
1) Users did not log data consistently, especially weight and heart rate, with only 64% tracking daily activity every day
2) Users spent the overwhelming majority of their time sedentary, and were usually lightly active the rest of the time
3) Breakdown of different activity levels and step count was nearly identical every day of the week
4) 42% of users did not meet the modest CDC guidelines for 150 minutes of physical activity per week
5) Only seven out of thirty three, or 21.21% of users averaged the recommended 10,000 steps per day or above
6) Users walked the most steps per hour between noon to 8 pm
7) Users who got between 5 and 8 hours of sleep per night tended to walk the most and burn the most calories
Recommendations
The business task was to analyze FitBit usage data and draw conclusions that will be useful to Bellabeat in marketing fitness trackers, updating the Bellabeat app, and designing future products to maximize their sales and company growth. These are my recommendations to that effect based on the previous conclusions:
1) Bellabeat could make tracking and logging easier, or even automatic, to make up for users’ lapses, and avoid frustration among users tired of constantly logging data
2) To combat lengthy sedentary periods, Bellabeat fitness trackers could give users a notification when they’ve been sedentary for more than an hour
3) On average, users were no more active and got no more steps on weekends than on weekdays. The Bellabeat app could encourage users to be more active on weekends, when they presumably have more time to exercise
4) To encourage users to meet the CDC guidelines of 150 minutes of activity per week, gamification methods could be used on the app, similar to Duolingo, to leverage users’ sense of fun and competition
5) A social aspect to the app, in which users share and compare their steps per day with friends, could encourage more walking, while feeling fun to users rather than burdensome
6) Users’ peak walking period didn’t begin until noon. Via notifications, the app and tracker could encourage more physical activity in the AM hours
7) More steps per day was associated with healthy sleep habits, so Bellabeat products could be marketed to customers who struggle with with difficulty sleeping
Final Thoughts
Given the popularity of Fitbit, a relative upstart like Bellabeat would naturally need an edge to carve out a share of the smart fitness tracker market. However, Fitbit’s own user data, when carefully analyzed, presents a number of opportunities through which a company like Bellabeat could improve user experience, health outcomes, and fitness goal achievement.