Analysis of Cyclistic Bikeshare’s User Data

  • 2 months ago
  • Abdulmalik Mohammed

Background

Cyclistic, a bike-share company launched in 2016 has now grown to a fleet of 5,824 bicycles and 692 stations across Chicago. Users can be grouped into “casual riders” (purchase single-ride or full-day passes) and “annual members”.

The company’s finance analysts concluded that annual members generate more profit. The director of marketing, Lily Moreno believes that an ad campaign to convert casual riders to annual members would be optimum, as they already aware of and have chosen Cyclistic for their mobility needs.

I have been tasked with answering the question “How do annual members and casual riders use Cyclistic bikes differently?”

Business Task

How do casual riders and annual members use Cyclistic bikes differently?

Preparing and Cleaning Data for Analysis

To answer this question, I collected the data from January to December 2023 from here. The data has been made available by Motivate International Inc. under this license. Data privacy prevents me from accessing any of the users’ personal information. This historical data from the previous year is reliable, original, comprehensive, cited and current thus making it suitable for this analysis.

Preparation

I chose RStudio to work with the data, as it is best equipped to work with datasets as large as these. The 12 data sets contain 13 columns and about 160,000 to 780,000 rows. Some of the columns had datetime values stored as characters so I saved all the data sets as excel workbooks before uploading to Rstudio and merging them into a data frame trip_data

library(tidyverse)
library(readxl)

jan_2023 <- read_excel("jan_2023.xlsx")
feb_2023 <- read_excel("feb_2023.xlsx")
mar_2023 <- read_excel("mar_2023.xlsx")
apr_2023 <- read_excel("apr_2023.xlsx")
may_2023 <- read_excel("may_2023.xlsx")
jun_2023 <- read_excel("jun_2023.xlsx")
jul_2023 <- read_excel("jul_2023.xlsx")
aug_2023 <- read_excel("aug_2023.xlsx")
sep_2023 <- read_excel("sep_2023.xlsx")
oct_2023 <- read_excel("oct_2023.xlsx")
nov_2023 <- read_excel("nov_2023.xlsx")
dec_2023 <- read_excel("dec_2023.xlsx")

trip_data <- jan_2023 %>%
  bind_rows(feb_2023) %>%
  bind_rows(mar_2023) %>%
  bind_rows(apr_2023) %>%
  bind_rows(may_2023) %>%
  bind_rows(jun_2023) %>%
  bind_rows(jul_2023) %>%
  bind_rows(aug_2023) %>%
  bind_rows(sep_2023) %>%
  bind_rows(oct_2023) %>%
  bind_rows(nov_2023) %>%
  bind_rows(dec_2023)

The merged data has the same number of columns and 5,693,156 rows. It contains information such as ride IDs, start and end station names and longitude and latitude of start and end stations

Processing 

Below is a summary of all the steps taken in the data cleaning and manipulation processes. The code can be viewed in my GitHUB repository.

  • I viewed the column names and changed the heading of the member_casual column to user_type.
  • I dropped columns that were irrelevant to the analysis i.e the end_station_id, start_station_id, end_lng, end_lat, start_lng and start_lat columns.
  • I checked the dataset for null values with the apply() function. I discovered there were 848995 records and 916658 records missing from the start and end station names respectively. These records were removed as the names are necessary for the analysis.
  • I checked for any spelling errors in the rideable_type and user_type columns with unique(). No inconsistencies were found in either column.
  • To begin answering the business task, I created a column ride_length which contains the duration of each ride in minutes. I created a start_hour column to show the time of day for each ride
  • I checked the minimum, maximum and average trip durations. The minimum was a negative value and the maximum was way too long for a trip. I removed all the outliers from the data set by calculating the interquartile range.
  • I included the columns weekday and weekdaynum to show what day of the week a ride took place. I also included a month column to better aggregate the data.

The cleaned data frame now contains 3,931,001 rows and 12 columns

Analysis

In the Analysis phase of my work, I calculated summary statistics that would help answer the business question. Some of which were:

  • Calculating the total number of trips taken by casual riders and annual members in 2023, then the average duration of their trips in minutes as a group and for each day. I also calculated bike type usage for each group.
  • Calculating the number of trips for each day at the different times as I wanted to see what days and at what times of the day each group rode more.
  • Calculating the number of trips taken by each user type for the different months to see the monthly trend.
  • Finally I calculated the number of trips per start and end station to see the stations with the most takeoffs and those most visited by each group.

Date Visualisation

For my data Visualisation I exported the results from my analysis as csv files and used Tableau Public Desktop to create a dashboard to display my findings.

Tableau Dashboard

Insights and Recommendations

Insights

  • As expected, Cyclilstic is mostly used by annual members as they made up 67.34% of the total trips for the past year. Both user groups enjoy Classic bikes the most but only casual users make use of Docked bikes. Despite annual members making up more than 50% of the customer base, casual users still take longer trips on average, especially on weekends

 

Users’ Bike Preferences

 

 

Daily Average Trip Duration

 

  • Annual members typically use Cyclistic most around 7 to 8 AM and 4 to 5 PM. This suggests they are using the service mostly to commute to work and then back home. Casual riders however only have peak usage around 5PM which suggests they ride mostly in their leisure time. Both users groups have a surge of trips in the summer (June to August)

 

Trips per Time of Day (24-hour clock)

 

 

 

 

 

Monthly Usage

 

 

 

  • From the start station analysis, the top stations used by casual members are as shown below.

Top 5 Start Stations (Casual riders)

Recommendations

From the findings above, here are some recommendations that could prove useful in converting casual riders to annual members

  • Cyclistic can offer a subscription plan with some benefits to users who go on evening trips, weekend trips and also take longer rides.
  • Marketing efforts should be increased in the top stations used by the casual riders in the summer to showcase the new subscription offer.

Thank you for reading! Stay tuned for the next data journey.

 

References

Eyitayo, O. 2023. Google Data Analytics Capstone Project: Cyclistic Bike-Share Case Study