Turtle Games - Data Analytics Portfolio

William MacDonald

Executive Summary

This analysis provides strategic insights to boost Turtle Games' sales performance through data-driven decision making. Key findings include:

These insights can drive significant improvements in customer engagement, loyalty, and overall sales performance.

Introduction

Turtle Games, a leading game manufacturer and retailer, aims to enhance its sales performance through data-driven strategies. This analysis addresses four key areas:

Loyalty Points Modelling

I developed and compared various regression models to predict customer loyalty points, a key indicator of customer engagement and potential sales.

Methodology

After data cleaning and normalisation, we tested several regression models:

Results

Gradient Boosting with Cross Validation emerged as the best performing model, achieving an R-Squared value of 98% and a Mean Squared Error (MSE) of 33,311. The SVR model followed closely with an MSE of 36,593.

Feature Importance

Spending score and remuneration were identified as the most significant predictors of loyalty points. While other variables had minimal individual impact, their collective contribution improved overall model accuracy.

Customer Segmentation

I employed K-means clustering to segment customers based on their spending patterns and remuneration levels. This segmentation allows for more targeted marketing strategies.

Methodology

K-means clustering is an unsupervised machine learning technique that groups similar data points. We used the elbow and silhouette methods to determine the optimal number of clusters.

Cluster Analysis

Key Insights and Recommendations

SVR Model and Cluster Analysis Visualization

This visualisation combines the SVR model for loyalty points prediction with the K-means clustering analysis, providing a comprehensive view of customer segments and their predicted loyalty.

Sentiment Analysis

I analyzed customer reviews to gain insights into product perception and overall customer satisfaction. This information is crucial for product development and marketing strategies.

Methodology

I cleaned the review data and compared two popular Natural Language Processing (NLP) libraries: TextBlob and VADER (Valence Aware Dictionary and Sentiment Reasoner).

VADER showed a higher proportion of positive reviews compared to TextBlob. After manual inspection of divergent sentiments, VADER was chosen for its more accurate sentiment identification.

Key Findings

Product Recommendations

By combining insights from our loyalty points modeling, customer segmentation, and sentiment analysis, I developed personalised product recommendations for each customer cluster.

Overall Product Performance

This scatter plot shows the overall performance of products based on sentiment and popularity, helping identify top-performing and underperforming products.

Top 10 Recommended Products

These are the top 10 products recommended across all customer segments.

Cluster-Specific Recommendations

Cluster 0: High Income, Low Spenders

Recommend top products in Cluster 4, as they have similar remuneration but differing spending scores

Cluster 1: Low Income, Low Spenders

Recommend top products in Cluster 3, as they have similar remuneration but differing spending scores.

Cluster 2: Moderate Income and Spending

Suggest a mix of mid-range products to maintain consistency. Recommend top rated products found within this Cluster.

Cluster 3: Low Income, Moderate Spenders

Recommend top products found within this cluster.

Cluster 4: High Income, High Spenders

Recommend top products found within this cluster.