Customer Satisfaction Prediction (Machine Learning Project)

Overview

This project applies multiple machine learning models to predict customer satisfaction scores using behavioral and interaction data.

The project explores two main questions:

Which model best predicts customer satisfaction?
Which customer behavior variables are most important in predicting satisfaction?

To answer these questions, several models were implemented, including both custom-built algorithms and library-based machine learning models.

Dataset

The dataset used is the Customer Experience Dataset from Kaggle.

It contains 1000 customer observations and several variables describing user interactions, behavior, and satisfaction.

In this project, the following variables are used.

Independent Variables (Features)

Num_Interactions
Feedback_Score
Products_Purchased
Products_Viewed
Time_Spent_on_Site

Dependent Variable (Target)

Satisfaction_Score

The goal is to predict Satisfaction_Score using these five behavioral features.

Data Preprocessing

Missing Values

Missing values are handled by filling them with the column mean.

Train/Test Split

The dataset is split into:

70% training data
30% testing data

Feature Normalization

Z-score normalization is applied:

X_normalized = (X - mean) / std

The mean and standard deviation are calculated only from the training data and then applied to both training and test datasets.

Models Implemented

Five models are implemented and compared.

1. Linear Regression (From Scratch)

A multivariate linear regression model implemented manually using gradient descent.

The model iteratively updates parameters until the loss improvement falls below a threshold.

This model serves as a baseline model.

2. K-Nearest Neighbors Regression (From Scratch)

A custom implementation of KNN regression.

Steps:

Compute Euclidean distance between test and training samples
Select the k nearest neighbors
Predict using the average of their target values

In this project:

k = 3

3. Polynomial Regression

Polynomial regression implemented using Scikit-learn.

PolynomialFeatures with degree 2 is used to allow the model to capture nonlinear relationships between variables.

4. Decision Tree (From Scratch)

A regression decision tree implemented manually.

The splitting rule is based on variance reduction, where the algorithm selects the feature and threshold that reduce the variance of the target variable the most.

This model is also used to compute feature importance.

5. XGBoost Regression

A gradient boosting model implemented using XGBoost.

Key parameters include:

n_estimators = 500
learning_rate = 0.03
max_depth = 3
subsample = 0.8
colsample_bytree = 0.8

XGBoost is used both for prediction performance comparison and feature importance analysis.

Model Evaluation

Model performance is evaluated using Root Mean Squared Error (RMSE).

RMSE = sqrt(mean((y - y_hat)^2))

Lower RMSE indicates better prediction performance.

Both training RMSE and testing RMSE are compared for all models.

Results

Model Performance

The results show that:

Linear Regression performs best on the test dataset
Polynomial Regression performs similarly but slightly worse
KNN shows some overfitting behavior
Decision Tree performs moderately
XGBoost provides competitive predictive performance

Feature Importance

Feature importance is evaluated using two tree-based models:

Decision Tree
XGBoost

Both models suggest that the most important predictor of customer satisfaction is:

Time_S

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DP_final_project.ipynb		DP_final_project.ipynb
README.md		README.md
customer_experience_data.csv		customer_experience_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Satisfaction Prediction (Machine Learning Project)

Overview

Dataset

Independent Variables (Features)

Dependent Variable (Target)

Data Preprocessing

Missing Values

Train/Test Split

Feature Normalization

Models Implemented

1. Linear Regression (From Scratch)

2. K-Nearest Neighbors Regression (From Scratch)

3. Polynomial Regression

4. Decision Tree (From Scratch)

5. XGBoost Regression

Model Evaluation

Results

Model Performance

Feature Importance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Satisfaction Prediction (Machine Learning Project)

Overview

Dataset

Independent Variables (Features)

Dependent Variable (Target)

Data Preprocessing

Missing Values

Train/Test Split

Feature Normalization

Models Implemented

1. Linear Regression (From Scratch)

2. K-Nearest Neighbors Regression (From Scratch)

3. Polynomial Regression

4. Decision Tree (From Scratch)

5. XGBoost Regression

Model Evaluation

Results

Model Performance

Feature Importance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages