This project involves building a regression model to predict the number of calories burned during exercise based on various input features such as age, weight, height, gender , session duration etc.. Also conducted A/B Testing Analsysis using statistic measures like chi-sqaure test , t-test , p_value , hypothesis testing etc to evaluate and have better statistical understanding of the relationship of features in the dataset.
To get started, you'll need to have Python installed.
With the following libraries
- pandas
- numpy
- matplotlib
- seaborn
- sklearn
- Flask
- Pickle
Have shared the csv file "gym_mem.csv" which is used for this project. The dataset used for this model includes the following features:
- Age Gender
- Weight (kg)
- Height (m)
- Max_BPM Avg_BPM
- Resting_BPM Session_Duration (hours)
- Calories_Burned
- Workout_Type
- Fat_Percentage
- Water_Intake (liters)
- Workout_Frequency (days/week)
- Experience_Level
- BMI
The regression model is built using XGBoost which is an ensemble method.
The model is trained to minimize the mean squared error between the predicted and actual calories burned.
The model have scored "99.65%" Testing Accuracy. Model correctly predicts "99.65%" of the variability in the testing data.
The model's performance is evaluated using metrics such as Mean Squared Error (MSE) and R-squared (R²).
The results are visualized using matplotlib to show the correlation between the predicted and actual values.
The model's scores for the evaluated metrics are the following:
- MSE : 238.03062993528332
- MAE : 11.468836831783063
- RMSE : 15.428241310508573
- R-Squared : 0.9961636614263105
Please Feel free to contribute to this project by submitting issues or pull requests.
Any enhancements, bug fixes, or optimizations are extremely welcomed!