Project Overview
For my capstone project at the Google Advanced Data Analytics Certificate, I developed a machine learning solution for Salifort Motors to predict and reduce employee turnover. Using HR analytics, I identified key factors contributing to employee departures and created a predictive model with personalized retention strategies.
Client
Salifort Motors (Case Study)
Tools Used
Python, scikit-learn, Pandas, Matplotlib, Seaborn
Team Size
1 data scientist
Business Problem
Salifort Motors, a leading automotive manufacturer, was experiencing an alarmingly high employee turnover rate of 16.6%. Each lost employee cost the company between 50-200% of their annual salary in recruitment, training, and lost productivity. The HR department needed to understand which employees were most likely to leave and why, allowing them to take proactive steps to improve retention.
Dataset Description
I analyzed a comprehensive HR dataset with 14,999 employee records containing 10 variables including:
- Satisfaction level (0-1 scale)
- Last performance evaluation score (0-1 scale)
- Number of projects assigned
- Average monthly working hours
- Years at company (tenure)
- Work accident history
- Promotion in last 5 years
- Department
- Salary level (low, medium, high)
- Whether the employee left (target variable)
The dataset was clean with no missing values but required feature engineering to capture complex relationships between variables.
Technical Approach
Data Exploration & Feature Engineering
- Created 10 engineered features to better capture employee behavior patterns:
- work_intensity: Monthly hours divided by number of projects
- relative_hours: Monthly hours relative to company average
- overworked: Binary indicator for employees with above-average hours and high project count
- poor_work_life_balance: Binary indicator for employees working excessive hours (>200/month)
- productivity: Evaluation score per project
- satisfaction_per_hour: Satisfaction level normalized by working hours
- underutilized: High performers with few projects
- burnout_risk: Combination of high hours, many projects, and low satisfaction
- flight_risk: Low satisfaction despite high performance and tenure
- high_performer_risk: High performers without recent promotion
- Uncovered clear bimodal distribution in satisfaction levels among employees who left
- Identified a striking U-shaped relationship between number of projects and turnover rate
- Discovered sharp differences in work patterns between employees who stayed versus left
Data Preparation & Model Development
- Created a sophisticated preprocessing pipeline including:
- Automated data quality checks for outliers and inconsistencies
- Feature scaling with StandardScaler for numerical features
- One-hot encoding for categorical variables
- SMOTE oversampling to address class imbalance (16.6% positive class)
- Feature selection to identify most predictive attributes
- Split data into training (60%), validation (20%), and test (20%) sets
- Evaluated multiple classification algorithms:
- Logistic Regression
- Random Forest
- XGBoost
- Conducted extensive hyperparameter tuning using RandomizedSearchCV and GridSearchCV
- Optimized classification threshold to maximize F1 score
- Analyzed learning curves to diagnose bias-variance tradeoff
Satisfaction vs Evaluation by Turnover Status
Monthly Hours vs Number of Projects with turnover highlighted
Results & Insights
My analysis revealed distinct patterns in employee turnover and produced a highly accurate predictive model:
Key Findings
- Model Performance: The Random Forest model achieved exceptional results with 98.4% accuracy and 97% F1 score on test data
- Primary Turnover Drivers:
- Satisfaction level: By far the strongest predictor (23.6% importance)
- Work overload: Employees with 7+ projects had nearly 100% turnover
- Work underload: Employees with only 2 projects showed 55% turnover
- Hour extremes: Both overworked (250+ hours) and underworked (130-150 hours) employees were at high risk
- Tenure impact: Employees at 5-year tenure had highest turnover (45%), suggesting a critical career milestone
- Unrecognized talent: High performers without promotion showed 58% higher turnover risk
- Calibration analysis: Revealed model underconfidence in mid-range probabilities, important for risk stratification
Model-based feature importance showing key drivers of turnover
Learning curve showing model convergence and stability
Confusion matrix showing excellent classification performance
ROC curve with AUC of 0.98 indicating outstanding predictive power
Turnover Risk Prediction System
I developed a complete end-to-end prediction system that:
- Takes an employee's data as input
- Performs feature engineering and preprocessing
- Generates a turnover risk probability and classification (Low/Medium/High)
- Identifies specific risk factors for that employee
- Recommends personalized retention actions based on identified risk factors
Sample output from the HR turnover prediction system showing personalized risk factors and recommendations
Recommendations & Business Impact
Strategic Recommendations
Based on my analysis, I developed five strategic recommendations to improve employee retention:
-
Workload Management
Implement project caps to limit concurrent projects to a maximum of 5 per employee. Set up alerts for employees consistently working over 200 hours monthly and ensure project distribution avoids both overwork and underutilization.
-
Career Development
Create clear advancement pathways, especially for employees approaching the 3-5 year tenure mark. Establish regular promotion reviews to ensure high performers aren't overlooked and develop lateral move opportunities when vertical promotion isn't available.
-
Compensation Reviews
Conduct targeted salary audits focusing on high performers in low salary classifications. Develop clear compensation growth plans tied to performance metrics and ensure compensation is competitive, particularly in high-risk departments.
-
Department-Specific Strategies
Create customized retention plans for departments with the highest turnover rates. Address department-specific issues through targeted interventions and management training.
-
Prediction System Integration
Implement the turnover prediction model as a proactive management tool, establishing intervention protocols for employees flagged as high-risk and conducting quarterly risk assessments.
Expected Business Impact
Based on model projections and similar HR interventions, implementing these recommendations would likely result in:
- 20-30% reduction in turnover within 12 months
- Potential 15-20% reduction in annual recruitment and training costs
- 15% increase in employee satisfaction scores across high-risk departments
- 8% productivity improvement from better workload management
- Enhanced knowledge retention from keeping experienced employees
Technical Skills Demonstrated
This project showcased my abilities in:
- Advanced feature engineering: Creating meaningful derived features that capture complex patterns
- Machine learning pipeline development: Building end-to-end workflows with preprocessing, model training, and evaluation
- Classification modeling: Implementing and optimizing various algorithms for predictive accuracy
- Hyperparameter tuning: Systematically optimizing model parameters for best performance
- Class imbalance handling: Applying techniques to address unbalanced target distributions
- Data visualization: Creating insightful visualizations to communicate complex patterns
- Business analysis: Translating technical insights into actionable recommendations