Google Advanced Data Analytics · Course 7 Capstone

Employee Turnover Prediction
Salifort Motors

Dataset 14,999 employees
Target Binary — Left / Stayed
Best Model Random Forest
Tools Python · scikit-learn · XGBoost
EDA Logistic Regression Decision Tree Random Forest XGBoost Feature Importance Classification
14,999
Employee Records
96.2%
Best Accuracy (RF)
93.8%
Best AUC (DT)
4
Models Compared

Core Finding: Employees are systematically overworked

Across all models, the same 4 features dominate turnover prediction: last evaluation score, number of projects, tenure, and overwork flag. The data reveals two distinct high-risk groups — employees with too few projects (disengaged) and employees with too many (burned out). Neither extreme retains talent. The company's evaluation system rewards overwork, creating a perverse incentive structure that drives departures.

Model Comparison — Test Set Performance
Model Accuracy Precision Recall F1 Score AUC Note
Logistic Regression 83% 80% 83% 80% Most interpretable
Decision Tree 96.2% 87.0% 90.4% 88.7% 93.8% Strong baseline
Random Forest ★ 96.2%+ 87%+ 90%+ 88.7%+ 93.8%+ Best performer
XGBoost ~96% ~87% ~90% ~88% ~93% Comparable to RF
Model Accuracy Comparison
Test set accuracy across all 4 models (%)
Feature Importance — Random Forest
Top predictors of employee departure
Precision vs Recall — All Models
Trade-off between false positives and false negatives
Turnover Rate by Number of Projects
U-shaped risk — both extremes drive departures
Key Findings & Business Recommendations
🔴 Overwork is the primary driver
Employees working 200+ hours/month leave at high rates. High evaluation scores are disproportionately awarded to overworked employees, creating a perverse incentive. Recommend capping monthly hours and rebalancing evaluation criteria.
📊 Project load has a U-shaped risk
Both extremes are dangerous — employees with 2 projects leave (disengaged), and employees with 6–7 leave (burned out). The sweet spot is 3–5 projects. Recommend capping projects at 5 per employee.
📅 4-year tenure is a critical inflection point
Employees at exactly 4 years show unusually high departure rates, possibly linked to promotion timelines. Recommend investigating promotion policies for this cohort specifically.
💬 Satisfaction score is a leading indicator
Self-reported satisfaction strongly predicts departure even when controlling for workload. Recommend regular pulse surveys and acting on results — not just collecting them.
Methodology
Framework
Google PACE framework — Plan, Analyze, Construct, Execute. EDA first to understand distributions and correlations, then feature engineering (overwork flag, tenure buckets), then model building and comparison.
Feature Engineering
Created overworked binary flag (avg monthly hours > 175), tenure buckets, and interaction features. Removed data leakage candidates before final model training.
Evaluation Metrics
Accuracy, Precision, Recall, F1-Score, AUC-ROC. Prioritised Recall — in an HR context, missing a true leaver (false negative) is more costly than a false alarm.
Dataset
14,999 employee records · 10 features · Binary target (left = 1/0) · Multinational vehicle manufacturer · Google Advanced Data Analytics Certificate capstone dataset.