Core Finding: Employees are systematically overworked
Across all models, the same 4 features dominate turnover prediction: last evaluation score, number of projects, tenure, and overwork flag. The data reveals two distinct high-risk groups — employees with too few projects (disengaged) and employees with too many (burned out). Neither extreme retains talent. The company's evaluation system rewards overwork, creating a perverse incentive structure that drives departures.
Model Comparison — Test Set Performance
| Model | Accuracy | Precision | Recall | F1 Score | AUC | Note |
|---|---|---|---|---|---|---|
| Logistic Regression | 83% | 80% | 83% | 80% | — | Most interpretable |
| Decision Tree | 96.2% | 87.0% | 90.4% | 88.7% | 93.8% | Strong baseline |
| Random Forest ★ | 96.2%+ | 87%+ | 90%+ | 88.7%+ | 93.8%+ | Best performer |
| XGBoost | ~96% | ~87% | ~90% | ~88% | ~93% | Comparable to RF |
Model Accuracy Comparison
Test set accuracy across all 4 models (%)
Feature Importance — Random Forest
Top predictors of employee departure
Precision vs Recall — All Models
Trade-off between false positives and false negatives
Turnover Rate by Number of Projects
U-shaped risk — both extremes drive departures
Key Findings & Business Recommendations
🔴 Overwork is the primary driver
Employees working 200+ hours/month leave at high rates. High evaluation scores are disproportionately awarded to overworked employees, creating a perverse incentive. Recommend capping monthly hours and rebalancing evaluation criteria.
📊 Project load has a U-shaped risk
Both extremes are dangerous — employees with 2 projects leave (disengaged), and employees with 6–7 leave (burned out). The sweet spot is 3–5 projects. Recommend capping projects at 5 per employee.
📅 4-year tenure is a critical inflection point
Employees at exactly 4 years show unusually high departure rates, possibly linked to promotion timelines. Recommend investigating promotion policies for this cohort specifically.
💬 Satisfaction score is a leading indicator
Self-reported satisfaction strongly predicts departure even when controlling for workload. Recommend regular pulse surveys and acting on results — not just collecting them.
Methodology
Framework
Google PACE framework — Plan, Analyze, Construct, Execute. EDA first to understand distributions and correlations, then feature engineering (overwork flag, tenure buckets), then model building and comparison.
Feature Engineering
Created
overworked binary flag (avg monthly hours > 175), tenure buckets, and interaction features. Removed data leakage candidates before final model training.Evaluation Metrics
Accuracy, Precision, Recall, F1-Score, AUC-ROC. Prioritised Recall — in an HR context, missing a true leaver (false negative) is more costly than a false alarm.
Dataset
14,999 employee records · 10 features · Binary target (left = 1/0) · Multinational vehicle manufacturer · Google Advanced Data Analytics Certificate capstone dataset.