TikTok Claims vs Opinions — Full Analytics Pipeline

Core Finding: Engagement metrics perfectly predict claim vs opinion

Videos that make claims generate dramatically higher engagement than opinion videos — more views, likes, shares, and downloads. This engagement pattern is so consistent that a Random Forest model classifies claim vs opinion videos with near-perfect accuracy (~100%). The model's most predictive features were all engagement-related: video view count, share count, download count — not video content or text.

Course 3

Exploratory Data Analysis & Visualisation

Analysed the distribution of claims vs opinions, explored engagement metric distributions, identified outliers, and examined missing data patterns. Key finding: claim videos consistently show higher engagement across all metrics.

Avg Engagement — Claims vs Opinions

Claims drive significantly higher engagement across all metrics

Video Duration Distribution

Claims and opinions have similar duration profiles

Course 4

Statistical Hypothesis Testing

Two-sample t-test comparing mean video view counts between verified and unverified accounts. Significance level: 5%.

❌ Null Hypothesis (Rejected)

There is no difference in number of views between videos posted by verified vs unverified accounts.

✅ Alternative Hypothesis (Accepted)

There is a statistically significant difference in mean view counts between verified and unverified accounts.

Result: t-statistic = 25.50, p-value = 2.6 × 10⁻¹²⁰ — extremely significant. Rejected null hypothesis at 5% significance level. Unverified accounts have significantly higher view counts than verified accounts, suggesting behavioural differences — possibly clickbait content or bot-inflated views.

Course 5

Logistic Regression — Predicting Verified Status

Built a logistic regression model to predict whether a TikTok account is verified, as an intermediate step toward the final claim classification model. Addressed class imbalance via upsampling and removed multicollinear features (video_like_count, r=0.86 with view count).

Class	Precision	Recall	F1	Support
Verified	74%	46%	57%	4,459
Not Verified	61%	84%	71%	4,483
Overall Accuracy	65%			8,942

      Key insight: each additional second of video duration is associated with a +0.009 increase in log-odds of verified status. Model performance acceptable — logistic regression was an intermediate step, with ML classification as the final goal.
    

Course 6

ML Classification — Random Forest & XGBoost

Built and tuned Random Forest and XGBoost models using GridSearchCV to classify videos as claims or opinions. Both models achieved near-perfect performance — engagement metrics alone are sufficient to identify claim videos.

Model Performance Comparison

Precision, Recall, F1 across models (%)

Feature Importance — Random Forest

Engagement metrics dominate predictions

Model	Precision	Recall	F1	Accuracy	Champion?
Random Forest ★	~100%	~100%	~100%	~100%	✅ Champion
XGBoost	99%	99%	99%	99%	Close runner-up
Logistic Regression	61–74%	46–84%	57–71%	65%	Intermediate step

Methodology & Skills Demonstrated

Framework

Google PACE (Plan → Analyze → Construct → Execute) applied across all 4 courses as a continuous case study building toward the final classification model.

Data Preparation

Class imbalance handled via upsampling. Multicollinearity detected and resolved (dropped video_like_count). Text length feature engineered from transcription data.

Model Tuning

GridSearchCV used for hyperparameter tuning on both Random Forest and XGBoost. Champion model selected based on F1 and recall — prioritising detection of claim videos over false alarms.

Business Context

TikTok's moderation team needs to prioritise user reports for claim-based content. A model that reliably flags claim videos reduces the backlog and allows human reviewers to focus where it matters.

TikTok Claims vs OpinionsFull Analytics Pipeline

Core Finding: Engagement metrics perfectly predict claim vs opinion

TikTok Claims vs Opinions
Full Analytics Pipeline