One-page case study
Global YouTube Statistics — EDA & Earnings Analysis
Exploratory data analysis on 995 top YouTube creators using the Kaggle Global YouTube Statistics dataset. Uncovered subscriber trends, earnings by channel type, country distributions, and sentiment patterns using NLP and regression.
Proof Points
995 creators
28 features
NLP + regression workflow
Challenges
- • Handling missing data across 28 features for 995 creators
- • Encoding inconsistency in international channel names (windows-1251)
- • High variance in earnings data skewing regression baselines
Learnings
- • Entertainment and Music channels dominate top earnings despite fewer uploads
- • US, India, and Brazil account for majority of top global creators
- • Subscriber count alone is a weak predictor of earnings — upload frequency matters more
Stack
PythonPandasNumPyMatplotlibNLTKTextBlobscikit-learn