Blog freshness: Research notes liveLatest update: May 2026Telemetry mode: Public-safe live stripAI tools: Self-hosted demos live
Skip to main content

One-page case study

Global YouTube Statistics — EDA & Earnings Analysis

Exploratory data analysis on 995 top YouTube creators using the Kaggle Global YouTube Statistics dataset. Uncovered subscriber trends, earnings by channel type, country distributions, and sentiment patterns using NLP and regression.

Proof Points

995 creators
28 features
NLP + regression workflow

Challenges

  • Handling missing data across 28 features for 995 creators
  • Encoding inconsistency in international channel names (windows-1251)
  • High variance in earnings data skewing regression baselines

Learnings

  • Entertainment and Music channels dominate top earnings despite fewer uploads
  • US, India, and Brazil account for majority of top global creators
  • Subscriber count alone is a weak predictor of earnings — upload frequency matters more

Stack

PythonPandasNumPyMatplotlibNLTKTextBlobscikit-learn