Tube Of Plenty: Analyzing YouTube’s First Decade

Table Of Contents:

Introduction
The data (and a disclaimer)
The growth of YouTube
Peak YouTube?
Measuring interest: what are we (not) watching?
How popular is my video or channel?
There’s lopsided interest…but is it the Power Law?
Is this normal?
Double-checking the fit
Familiarity breeds likes and dislikes: views versus other popularity metrics
Does size matter?
Aging like wine…or milk?
Categorically popular
Wordiness: titles, descriptions, and tags
Things that maybe, possibly should be random: the first characters of titles and popularity
Things that definitely should be random: YouTube video IDs
So many cameras: where people are recording videos
Timing is…everything?
Projects that were inspired by this study
Methodology: Introduction
Methodology: Constructing the panel
Methodology: More random, please
Methodology: Choosing the order
Methodology: On the use of 4-character strings
Methodology: Collecting the data
Methodology: R
Methodology: Fitting distributions
Methodology: Programming notes
Methodology: Zeros
References

Introduction

It all began with this video:

This 18-second masterpiece was the first video posted to YouTube by company co-founder Jawed Karim on April 23, 2005. “Me at the zoo” kicked off a chain reaction that feels like some combination of the invention of the printing press, fast food, and the VCR. When it comes to distributing moving images, YouTube made it possible for anybody to post a video that everybody could see. For consumption, you can now watch exactly what you want, when you want. Everything on the spectrum of culture, from highbrow to lowbrow, is on there: if YouTube were a restaurant, it would serve both greasy burgers and the fanciest haute cuisine. As a cultural phenomenon, I can’t tell you the number of times I’ve heard someone start a conversation with “So I was watching this video on YouTube…”

A decade later, on-demand video is also changing our lives beyond mere instant gratification. “I learned that on YouTube” is also a phrase I’ve heard countless times. The educational value of video is democratizing how we help each other. Everybody is good at something: now, with only a cheap camera and a few clicks, you can teach other people what you know. Personally, I’ve learned how to reset the “check engine” light on my car, how to replace the hard drives in a variety of computers, how to locate and charge the battery on my motorcycle, how to make the “special sauce,” and much more.

That all sounds very high-minded, but you’re probably wondering what prompted this study. Was it to honor the milestone of the first decade of this cultural juggernaut? No. Truthfully…it’s a Korean rapper who dresses ironically in formalwear. Yep, I’m a fan of PSY’s Gangnam Style. Recently, I checked in on this mega-hit and found that it had over 2 billion views. Whoa. That figure blew my mind, but there was a secondary explosion when I learned it was about double the runner-up. Contemplating Gangnam Style’s huge success dovetailed nicely with some ideas that have been floating around in my head. Lately, I’ve been seeing a lot of references to the Power Law and the impact of unexpected outliers, both in Taleb’s brilliant The Black Swan and also in a variety of articles about the ins-and-outs of angel investing.

I did some searching and I’m definitely not the first person to think that the Power Law might describe the popularity of YouTube videos. However, I wasn’t able to find a study that addressed it directly or in-depth, so I thought I’d do my own. Then, the project took on a life of its own: sometimes, you start peeling the onion and there’s no end to the questions. I became a bit obsessed with this topic and the result is this 11,000+ word study you’re reading. I originally just wanted to see if the views of YouTube videos followed the Power Law, but that led to many other interesting threads that I just had to follow. Also, as my capabilities with the YouTube API and R grew, so did my ability to ask and answer more difficult questions about the data.

Collecting metadata with the YouTube Data API
Stage	# of resulting video IDs:
Search	11,720,992
After deduplication	10,182,114
Successful metadata retrieval	10,175,093
Data collected October 27-30, 2015

Measuring popularity of YouTube videos: descriptive statistics
Measure	Mode (% of videos or channels)	Minimum	Median	Mean	Maximum
Views	0 (1.33%)	0	351	39,987	2,440,349,198
Channel Views	0 (0.65%)	0	476	87,618	3,939,621,394
Likes	0 (33.73%)	0	2	198	9,978,570
Dislikes	0 (71.77%)	0	0	11	1,372,424
Comments	0 (58.77%)	0	0	32	4,974,275
Data collected October 27-30, 2015

Popularity: YouTube video and channel views by percentile
Percentile	# of video views	# of channel views
10%	16	22
20%	48	62
30%	101	130
40%	191	250
50%	351	476
60%	644	880
70%	1,235	1,798
80%	2,916	4,600
90%	11,197	19,567
91%	13,533	23,923
92%	16,635	29,829
93%	20,916	38,056
94%	27,049	49,748
95%	36,336	67,656
96%	51,380	96,927
97%	78,895	150,397
98%	139,614	268,817
99%	344,532	670,042
Data collected October 27-30, 2015

Testing fitness: the Power Law vs log-normal
Measure	Vuong’s test statistic	p-value	Closer distribution
Views	-113.59	0	Log-normal
Likes	-27.71	0	Log-normal
Dislikes	-20.61	0	Log-normal
Comments	-20.67	0	Log-normal
Data collected October 27-30, 2015

Measure	*Geometric Mean (x̄)**	*Geometric Standard Deviation (s)**
Views	419.06	12.56
Channel Views	584.14	14.03
Likes	7.17	6.40
Dislikes	3.20	4.13
Comments	5.39	5.11
Data collected October 27-30, 2015

Descriptive statistics: video durations (in seconds)
Mode	Minimum	Median	Mean	Maximum
31	0	213	463	107,373
Data collected October 27-30, 2015

Descriptive statistics: video age (in days)
Mode	Minimum	Median	Mean	Maximum
3	0	782	951	3787
Data collected October 27-30, 2015

Top 10 countries with YouTube videos geotagged within their borders
Country	% of geotagged videos
United States of America	25.36
Germany	5.38
Brazil	4.81
United Kingdom	4.28
France	4.08
India	3.28
Italy	3.08
Spain	2.78
Canada	2.71
Poland	2.53
Data collected October 27-30, 2015

Testing ordering methods: comparing result counts
# of Raw Results	# of Unique Results	Order Method
24,159	24,094	relevance
14,150	8,944	title
14,098	8,950	rating
14,091	8,922	date
14,088	8,944	viewCount
80,586	31,728 (deduped across all order methods)	TOTAL

Testing ordering methods: descriptive statistics
Order Method	Views				Duration (seconds)				Age (days)
	Mean	Median	Mode	St. Dev.	Mean	Median	Mode	St. Dev.	Mean	Median	Mode	St. Dev.
date	52,131	242	0	1,785,676	467	213	137	924	579	274	0	672
rating	54,513	449	0	1,809,347	464	226	31	888	837	657	622	705
relevance	216,762	348	0	6,137,273	461	210	31	949	963	790	1	767
title	28,652	357	0	464,158	445	219	31	817	839	650	1	722
viewCount	120,435	1,076	0	1,897,330	462	231	239	854	971	808	622	744

Testing ordering methods: ranges
Order Method	Views		Durations (seconds)		Ages (days)
	Min	Max	Min	Max	Min	Max
date	0	129,075,842	0	20,207	0	3,488
rating	0	129,075,842	0	23,571	0	3,488
relevance	0	634,469,694	0	43,151	0	3,563
title	0	26,775,376	0	11,526	0	3,581
viewCount	0	129,075,939	0	18,393	0	3,581

Popularity statistics: zero values
Measure	# of zero values	% of videos or channels
Views	135,536	1.33
Channel Views	30,095	0.65
Likes	3,432,051	33.73
Dislikes	7,302,216	71.77
Comments	5,980,236	58.77
Data collected October 27-30, 2015

Share this:

Related

6 thoughts on “Tube Of Plenty: Analyzing YouTube’s First Decade”

Leave a comment Cancel reply