-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmp4.Rmd
More file actions
289 lines (203 loc) · 13.5 KB
/
mp4.Rmd
File metadata and controls
289 lines (203 loc) · 13.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
---
date: "`r format(Sys.time(), '%d %B, %Y')` at `r format(Sys.time(), '%r')`"
title: "It's Time to Stop Complaining About the Amount of Sequels, Remakes, and Reboots"
author: "by Audrey Bertin and Eva Gerstle"
output:
html_document:
code_folding: hide
fig_width: 12
fig_height: 8
fig_caption: true
theme: yeti
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
*Tags:* movies, sequel, remake, reboot, hollywood, box office
```{r, message = FALSE, warning = FALSE}
library(mdsr)
library(tidyverse)
library(RMySQL)
library(ggthemes)
library(rvest)
library(knitr)
db <-dbConnect_scidb(dbname = "imdb")
```
{ width=10in }
<br>
</br>
Complaints that no movies are original anymore have begun to surface in recent years. "As a film fan it’s really disheartening to see Hollywood go back to the well so often," says one [ScreenRant](https://screenrant.com/movie-remakes-movie-sequels/) author. It certainly feels that way. [Just this year](http://uproxx.com/movies/2017-sequels/) we have *Cars 3*, a new *Guardians of the Galaxy*, a *Beauty and the Beast* remake, the __ninth__ *Planet of the Apes* movie, and the latest installment in the *Star Wars* franchise. However, in reality, far fewer of these movies are released than may appear.
In a 2016 [Tylt poll](https://thetylt.com/entertainment/are-there-too-many-hollywood-reboots-and-sequels), the vast majority (79.3%) of respondents said that there are “#TooManyReboots.” [^1] We set out to determine if this was actually true, or if the number of unoriginal movies is being blown out of proportion. As it turns out, we are not even at peak sequel or remake production, and when considering the fact that more movies are being made now than ever before, unoriginal films make up a significantly smaller proportion of the movie industry than in the past.
In order to determine the number of unoriginal movies produced around the world, we pulled data on all feature length movies released in the past century tagged with the keywords `remake`, `reboot`, or `sequel` in the IMDb.[^2]
Remakes generally reference projects that keep the most important features of an older movie, such as characters and story, and remake it using different actors, more advanced film techniques, and special effects. Remakes tend to add new, socially relevant dialogue to appeal to modern audiences. Think *Scarface*, based on a 1932 movie of the same name.
Reboots are complete overhauls of the source material–a reimagining of a story. Typically, the source is stripped down to its most core elements, and a new story is built predominantly from scratch. Think *Batman Begins* or the 2009 movie my family lovingly refers to as "Star Trek Kids."
Sequels continue the story of, or expand upon, earlier films. These often have the same characters, and similar settings, as the films they follow. Major film franchises have lots of sequels-*Star Wars*, Marvel, and *Pirates of the Caribbean*, to name a few.
<div style="margin-bottom:40px;">
```{r}
# Query to find all movies tagged with remake, reboot, or sequel
remakes_reboots_sequels <- db %>%
dbGetQuery(
"SELECT kw.id, kw.keyword, t.title, t.production_year
FROM imdb.keyword kw
JOIN movie_keyword mkw ON kw.id = mkw.keyword_id
JOIN title t ON t.id = mkw.movie_id
LEFT JOIN movie_info mi ON mi.movie_id = t.id
LEFT JOIN movie_info mi2 ON mi2.movie_id = t.id
WHERE kw.id IN (82, 1134, 20757) AND t.kind_id = 1
AND t.production_year BETWEEN 1917 AND 2017
AND mi.info_type_id = 1 AND mi.info >= 80
GROUP BY t.id;")
# 82, 1134, and 20757 are the keyword id numbers that correspond to the keywords remake, reboot, and sequel.
#This query is optimized because we use the primary keys in the keyword and title tables to join. When we use "where" to select the IDs from title and keyword, SQL only has to look at 1 and 3 rows, respectively. On movie keyword, keyword id is indexed, so we only have to look at 32 rows when referencing that table! Therefore, this query is extremely fast.
remakes_by_year <- remakes_reboots_sequels %>%
filter(!is.na(production_year)) %>%
group_by(production_year) %>%
summarize(
reboot = sum(keyword == "reboot"),
remake = sum(keyword == "remake"),
sequel = sum(keyword == "sequel"))
remakes_by_year_gathered <- remakes_by_year %>%
gather(key = "type", value = "count", reboot:sequel)
#This code puts the information pulled from the query into a form that allows us to graph the information for sequels, remakes, and reboots all on one single graph.
#Note that the movie_keyword table includes keywords for remake, sequel, AND reboot (unlike the link_type table), so here we can see reboots as well.
```
```{r}
# Function to find the 3 years with the most sequels, remakes, and reboots
most_in_year <- function(movie_type) {
remakes_reboots_sequels %>%
filter(keyword == movie_type, !is.na(production_year)) %>%
group_by(production_year, keyword) %>%
summarize(N = n()) %>%
arrange(desc(N)) %>%
head(1)
}
# The above function finds the year with the most movies of a given type made (e.g. the year with the most sequels)
list <- c("reboot", "remake", "sequel")
top_year_by_type <- lapply(list, FUN = most_in_year) %>% bind_rows()
```
```{r, warning = FALSE}
# Graph the numbers of remakes, reboots, and sequels over time
remakes_by_year_gathered$type <- factor(remakes_by_year_gathered$type, levels = c("sequel", "remake", "reboot"))
plot <- remakes_by_year_gathered %>%
ggplot(aes(x = production_year, y = count, col = type)) +
geom_line(size = 2) +
scale_x_continuous(
breaks = seq(from = 1920, to = 2020, by = 10),
expand = c(0,0)) +
# Add more labels (for each decade) to the plot
scale_y_continuous(
breaks = seq(from = 0, to = 90, by = 15)) +
geom_vline(xintercept = 2017) +
# Add a line showing the current year. All movies after this line are planned and not actually finished.
scale_color_brewer(
palette = "Set1",
name= "Movie Type",
breaks = c("sequel","remake", "reboot"),
labels = c("Sequel", "Remake", "Reboot")) +
xlab(NULL) +
ylab("Number of Movies Produced") +
ggtitle ("Number of Movie Sequels, Remakes, and Reboots Over The Last 100 Years") +
labs(caption = "Source: IMDb") +
geom_curve(
x = 1983, xend = 1994,
y = 60, yend = 73,
arrow = arrow(length = unit(0.3,"cm")),
curvature = -0.5,
color = "black") +
geom_curve(
x = 2007, xend = 1999,
y = 50, yend = 41,
arrow = arrow(length = unit(0.3, "cm")),
curvature = -0.5,
color = "black") +
geom_curve(
x = 2008, xend = 2016,
y = 10, yend = 2,
arrow = arrow(length = unit(0.3, "cm")),
curvature = 0.5,
color = "black") +
geom_text(
x = 2006, y = 54,
label = "Most Remakes:\n41 in 1999",
size = 4,
color = "black") +
geom_text(
x = 1980, y = 57,
label = "Most Sequels:\n73 in 1994",
size = 4,
color = "black") +
geom_text(
x = 2010, y = 14,
label = "Most Reboots:\n2 in 2017",
size = 4,
color = "black") +
theme_fivethirtyeight()
plot +
geom_point(data = top_year_by_type, aes(x = production_year, y = N), shape = 21, size = 3, fill = "white", color = "black") #Plot top years for each type of movie as dots on the plot
```
</div>
According to the IMDb data, the numbers of unoriginal movies released in 2017 were:
Sequels: 54
Remakes: 9
Reboots: 2
These might seem like pretty big numbers when you consider how many movies you watch per year--who has time to go see 54 sequels? That's more than one movie a week!--but when you look at the history of these unoriginal movies, the only type currently at its peak is reboots, with two films in 2017. Remakes and sequels both reached their peak a while ago: in 1999 and 1994, respectively. We have fewer sequels and remakes being made now, and if we take into consideration the fact that the movie industry has grown significantly, these unoriginal films make up a smaller percentage of the movie industry than they ever have.
<div style="margin-bottom:40px;">
```{r}
total_movies <- db %>%
dbGetQuery(
"SELECT production_year, count(distinct t.id) AS number
FROM imdb.title t
LEFT JOIN movie_info mi ON mi.movie_id = t.id
LEFT JOIN movie_info mi2 on mi2.movie_id = t.id
WHERE kind_id = 1 AND production_year BETWEEN 1917 AND 2017
AND mi.info_type_id = 1 AND mi.info >= 80
GROUP BY production_year;")
```
```{r}
proportion <- total_movies %>%
inner_join(remakes_by_year, by = 'production_year') %>%
mutate(sum_srr = remake + reboot + sequel) %>%
mutate(prop_srr = sum_srr/number *100)
```
```{r, warning=FALSE}
ggplot(proportion, aes(x= production_year, y = prop_srr)) +
geom_area() +
scale_x_continuous(
name = "Production Year",
limits = c(1917, 2017),
breaks= seq(from = 1920, to = 2020, by = 10)) +
scale_y_continuous(
labels = c("0%", "2.5%", "5%", "7.5%", "10%"))+
ggtitle ("Percentage of Movies Considered Unoriginal") +
labs(caption = "Source: IMDb") +
theme_fivethirtyeight()
```
</div>
As seen above, the highest percentage of unoriginal movies occurred around 1920, almost 100 years ago. Even in the biggest year for unoriginality, sequels, remakes, and reboots only made up just over 10% of movies.
Additionally, over the last few decades, the percentage of unoriginal movies has been steadily declining. This year, only 1% of all movies made were unoriginal.
An big reason for this is the quickly increasing number of movies produced. A hundred years ago, around 50 movies were released annually. This year, there were over 7,000. To put that number into perspective, if you sat down right now and watched every movie back-to-back without stopping, it would take you more than 1 1/2 years to finish. Sequel, remake and reboot production just can’t seem keep up with the ever growing number of movies.
So if there really aren’t that many unoriginal movies being made, why does it seem like they’re everywhere?
There is certainly a reason that sequels, remakes, and reboots fill our thoughts so much, and that reason is **money**. Of the top 20 [highest grossing movies](http://www.boxofficemojo.com/alltime/world/) of all time, only 4 can be classified as truly original: *Avatar,* *Titanic,* *Frozen,* and *Minions.* [^3]
```{r}
url <- "http://www.boxofficemojo.com/alltime/world/"
html_bom <- read_html(url)
tables <- html_bom %>%
html_nodes("table")
top_grossing <- tables[[3]] %>%
html_table(header = TRUE)
colnames(top_grossing)[4] <- "Gross (millions)"
kable(top_grossing[1:20, -(5:8) ], caption = "Top Grossing Movies of All Time")
```
\s `^ = Movie made its gross over multiple releases.`
The success of unoriginal films becomes even more impressive when considering [individual years](https://stephenfollows.com/how-original-are-hollywood-movies/). In 2007, 2013, and 2014, the top ten grossing movies were all “derived from a pre-existing movie or source.” [^4]
Unoriginal films just seem to attract big crowds. We like things we are familiar with, something known as the [Mere Exposure Effect](https://www.psychologytoday.com/blog/sapient-nature/201201/familiarity-breeds-enjoyment). Familiarity with a movie’s premise or characters makes us more likely to see it. Additionally, unoriginal movies require a lot less work on the part of the viewer if they are familiar with the original. Many people use movies as a way to relax, so having less work figuring out the plot and characters can be appealing.
It’s no wonder, then, that unoriginal movies fill our thoughts. These are the blockbuster movies that bring in the most revenue, are the most seen, are the most heavily advertised, and are therefore the most talked about. You can probably count on one hand the number of people you know who are unaware that *The Last Jedi* was just released.
If people continue to purchase tickets to unoriginal movies, companies will continue to produce them because there’s little risk involved. Producers can predict how large their audience will be, there’s already a lot of time put into character development, and there’s already a dedicated fan base they know will go watch the movie when it’s released. Who wouldn’t make a movie guaranteed to be a success?
Moviegoers are causing their own frustration. The only way to reduce the number of unoriginal movies made and reduce the level of annoyance is to stop watching them. If you’re someone who votes “#TooManyReboots,” it might be time to consider watching some indie movies instead. You’re not likely to get anywhere by streaming into the theaters every time a new *Star Wars* movie gets announced.
*The iterations for this project can be found on GitHub*[^5]
[^1]: This poll is not a random sample. It was voluntary and only completed by users of the Tylt website. However, it does still provide a glimpse as to what moviegoers think of reboots, since it can be assumed that most respondents were film fans.
[^2]: The IMDb was accessed through Smith College’s `scidb` database using MySQL. Feature length movies are movies over 80 min, and our query included all movies released worldwide (not just the US).
[^3]: All other movies were either a sequel (or series installment), remake, or reboot. Additionally, the trend of unoriginality can be seen all throughout the top 100 movies.
[^4]: A pre-existing source in this case does not have to be another movie. It could also mean the movie is a remake of a book. This is described in detail on the linked website.
[^5]: [Repository](https://github.com/ambertin/mp4.git)