Predicting Rotten Tomatoes Audience Score
Ishaan Rao, OIDD 245
As an avid movie watcher, I am always trying to find the next movie to watch. When selecting a movie to watch, I tend to be very picky and I always lean to IMDB and Rotten Tomatoes reviews to help me in my decision. Movies on IMDB are given a rating between 0 and 10, while Rotten Tomatoes reviews consist of both a Critics Score and an Audience Score, both between 0 and 100. The Critics Score is the average of ratings by movie critics, while the Audience Score is the average of ratings by people like you and me. Interestingly, there are a large portion of movies where the Audience Score is positive, while the Critics Score is very low. Similarly, there are plenty of movies where the Audience Score is low, while the critics have praised the movie highly. I wanted to investigate what drives this stark difference in ratings that is so common in movies nowadays.
First I found a Kaggle dataset containing every basic information about every movie on Rotten Tomatoes (including ratings, genre, year, etc.). Since we are interested in investigating movies with a large disparity in audience and critic score, I created two datasets: one containing 1500 movies with the largest disparity in scores, and one containing 1500 movies with the smallest disparity in scores. This will help us investigate what causes movies to have a high disparity vs. a low disparity.
We can plot the audience and critic scores from each of these datasets to visually observe this disparity. Interestingly, there are 3 movies with a disparity larger than 90%: 96 Souls, Hating Breitbart, and Is That a Gun in Your Pocket?. In all of these cases, the critics score is 0%, but audiences gave each of these movies above a 90% rating (95, 92, and 92 respectively).
Now, let’s examine the different words and feelings expressed in reviews from movies in either dataset. Hopefully there are clues in the text reviews that show why certain movies fall in to one dataset or the other. We created a wordcloud for the critics consensus of every movie in each of the datasets. We created two wordclouds for each dataset: one containing words that correspond with higher critic scores, and one that contains words corresponding with lower critic scores. For the first dataset (high disparity), the first wordcloud also corresponds with a low audience score, and the second wordcloud corresponds with a high audience score. For the second dataset (low disparity), it is the opposite.
From observing the left wordcloud above, it seems that critics value factors that may not have to do with the actual movie plot, such as the writer of the movie, the directors, the genre, and even the cast. Obviously there is some correlation between these factors and how the movie turns out, but since the critics mention this a lot in their analysis, it seems that sometimes certain A-list actors, directors, and writers can influence the rating with their name itself. On the other hand, the right wordcloud above shows that critics don’t seem to value comedy movies, predictable movies, teen movies, and formulaic movies. However, it makes sense why audiences might actually enjoy these movies, as sometimes we as movie-watchers enjoy something that will make us laugh or movies that we can follow easily. Additionally, almost all critics are not teenagers, which would make sense why they might like teen movies, despite the common teenager finding them very relatable.
From observing the wordclouds corresponding to movies with a low disparity, we notice that both critics and audiences alike value strong, powerful performances by actors and characters. Also, movies that are invoke emotion, are gripping, or are just considered film classics are highly touted by both sides. On the other hand, both critics and audiences agree that movies that are hollow, don’t invoke emotion, cheesy, and underdeveloped should not be considered great movies.
From our exploration of the wordclouds above, we gained a major insight: factors such as certain genres, directors, writers, and cast could explain the disparity between critics and audiences. Maybe the average audience member prefers certain genres (comedy), while the average critic prefers something more involved (thriller, crime, horror). We will create a predictive model using these variables to see if our intuition is correct.
We immediately see that there are very strong correlations between certain genres and audience rating (directors and writers are cut out of the image above for space reasons). As hypothesized, comedy and kids&family are loved by audiences, but are not generally considered critically acclaimed. Similarly, mystery, classics, and cult movies are loved by critics but not so much by fans. Chris Columbus stands our as a director that audiences love, but critics are not a fan of.
For movies that are loved by critics and audiences alike, only animated movies and dramas seemed to be highly agreed upon. Every other genre has a negative correlation with audience rating (and by association, critics rating). Al Pacino is one director who stands out as beloved by critics and audiences alike.
I hope you enjoyed reading my analysis of Rotten Tomatoes scores!