The 4 Types of Johnny Depp MoviesJordan BoonstraBlockedUnblockFollowFollowingJan 17Inspired by this article from 538, which details five distinct types of Nicolas Cage movies, I wanted to try and take on this subject myself.

This type of analysis is a great opportunity to combine data science with an interesting topic such as the entertainment business.

To successfully find distinct types of movies for a given actor or actress, you need someone that has had some serious range in both box office success and critical acclaim.

After some thought and examination, Johnny Depp seems to fit that mold perfectly.

Depp has acted in over 80 movies, with some high highs (Pirates of the Caribbean, Charlie and the Chocolate Factory) and some low lows (Mortdecai, Private Resort).

With an actor in focus, I went forth and gathered box office revenue¹ and Metacritic² scores on every domestic film that Johnny Depp has had an acting credit in over the course of his acting career.

Once all the necessary data was collected, I used Python to cluster the movies into different groups, and R for creating the final visual.

I’ll first dive into the four types of Johnny Depp movies, and then I will explain the process and reasoning of placing the movies into groups.

Without further ado, here are the four types of Johnny Depp movies.

Breakdown of Movie TypesPirate/Fantasy BlockbustersFilms: “Pirates of the Caribbean: Curse of the Black Pearl” (2003), “Charlie and the Chocolate Factory” (2005), “Pirates of the Caribbean: Dead Man’s Chest” (2006), “Pirates of the Caribbean: At World’s End” (2007), “Alice in Wonderland” (2010), “Pirates of the Caribbean: On Stranger Tides” (2011), “Fantastic Beasts and Where to Find Them” (2016)These movies have all been huge hits at the box office and typically feature Johnny Depp as an “interesting” featured character such as Captain Jack Sparrow or Willy Wonka.

The movies that have had the best success in the box office have all been released post-2000, and every major success involving Johnny Depp in this group has been in the fantasy genre.

Sherlock Gnomes TerritoryFilms: “Private Resort” (1985), “Benny & Joon” (1993), “Nick of Time” (1995), “Fear and Loathing in Las Vegas” (1998), “The Ninth Gate” (1999), “The Astronaut’s Wife” (1999), “The Man Who Cried” (2000), “Blow” (2001), “From Hell” (2001), “Once Upon a Time in Mexico” (2003), “Secret Window” (2004), “The Libertine” (2004), “The Tourist” (2010), “The Rum Diary” (2011), “Dark Shadows” (2012), “The Lone Ranger” (2013), “Transcendence” (2014), “Mortdecai” (2015), “Alice Through the Looking Glass” (2016), “Pirates of the Caribbean: Dead Men Tell No Tales” (2017), “Murder on the Orient Express” (2017), “Sherlock Gnomes” (2018), “Fantastic Beasts: The Crimes of Grindelwald” (2018)The “Sherlock Gnomes” territory is the largest category for Johnny Depp and has had a big increase in size in recent years.

The lowest of the low for Johnny Depp have been all over the place.

Some have been uninspired continuations of past movies such as “Alice Through the Looking Glass” or “Pirates of the Caribbean: Dead Men Tell No Tales”.

Others, like “Private Resort” and “Sherlock Gnomes”, made you wonder why they were made in the first place.

There were not a lot of poorly performing movies at the beginning of Depp’s career, but he has had quite the stretch of them since 2010.

Unique and Well-ReceivedFilms: “A Nightmare on Elm Street” (1984), “Cry-Baby” (1990), “Edward Scissorhands” (1990), “Arizona Dream” (1993), “What’s Eating Gilbert Grape” (1993), “Ed Wood” (1994), “Don Juan DeMarco” (1994), “Dead Man” (1995), “Donnie Brasco” (1997), “Sleepy Hollow” (1999), “Before Night Falls” (2000), “Chocolat” (2000), “Finding Neverland” (2004), “Corpse Bride” (2005), “Sweeney Todd: The Demon Barber of Fleet Street” (2007), “The Imaginarium of Doctor Parnassus” (2009), “Public Enemies” (2009), “Rango” (2011), “Into the Woods” (2014), “Black Mass” (2015)This group makes up a decent amount of Depp’s filmography.

Many of the movies included in this group are originals, and they display the wide variety of Depp’s acting throughout his career.

From the classic horror film “A Nightmare on Elm Street” to the drama “What’s Eating Gilbert Grape”, to the animated comedy “Rango”.

While they have not been as prevalent in recent years, Depp has been consistently putting out original and well-received movies throughout his 35+ year career.

PlatoonFilm: “Platoon” (1986)The last group is simply “Platoon”.

“Platoon” is quite the anomaly in the Depp filmography.

It was the third ever film he had acted in, and it is the only war film that Depp has ever acted in.

It was a major box office success, bringing in $138.

5 million on a budget of just $6 million.

Along with box office success, Platoon also won the Academy Award for Best Picture at the 59th Academy Awards.

Process of Clustering MoviesAfter gathering filmography data from IMDb, box office revenue data from Box Office Mojo, and critic rating data from Metacritic, it was time to prepare an algorithm that could adequately group movies into different groups.

To effectively separate movies into distinct groups, I implemented a clustering algorithm known as a Gaussian Mixture Model.

Why did I choose a Gaussian Mixture Model?.This algorithm works very well when there are only two dimensions to evaluate (revenue & critic score), and the results of this algorithm may end up being more “exciting” for this type of analysis compared to other clustering algorithms, because Gaussian Mixture Modeling does not assume that different groups will align to geometric shapes or structures.

Before implementing a Mixture Model, how can we know how many clusters that the algorithm should be searching for?.Finding the right number of clusters can be tricky and very subjective, so I will use what is known as an elbow plot to identify an appropriate number of clusters.

With the assistance of the Python Scikit-Learn library, I implemented an algorithm similar to Gaussian Mixture Modelling known as “Expectation-Maximization” style K-Means clustering.

while targeting the data for 1 to 10 clusters to see which number may work the best.

The “best” number of clusters to use is typically the point where at which the Sum of Squared Errors (SSE) stops declining at a rapid pace³.

from sklearn.

cluster import KMeansfrom sklearn.

mixture import GaussianMixturefrom sklearn.

preprocessing import StandardScalerimport pandas as pdimport matplotlib.

pyplot as plt# Reading in the datamovies = pd.

read_csv('Johnny Depp Movies.

csv')# Using the Sci-kit Learn Standard Scaler for z-score transformation of# numerical variables.

scaler = StandardScaler()movies['scores_scaled'] = scaler.

fit_transform(movies[['Metacritic Score']])movies['box_office_scaled'] = scaler.

fit_transform(movies[['Box Office']])# Running an Expectation-Maximization K-Means algorithm at increasing# amounts of clusters to find an appropriate target amount for # the Gaussian Mixture ModelSSE = []i = 1while i < 10: kmean = KMeans(n_clusters = i, max_iter = 1000, algorithm = 'full') kmean_fit = kmean.

fit(movies[['scores_scaled', 'box_office_scaled']]) SSE.

append(kmean_fit.

inertia_) i += 1# Plotting the results of the Expectation-Maximization K-Meansplt.

style.

use('default')plt.

suptitle('K-Means Sum of Squared Errors', fontsize = 15)plt.

xlabel('# of Clusters', fontsize = 12)plt.

ylabel('Sum of Squared Errors', fontsize = 12)plt.

plot(SSE)While there is not a definite point to choose that is perfect in this scenario, I decided to go with 4 clusters, as it seems to be the first point where the SSE stops declining at a high rate.

After choosing 4 as the set number of clusters, I employed a Gaussian Mixture Model.

# Setting up a Gaussian Mixture Model with 4 clusters and 1000 iterationsgmm = GaussianMixture(n_components=4, covariance_type = 'full', max_iter=1000, n_init = 10)# Fitting the model to the datagmm_fit = gmm.

fit(movies[['scores_scaled', 'box_office_scaled']])# Finally assigning labels to the moviesmovies['Cluster'] = gmm_fit.

predict(movies[['scores_scaled', 'box_office_scaled']])With the cluster labels now assigned to each movie, I plotted the data in R with the ggplot2 package and highlighted some of the more interesting data points.

Some final touches were made with Microsoft Word.

# Loading the ggplot2 package and reading in the datalibrary(ggplot2)# Plotting out the dataggplot(df2, aes(x=Metacritic.

Score, y=box.

millions, color = Cluster)) + geom_point(size = 2.

5, alpha = .

5) +# Highlighting some points of interest geom_point(data=df2[3, ], colour="magenta4", size=2.

5) + geom_point(data=df2[30, ], colour="red", size=2.

5) + geom_point(data=df2[35, ], colour="red", size=2.

5) + geom_point(data=df2[28, ], colour="red", size=2.

5) + geom_point(data=df2[29, ], colour="green4", size=2.

5) + geom_point(data=df2[5, ], colour="green4", size=2.

5) + geom_point(data=df2[44, ], colour="blue", size=2.

5) + geom_point(data=df2[50, ], colour="blue", size=2.

5) +# Finishing touches to the plot labs(title = "Types of Johnny Depp Movies") + xlab('Metacritic Score') + ylab('Domestic Box Office (Millions)') + scale_color_manual(values = c('magenta4','blue', 'red', 'green4')) + ylim(0, 600) + xlim(0, 100) + theme_classic() + theme(legend.

position="none") + theme(plot.

title = element_text(face = 'bold', size = 20, hjust=0.

5), axis.

title.

x = element_text(face = 'bold', size = 15), axis.

title.

y = element_text(face = 'bold', size = 15))That concludes the process of gathering data, implementing a clustering algorithm, and visualizing the discoveries.

I hope you all enjoyed this article on the wide variety of the Johnny Depp filmography while learning about how Gaussian Mixture Modelling can be used to make some interesting discoveries.

¹ Box office revenue was converted to 2019 dollars via Box Office Mojo.

² Rotten Tomatoes average critic score was used as a substitute for some movies that are not listed on Metacritic.

³ Z-score scaling was used on box office revenue and movie score data to structure the data for K-Mean clustering.

.. More details