![](https://static.wixstatic.com/media/8441f7_be7ad52febaa45dfb38265045c6c1efd~mv2.png/v1/fill/w_800,h_1067,al_c,q_90,enc_auto/8441f7_be7ad52febaa45dfb38265045c6c1efd~mv2.png)
Walt Disney Studios is the foundation on which The Walt Disney Company was built. The Studios has produced more than 600 films since their debut film, Snow White and the Seven Dwarfs in 1937. While many of its films were big hits, some of them were not. In this notebook, I will use a dataset of Disney movies and analyse what contributes to the success of Disney movies.
* The dataset can be found here >> Link
It contains 579 Disney movies with six features: movie title, release date, genre, MPAA rating, total gross, and inflation-adjusted gross.
Let's start importing the data and exploring the features!
Data wrangling : Exploratory and Manipulation
# Import pandas library import pandas as pd # Read the file into gross and convert data type of release_date from string to 'date' gross = pd.read_csv('datasets/disney_movies_total_gross.csv', parse_dates=['release_date']) # Print out gross gross.head() |
![](https://static.wixstatic.com/media/8441f7_e0d29a4be7d44215a15b642c8d67ac6a~mv2.png/v1/fill/w_980,h_252,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/8441f7_e0d29a4be7d44215a15b642c8d67ac6a~mv2.png)
Let’s have a look at data dimensionality.
print("Total rows and columns:", gross.shape) |
![](https://static.wixstatic.com/media/8441f7_700b7d695c0e4a2eb134b6136d510d2a~mv2.png/v1/fill/w_602,h_62,al_c,q_85,enc_auto/8441f7_700b7d695c0e4a2eb134b6136d510d2a~mv2.png)
From the output, we can see that the table contains 579 rows and 6 columns. Now let's see how many null values in the dataset.
gross.isna().sum() |
![](https://static.wixstatic.com/media/8441f7_50bcbab819d54281ac6e7c55d6ae4cb8~mv2.png/v1/fill/w_564,h_266,al_c,q_85,enc_auto/8441f7_50bcbab819d54281ac6e7c55d6ae4cb8~mv2.png)
There are 2 columns with null values, genre 17 rows and mpaa_rating 56 rows. So I will delete them from the dataset. And preview the check the result again.
gross = gross.dropna() gross.isna().sum() |
![](https://static.wixstatic.com/media/8441f7_681b2522902e4da58d7cfdba5772ebe4~mv2.png/v1/fill/w_528,h_262,al_c,q_85,enc_auto/8441f7_681b2522902e4da58d7cfdba5772ebe4~mv2.png)
I also want to preview basic statistical characteristics of each numerical feature (number of non-missing values, mean, standard deviation, range, median, 0.25 and 0.75 quartiles.)
gross.describe() |
![](https://static.wixstatic.com/media/8441f7_db3413870e06427c924c3a7732bdde8c~mv2.png/v1/fill/w_602,h_498,al_c,q_85,enc_auto/8441f7_db3413870e06427c924c3a7732bdde8c~mv2.png)
Here is the unique value of 'genre' column
gross['genre'].unique() |
![](https://static.wixstatic.com/media/8441f7_7127ffcd3f3e475592580509a7dee826~mv2.png/v1/fill/w_980,h_108,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/8441f7_7127ffcd3f3e475592580509a7dee826~mv2.png)
Top 10 movies at the box office
By sorting movies by their inflation-adjusted gross, it shows the top 10 movies by the revenue performance.
inflation_adjusted_gross_desc = gross.sort_values(by='inflation_adjusted_gross', ascending=False) inflation_adjusted_gross_desc.head(10) |
![](https://static.wixstatic.com/media/8441f7_3683be6d2902407faf74be014b7cdef1~mv2.png/v1/fill/w_980,h_411,al_c,q_90,usm_0.66_1.00_0.01,enc_auto/8441f7_3683be6d2902407faf74be014b7cdef1~mv2.png)
The movie Genre Trend
From the top 10 movies above, it seems that some genres are more popular than others. To do this, I will group movies by genre and calculate the average gross revenue.
# Compute mean of adjusted gross per genre and per year group = gross.groupby(['genre']).mean() # Convert the GroupBy object to a DataFrame genre = group.reset_index() # Inspect genre_yearly genre.head(10) |
![](https://static.wixstatic.com/media/8441f7_045058cf30f84ea3ae0d6a6a1aaaf140~mv2.png/v1/fill/w_802,h_596,al_c,q_90,enc_auto/8441f7_045058cf30f84ea3ae0d6a6a1aaaf140~mv2.png)
Let' see the visualisation of it.
# Import seaborn library import seaborn as sns import matplotlib.pyplot as plt # Plot the data sns.set(font_scale=1.0) sns.catplot(x='genre', y='inflation_adjusted_gross', data=genre, kind='bar') # Using Matplotlib to change axis plt.xticks(rotation=90) plt.xlabel("Genre", fontsize=15) plt.ylabel("Gross revenue", fontsize=15) |
![](https://static.wixstatic.com/media/8441f7_5939d24f41314017968c3be4da1979f7~mv2.png/v1/fill/w_726,h_708,al_c,q_90,enc_auto/8441f7_5939d24f41314017968c3be4da1979f7~mv2.png)
Conclusion
The disney movie that has the highest revenue is "Snow White and the Seven Dwarfs" at around 5.2B USD which released in 1937.
"Musical" is the genre that has the highest average revenue, around 600M USD, leaving the second and third which are "Advanture" and "Action" over 3X.
This means that the movie "Snow White and the Seven Dwarfs" exceed over 8.7X of the average revenue of "Musical" genre.
Comments