The data visualization is one of the most important fundamental toolkits of a data scientist. A good visualization is very hard to produce. Often during a presentation, people don’t understand well enough the data, or the statistics involved but showing them a good visualization will help them understand the story we are trying to convey them. Therefore, they say a picture is worth a thousand words.
I believe that visualization is one of the most powerful means of achieving personal goals. — Harvey Mackay
One of the best and the most commonly used library used for visualization is called matplotlib. This library produces publishable quality plots. Throughout the tutorial, we will use the pyplot module. If you are using a jupyter notebook then you can directly import the library otherwise, you can use the below command to install it manually:
The current new version of matplotlib is 3.2.1. You can refer to the official installation docs here.
pip install matplotlib
If you are using a jupyter notebook then you might want to add a “!” at the beginning of the command. This is just informing the kernel that the command is being entered.
!pip install matplotlib
A bar plot or bar graph is a plot/graph that represents the value of categorical data with rectangle bars. The rectangle bars can be horizontal or vertical. Categorical data here can be the name of the movies, countries, football players, etc. Correspondingly the values can be the count of the movies that won Oscar, GDP of a country, players who scored most goals, etc.
Below is the general syntax of the bar plot:
bar(x, height, width, bottom, *, align)
bottom = The ‘y’ coordinate of the bars. The default value is 0.
height = The ‘height’ of the bars.
width = The ‘width’ of the bars. The default value is 0.8.
align = The alignment of the bars based on the ‘x’ coordinate. The default value is “center” which centers the base on the ‘x’ position. Similarly, the alternate value is “edge” which align the left edges of the bar with respect to the ‘x’ coordinates.
The ‘*’ represents alternative parameters, I will mention only the most used parameter such as:
color = The color of the bar plot. The values must be either ‘r’, ‘g’, ‘b’, and any combination of all three. Also, colors such as ‘red’, ‘cyan’, etc are also valid.
orientation = The orientation of the bars. The values are ‘horizontal’ and ‘vertical’, their working is pretty much self-explanatory.
Now, from here on I will explain the concepts via examples so that you will clearly understand its usage.
Here to plot the bar plot, I will use the data from worldometer, which is coronavirus total death counts from the top 6 countries. The data was taken on 8–6–20, at 10:18 AM (CST).
# Importing the matplotlib library
import matplotlib.pyplot as plt
# Categorical data: Country names
countries = ['USA', 'Brazil', 'Russia', 'Spain', 'UK', 'India']
# Integer value interms of death counts
totalDeaths = [112596, 37312, 5971, 27136, 40597, 7449]
# Passing the parameters to the bar function, this is the main function which creates the bar plot
plt.bar(countries, totalDeaths)
# Displaying the bar plot
plt.show()
In the below plot, let us add some spices to the plot, spices in the sense of adding some more parameters and making the plot look better and more informative. Also, there are some attributes that we can use to make the bar plot more informative. Below are some things that I would add:
figsize = (12,7)
: Helps in setting the height and width of the plot. But one twist is the order is interchanged which is (width, height) or (y, x).
width= 0.9
: It helps in setting the width of the bars.
color = ‘cyan’: It helps in setting the color of the bars. edgecolor = ‘red’: It helps in setting the edge color of the bars.
annotate = (‘text’, (x, y))
: Helps for annotation the bars, include the text or the string along with the desired location as x and y coordinates.
legend(labels = [‘Text’])
: It helps for setting up a label for the bar plot.
title(‘Text’)
: Helps in providing a title for the bar plot
xlabel(‘Text’), ylabel(‘Text’)
: Helps in providing the name for the x and y-axis of the plot.
savefig(‘Path’)
: It helps in saving the plot to your local machine or anywhere. You can save in different formats such as “PNG”, “JPEG”, etc.
The ‘Text’ here can be replaced by the string of your choice, and the ‘Path’ represents the path where you want to store the plot.
# Importing the matplotlib library
import matplotlib.pyplot as plt
# Declaring the figure or the plot (y, x) or (width, height)
plt.figure(figsize = (12,7))
# Categorical data: Country names
countries = ['USA', 'Brazil', 'Russia', 'Spain', 'UK', 'India']
# Integer value interms of death counts
totalDeaths = [112596, 37312, 5971, 27136, 40597, 7449]
# Passing the parameters to the bar function, this is the main function which creates the bar plot
plt.bar(countries, totalDeaths, width= 0.9, align='center',color='cyan', edgecolor = 'red')
# This is the location for the annotated text
i = 1.0
j = 2000
# Annotating the bar plot with the values (total death count)
for i in range(len(countries)):
plt.annotate(totalDeaths[i], (-0.1 + i, totalDeaths[i] + j))
# Creating the legend of the bars in the plot
plt.legend(labels = ['Total Deaths'])
# Giving the tilte for the plot
plt.title("Bar plot representing the total deaths by top 6 countries due to coronavirus")
# Namimg the x and y axis
plt.xlabel('Countries')
plt.ylabel('Deaths')
# Saving the plot as a 'png'
plt.savefig('1BarPlot.png')
# Displaying the bar plot
plt.show()
Yes, you read it right. By adding one extra character ‘h’, we can align the bars horizontally. Also, we can represent the bars in two or more different colors, this will increase the readability of the plots. Show below is the code with the modifications.
# Importing the matplotlib library
import matplotlib.pyplot as plt
# Declaring the figure or the plot (y, x) or (width, height)
plt.figure(figsize=[14, 10])
# Passing the parameters to the bar function, this is the main function which creates the bar plot
# For creating the horizontal make sure that you append 'h' to the bar function name
plt.barh(['USA', 'Brazil', 'Russia', 'Spain', 'UK'], [2026493, 710887, 476658, 288797, 287399], label = "Danger zone", color = 'r')
plt.barh(['India', 'Italy', 'Peru', 'Germany', 'Iran'], [265928, 235278, 199696, 186205, 173832], label = "Not safe zone", color = 'g')
# Creating the legend of the bars in the plot
plt.legend()
# Namimg the x and y axis
plt.xlabel('Total cases')
plt.ylabel('Countries')
# Giving the tilte for the plot
plt.title('Top ten countries most affected by\n coronavirus')
# Saving the plot as a 'png'
plt.savefig('2BarPlot.png')
# Displaying the bar plot
plt.show()
At times you might want to stack two or more bar plots on top of each other. With the help of this, you can differentiate two separate quantities visually. To do this just follow.
# Importing the matplotlib library
import matplotlib.pyplot as plt
# Declaring the figure or the plot (y, x) or (width, height)
plt.figure(figsize=[15, 5])
# Categorical data: Country names
countries = ['USA', 'Brazil', 'Russia', 'Spain', 'UK', 'India']
# Integer value interms of death counts
totalCases = (2026493, 710887, 476658, 288797, 287399, 265928)
# Integer value interms of total cases
totalDeaths = (113055, 37312, 5971, 27136, 40597, 7473)
# Plotting both the total death and the total cases in a single plot. Formula total cases - total deaths
for i in range(len(countries)):
plt.bar(countries[i], totalDeaths[i], bottom = totalCases[i] - totalDeaths[i], color='black')
plt.bar(countries[i], totalCases[i] - totalDeaths[i], color='red')
# Creating the legend of the bars in the plot
plt.legend(labels = ['Total Deaths','Total Cases'])
# Giving the tilte for the plot
plt.title("Bar plot representing the total deaths and total cases country wise")
# Namimg the x and y axis
plt.xlabel('Countries')
plt.ylabel('Cases')
# Saving the plot as a 'png'
plt.savefig('3BarPlot.png')
# Displaying the bar plot
plt.show()
In the above plot, we can see two different variations of the data being plotted which are the total deaths and the total cases.
Often many-a-times you might want to group two or more plots just to represent two or more different quantities or whatever. Also in the below code, you can learn to override the name of the x-axis with the name of your choice.
# Importing the matplotlib library
import numpy as np
import matplotlib.pyplot as plt
# Declaring the figure or the plot (y, x) or (width, height)
plt.figure(figsize=[15, 10])
# Data to be plotted
totalDeath = [113055, 37312, 5971, 7473, 33964]
totalRecovery = [773480, 325602, 230688, 129095, 166584]
activeCases = [1139958, 347973, 239999, 129360, 34730]
# Using numpy to group 3 different data with bars
X = np.arange(len(totalDeath))
# Passing the parameters to the bar function, this is the main function which creates the bar plot
# Using X now to align the bars side by side
plt.bar(X, totalDeath, color = 'black', width = 0.25)
plt.bar(X + 0.25, totalRecovery, color = 'g', width = 0.25)
plt.bar(X + 0.5, activeCases, color = 'b', width = 0.25)
# Creating the legend of the bars in the plot
plt.legend(['Total Deaths', 'Total Recovery', 'Active Cases'])
# Overiding the x axis with the country names
plt.xticks([i + 0.25 for i in range(5)], ['USA', 'Brazil', 'Russia', 'India', 'Italy'])
# Giving the tilte for the plot
plt.title("Bar plot representing the total deaths, total recovered cases and active cases country wise")
# Namimg the x and y axis
plt.xlabel('Countries')
plt.ylabel('Cases')
# Saving the plot as a 'png'
plt.savefig('4BarPlot.png')
# Displaying the bar plot
plt.show()
In the above plot, we can easily visualize which country is doing well in terms of recovery or active cases, etc.
The bar plot is one of the most simple and interesting plots available in the matplotlib library. It’s fun to learn, I hope you guys have completely understood the in and out of the bar plot. Below is the brief table of contents I covered in the above part of the tutorial. Go take a look and make things clear in your head and come back to me if not.
The general syntax of the bar plot
Simple bar plot with no tricks involved
Learning how to use special parameters
Plotting a bar plot horizontally
Stacking two bar plot on top of another
Plotting three-bar plots next to another (Grouping)
Overriding the x-axis, learning what magic can xticks
do.
Lastly, saving a plot as an image (png).
Thank you guys that’s it for this tutorial “Mastering the Bar Plot in Python”. Hope you have learned something new today, feel free to explore more create cool bar plots. See you stay tuned for more updates until then stay safe.