Matplotlib and Pyplot#

To create data visualizations, we will need to import the necessary libraries and packages. We will largely be using the matplotlib library, which is a popular tool for visualizing data from pandas DataFrames. Within the matplotlib library, a module called pyplot can be used to allow for customization to figures. Some common customizations include adding a legend, creating a title, setting axis limits and increments, and many others. To utilize the pyplot module, it needs to be imported from the matplotlib library. Often times, it is imported as the alias plt.

For our visualizations we will set a style - a specific appearance for our graphs. To do this, we will use the plt.style.use() function and use the fast style for our plots. The fast style has a great visual appeal and is one of several style options available through matplotlib. A link to documentation on the various style options can be found here.

Other visualization libraries, such as seaborn can be used as well and will be used later in the chapter.

For now, let’s start by importing our libraries:

import numpy as np
import pandas as pd

# Imports the pyplot module from matplotlib
from matplotlib import pyplot as plt

# Sets the style for visualizations
plt.style.use('fast')

To practice making these visualizations, we will be working with data from the World Bank. This data examines the military spending in each country in North America 1960-2020.

Let’s load the data and begin to explore it.

military = pd.read_csv("../../data/NorthAmerica_Military_USD-PercentGDP_Combined.csv", index_col='Year')

military
CAN-PercentGDP MEX-PercentGDP USA-PercentGDP CAN-USD MEX-USD USA-USD
Year
1960 4.185257 0.673509 8.993125 1.702443 0.084000 47.346553
1961 4.128312 0.651780 9.156031 1.677821 0.086400 49.879771
1962 3.999216 0.689655 9.331673 1.671314 0.099200 54.650943
1963 3.620650 0.718686 8.831891 1.610092 0.112000 54.561216
1964 3.402063 0.677507 8.051281 1.657457 0.120000 53.432327
... ... ... ... ... ... ...
2016 1.164162 0.495064 3.418942 17.782776 5.336876 639.856443
2017 1.351602 0.436510 3.313381 22.269696 5.062077 646.752927
2018 1.324681 0.477517 3.316249 22.729328 5.839521 682.491400
2019 1.278941 0.523482 3.427080 22.204408 6.650808 734.344100
2020 1.415056 0.573652 3.741160 22.754847 6.116377 778.232200

61 rows × 6 columns

The data consists of an index and six columns. When importing the data, we used the index_col argument to set the Year column to our index. This will make things easier down the line when we want to extract data for a particular year of interest, rather than thinking of which index corresponds to our year of interest. Information about our data is listed below:

Year

The year of the collected data

CAN-PercentGDP

Percentage of the Gross Domestic Product of Canada spent on the military

MEX-PercentGDP

Percentage of the Gross Domestic Product of Mexico spent on the military

USA-PercentGDP

Percentage of the Gross Domestic Product of the United States spent on the military

CAN-USD

Amount of money (in billions, USD) spent on the military in Canada

MEX-USD

Amount of money (in billions, USD) spent on the military in Mexico

USA-USD

Amount of money (in billions, USD) spent on the military in the United States

In the upcoming exercises, we will explore these data using various visualizations. With these visualizations, we can construct a narrative of what the data show and mean.

Resources#