7.1. Matplotlib and Pyplot#
To create data visualizations, we will need to import the necessary libraries and packages. We will largely be using the matplotlib library, which is a popular tool for visualizing data from pandas DataFrames. Within the matplotlib library, a module called pyplot can be used to allow for customization to figures. Some common customizations include adding a legend, creating a title, setting axis limits and increments, and many others. To utilize the pyplot module, it needs to be imported from the matplotlib library. Often times, it is imported as the alias plt.
For our visualizations we will set a style - a specific appearance for our graphs. To do this, we will use the plt.style.use() function and use the fast style for our plots. The fast style has a great visual appeal and is one of several style options available through matplotlib. A link to documentation on the various style options can be found here.
Other visualization libraries, such as seaborn can be used as well and will be used later in the chapter.
For now, let’s start by importing our libraries:
import numpy as np
import pandas as pd
# Imports the pyplot module from matplotlib
from matplotlib import pyplot as plt
# Sets the style for visualizations
plt.style.use('fast')
To practice making these visualizations, we will be working with data from the World Bank. This data examines the military spending in each country in North America 1960-2020.
Let’s load the data and begin to explore it.
military = pd.read_csv("../../data/NorthAmerica_Military_USD-PercentGDP_Combined.csv", index_col='Year')
military
| CAN-PercentGDP | MEX-PercentGDP | USA-PercentGDP | CAN-USD | MEX-USD | USA-USD | |
|---|---|---|---|---|---|---|
| Year | ||||||
| 1960 | 4.185257 | 0.673509 | 8.993125 | 1.702443 | 0.084000 | 47.346553 |
| 1961 | 4.128312 | 0.651780 | 9.156031 | 1.677821 | 0.086400 | 49.879771 |
| 1962 | 3.999216 | 0.689655 | 9.331673 | 1.671314 | 0.099200 | 54.650943 |
| 1963 | 3.620650 | 0.718686 | 8.831891 | 1.610092 | 0.112000 | 54.561216 |
| 1964 | 3.402063 | 0.677507 | 8.051281 | 1.657457 | 0.120000 | 53.432327 |
| ... | ... | ... | ... | ... | ... | ... |
| 2016 | 1.164162 | 0.495064 | 3.418942 | 17.782776 | 5.336876 | 639.856443 |
| 2017 | 1.351602 | 0.436510 | 3.313381 | 22.269696 | 5.062077 | 646.752927 |
| 2018 | 1.324681 | 0.477517 | 3.316249 | 22.729328 | 5.839521 | 682.491400 |
| 2019 | 1.278941 | 0.523482 | 3.427080 | 22.204408 | 6.650808 | 734.344100 |
| 2020 | 1.415056 | 0.573652 | 3.741160 | 22.754847 | 6.116377 | 778.232200 |
61 rows × 6 columns
The data consists of an index and six columns. When importing the data, we used the index_col argument to set the Year column to our index. This will make things easier down the line when we want to extract data for a particular year of interest, rather than thinking of which index corresponds to our year of interest. Information about our data is listed below:
YearThe year of the collected data
CAN-PercentGDPPercentage of the Gross Domestic Product of Canada spent on the military
MEX-PercentGDPPercentage of the Gross Domestic Product of Mexico spent on the military
USA-PercentGDPPercentage of the Gross Domestic Product of the United States spent on the military
CAN-USDAmount of money (in billions, USD) spent on the military in Canada
MEX-USDAmount of money (in billions, USD) spent on the military in Mexico
USA-USDAmount of money (in billions, USD) spent on the military in the United States
In the upcoming exercises, we will explore these data using various visualizations. With these visualizations, we can construct a narrative of what the data show and mean.