Matplotlib and Pyplot
Contents
Matplotlib and Pyplot#
To create data visualizations, we will need to import the necessary libraries and packages. We will largely be using the matplotlib
library, which is a popular tool for visualizing data from pandas
DataFrames. Within the matplotlib
library, a module called pyplot
can be used to allow for customization to figures. Some common customizations include adding a legend, creating a title, setting axis limits and increments, and many others. To utilize the pyplot
module, it needs to be imported from the matplotlib
library. Often times, it is imported as the alias plt
.
For our visualizations we will set a style - a specific appearance for our graphs. To do this, we will use the plt.style.use()
function and use the fast style for our plots. The fast style has a great visual appeal and is one of several style options available through matplotlib
. A link to documentation on the various style options can be found here.
Other visualization libraries, such as seaborn
can be used as well and will be used later in the chapter.
For now, let’s start by importing our libraries:
import numpy as np
import pandas as pd
# Imports the pyplot module from matplotlib
from matplotlib import pyplot as plt
# Sets the style for visualizations
plt.style.use('fast')
To practice making these visualizations, we will be working with data from the World Bank. This data examines the military spending in each country in North America 1960-2020.
Let’s load the data and begin to explore it.
military = pd.read_csv("../../data/NorthAmerica_Military_USD-PercentGDP_Combined.csv", index_col='Year')
military
CAN-PercentGDP | MEX-PercentGDP | USA-PercentGDP | CAN-USD | MEX-USD | USA-USD | |
---|---|---|---|---|---|---|
Year | ||||||
1960 | 4.185257 | 0.673509 | 8.993125 | 1.702443 | 0.084000 | 47.346553 |
1961 | 4.128312 | 0.651780 | 9.156031 | 1.677821 | 0.086400 | 49.879771 |
1962 | 3.999216 | 0.689655 | 9.331673 | 1.671314 | 0.099200 | 54.650943 |
1963 | 3.620650 | 0.718686 | 8.831891 | 1.610092 | 0.112000 | 54.561216 |
1964 | 3.402063 | 0.677507 | 8.051281 | 1.657457 | 0.120000 | 53.432327 |
... | ... | ... | ... | ... | ... | ... |
2016 | 1.164162 | 0.495064 | 3.418942 | 17.782776 | 5.336876 | 639.856443 |
2017 | 1.351602 | 0.436510 | 3.313381 | 22.269696 | 5.062077 | 646.752927 |
2018 | 1.324681 | 0.477517 | 3.316249 | 22.729328 | 5.839521 | 682.491400 |
2019 | 1.278941 | 0.523482 | 3.427080 | 22.204408 | 6.650808 | 734.344100 |
2020 | 1.415056 | 0.573652 | 3.741160 | 22.754847 | 6.116377 | 778.232200 |
61 rows × 6 columns
The data consists of an index and six columns. When importing the data, we used the index_col
argument to set the Year
column to our index. This will make things easier down the line when we want to extract data for a particular year of interest, rather than thinking of which index corresponds to our year of interest. Information about our data is listed below:
Year
The year of the collected data
CAN-PercentGDP
Percentage of the Gross Domestic Product of Canada spent on the military
MEX-PercentGDP
Percentage of the Gross Domestic Product of Mexico spent on the military
USA-PercentGDP
Percentage of the Gross Domestic Product of the United States spent on the military
CAN-USD
Amount of money (in billions, USD) spent on the military in Canada
MEX-USD
Amount of money (in billions, USD) spent on the military in Mexico
USA-USD
Amount of money (in billions, USD) spent on the military in the United States
In the upcoming exercises, we will explore these data using various visualizations. With these visualizations, we can construct a narrative of what the data show and mean.