Matplotlib and Pyplot

Matplotlib and Pyplot#

To create data visualizations, we will need to import the necessary libraries and packages. We will largely be using the matplotlib library, which is a popular tool for visualizing data from pandas DataFrames. Within the matplotlib library, a module called pyplot can be used to allow for customization to figures. Some common customizations include adding a legend, creating a title, setting axis limits and increments, and many others. To utilize the pyplot module, it needs to be imported from the matplotlib library. Often times, it is imported as the alias plt.

For our visualizations we will set a style - a specific appearance for our graphs. To do this, we will use the plt.style.use() function and use the fast style for our plots. The fast style has a great visual appeal and is one of several style options available through matplotlib. A link to documentation on the various style options can be found here.

Other visualization libraries, such as seaborn can be used as well and will be used later in the chapter.

For now, let’s start by importing our libraries:

import numpy as np
import pandas as pd

# Imports the pyplot module from matplotlib
from matplotlib import pyplot as plt

# Sets the style for visualizations
plt.style.use('fast')

To practice making these visualizations, we will be working with data from the World Bank. This data examines the military spending in each country in North America 1960-2020.

Let’s load the data and begin to explore it.

military = pd.read_csv("../../data/NorthAmerica_Military_USD-PercentGDP_Combined.csv", index_col='Year')

military

	CAN-PercentGDP	MEX-PercentGDP	USA-PercentGDP	CAN-USD	MEX-USD	USA-USD
Year
1960	4.185257	0.673509	8.993125	1.702443	0.084000	47.346553
1961	4.128312	0.651780	9.156031	1.677821	0.086400	49.879771
1962	3.999216	0.689655	9.331673	1.671314	0.099200	54.650943
1963	3.620650	0.718686	8.831891	1.610092	0.112000	54.561216
1964	3.402063	0.677507	8.051281	1.657457	0.120000	53.432327
...	...	...	...	...	...	...
2016	1.164162	0.495064	3.418942	17.782776	5.336876	639.856443
2017	1.351602	0.436510	3.313381	22.269696	5.062077	646.752927
2018	1.324681	0.477517	3.316249	22.729328	5.839521	682.491400
2019	1.278941	0.523482	3.427080	22.204408	6.650808	734.344100
2020	1.415056	0.573652	3.741160	22.754847	6.116377	778.232200

61 rows × 6 columns

The data consists of an index and six columns. When importing the data, we used the index_col argument to set the Year column to our index. This will make things easier down the line when we want to extract data for a particular year of interest, rather than thinking of which index corresponds to our year of interest. Information about our data is listed below:

Year: The year of the collected data
CAN-PercentGDP: Percentage of the Gross Domestic Product of Canada spent on the military
MEX-PercentGDP: Percentage of the Gross Domestic Product of Mexico spent on the military
USA-PercentGDP: Percentage of the Gross Domestic Product of the United States spent on the military
CAN-USD: Amount of money (in billions, USD) spent on the military in Canada
MEX-USD: Amount of money (in billions, USD) spent on the military in Mexico
USA-USD: Amount of money (in billions, USD) spent on the military in the United States

In the upcoming exercises, we will explore these data using various visualizations. With these visualizations, we can construct a narrative of what the data show and mean.

Matplotlib and Pyplot

Contents

Matplotlib and Pyplot#

Resources#