14.1. Background#
Beginning in 2004, the Illinois Department of Transportation (IDOT) launched a statistical study of traffic and pedestrian stops across the state [1]. The study has been renewed several times and continues as of the writing of this chapter. In 2008, the Racial Profiling Prevention and Data Oversight Board was created, which strengthened the collection of racial profiling data and made it a central focus of the study [2].
For our investigation, we will work with 2020 traffic stop data from the City of Chicago collected by IDOT as part of the Illinois Traffic and Pedestrian Stop Study (available by request [3]). The subset of data we use contains seven variables recorded by the officer at the time of the stop: when and where the stop occurred, the reason for the stop, the officer’s perception of the driver’s race, whether a search was conducted, and whether any contraband was found during the search.
You can see the number of stops in each police beat and district on the map below.
Our analysis will mirror one of the core approaches in the IDOT study, known as benchmarking. Benchmarking compares the proportion of stops involving a racial or ethnic group to that group’s share of the local population. For example, if 30% of stops involve Hispanic or Latino drivers and 30% of the local population is Hispanic or Latino, the stop rate matches the population proportion. But if the population is only 10% Hispanic or Latino while 30% of stops involve members of this group, this disparity suggests potential bias.
Benchmarking depends heavily on the accuracy of the chosen benchmark. In this case, the relevant population is not the general public, but drivers. Unfortunately, detailed data on the driving population is not freely available. IDOT has addressed this by estimating the driving population by race using additional data sources and probability methods. We will rely on these estimates, published in the 2019 and 2020 IDOT reports, as the benchmark for our investigation (link to reports).
In the next section, we will begin exploring this dataset in Python and see how benchmarking works in practice. As you follow along, keep this guiding question in mind: Are certain groups being stopped at rates higher than we would expect, given their share of the driving population?