Empirical and Probability Distributions#

Susanna Lange and Amanda R. Kube Jotte

In the past few chapters, we have discussed methods of sampling individuals from a population and how biased samples can affect the generalizability of our data. Remember, sampling is used to make inferences about a population when gathering information about the entire population is difficult or impossible. We make these inferences through calculating statistics on our sample with the goal of estimating the true population parameter we are interested in.

Probabilistic Sampling#

Earlier in this book, we learned how to slice dataframes or select elements from arrays. This is a type of sampling known as deterministic sampling since there is no chance involved. In this section, we will build on our use of the random.choice function from Chapter 10 to create probabilistic samples where the probability of each unit being chosen is known before sampling is done. Simple random samples (SRS), as we learned in Chapter 10, are samples in which each unit has equal probability of being chosen. Since we know the probability of each unit being chosen, a SRS is an example of a probabilistic sample.

In this chapter, we will use probabilistic sampling and the probability basics we learned in Chapter 11 to explore ways of understanding a population from a sample.

Introduction to Data Science I & II

Empirical and Probability Distributions

Contents

Empirical and Probability Distributions#

Probabilistic Sampling#