Data Collection#

Amanda R. Kube Jotte

With recent increased capacity for data storage and large scale analyses, “Big Data” has become a hot topic in both data science and broader society. However, bigger data is not always better data.

For example, difficulties or errors with measurement, data management, and data entry can lead to data that are incorrect or uninformative. The popular addage “garbage in, garbage out” refers to the use of incorrect or uninformative data leading to incorrect or uninformative analyses and results. In addition, the use of a large dataset that was not collected with the research question in mind may lead to misleading results if researchers are not cognizant of what types of data were collected and in what manner.

It is important for data scientists to collect data using well-thought-out experimental design, including proper measurement, sampling and tailoring to the research question(s). In this chapter, we will explore several important topics in data collection: causality versus association, observational versus experimental studies, sampling and biases.