Ethics and Pitfalls in Data Science#

Evelyn Campbell, Ph.D.

Data Ethics is an emerging discipline that deals with the ethical circumstances and decisions made around the possession and use of data. Because data can involve information pertaining to and affecting human subjects, it is an extremely important consideration throughout the data life cycle.

In recent years, data ethics has garnered great attention and consideration as various situations have arisen in need to moral and ethical evaluation. One particular incident includes AOL’s release of a massive dataset that includes millions of web search queries. The data, though anonymized, has particular information that can be used to identify users of the search engine. This incident, along with the Netflix lawsuit, the U.S. Military/Strava case, and many others bring up a number of questions and concerns around the handling and processing of data.

As with other fields, such as sociology, psychology, medicine, business, and others, there are protocols around the human-centered research and the process and handling of sensitive information. Because data ethics is a new and developing discipline, there is a lack of consistent standards and exhaustive legislations that considerably guide ethical behavior and practices pertaining to data in the private and public sectors. In this chapter, we will discuss essential considerations around ethics, current laws pertaining to data ethics, and an ethical framework to consider as it pertains to human data.