Observational versus Experimental Studies#

In most research questions or investigations, we are interested in finding an association that is causal (the first scenario in the previous section). For example, “Is the COVID-19 vaccine effective?” is a causal question. The researcher is looking for an association between receiving the COVID-19 vaccine and contracting (symptomatic) COVID-19, but more specifically wants to show that the vaccine causes a reduction in COVID-19 infections (Baden et al., 2020)1.

Experimental Studies#

There are 3 necessary conditions for showing that a variable X (for example, vaccine) causes an outcome Y (such as not catching COVID-19):

  • Temporal Precedence: We must show that X (the cause) happened before Y (the effect).

  • Non-spuriousness: We must show that the effect Y was not seen by chance.

  • No alternate cause: We must show that no other variable accounts for the relationship between X and Y.

If any of the three is not present, the association cannot be causal. If the proposed cause did not happen before the effect, it cannot have caused the effect. In addition, if the effect was seen by chance and cannot be replicated, the association is spurious and therefore not causal. Lastly, if there is another phenomenon that accounts for the association seen, then it cannot be a causal association. These conditions are therefore, necessary to show causality.

The best way to show all three necessary conditions is by conducting an experiment. Experiments involve controllable factors which are measured and determined by the experimenter, uncontrollable factors which are measured but not determined by the experimentor, and experimental variability or noise which is unmeasured and uncontrolled. Controllable factors that the experimenter manipulates in his or her experiment are known as independent variables. In our vaccination example, the independent variable is receipt of vaccine. Uncontrollable factors that are hypothesized to depend on the independent variable are known as dependent variables. The dependent variable in the vaccination example is contraction of COVID-19. The experimentor cannot control whether participants catch the disease, but can measure it, and it is hypothesized that catching the disease is dependent on vaccination status.

Control Groups#

When conducting an experiment, it is important to have a comparison or control group. The control group is used to better understand the effect of the independent variable. For example, if all patients are given the vaccine, it would be impossible to measure whether the vaccine is effective as we would not know the outcome if patients had not received the vaccine. In order to measure the effect of the vaccine, the researcher must compare patients who did not receive the vaccine to patients that did receive the vaccine. This comparison group of patients who did not receive the vaccine is the control group for the experiment. The control group allows the researcher to view an effect or association. When scientists say that the COVID-19 vaccine is 94% effective, this does not mean that only 6% of people who got the vaccine in their study caught COVID-19 (the number is actually much lower!). That would not take into account the rate of catching COVID-19 for those without a vaccine. Rather, 94% effective refers to having 94% lower incidence of infection compared to the control group.

Let’s illustrate this using data from the efficacy trial by Baden and colleagues in 2020. In their primary analysis, 14,073 participants were in the placebo group and 14,134 in the vaccine group. Of these participants, a total of 196 were diagnosed with COVID-19 during the 78 day follow-up period: 11 in the vaccine group and 186 in the placebo group. This means, 0.08% of those in the vaccine group and 1.32% of those in the placebo group were diagnosed with COVID-19. Dividing 0.08 by 1.32, we see that the proportion of cases in the vaccine group was only 6% of the proportion of cases in the placebo group. Therefore, the vaccine is 94% effective.

Chicago has a population of almost 3,000,000. Extrapolating using the numbers from above, without the vaccine, 39,600 people would be expected to catch COVID-19 in the period between 14 and 92 days after their second vaccine. If everyone were vaccinated, the expected number would drop to 2,400. This is a large reduction! However, it is important that the researcher shows this effect is non-spurious and therefore important and significant. One way to do this is through replication: applying a treatment independently across two or more experimental subjects. In our example, researchers conducted many similar experiments for multiple groups of patients to show that the effect can be seen reliably.

Randomization#

A researcher must also be able to show there is no alternate cause for the association in order to prove causality. This can be done through randomization: random assignment of treatment to experimental subjects. Consider a group of patients where all male patients are given the treatment and all female patients are in the control group. If an association is found, it would be unclear whether this association is due to the treatment or the fact that the groups were of differing sex. By randomizing experimental subjects to groups, researchers ensure there is no systematic difference between groups other than the treatment and therefore no alternate cause for the relationship between treatment and outcome.

Another way of ensuring there is no alternate cause is by blocking: grouping similar experimental units together and assigning different treatments within such groups. Blocking is a way of dealing with sources of variability that are not of primary interest to the experimenter. For example, a researcher may block on sex by grouping males together and females together and assigning treatments and controls within the different groups. Best practices are to block the largest and most salient sources of variability and randomize what is difficult or impossible to block. In our example blocking would account for variability introduced by sex whereas randomization would account for factors of variability such as age or medical history which are more difficult to block.

Observational Studies#

Randomized experiments are considered the “Gold Standard” for showing a causal relationship. However, it is not always ethical or feasible to conduct a randomized experiment. Consider the following research question: Does living in Northern Chicago increase life expectancy? It would be infeasible to conduct an experiment which randomly allocates people to live in different parts of the city. Therefore, we must turn to observational data to test this question. Where experiments involve one or more variables controlled by the experimentor (dose of a drug for example), in observational studies there is no effort or intention to manipulate or control the object of study. Rather, researchers collect data without interfering with the subjects. For example, researchers may conduct a survey gathering both health and neighborhood data, or they may have access to administrative data from a local hospital. In these cases, the researchers are merely observing variables and outcomes.

There are two types of observational studies: retrospective studies and prospective studies. In a retrospective study, data is collected after events have taken place. This may be through surveys, historical data, or administrative records. An example of a retrospective study would be using administrative data from a hospital to study incidence of disease. In contrast, a prospective study identifies subjects beforehand and collects data as events unfold. For example, one might use a prospective study to evaluate how personality traits develop in children, by following a predetermined set of children through elementary school and giving them personality assessments each year.


1

Baden LR, El Sahly HM, Essink B, Kotloff K, Frey S, Novak R, Diemert D, Spector SA, Rouphael N, Creech CB, McGettigan J. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. New England journal of medicine. 2020 Dec 30.