Correlation or Causation?
Correlation and causation mean two different types of relationships between variables. Correlation shows only the association between independent and dependent variables, while causation is about the effect that one variable has on others.
These are two interrelated ideas. However, correlation does not necessarily mean causation, while causation is also about correlation. Understanding this idea will help you interpret the scientific data correctly.
Are Correlation and Causation Different?
These two aspects of the associations between variables differ from each other.
In correlation, the variables change simultaneously - if there is a change in one variable, the other variable changes automatically. So, correlation is a statistical indicator of these changes. Such simultaneous alterations are also known as co-variation. However, the variables can co-vary without any causal effects.
When we speak about causation, the changes are the same - if the independent variable changes, so does the dependent variable. That is why causation is a kind of correlation. Though, the difference is in the cause-and-effect relationship between these variables - a change in one variable is an immediate cause of changes in the other variable.
Remember!
That is why the most popular explanation of the difference between correlation and causation is that correlation does not imply a cause every time but causation always implies correlation.
Why Isn’t There a Cause Related to Correlation?
You need to understand that there are at least two reasons why we cannot consider correlation as causation. They are determined by specific statistical problems, which should be correctly identified. Such identification is essential for making reasonable scientific conclusions on the basis of the research results.
The first problem to consider is a so-called ‘third variable problem’. It implies the appearance of a confounding variable that equally affects the independent and dependent variables. They may seem causally related, but they are not.
Chocolate consumption increases in cold weather. However, the cold weather is not always the reason for buying chocolate. These two variables correlate, but they do not have causative relationships, or you may mistakenly come to the conclusion that every time the weather gets cold, most people buy chocolate. The variable that helps make this mistake is hot drink consumption in cold weather, which is usually done with something sweet. It is a third variable that is confounding and influences both people’s reactions to cold weather and chocolate consumption.
The second problem is directionality. It occurs when the two variables correlate and may have cause-and-effect relationships. Though, it is not possible to detect which of the two variables influences the other.
Watching violent videos correlates with high levels of stress. However, it is impossible to conclude whether watching these videos causes stress or whether a high level of stress makes people watch violent videos.
Therefore, you need to choose an applicable research design to make the distinction between correlational and causal ties between the variables of your interest. If you choose a pure correlational design, it will only help you prove the presence of correlation between the variable. That is why experimental design seems to be more rewarding because it can test causation, too. Anyway, you may consider the combination of several research designs to get more valid results.
How to Conduct Correlational Research?
You do not manipulate data while using a correlational research design. You only collect and analyze it.
You need to collect data to research whether there is a relationship between weather conditions and happiness. You ask participants about the weather conditions they normally live in and their emotional reactions and physical well-being in various weather conditions. Then, you measure their positive emotions with the help of an inventory.
You will find that the sunny weather is positively correlated with feeling happy” the rainy weather is associated with feeling unhappy or depressed, while the sunny and warm weather is associated with satisfaction and happiness.
You can notice that correlational research usually has a high external validity because all the findings correspond to real-life settings. However, this method does not allow a high internal validity because you cannot conclude that the changes in one variable cause the alteration in the other variable while using correlational research only.
This kind of research is often used when conducting controlled experiments, that could prove cause-and-effect relationships, is unethical, too expensive, or too complicated and time-consuming. This method is also helpful when you do not initially expect that the relations between the variables could be causal.
You need to study whether playing violent video games is related to a bad mood, so you collect data on pupils’ playing violent video games and their feeling after that activity. You want their parents to tell you how many hours per day the children play these games and also collect data from both parents and teachers about the kids’ moods and emotions. You get the positive correlation: children who play violent video games feel down and unhappy more often than those who don’t.
The Problem of the Third Variable
You cannot do without a controlled experiment if you need to be sure that the variable of your interest has caused changes in other variables. Third variables can influence these changes a lot. All extraneous variables can produce such influence. If you do correlational research, your control over variables is limited, so you cannot be sure that the alternative explanation of the results is not a consequence of an extraneous or confounding variable’s effect. Confounding variables can bring forward the risk of accepting a correlational relationship as a causal one, though actually, it is not like that.
In your research, the attention of parents and teachers to how much time their kids play violent video games per day is a confounding variable because they can regulate the amount of time spent at the computer and influence the children’s mood and emotions. The insufficient parental impact can lead to an increase in the time spent playing violent games and more cases of bad mood and depression in kids.
You cannot control these factors but only make a conclusion about the correlation between the two main variables by excluding the third one. The only thing you can be sure about is that the change in one variable can produce changes in the other variable.
False (Spurious) Correlations
There are false or spurious correlations that result from the relationships between the two variables via the hidden third variable or simple coincidence.
If you look at the number of chickenpox cases in Europe in 2020-2022 and the number of marriages during the same period, you can decide that these two variables highly correlate. The reduction in chickenpox cases is almost the same as the decrease in the number of marriages. Are they really correlated? These two variables are, in fact, pretty independent.
Modern medical achievements and vaccination rates cause chickenpox cases to drop, and fewer couples are getting married each year because of many different reasons. Therefore, the correlation between these two variables is spurious.
You can notice that correlation analysis in larger datasets involving many different variables can produce at least one statistically significant coincidence. This is a type I error - the conclusion about the actual correlation between variables which is based on distorted sample data.
The Problem of Directionality
This problem can help to show causation but is useless for demonstrating correlation. Let’s explain it. Causal relationships can be unidirectional when one variable caused changes in the other or bidirectional when the two variables cause changes in each other without any alternative explanations or interpretations. When you deal with a correlational design, you cannot make an accurate distinction between these cases and the different influences variables produce on each other. You should use an experimental design to decide on the possible direction of these impacts and their outcomes.
When you try to analyze the mutual influence of playing violent video games and a bad mood, these variables can be causally interrelated in the following directions:
- Too much violent video game playing may lead to a bad mood.
- A bad mood may cause a choice of a violent video game for playing.
- Playing violent video games and changing the mood from good to bad can equally affect each other.
You cannot apply the directionality of relationships between the variables in correlational research because you do not understand how to do it clearly. Your control as a researcher is limited here, so you may risk making a wrong conclusion about the non-existing causality.
How to Conduct Causal Research?
You can detect and demonstrate causal links between the variables of your interest only with the help of a controlled experiment. Such an experiment can test your predictions, hypotheses, and the preliminary choice of the cause-and-effect direction.
Experiments are pretty helpful in providing high internal validity because the cause-and-effect relationships can be confidently demonstrated. You can manipulate an independent variable, so it is easy to establish clear directionality between variables before you can measure the possible changes in a dependent variable.
For example, if you test directionality in an experimental design, you can first test your hypothesis about the effect of long-time violent video game playing on the negative changes in the mood. You set an experiment making players do this activity for a long time, while the control group can play a non-violent game for the same amount of time, and then measure the changes in the mood. You can also establish the directionality by changing the type of game between the groups.
If you want to test bidirectionality, you may set another experiment to decide whether the choice of a game to play depends on the mood. If the experiment is controlled, you can also reduce the influence of any extraneous variables and confounders by using random assignments. Such assignments will let you equally distribute the characteristics of participants between the groups so that you can compare them without making significant errors. If you have a control group, you can make any manipulations or no manipulations at all with it.
If you place a participant randomly either in an experimental or control group, you will eliminate the effects of third variables, such as personal characteristics or health conditions of participants, that can influence the final results.
Your experimental group will receive the intervention in physical activity, while the control group will be influenced by a specific non-physical and quite comparable intervention. You can keep all the variables constant in both groups so that any changes can be contributed to only your intervention but to no other influences.
Final Thoughts
Here, we have explained the difference between correlation and causation relationships between the variables within a scientific research design. You need to be careful before you start the experiment or apply any other research design to your study. Understanding these differences and choosing the correct methods and techniques will help you avoid errors and distorted or alternative results. That will make a strong basis for further research in your field.