Internal Validity vs. External Validity
Internal and external validities are important when you are testing the cause-and-effect relationships between variables in the experimental design of your research.
- Internal validity is essential to ensure confidence that no other factors have influenced the results of the experiment apart from those you have established inside the experimental design.
- External validity refers to the extent to which the results of the study are applicable to other contexts and groups. You can also take into account measurement validity within the experiment, but these two types are the main ones.
The article compares them by providing an analysis of the main factors that can have an impact and how to deal with them.
What You Need to Know About Internal Validity
Identifying internal validity is crucial for the experimental design because it allows making of conclusions about the causal relationships between independent and dependent variables.
Suppose you want to test the hypothesis about the influence of quality 7-hour sleep on office workers’ productivity. You involve an equal number of office workers whose sleeping patterns approximately correspond to your goals. You schedule morning sessions in the laboratory for them.
As soon as the participants arrive at the lab in the morning, you ask them how many hours per night they have slept and record the results. You divide your sample into the treatment and control groups according to the number of hours they have slept - for convenience, you assign those who slept more than 7 hours to the control group and those who slept 7 hours or fewer to the treatment group. Now, you give the memory test to both groups. After the result analysis, you can assume that the control group performed better. So, you can conclude that those office workers who do not sleep seven and more hours per night demonstrate lower performance quality. However, to ensure that your conclusion is valid, you need to exclude any other factors that might influence the performance to provide reliable explanations.
Checking the Study for Internal Validity
Internal validity needs three essential conditions that should be present simultaneously to set causality between independent (treatment) and dependent (response) variables. They are the following:
- Both treatment and response variables change.
- Changes in treatment occurred before the changes in the response variable.
- The experiment was not affected by any extraneous or confounding factors.
In our example, only two out of three conditions were met:
- ✔️ Performance efficiency increased together with the increased number of sleeping hours.
- ✔️ Sleeping hours went before the memory test.
- ❌ The time of going to bed and getting up in the morning was not considered, as well as the age of the participants. These are extraneous factors that can impact the results of the experiment.
You assigned all the participants according to their sleeping hours only, despite their age, health conditions, and work experience, so the groups were not equal from the beginning. Any differences in performance could result from all these factors or even all of them. That is why you cannot argue that only the number of sleeping hours can influence their memory efficiency. Therefore, your study does not have high internal validity.
What Threats Can Affect Internal Validity and How to Deal with Them
Threats to internal validity should be considered in any research design. There are eight main threats, and they can depend on the study’s format and have different impacts on single-group and multi-group studies.
Threats Related to Single-Group Studies
Here is an example of single-group research. The team of researchers is planning to study whether free coffee and cookies influence the productivity of a group of office workers in a marketing company. All the participants had access to free treats for a month. They completed a productivity test before the experiment (pre-test) and after the experiment (post-test).
The internal validity was threatened by the following four factors:
- History. An unexpected event affected the results. Several days before the post-test, the company informed all the employees about the plans for transformation, so some workers would be fired. The participants were shocked and worried because they did not know who would be laid off, so their post-tests results were much lower than pre-test ones.
- Maturation. The results could vary because many employees were new workers at the time when the experiment started, and they got accustomed to their positions after a month, so they performed much better in the post-test.
- Instrumentation. The pre-test and post-test used different time measurements. The pre-test lasted 20 minutes, and the post-test was 10 minutes longer.
- Testing influence. The same set of questions was offered for the pre-test and post-test. Participants knew what to expect and showed much higher productivity.
How to Deal with Threats in Single-Group Studies
You can manage these threats for single-group studies by changing the experimental design. You can do this in the following ways:
- Add a control group to your design to ensure the validity of the results. The control and treatment groups will experience the same threats, so the results will not be dependent on them.
- Extend the sample size for testing. The results you can obtain will be more detailed and sensitive to any variability.
- Use questionnaires or filler tasks to hide the aim of testing from the participants.
Threats Related to Multi-Group Studies
Let’s look at the following example. Suppose you want to test whether a mobile app or textbook entries are more efficient in learning grammar rules for ESL students. You involve a sample consisting of ESL college students and divide them into three groups - one group is using an app for two months, the second takes all the information from the textbooks, and the third, as a control group, spends time making flashcards for every grammar rule. The groups are equally divided according to the results of the pre-test. After two months, the students take a post-test.
The internal validity was now threatened by the following factors:
- Selection bias. The groups have different characteristics from the beginning of the experiment. More students with low scores got at the pre-test were distributed to the first group, and more students with high scores were placed in the second group. That is why the treatment and its results did not show the true picture of improvements.
- Mean regression. Statistics say that people who get good or bad scores tend to move closer to the middle results during the next test. So, it is difficult to assume that the way of studying grammar led to the improvements or it was just statistical norms that appeared as a result.
- Social interaction. All the participants from different groups communicated with each other and had a chance to compare the task and results and find out the true aim of testing. They may even resent the conditions the other group had, for example, free access to mobiles in class, and feel frustrated. That leads to their poor performance during the post-test.
- Attrition. There are participants who are tired of testing and feel unwilling to provide any responses at the post-test. If most of them belong to the control group, it will be not possible to compare their results with those from the treatment groups, and internal validity will be rather poor.
How to Deal with Threats in Multi-Group Studies
You can also change the design of the experiment to obtain more valid results and diminish threats. You may prefer to do the following:
- Assign participants to groups randomly to avoid selection bias and make the groups comparable to eliminate the participants’ regression to the mean.
- Hide the purpose of the study from the participants so that they cannot predict or guess it from their social interactions.
What Threats External Validity Can Experience and How to Deal with Them
External validity can also experience threats that are important to recognize ahead of time and deal with in the research design.
You want to test the hypothesis that people who suffer from diabetes Type II can benefit from eliminating cholesterol in their diets. Your experiment lasts for two months. You involve the patients with diabetes Type II who were diagnosed with the disorder a year ago at the age between 40 and 50 from the local communities. Participants are given a pre-test and post-test that measure how much cholesterol-containing foods they consume per day. During the time of the experiments, all participants have consultations with diabetologists and are asked to change their diets according to their recommendations.
As a result of the post-test, you can see that those participants who accurately followed the recommendations experienced the minimum of all the diabetes symptoms. So, you conclude that such patients can improve their condition by reducing cholesterol-containing products in their diets.
However, the external validity of such experiments can be severely threatened by the following factors:
- Sampling bias. Your sample may not be representative of the entire population. It includes only people who were diagnosed with diabetes a year ago and whose symptoms and ways of treatment are similar. However, there are many other patients who got the disorder earlier or later and whose symptoms and treatment differ within the population.
- History. Some unexpected or occasional events can influence the results. As soon as seven days before the post-test a forest fire happened in the neighboring area, and it was impossible to deliver certain kinds of foods to them to comply with their diets. So, the post-test results did not show any improvements or even showed worsening because of the high level of anxiety in many patients.
- Experimenter influences. Experimenters can affect the outcomes unintentionally due to their personal characteristics or attitudes. One of the consultants unexpectedly started talking to participants about the immense importance of the experiment and its result for the development of medical science in the country. So, many of them started trying hard to comply with the expectations by eating big amounts of cholesterol-free foods. That made their bodies imbalanced, so the post-test results turned out to be much worse than the pre-test ones.
- Hawthorne effect. Participants tend to change their usual behavior only because they know that they are tested as a part of the study. Some participants started reducing their cholesterol-containing foods to the minimum, while others consumed larger amounts of such foods simply to see what effects they may experience.
- Effect of testing. The way of administering the test may affect the results. For example, the participants already know the format of the pre-test, so they do not worry about the post-test and its outcomes because they know for sure that the answer will be positive if they consume less cholesterol-containing food.
- Situation effect. The setting, time of day, location, weather conditions, testers’ characteristics, and other irrelated factors during the test may influence the results. Even if the weather changes slightly, the participants may feel worse and showcase that in their post-test.
- Aptitude treatment. Specific characteristics of different groups of participants and their individual features may produce individual independent variables that will affect the dependent variable. Such interactions between the individual characteristics may influence the results of the experiment.
How to Diminish Threats to External Validity?
You can counter threats to external validity in some efficient ways. They are the following:
- Use replications to boost the generalizability in favor of other situations, settings, or types of populations.
- Conduct field experiments to apply the testing in more natural contexts.
- Apply probability sampling to fight selection bias to give everyone in the sample equal chances to be assigned to any group.
- Counter the selection bias by reprocessing or recalibration to balance the importance of different factors within samples.
How to Compromise Between Internal and External Validity
You need to be sure that you will be able to use the results of your research in a broader context. It implies external validity for generalizing the findings to different groups, settings, and other measurements.
That is why it is important to learn how to compromise between internal and external validity. The problem of control over variables is essential here. The more you want to control extraneous variables in your experiment, the less you can apply your results to broader contexts.
Let’s come back to our testing of the relationships between the number of sleeping hours and the memory productivity of office workers. The external validity here is displayed in the choice of the memory test, the inclusion criteria of participants, and the settings within the laboratory. You can restrict your sample to people aged 25-30. Such a restriction will boost the internal validity, but this increase will occur at the expense of external validity because you will be able to generalize the findings only to any group of people aged 25-30.
Therefore, if you want to get better internal validity, it will be achieved at the expense of external validity and vice versa. So, you need to be careful about the type of research you choose because it will demonstrate your priorities in the study.
You need to think about the setting for your study. You can test a causal relationship either in the laboratory (artificial setting) or in the real world (natural setting). Internal validity will be higher in the lab conditions because there will not be so many external impacts as in the natural environment. Though, you will receive diminished external validity because the lab settings are too ideal and do not imply the external factors. The best solution here is to do the research twice - first in the controlled environment of the laboratory to ensure that the casual relationships do exist between the variables and then in the field to check and analyze whether the laboratory results are applicable to other, more natural environments.
Final Thoughts
Therefore, you can see the difference between internal and external validity and the factors that may influence them. You can also take into account now the possible threats to internal and external validity within your experiments and how to deal with them.
The best way to receive valid results is to trade-off between these two types of validity by conducting the experiments in the laboratory first and then checking the results in the field experiment to be sure that they are applicable to different settings and samples. If you manage to find such a compromise, all the findings of your research will be valid, and you will make a lot of progress in your academic work.