Our writers are ready to help! Get 15% OFF your first paper

Hire our writerHire writer

Criterion Validity

Every test is developed to measure an outcome like performance, behavior, or health condition. If you want to know how accurately it can measure something you want to measure, you use criterion or criterion-based validity. You can measure criterion variables related to tests:

  • with concurrent validity if you want to achieve the results for the present moment;
  • with predictive validity if you need them for possible future performance.

You have to compare the results of the test to certain criterion variables known as a “gold standard” if you need to find out criterion validity. These measurements are based on existing tests widely used to prove the validity of constructs.

Example

As a researcher, you mean to learn how well the entrance math test can predict first-year students’ performance and academic advancement for the two terms of their study program. You can use the results of the first and second-term tests as criterion variables because they can demonstrate the students’ achievements during this time.

Then, you can compare those term test results with their entrance math test results. If you take the scores of about one hundred students or even a larger sample, you will get more accurate results. If you see that the results for the first and second terms are close to those of the entrance test, then it means the entry exam has high criterion validity.

Finding the criterion variable here is not difficult. However, you may not easily cope with this task while measuring criterion validity in other situations and environments.

This article will tell you what criterion validity is and how to measure it properly.

The Meaning of Criterion Validity

So, you need to ensure how well a new test can correlate with the established standards. The latter ones are known as criteria.

You can see that a specific measurement instrument, like a survey or questionnaire, has criterion validity if its results correspond to those obtained with the help of the already accepted and widely used instrument. The results received from that tool are known as a “gold standard.”

A gold standard makes up a criterion variable that can measure the same construct, behavior, performance, or several constructs that are conceptually relevant. When you can use a gold standard, you assess criterion validity easily. For instance, it is easy to compare a new test to the existing one or medical scores with the relevant clinical results.

Nevertheless, you can often face situations where there is no gold standard. For example, when you need to measure excitement, you do not have any objective standard and have to rely entirely on the subjective feedback of respondents. So, you cannot measure criterion validity.

One more thing you need to consider is the availability of a gold standard or any related references. The corresponding measurement can be biased, so you will not obtain any valid measure. The non-biased gold standard can help obtain criterion validity. On the other hand, even if you use two biased measures, they will not be valid because they will only confirm each other. That is why you cannot be sure that these measures are valid, so you need to check other types of validity in such cases.

Criterion Validity Types

Criterion validity can be of two types, and you use them depending on the sequence of the two measures that you want to obtain. They are concurrent and predictive validity.

  • Concurrent validity is applicable when the criterion variables and test scores are available simultaneously.
  • Predictive validity is helpful when the criterion variables are accessible after you get the test scores.

Concurrent Validity

Suppose you have accessed the test results that are already accepted as valid and then conduct a new test called a criterion test. If the two test results highly correlate, you get concurrent validity.

Setting up concurrent validity is essential when you create a new measure that, as you expect, will be faster, cheaper, or more objective.

For example, a researcher in the field of psychology wants to make up a self-report test on aggression. Concurrent validity here is obtained from the comparison of clinical results that are obtained from other tests and observations with the scores of a new test. These results have to be obtained simultaneously. Concurrent validity can be helpful only if a valid measurement tool or criterion already exists.

Predictive Validity

This type of validity is helpful when a test is conducted to foresee future outcomes that can be relevantly predicted. For example, if an entrance test can predict the excellent academic performance of first-year students, predictive validity can be applied here.

Suppose you want to develop a preliminary test for placing your ESL students at the correct language mastery level. Their future progress will depend on the adequate assignment of these students to appropriate language groups according to their present skills. You give them the final test after the first semester and compare how well they coped with it according to the level of their group.

This test can be used as a criterion variable. If the test scores are high, your placement test has strong predictive validity. If the students did not cope with the final test properly, you need to make changes to your placement test because it is not predictively valid. Only a high correlation between preliminary and final tests can ensure the evidence of high predictive validity and correctness of your hypotheses.

Measuring Criterion Validity

You can measure criterion validity in two ways:

  • by testing a new measurement procedure statistically in comparison to the standard or independent criterion (it shows you concurrent validity);
  • by testing it against future performance and its results (it sets up predictive validity).

You have to correlate the validated measure (e.g., a test or questionnaire) with a previously established measure that is considered valid because it clearly indicates the study constructs. This measure makes up your criterion variable.

You can calculate the correlation between the test scores and the criterion variable with the help of a correlation coefficient, also known as Pearson’s r. It shows the sustainability of the relationships between these two variables. The obtained value can be between -1 and +1.

You can interpret this correlation coefficient in the following way:

  • r = 1: You have got an excellent positive correlation.
  • r = 0: You haven’t obtained any correlation at all.
  • r = -1: You have achieved an excellent negative correlation.

If you want to reduce calculation time, use statistical software - Excel, R, or SPSS will work properly for this purpose.

When you receive a positive correlation, your test is valid. If there is no correlation or it is negative, the criterion variable and the test score do not mean measuring the same concept.

Example

You want to develop your unique scale for measuring work-life balance satisfaction. You need to compare the supposed criterion validity to a criterion variable first. That is why you offer the established and new scales to the same sample of participants. When you compare the level of agreement between the answers provided by the respondents, use a correlation coefficient. So, you compare the results of these two tests and see that the new scale correlates with the previous one with r = 0.85. This correlation is strongly positive. It means that your new scale can accurately measure the same construct of work-life balance satisfaction that has been properly operationalized in the adequately validated scale.

Final Thoughts

You can see the practical application of criterion validity. You can also use it when you need to check the newly developed test and ensure how well it can measure the subject you want to research.

Still, remember that it may not be applicable if you cannot compare the obtained results with a gold standard. It means you will need other types of validity to ensure that your test makes up an appropriate measurement technique for your research.

More interesting articles