How to Do a Statistical Analysis
One of the most important tools used by researchers, academic program students, governments and non-governmental organizations, and businesses is statistical analysis. It helps investigate patterns and trends by means of quantitative data.
Thorough planning from the very beginning of research leads to drawing relevant conclusions. To start with, you should identify a hypothesis and choose the sampling procedure. The size and design of the research matter a lot.
When the data collection is completed, you need to summarize the obtained results with the help of descriptive statistics. You can also use inferential statistics to test your hypothesis, and, after that, sum up your findings.
We have compiled some step-by-step tips on statistical analysis, which will be useful for all students and researchers who are not sure where to start.
Stage 1: Make Up a Hypothesis and Research Design Plan
You need to identify one or several hypotheses to test in order to obtain the most valid data from your analysis.
How to Make Up a Statistical Hypothesis
The research should have a goal. It can concern relationships between variables, for example, in the chosen part of the population. The first thing is a prediction you need to check.
It means that you have to write a formal prediction about a population under concern. This prediction can produce a null hypothesis and several alternative ones. By using data from a sample, you will test these hypotheses.
The null hypothesis does not indicate any relations between variables, while alternative hypotheses deal with the results and relationships.
We can test an effect by the following statistical hypotheses:
- Null hypothesis: Students’ income has no effect on their math exam results.
- Alternative hypothesis: Students’ income greatly influences their math exam results.
The example of hypotheses related to the correlation are:
- Null hypothesis: Students’ income and their math exam results do not have any relationships with each other.
- Alternative hypothesis: Students’ income and their math exam results are negatively correlated.
How to Plan a Research Design Correctly
If you want to collect data efficiently for a more thorough analysis, you need to plan the process carefully. You should know beforehand which statistical tests you are going to use.
The first thing to understand is what kind of research design to use. It can be correlational, descriptive, or experimental. The first two are used to describe variables, while the last one can help measure them directly.
- You can use correlation coefficients or test the significance in the correlational design, which will help you investigate the relations between variables (e.g., students’ income and test results)
- You learn everything about the main features of the phenomenon or a population by drawing conclusions from sample data in the descriptive design (e.g., the actual income of math students and their levels of anxiety about it before and after the exams).
- You conduct statistical tests related to regression or comparison to see the cause-and-effect relationships in the experimental design (e.g., the effect of low income on the math exam results).
You will also think about comparing participants and dividing them into groups or individually. When you consider groups (e.g., those who performed poorly and perfectly at the exam), your design will be a between-subjects one. If you compare repeated results of individuals with low and high income, you will use a within-subjects design.
How to Measure Variables
To make a successful statistical analysis, you need to decide on how to measure variables. Think about the level of measurement. It can deal with categorical data (mostly analyzed in groups) and quantitative data representing amounts received from individuals according to the interval (math exam scores) and ratio (age of students) scales. Categories are more general - they can be nominal (gender) or ordinal (levels of academic performance).
However, these two measurements can intersect. For example, the information about the age can be quantitative (19 years old) or categorical (high-school teenagers).
Identifing the correct way of measuring variables accounts a lot for picking out a corresponding hypothesis test. For example, if you need to calculate the number of students with low income whose exam results are much lower, you cannot use categorical data because you need it to be more precise.
Stage 2: How to Collect Data from a Sample
If you want to collect data from individuals, you need to know that it is a long and tiresome process. You can collect data from samples at a lower cost. However, you need to know how to organize a correct sampling procedure. First of all, the sample should be representative of the researched population.
Statistical Analysis Sampling
You can use probability and non-probability sampling for statistical analysis.
- In probability sampling, the selection is random, and every member of the population can be chosen for the research.
- Non-probability sampling implies that there are strict criteria for selection, and not everyone can comply with them. Some members who meet these criteria can be selected for the research more likely than others.
When you need generalized findings, you can use random selection. Here, you can reduce the bias and make sure that the data obtained is more typical for the entire population. You can use parametric tests here for stronger statistical inferences.
However, it is not so easy to choose an ideal sample. That is why a non-probability approach, though it can produce a certain bias, is preferable because the data can be collected from volunteers who are easier to recruit. You can use non-parametric tests here, but they produce weaker inferences. You can use parametric tests as well, though you need to be sure that the chosen sample is a true representative of the population under research and that there is no systematic bias in it. Be aware of the limitations in this case, which can produce a further discussion of your generalized data.
A Well-Structured Sampling Procedure Is Essential
First of all, you need to decide in what way you are going to recruit the participants.
- Will they be involved with the help of your study advertising both inside and outside of your academic institution?
- Will you be able to recruit diverse representatives from the population under research?
- Will you be able to contact the representatives of high-to-reach groups?
You can contact several educational establishments in your vicinity and check whether you can involve their students in the experiment and how easily you can do that. You can opt for either choosing the representatives yourself or getting them self-selected by their institution. In any case, your aim is to receive the sample, which consists of the most appropriate individuals, and you need to understand whether it is available.
Decide on the Size of Your Sample
You have to decide on that before the start of the recruitment process. You may use other research in this field or the available statistics on the topic. You can face an insufficient sample which will be unrepresentative, or if the sample is unnecessarily large, you will spend a lot of costs and effort on involving it.
You can use special online sample calculators. Or you can use different statistics formulae. Finally, you will see that having a sample of fewer than 30 units is useless.
Even if you want to use an online calculator, you need to take into account such key components as:
- statistical power (the predicted size of the study results encountering 80% or more);
- standard deviation of the population (in case you have estimated the size of the sample on the basis of some pilot or previous study);
- significance level (the most important thing) (this includes the risk of the initial null hypothesis complete rejection, which is always present at 5% or more);
- predicted effect size (your predictions can be based on standards used for similar research).
Stage 3: Using Descriptive Statistics for Summarizing the Data
As soon as you have collected all the data, you need to summarize it. You can do this by using descriptive statistics in the following way.
Data Inspection
You can do it by using different techniques:
- making frequency distribution tables on the basis of the data obtained for each variable;
- making a bar chart to display the data from the key variables;
- using a scatter plot to view the relationships between the two variables.
Various tables and graphs are a powerful means of visualization where you can see the exact distribution of data, which of it is missing and which doesn’t have to be taken into account. You can observe a normal distribution when the data is distributed equally around the main values or a skewed distribution when the values are not in the center of the data at all. Some of them can be on one end of the scale, while others are distributed asymmetrically. Skewed distributions can be used only in descriptive statistics.
Measures of the Central Tendency
These describe where the majority of key values lie in the data set obtained. There are three main measures that can matter here:
- Mode. It is the most frequently observed or reported value in the set.
- Median. This value is located in the center of the data set but distributed from low to high.
- Mean. This is the total of all values researched divided by their number.
You can use one or two of these measurements depending on the shape of a distribution.
Measures of Variability
You use these to see how the values in the data set are spread. There are four major measures of variability that are normally used:
- Range. It is the difference between the highest and lowest values in your data set.
- Interquartile range. It is the indicator of the middle half of the data set.
- Standard deviation. It indicates the distance between the value represented in the collected data and the mean.
- Variance. It represents the square of the standard possible deviation.
The choice of these variability measures depends on the shape of the distribution. For example, you can opt for the interquartile range for skewed distributions.
Stage 4: Testing the Hypotheses
There is a difference between the two notions - a statistic and a parameter. A statistic is a number received from the sample, and a parameter is a number that describes the population. Inferential statistics is used here to make conclusions about the parameters.
There are two main methods for making inferences. They are estimation and hypothesis testing. Estimation stands for calculating parameters, and hypothesis testing describes a formal process for testing predictions.
Estimation
Estimates can be of two types - a point and interval estimate.
- A point estimate is a value for your closest prediction of a certain parameter.
- An interval estimate involves the range of values for your closest prediction, which includes the parameter.
You should use both points to infer the main characteristics from the sample data. When you have a purely representative sample, for example, a public opinion poll, you can use a point estimate. Remember that you may often encounter an error in this type of estimation. That is why you need an interval estimate to ensure more confidence. It uses the standard deviation and the z-score from the standard distribution to see where your expected population parameter can be found.
Testing of a Hypothesis
It is needed to find the relationships between the variables. You can start with the assumption that the only true variant is a null hypothesis. So, statistical tests are used to estimate whether the null hypothesis is true or not.
The results of statistical tests can show you how much the obtained data differs from the null hypothesis and the p-value, meaning how probable it is to obtain further results which will prove that the null hypothesis is actually true.
You can also have such varieties of statistical tests as comparison, regression, and correlation tests.
- Comparison tests evaluate the differences between the groups seen in the results.
- Regression tests account for the cause-and-effect relationships between the variables.
- Correlation tests evaluate the relationships between the variables without taking into account causation.
Your research questions, research design, data characteristics, and the chosen sampling method may contribute a lot to the choice of statistical test.
Parametric Tests
These tests are quite helpful in making inferences about the entire population which are based on the sample data. However, they need to comply with some assumptions, and you can use only certain types of variables here. If the data does not correspond to the assumptions, you will have to use non-parametric tests alternatively or try to make some data transformations which is not always desirable.
Here, the regression model is the most appropriate one. It can be either a simple or multiple linear regression.
- A simple regression contains one prediction and one outcome variable.
- A multiple regression can have two or more predictor variables and one outcome variable.
Or you can use the comparison tests, which compare the data between the groups within the sample. Or the means of one group taken at different times. They can be a sample mean, or a population mean.
Stage 5: Interpreting the Results
It is usually the final step of statistical analysis. It shows the statistical significance, effect size, and decision errors.
Statistical Significance
You need this criterion to decide whether your results are valuable or not. You compare the p-value with the set significance level to make the conclusion. If it is lower than 0.05, the outcomes of the research are non-significant.
Effect Size
The results which are statistically significant are higher than 0.05 in the set significance level. However, even if they are, it does not mean that you can apply them to practice. There is one more criterion to decide about this - the effect size. It is based on the interval estimates of the experiment.
Decision Errors
Very often, mistakes can occur in research conclusions. They are called decision errors and can be of Type I and Type II. The Type I error rejects the null hypothesis completely when it is actually correct. The Type II error is the opposite one - you support the null hypothesis even if it is actually false. To avoid making these mistakes, you need to choose a reasonable significance level and provide high power of research results.
Final Thoughts
These are the main notions that characterize statistical analysis. You need to learn them in-deep if you want to use the method in your research work. Understanding the basic stages of statistical analysis will help you to plan your research properly and get sustainable and relevant results.