top of page

One-Tailed and Two-Tailed Hypothesis Testing

More statistical power does not necessarily make for a better hypothesis test.

 

Data of the kind typically used in quantitative research falls within a symmetrical and normally distributed curve. Examples of such distributions are the z-distribution (the standard normal distribution based on the standard deviation of a population) and the t-distribution (based on the standard deviation of a sample). The tails are the regions at the end of either side of these distributions. As is the case with certain asymmetrical types like the F-distribution (associated with the analysis of variance and regression), not all distributions will have two tails (Vogt & Johnson, 2015).


Both one- and two-tailed tests generate a sample mean statistic (a critical value) used for comparison with a population mean and an established critical region or regions. The purpose of the comparison is inferential and allows the researcher to determine whether data from experimental observations support a claim (DeMoulin & Kritsonis, 2013). In other words, one- and two-tailed tests are differing approaches to forms of hypothesis testing as a researcher attempts to explain a phenomenon of interest either by rejecting or failing to reject a null hypothesis (H0). In general, in a research study, the researcher hopes to reject the H0 in favor of the alternative hypothesis or Ha. This is because the H0 represents the current state of knowledge about a phenomenon, and the Ha is the claim a researcher believes to explain it better, more accurately, and more justifiably.


Because not all data distributions are two-tailed, not all research designs will require the designation of a one-tailed versus a two-tailed test. With any statistical procedure planned for a research study with normally distributed data, however, determining which type of test will be used should happen in the initial design phase of the project (Scott & Morrison, 2006).

Assuming normally distributed data, in a one-tailed test, which examines the critical region of only one side of a symmetrical distribution, the statistical procedure employed can suggest whether a sample mean is higher or lower than a population mean (but not both). For this reason, a one-sided test is also referred to as a directional test (Vogt & Johnson, 2015). The researcher must make a decision about which of the two relationships or tails they will consider because a one-tailed test cannot evaluate both. One-tailed tests are only able to consider the relationship of a sample test statistic in one direction and, thus, give no consideration to the relationship in the other direction or opposite tail of the distribution. Figure 1 below shows how this relationship is expressed through the H0 and Ha at the beginning of the research project.


Given the level of significance or alpha that a researcher is willing to accept (usually 0.05 or 5%, but as high as 0.10 and as low as 0.01 or lower), the values of a one-tailed test is its increased power. Because the researcher is examining only one direction of a possible relationship, the significance value is not divided. The power of the one-tailed test is thus greater than the power of the two-tailed test.


A two-tailed test establishes a critical region at either end of a normal distribution – it is, for this reason, often described as a non-directional test (Vogt & Johnson, 2015). As with the one-tailed test, a researcher determines the significance value (alpha level) in advance of the test (again, typically 0.05 or 5%) and then applies the appropriate statistical procedure (for example, a one-sample Z-test or t-test). Because both ends of the distribution are to be examined, the alpha is divided (now only 0.025). This divided alpha also means the power of the two-tailed test is lower than the one-tailed test. Should the sample being tested fall within either critical region at either end of the distribution, the H0 will be rejected.


One-tailed hypothesis tests can show a sample mean is higher or lower than the population mean. They consider whether a sample test statistic (a critical value) falls within the critical region of one side of a distribution only. If a tested sample falls into that critical region, the researcher rejects the H0 for the Ha (DeMoulin & Kritsonis, 2013). By comparison, a two-tailed test examines a range of values and considers whether an effect is evident at either of the two ends of a normal distribution. Using a one-tailed test, a researcher can only infer whether the test statistic falls within the rejection region that is either greater than or less than the calculated critical value. A two-tailed test, while less powerful, uses a range of values that include both sides (i.e., both the positive and negative relationships) in a probability distribution.


A one-tailed test has more power than a two-tailed test because the entire alpha is applied to one relationship in a data distribution, but this increased power comes at a cost. The one-tailed test, by its nature, disregards half of the data of a distribution. The cost of this added power is so great that, unless a research basis exists specifically to use a one-tailed test, the two-tailed test is the default approach for hypothesis testing (Cohen et al., 2018). The one-tailed test is appropriate in experimental scenarios only wheen the researcher needs to understand just one side of a relationship, or where not knowing both sides of the relationship would not be unethical. Take, for example, a study of the efficacy of a novel literacy program being piloted in a school district. Should the research question be limited simply to whether the new program is significantly less effective than the program currently in use, a one-tailed test would be appropriate. Because, however, the one-tailed test is directional, it would not allow any inference as to whether this new program is significantly more effective. Moreover, a one-tailed test should not be used to determine significance, nor should such a test be used on data when a pre-determined two-tailed statistical test has failed. Validity (in this case, the relevance and accuracy of the statistical procedure to the data) and reliability (the consistency and replicability of the procedure) depend upon appropriate use of one- and two-tailed tests (Scott & Morrison, 2006).


Figure 1. Comparison of the null and alternative hypothesis in a one-tailed and two-tailed test.


References:

Cohen, L, Manion, L., and Morrison, K. (2018). Research Methods in Education (8th

ed.). Routledge.

DeMoulin, D.F., and Kritsonis, W.A. (2013). A statistical journey: Taming of the skew!

(2nd ed.). The AlexisAustin Group.

Scott, D. and Morrison, M. (2006). Key ideas in educational research. Continuum.

Vogt, W.P. and Johnson, R.B. (2015). The SAGE dictionary of statistics and

methodology (5th ed.). SAGE Publications, Inc.

bottom of page