top of page

Is Your Variance Homogeneous?

A small violation of the homogeneity of variance assumption may have little practical effect on analysis due to the robust nature of parametric tests - but how small is reasonable?

 

Variance and the Assumption of Homogeneity of Variance

The distribution of a set of data is described in three ways: by the shape of the data (for instance, unimodal or bimodal; lepto-, meso-, or platykurtic), by their central tendency (the median, median, and mode), and by their variability (spread or dispersion of scores across the data set as indicated by such measures as variance, standard deviation, and range) (Vogt & Johnson, 2015). Variance, in particular, is a measure of the average squared deviation from the mean, indicated by σ2, where σ is the standard deviation of a population. Because variance is based on deviations from the mean, as an indicator of dispersion, it reflects how scores are spread about the mean and is calculated by taking deviations from the mean.


Homogeneity of variance is an assumption in statistics used both for t and F tests, especially in connection to analysis of variance, commonly known as the ANOVA procedure. The related term, homoskedasticity, is typically used in connection to correlations and regressions and refers to homogeneity of variance in arrays. The concepts are related, but the terms should be distinguished (Vogt & Johnson, 2015). Homogeneity of variance posits that multiple samples taken from a population will be similar in their measured behaviors or responses (i.e., the data they produce). Although the variance of the data among multiple samples taken from a given population will not be identical, they should be relatively similar. Take, for example, in the realm of education, where a researcher may be investigating literacy among students in Grade 3 in a series of independent and public schools in a district. For the purpose of comparison and in order to apply higher-order statistical procedures (which assume homogeneity), it is necessary to determine first that the variances of the samples are either relatively similar or different for the results to be credible. Testing for homogeneity in this way, while it does differ from hypothesis testing, is closely related and, as such, may be referred to as the minor hypothesis (Demoulin & Kritsonis, 2013).


For hypothesis testing with multiple samples (a t-test or ANOVA), the assumptions are: 1. That scores are independent; 2. That scores are normally distributed; and 3. That score variance is homogeneous (Vogt & Johnson, 2015). Verified independence is a function of random selection; verified normal distribution is a function of data description and plotting; and verified homogeneity of variance is a function of a test statistic, like an F test. In all cases, these assumptions refer to the population as a whole, although samples might be used to verify them. Should a data set not satisfy all three assumptions, a parametric test (i.e., one whose findings are generalizable to a population) may give misleading results.



Impact of violating Homogeneity of Variance on the Validity of Results

Parametric statistical procedures, by definition, are concerned with populations. A parameter is a value that remains constant in an experiment or calculation and represents the whole of a system being measured and interpreted; that whole is also known as a population. By comparison, a statistic is a value that describes and interprets a part of the whole, otherwise known as a sample. Given that homogeneity of variance is one of the assumptions of parametric statistics, it follows that when the assumption is violated, the validity of the calculation diminishes. Because statistics as a discipline is rooted in probabilities, levels of uncertainty are implicit. The work of research necessarily involves some level of possible error, usually expressed as a percentage and, by convention, falling between one and five percent, but as high as ten percent. In a scenario where scores are verified independent and the distribution is normal, should the observed value of an F test statistic fall just outside of the rejection region, but only by a small number of decimal points, moving forward with a parametric procedure could seem logical. Because statistics is probabilistic, a very small violation of the assumption of homogeneity of variance might seem reasonable.


A challenge, in my view, arises from the fact that statistics does not admit of degrees, which is to say, for instance, that a result is significant or not significant, based upon the pre-established alpha of the research design. Whether an observed value falls within a rejection region merely by 0.01 or by 100.01, the finding is significant – not barely significant, not highly significant; just significant. And the same is said of non-significant findings. Whether there is small practical effect or not from using data in a multi-sample hypothesis test when the data have been shown to be relatively different in variance does not change the fundamental premise that statistical validity is contingent on fixed values, not gradations in its decision rules. Furthermore, as the difference in variance between two samples increases, so does the likelihood of rejecting the null hypothesis when the null hypothesis is, in fact, true – otherwise known as a Type I error or false-positive result.

Resolving the two viewpoints

A violation of homogeneity of variance will necessarily negatively impact the results of a parametric procedure. As such, I would argue that resolving the competing viewpoints of whether some small difference in variance is acceptable or not misses the mark. Instead, the approach I would take is that when homogeneity of variance is assumed, that assumption must be met. When the assumption cannot be met, I would look for an alternative. Each parametric statistical procedure has a non-parametric equivalent (in this case, Levene’s test for equality of variances, used in combination with Welch’s t-test). Rather than force data into a calculation that they do not support, my preference is to consider an alternative that the data do support. Given that statistics is already a probabilistic field of inquiry, the goal should be to decrease, not increase the likelihood of error.


References:


DeMoulin, D.F., and Kritsonis, W.A. (2013). A statistical journey: Taming of the skew! (2nd ed.).

The AlexisAustin Group.


Vogt, W.P. & Johnson, R.B. (2015). The SAGE dictionary of statistics and methodology (5th ed.). SAGE Publications, Inc.



Comentarios


bottom of page