Contents
When to you use the F-test?
Comparing two variances is useful in several cases, including:
When you want to perform a two samples t-test to check the equality of the variances of the two samples
When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?
Research questions and statistical hypotheses
Typical research questions are:
- whether the variance of group A (\(\sigma^2_A\)) is equal to the variance of group B (\(\sigma^2_B\))?
- whether the variance of group A (\(\sigma^2_A\)) is less than the variance of group B (\(\sigma^2_B\))?
- whether the variance of group A (\(\sigma^2_A\)) is greather than the variance of group B (\(\sigma^2_B\))?
In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:
- \(H_0: \sigma^2_A = \sigma^2_B\)
- \(H_0: \sigma^2_A \leq \sigma^2_B\)
- \(H_0: \sigma^2_A \geq \sigma^2_B\)
The corresponding alternative hypotheses (\(H_a\)) are as follow:
- \(H_a: \sigma^2_A \ne \sigma^2_B\) (different)
- \(H_a: \sigma^2_A > \sigma^2_B\) (greater)
- \(H_a: \sigma^2_A < \sigma^2_B\) (less)
Note that:
- Hypotheses 1) are called two-tailed tests
- Hypotheses 2) and 3) are called one-tailed tests
Formula of F-test
The test statistic can be obtained by computing the ratio of the two variances \(S_A^2\) and \(S_B^2\).
\[F = \frac{S_A^2}{S_B^2}\]
The degrees of freedom are \(n_A - 1\) (for the numerator) and \(n_B - 1\) (for the denominator).
Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.
Note that, the F-test requires the two samples to be normally distributed.
Compute F-test in R
R function
The R function var.test() can be used to compare two variances as follow:
# Method 1
var.test(values ~ groups, data,
alternative = "two.sided")
# or Method 2
var.test(x, y, alternative = "two.sided")
- x,y: numeric vectors
- alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.
Import and check your data into R
To import your data, use the following R code:
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
Here, we’ll use the built-in R data set named ToothGrowth:
# Store the data in the variable my_data
my_data <- ToothGrowth
To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function sample_n()[in dplyr package]:
library("dplyr")
sample_n(my_data, 10)
len supp dose
43 23.6 OJ 1.0
28 21.5 VC 2.0
25 26.4 VC 2.0
56 30.9 OJ 2.0
46 25.2 OJ 1.0
7 11.2 VC 0.5
16 17.3 VC 1.0
4 5.8 VC 0.5
48 21.2 OJ 1.0
37 8.2 OJ 0.5
We want to test the equality of variances between the two groups OJ and VC in the column “supp”.
Preleminary test to check F-test assumptions
F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.
Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use Q-Q plot (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.
If there is doubt about normality, the better choice is to use Levene’s test or Fligner-Killeen test, which are less sensitive to departure from normal assumption.
Compute F-test
# F-test
res.ftest <- var.test(len ~ supp, data = my_data)
res.ftest
F test to compare two variances
data: len by supp
F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3039488 1.3416857
sample estimates:
ratio of variances
0.6385951
Interpretation of the result
Access to the values returned by var.test() function
The function var.test() returns a list containing the following components:
- statistic: the value of the F test statistic.
- parameter: the degrees of the freedom of the F distribution of the test statistic.
- p.value: the p-value of the test.
- conf.int: a confidence interval for the ratio of the population variances.
- estimate: the ratio of the sample variances
The format of the R code to use for getting these values is as follow:
# ratio of variances
res.ftest$estimate
ratio of variances
0.6385951
# p-value of the test
res.ftest$p.value
[1] 0.2331433
Infos
This analysis has been performed using R software (ver. 3.3.2).