Difference between revisions of "T-test"

From Wikili
Jump to: navigation, search
(New page: The t-test is probably the best know statistical test.<br> Baiscally the '''t-test''' can be used to compare a) if the avarage of a given sample is different from 0 or b) if the averages o...)
 
 
Line 1: Line 1:
 +
 +
== t-test in R ==
 +
 
The t-test is probably the best know statistical test.<br>
 
The t-test is probably the best know statistical test.<br>
 
Baiscally the '''t-test''' can be used to compare a) if the avarage of a given sample is different from 0 or b) if the averages of two (independent) samples are different.   
 
Baiscally the '''t-test''' can be used to compare a) if the avarage of a given sample is different from 0 or b) if the averages of two (independent) samples are different.   
  
 
The individual values in each sample should follow the '''normal distribution''' and the samples should be '''independet'''.
 
The individual values in each sample should follow the '''normal distribution''' and the samples should be '''independet'''.
for testing Normality in R you may use the [Shapiro-test]
+
for testing Normality in R you may use the [[Shapiro-test]]
 
   
 
   
 
Before launching the test it is essential to define the '''hypothesis to be tested''' and the Ho (hypothesis of the inverse).  Averages may be tested "two-sided" for (not-)equality (the hypothesis doesnt specify if average_1 is larger or smaller than average_2), or single-sided (where larger or samller has to be chosen).
 
Before launching the test it is essential to define the '''hypothesis to be tested''' and the Ho (hypothesis of the inverse).  Averages may be tested "two-sided" for (not-)equality (the hypothesis doesnt specify if average_1 is larger or smaller than average_2), or single-sided (where larger or samller has to be chosen).

Latest revision as of 20:01, 17 November 2008

t-test in R

The t-test is probably the best know statistical test.
Baiscally the t-test can be used to compare a) if the avarage of a given sample is different from 0 or b) if the averages of two (independent) samples are different.

The individual values in each sample should follow the normal distribution and the samples should be independet. for testing Normality in R you may use the Shapiro-test

Before launching the test it is essential to define the hypothesis to be tested and the Ho (hypothesis of the inverse). Averages may be tested "two-sided" for (not-)equality (the hypothesis doesnt specify if average_1 is larger or smaller than average_2), or single-sided (where larger or samller has to be chosen). The initial t-test assumes equal variance in both samples, if you think this is not the case the Welch-correction allows to use for each sample individual estimations of the standard deviation. in fact, the default implementation in R does already the Welch-correction.

Run the test in R as :

samp1 <- c(2:10,4:6)
samp2 <-  c(6:11,9,10,14)
# test the hypothesis that the averages of samp1 and samp2 are equal (ie Ho aver(samp1) equal aver(samp2) )
t.test(samp1, samp2)

will return the t-value, the degrees of freedom, the p-value, the 95% confidence interval and the sample (estimated) means. If you simply want the p-values type :

t.test(samp1, samp2)$p.value

In this particular example the probability (p-value) for the hypothesis of both averages being equal is quite samll, therefore one may consider the averages of both samples as significaltly different (ie below the calssical a=5% threshold) since :

t.test(samp1, samp2)$p.value < 0.05


Special cases and Assumptions :

As mentioned before, t-test assumes INDEPENDENCE of the variables to be tested ! Note, that in many settings in Bioinformatics such independence is not entirely granted (eg genes may potentially be co-regulated...).

When running many t-test a special correction for the multiple testing should be applied. For example this is the case with many testing situation many genes present on a single microarray.