Example+t-test+and+answers+to+sample+size+questions

Example t-test and answers to sample size questions

Here are two questions that I would like to be able to answer.

1. Given a certain standard deviation and number of samples in each of two groups, what would the difference between the two population means need to be greater than in order achieve significance and is this difference detectable and greater than the standard deviation?

2. Given a certain standard deviation and difference between two population means, how many samples in each group would be necessary to achieve enough significance to conclude that the two group means truly are different (with a 5% false positive error)?

Note that in order to answer these questions the approximate standard deviation would need to be known from an experiment or previous values in the literature.

Excel has two nice functions for answering these questions: TDIST and TINV to go back and forth between t and p values. Note that determining the p value from the t value essentially involves determining the area under Student's t distribution curve from the point of the t value. However, the equations for doing this are quite large and involved so these Excel functions can greatly speed the calculations up.

Before answering these two questions, I will first I'll just go through an example 2 sample t-test using formulas that can handle unequal sample sizes and unequal variance.

Let's say I have signal intensities for a particular spot on a microarray due to tumor sera binding, and I also have signal intensities for a particular spot on a microarray due to naive sera binding.


 * tumor1 || tumor2 || naive1 || naive2 ||
 * 1297 || 1293 || 2519 || 2567 ||

I then calculate the average and standard deviation for the tumor group and the naive group.
 * tumor avg || tumor stdv || naïve avg || naïve stdv ||
 * 1295 || 2.828427 || 2543 || 33.94113 ||

Next I need to calculate t from this information.



With this data the value for t becomes: Note that when computing the t value the x1 must be greater than x2 or the absolute value function could be used.
 * n || Sx1-x2 || t ||
 * 2 || 24.08319 || 51.82038 ||

Next before using the t value to determine the p value, we need to determine the degrees of freedom.



For this data I obtain


 * df ||
 * 1.013888 ||

The excel function TDIST(X,df,# of tails) can then be used to determine the p value. Researchers often set the "alpha" value to 0.05, and consider the means of the two populations to be significantl different if the p value is below this alpha value. Note that alpha is also the Type I error rate or the false positive error rate. I just think of it as the probability that you will accept the alternative hypothesis when it is actually false. Note that you can never really accept the alternative hypothesis, you can just reject the null hypothesis (that the two means are not significantly different) when it is actually true.

With this data I obtain:
 * p value ||
 * 0.012284 ||

Now let's answer the original questions. 1. Given a certain standard deviation and number of samples in each of two groups, what would the difference between the two population means need to be greater than in order achieve significance and is this difference detectable and greater than the standard deviation?

Let's keep the standard deviation and the number of samples the same. Let's set the p value to 0.05. Then let's use TINV(p value, df) to determine the t value. From the t value we can calculate the difference in population means required to originally obtain that p value. t = difference/"Sx1-x2". I obtain the following values.


 * tumor stdv |||| naïve stdv || n || Sx1-x2 ||  ||   || df ||   || p value ||   || t value ||   || difference between population means required ||
 * 2.828427 ||  || 33.94113 ||   || 2 || 24.08319 ||   ||   || 1.013888 ||   || 0.05 ||   || 12.7062 ||   || 306.0059 ||

Now let's answer the 2nd question. 2. Given a certain standard deviation and difference between two population means, how many samples in each group would be necessary to achieve enough significance to conclude that the two group means truly are different (with a 5% false positive error)?

A solution to this problem can be found by using the goal seek function in excel. Set a cell for the difference between the population means; set a cell for n (just choose some arbitrary number like 2; calculate df and t using the other information (including standard deviation). Then use the TDIST function to get the p value from the t value. Then use goal seek (Data -> What If Analysis -> Goal Seek) to set the p value to 0.05 by changing n. So for this example data if I know that the two population means are expected to vary by 40, I will find that I need 6 samples in each group to show that this difference is significant.


 * tumor stdv |||| naïve stdv || n || Sx1-x2 ||  ||   || df ||   || p value ||   || t value ||   || difference between population means required ||
 * 2.828427 ||  || 33.94113 ||   || 5.523944 || 14.4912 ||   ||   || 4.586773 ||   || 0.050834 ||   || 2.760296 ||   || 40 ||

Spreadsheet file can be found here: https://imtiewhl.livedrive.com/files/5183 or here "F:\kurt\storage\CIM Research Folder\DR\2012\5-17-12\example t-test 5-17-12.xlsx"