Notes+on+t-test

Notes on Student's t-test

Sources of Information http://en.wikipedia.org/wiki/T_test http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5 http://www.andrews.edu/~calkins/math/edrm611/edrm11.htm http://martin-bell.suite101.com/sample-size-calculations---confidence-and-power-a209629 http://www.surveysystem.com/sscalc.htm#one Tiger also found this link for looking at how many samples are needed for a microarray with a certain number of spots @http://bioinformatics.mdanderson.org/MicroarraySampleSize/ Sample size software (from Tiger): http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/ Here's a paper about this program: G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences Other papers on sample sizes for microarrays

used for data that follows a normal distribution (a normality test such as the Shapiro-Wilk or Kolmogorov-Smirnov test can be used to determine if data is of a normal distribution form) null hypothesis: the difference between two responses measured on the same statistical unit has a mean value of zero. unpaired t-test: two independent groups of samples are compared (one treatment group and one non-treatment group) paired t-test: often a repeated measures t-test (one group is tested before and after treatment)

p-value: determined with t value from a table of values from Student's t-distribution (there's also formulas for Student's t-distribution that can be used instead of tables). Excel has some normal distribution formulas found here http://www.exceluser.com/explore/statsnormal.htm this page has an explanation for how to use tables to look up p values http://www.math.unco.edu/facstaff/Powers/M550/ContinuousDistributions/pdf/7.2.1%20Finding%20Areas%20under%20the%20Standard%20Normal%20Curve.pdf Excel has two functions useful for t tests: TTEST, and TINV, TDIST The p value can be found from Student's t distribution this page looks like it has some good information about how to do this http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section8/smallsampmean/tdist/tdist.htm

degrees of freedom for a two sample t test is n-2. actually the degrees of freedom for unequal sample size and unequal variance is much more complicated and is found on the wikipedia article.

some t test kahn academy videos start here: http://www.khanacademy.org/math/statistics/v/t-statistic-confidence-interval

One-sample t-test t=(x_bar-u0)/(s/sqrt(n)) df = n-1

Two sample t test for unequal sample sizes, unequal variance can be found on wikipedia article (long equations) http://en.wikipedia.org/wiki/T_test

Selecting appropriate sample sizes confidence interval (same as confidence?) = 1-alpha confidence level (preselected) (alpha error level) (often 0.05) characteristics of sample or population margin of error

determining sample size needed to estimate the true mean of a population with a certain margin of error this page talks all about that http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5 this page also has information about determining the difference of two population proportions (but I think this is for a binomial distribution and not a normal distribution)

power = probability of rejecting the null hypothesis when you should (when the specific alternative hypothesis is true) (1-beta). In other words, the power is the probability of detecting an effect when their truly is an effect. beta = Type II error rate (like a false negative) (you reject the alternative hypothesis when it is actually true) alpha = Type 1 error rate (probability of rejecting the null hypothesis when it is actually true or the probability of accepting the alternative hypothesis when it is actually false (like a false positive)) (commonly set to 0.05)

this page talks about power analysis http://www.ats.ucla.edu/stat/sas/dae/t_test_power2.htm information needed expected average difference standard deviations of groups pre-specified level of statistical power for calculating the sample size (could be set to 0.8) "good estimate of effect size is the key to a good power analysis" I could use SAS to determine the power of a test with a certain effect size, stdev, and sample sizes

Notes on t-test for specific applications Notes on q-value for many simultaneous tests

Normal distribution function http://mathworld.wolfram.com/NormalDistributionFunction.html Cumulative distribution function

Here are two questions that I would like to be able to answer. 1. Given a certain standard deviation and number of samples in each of two groups, what would the difference between the two population means need to be greater than in order achieve significance and is this difference detectable and greater than the standard deviation?

2. Given a certain standard deviation and difference between two population means, how many samples in each group would be necessary to achieve significance?

Example t-test and answers to sample size questions

Calculating false discovery rate

R Statistics Program could be a useful for the analysis.