P values in practice

Found a great article from Andrew Gelman at Columbia on how to think about p values from a Bayesian perspective.  Although I don’t really understand Bayesian statistics at this point, the article still had some very nice explanations about how to think about traditional p values as they are used in practice.

Some key points:

The P value is a measure of discrepancy of the fit of a model or “null hypothesis” H to data y. Mathematically, it is defined as Pr(T(yrep)>T(y)|H), where yrep represents a hypothet- ical replication under the null hypothesis and T is a test statis- tic (ie, a summary of the data, perhaps tailored to be sensitive to departures of interest from the model).”

“[…] the P value is itself a statistic and can be a noisy measure of evidence. This is a problem not just with P values but with any mathematically equivalent procedure, such as summarizing results by whether the 95% confidence interval includes zero.”

“[…] we cannot interpret a nonsignificant result as a claim that the null hypothesis was true or even as a claimed probability of its truth. Rather, nonsignificance revealed the data to be compatible with the null hypothesis;”

“we accept that sample size dictates how much we can learn with confidence; when data are weaker, it can be possible to find reliable patterns by averaging.”

“The focus on P values seems to have both weakened [the] study (by encouraging the researcher to present only some of his data so as to draw attention away from nonsignificant results) and to have led reviewers to inappropriately view a low P value (indicating a misfit of the null hypothesis to data) as strong evidence in favor of a specific alternative hypothesis […] rather than other, perhaps more scientifically plausible, alternatives such as measurement error and selection bias.”