In a 2016 paper published in the Journal of the American Statistical Association, Valen Johnson, university distinguished professor and department head of statistics, suggested, along with graduate students Alex Asher and Tianying Wang, that statistical significance should be redefined, or at least refined.
The most widely accepted measure for statistical significance is the p-value, according to Wang, which is a measure of how likely a more extreme result is to appear after a set of trials.
“This number, in statistics, is like a threshold — the smaller the number is, the more strict your standard,” Wang said. “In many cases it’s very flexible.The researcher can choose whatever they want to use, but usually, people just use 0.05.”
There are currently two interpretations of probability within statistics: Bayesian and frequentist probabilities. Johnson was studying Bayesian probability, which judges probability by comparing results to previous iterations within the same trial.
“In studying Bayesian hypothesis testing, I came to the conclusion that p-values of 0.05 were as likely to be evidence for a null hypothesis as against it,” Johnson said.
The null hypothesis, according to Asher and Wang, is the idea that the results of a study are due entirely to random chance, or that there is no correlation between observed results. In order to see if the p-value accurately represents how often results are significant, they chose a set of old trials that had recently been repeated.
“We used a data set that had been previously gathered by a consortium of scientists called the Open Science Collaborative and they spent more than a year reproducing about a hundred experiments that were published in three top psychology journals,” Asher said.
According to Asher, the Open Science Collaboration (OSC) experiments, conducted in 2015, were intended to be fact-checks of the original studies; the scientists worked closely with the authors of the original papers in order to see if the results of the original experiments were still valid.
However, according to Johnson, there was a discrepancy in the OSC results and the original trials.“In 97 of the original experiments, there was a significant finding of a p-value less than 0.05,” Johnson said. “When they were replicated, only 36 of the experiments replicated, or got a p-value of less than 0.05.”
This prompted Asher and Johnson to propose certain changes to the status quo.
“One of our recommendations is that 0.05, the sort of traditional rule of thumb that people have been using is not stringent enough to really consider something significant,” Asher said.
According to Asher and Wang, the value that represents true statistical significance varies from trial to trial, but is generally below 0.005 and sometimes as low as 0.001. According to Johnson, since the publication of the paper people have begun accepting his proposed change.
“In a paper in Nature Human behaviour, 72 senior scientists and I endorsed the idea of redefining statistical significance to be a p-value of less than 0.005,” Johnson said.
A survey later found that around 70 percent of scientists support the idea of lowering the p-value threshold for statistical significance to 0.005.
Johnson said that percentage could be even higher if more scientific journals began publishing statistically significant results as only those with p-values below 0.005 and he intends to continue research in this area.
“We’re also looking at ways of designing experiments more efficiently, so that using the same number of subjects that are currently used in fixed designs to get a p-values of 0.05, or getting a significant result with a given power, you can use maybe even fewer subjects and get a significant result at the 0.005 level,” Johnson said.
Setting statistics straight
November 8, 2017
0
Donate to The Battalion
Your donation will support the student journalists of Texas A&M University - College Station. Your contribution will allow us to purchase equipment and cover our annual website hosting costs.
More to Discover