Working together, we get the job done: our paper on justifying your alpha, recently accepted by Nature Human Behavior, was the fruit of a great collaboration by 88 authors. How? Google Docs!
Not So Blue Monday
On 'Blue Monday' this year I received a rather encouraging e-mail: our recent paper has been accepted by Nature Human Behavior. We have written a response to the paper by Benjamin et al. (2017), who suggested redefining statistical significance by lowering the standard statistical alpha level, also known as the p-value. In our paper, we argue that this new level is as arbitrary as the current standard and can become very problematic for multiple reasons. We point out that researchers think through their priors, methods, and analyses very thoroughly and justify their alpha on that basis. Although I was just one of 88 authors, it felt very victorious to have this paper accepted: this document had come about using a unique democratic platform in scientific writing: the Google Document.
Redefine Statistical Significance
It all started in the summer of 2017. A pre-print of Daniel Benjamin and colleagues “Redefine Statistical Significance” appeared online on the Open Science Framework. This paper also had an impressive list of authors: more than 70 researchers had contributed. In their paper, the authors argued that for null-hypothesis statistical testing the standard significance level, the alpha level, should be lowered from p <0.05 to p <0.005. Their suggestion was a direct response to the replication crisis that is thought to currently dominate many scientific fields, including psychology. According to the authors, the level of p <0.05 provides too weak evidence against the null hypothesis ('there is no effect') for claiming new discoveries. They argued that with this alpha level, too many false positive results were reported as valid effects, and that this resulted in a decreased capability for others to replicate these effects. A lower level for statistical significance would render evidence against the null hypothesis stronger, and with it, conclusions more reliable.
Is this really the solution?
The proposal by Benjamin et al. gave rise to many responses within the scientific community. Discussions on twitter and in multiple blog posts revolved around whether this was a good idea or not. I, too, had my reservations about this proposal. Of course, we must continue to monitor our methods critically and avoid drawing unreliable conclusions based on weak evidence. But to change an arbitrary standard significance level to a lower – but equally arbitrary – level seemed a rather weak solution to the larger replication problem. Moreover, such a move would only cause more problems. For example, you would have to need a much larger sample to be able to detect the same effects. Finding suitable and motivated test subjects can already be quite difficult, especially among patient populations. Moreover, dramatically increasing sample sizes would render studies much more expensive. And financial resources are already quite limited in science.
Shortly after the appearance of the preprint, Daniël Lakens, experimental psychologist and statistical expert at Eindhoven University of Technology, took on the task of formally responding to the paper. He started a Google Document, and called via twitter on anybody who wanted to contribute to a joint response to the paper by Benjamin et al. Soon, a group of about 100 scientists had joined in the interactive Google Docs discussion online. A wide range of researchers – from big names in the field to PhD students, scientists from large and small institutes worldwide – were all motivated to share their insights and knowledge on the matter.
Consensus through Google Docs
The first brainstorming document quickly reached 100 pages. The main topics of this first brainstorm session were summarized in a new Google document, and there the article gradually took shape. As is characteristic for the Google Doc format, authors were able to delete and add pieces of text online and in real-time, often several of us editing the file at the same time. Most contributors committed to the discussion about the main arguments we wanted to cover. And everyone further contributed to the article in their own unique way, for example by performing data simulations to substantiate our arguments, contributing to figures and captions, suggesting relevant literature, monitoring language, grammar, and the word limit, or keeping the reference list up-to-date. Within just a few weeks, the group of authors had reached consensus about the text and main message, and in mid-September the paper “Justify Your Alpha” was published in its final form as a preprint. The paper was also submitted to Nature Human Behavior, the journal that had published the paper by Benjamin et al. two weeks previously. After a similarly iterative and interesting revision round, the paper was accepted for publication in January 2018.
The main message
The main message of our article was that rather than lowering an arbitrary level of significance, scientists should justify their choices in analytical methods, alpha-level, and other statistical decisions. At the present time, far too few researchers make the effort to properly do this. In the proposed approach, researchers would have to take into account prior knowledge, the (clinical) implications of their findings, relevance and standards in their field of research, and multiple indicators of evidence, both for and against the null hypothesis; in the end this information should all be included in the conclusion. Read an example of how to justify your alpha is described.
All in all, this collaborative effort was a very interesting, but certainly also educational experience for me. I would really advise anyone to take up this opportunity if it comes their way: a few weeks of lively discussion with experts via a Google Document has provided me with many new insights into the statistics of science and has made me even more critical of my own work and that of my peers: an important trait that contributes to the wonderful self-correcting character of science!