Digital Scholarship@Leiden

Leiden University Libraries & Elsevier Seminars on Reproducible Research: Wrap-up Seminar 3 Reproducibility: the Turing Way

Leiden University Libraries & Elsevier Seminars on Reproducible Research: Wrap-up Seminar 3

The spreakers in the third session of the Leiden University Libraries & Elsevier seminars on Reproducible Research explained the concrete steps researchers can take to make their research reproducible.

Did you miss this seminar or would you like to watch some parts (again)?
You can watch the recording of the seminar on this playlist:

Catriona Fennel firstly spoke about some of the ways in which academic publishers can enable and encourage researchers to make their research more reproducible. Elsevier has experimented with publishing various article types and new journal types. The Data in Brief journal, in which researchers can publish and describe data sets, has been quite successful, mainly because the editors tried to incorporate the process of sharing data sets within the existing workflow for publishing regular papers as much as possible. The initiative to set a journal focusing on negative results had been less successful, unfortunately. Fennel also discussed STAR Protocols, an initiative set up by Cell Press to publish peer-reviewed protocols. On this platform, researchers can share detailed and consistent instructions on how to conduct specific experiments. Elsevier also works with artificial intelligence to make peer review processes more efficient. Next to various tools that can detect duplicated images and cases of plagiarism, Elsevier also makes use of StatCheck, a tool developed at Tilburg University to detect statistical reporting inconsistencies. A number of Elsevier journals have implemented the concept of Registered reports, in which papers are submitted before the eventual results are known. Some journals also work with the principle of Results masked reviews, which means that papers are submitted to the peer reviewer without the actual results. Initiatives such as these can clearly help to counter publication bias.

Kristina Hettne discussed the concrete steps that can be followed to reproduce a given study in the second lecture. To be able to reproduce a study, it is generally necessary to have access to the data, the tools, the code, the documentation of the code, and a clear description of the final results. She illustrated this whole process of enhancing the replicability of a study by discussing the workflow that was followed for one of her own research papers, published in PLOS One. The data of this paper were made available in Dryad, and the research software was published on GitHub. The software included an overview of the structure of the code and a precompiled file specifying all the dependencies. The results were shared as nano-publications and in the form of a CSV file. Hettne explained that, in this particular project, the whole process of making the research reproducible demanded about 20% of the total time, but she added that this process will probably not be as labour-intensive in new projects, as these can follow the best practices that have been established during this earlier project. Hettne concluded her talk by sharing her experiences with the organisation of reproHacks, which are basically hackathons in which participants aim to reproduce the results of submitted papers. Researchers who attend such reproHack sessions can improve their knowledge and their skills in the field of reproducible research, and the authors whose papers are scrutinised also receive feedback on the transparency and the reproducibility of their work.

Gabriele Hayden delved more deepy into the topic of computational reproducibility. It was stressed again that computational reproducibility demands data and code, and that both of these components need to be documented well. Next to this, it is also important to ensure that software is archived. Github is an environment that can make code available to colleagues directly, but it does not necessarily safeguard the software for the longer term. Even when the code and the data have been curated carefully, running the code may still pose certain challenges. Software typically has specific dependencies, and tools and programming languages evidently develop over time. Such changes may cause the code to break, or, more delusively, they may lead to completely different results. There are a range of tools that can be used to replicate computational analyses. Virtual Machines can be very useful, but they often demand a large storage capacity and considerable processing power. As a more light-weight alternative, researchers can work with containers such as Docker in which the application, the dependencies and the configuration files are packaged together. During her presentation, Hayden bravely gave a live demonstration of a replication of a specific research paper. The code fortunately ran well, to large extent. There was only one error message in Rstudio about a specific dependency. Finally, recognizing that the process of making research reproducible can take time, Hayden advised researchers to start small and to refrain from the need to get everything right immediately. It can also be helpful to ask colleagues to test the code, because the original researchers may sometimes suffer from expert blindness.