Keeping up with the Jobses: the role of technology in reproducible research

AllBiologo's workshop on 'reproducibility in research' saw a metaphorical bottle smashed against the bow of The Genome Analysis Centre (TGAC)'s shiny new training facility.

Fueled by hackpads, marker pens and a mountain of tea and biscuits, the workshop (a mixture of research scientists, PhD students, coders, funders and publishers) set about asking the question: 'what are the barriers to reproducible research?'

Group photo
Group photo (click to enlarge)

Running to stand still

allbio-logoAllBio was established to bring the technology of bioinformatics to a diverse set of biological disciplines, but with this workshop it stepped across to research's flipside: publishing.

Whether data or papers, it is clear that advances in technology have much to offer when it comes to improving the methods by which scientific advancement is reported.

It's been done before. BioMed Central was founded on just such a principle – that the newly widespread availability of the world wide web made Open Access, online-only publishing the sensible thing to do as we entered a new millennium.

At the time radical, BioMed Central's vision would prove to be contagious, and the Open Access model has now been adopted by hundreds of journals from publishers across all areas of biomedicine.

An ecosystem

But we don't want to stand frozen in time in the year 2000: technological advancements continue to redefine both what is possible in publishing and what types of data are being reported, and we need to respond to them.

One area where change will surely have to occur is in the way large biological datasets and their subsequent bioinformatics analyses are disseminated. Frequently, one or both of these are not disseminated at all. When they are, the default is a data and code dump with minimal curation, minimal discoverability and minimal interoperability.

Ideas emerging from the workshop included a call for an ecosystem to be developed that would make it easy for researchers to change the way they release their data and code.

Critically, this would allow others not only to reuse and recycle with no duplication of effort, but also to test the rigor of the claims made in a study.

Got no rhythm

The focus on code and ecosystems might seem to have a bioinformatics bent – which is to be expected given the origins of AllBio and TGAC, and hence make-up of the workshop – but principles of reproducibility should apply universally to all fields within biomedicine.

BMC Biology was pleased to be able to illustrate this recently in a paper that took apart a decades' old dogma holding that fruit flies belted out their courtship song with a characteristic rhythm.

David Stern
David Stern

David Stern's combination of new experiments and reanalysis of the original data showed that these rhythms simply do not exist. As he explained in an accompanying blog post, the so-called rhythms were merely an artifact of the way data had been grouped together.

In the AllBio workshop's envisaged ecosystem, it would be possible for skeptics to interact with the analyses immediately upon publication. The hope is that, with such openness in place, far fewer than 30 years would elapse before fatal flaws are exposed.

Even the workshop was open

The bonus of a hackpad-driven workshop is that all the discussion, ideas and outcomes are accessible to anyone with the URL for the motherpad. Well, that includes you, because here it is:

Highlights included a proof-of-principle standards in next-generation sequencing analysis checklist-for-dummies, which used RNA-seq pipelines as an example, and a discussion around building a Docker-centered ecosystem to facilitate reproducibility of computational analyses. Released for the first time only last year, Docker is an instructive example of how technology continues to change what is possible in science publishing.

Naked mole-rat
Click to enlarge

Based at TGAC, the workshop was surrounded by plenty of great examples of the tools and practices we had set our sights on.

From this month's launch of the naked mole-rat genome resource to leadership of the BioJS project for data visualization, it was a reminder that many initiatives are already under way to improve the openness and reproducibility of research.

Creative incentives

That said, these principles are far from universally accepted, as PLOS discovered when they faced an anti-open data backlash. And where they are accepted, cost in terms of time and effort can prove just as big a barrier as ideology.

The three directions we can take to lower these barriers are clear:

(1) to create user-friendly, quick-and-easy systems for data and code deposition;

(2) more training for researchers in how to share data and code;

and, (3) to incentivize good practice in a way that might just about make the time and effort worth it.

Even public conveniences can be awarded an 'open data certificate' (click to enlarge)
Even public conveniences can be awarded an 'open data certificate' (click to enlarge)

Even when it comes to incentives, new technology might have a role to play, suggested the European Bioinformatics Institute's Dan Bolser.

He put forward the concept of using a Bitcoin-like cryptocurrency to reward good practice, using the 'carrot' approach already adopted by initiatives such as the 'open data certificate'.

Metrics can also help motivate, providing easy numbers for researchers to use as evidence of their contributions and impact to the community outside of traditional citation measurements.

But despite the current fad for altmetrics, we still don't have standard metrics for the popularity, in terms of usage, of bioinformatics software tools and biological data resources.

Others favored the 'big stick' approach, with more proactivity from journal editors, reviewers and funders in enforcing standards. Ultimately, the future most likely lies in a mix of carrot, stick, training and better systems. Quite possibly driven by a technology that does not yet exist.

View the latest posts on the On Biology homepage