Reproducible research and public data-sharing in ecology: A Q+A with LAGOS creators

Research published today in GigaScience has taken hundreds of ecology datasets and created an open database, known as LAGOS (LAke multi-scaled GeOSpatial and temporal database), spanning a significant chunk of North America and tens of thousands of lakes. GigaScience has asked one of the creators, Pat Soranno, Professor at Michigan State University’s Department of Fisheries and Wildlife, and her team, more about this data resource.

What is macrosystems ecology research, and why is it important?

Macrosystems ecology recognizes that there are important interactions that happen at regional and continental scales too. By definition, macrosystems ecology is multi-scaled, data-intensive, interdisciplinary, and team-based.

Despite its relative youth, macrosystems ecology has many examples of teams working on a wide range of today’s environmental challenges, from trying to understand how invasive beetles are going to affect forests in North America as the climate changes, to understanding and predicting the importance of freshwater ecosystems in global carbon cycles.

For more information on this emerging field, watch the video below:

Click here to display content from YouTube.
Learn more in YouTube’s privacy policy.

What was the motivation behind collecting these types of data together and creating this new database?

Ultimately, we wanted to ask a basic-science question related to macrosystems ecology. We wanted to better understand general properties of macrosystems, particularly related to a class of relationships between predictor and response variables called ‘cross-scale interactions’.

We wanted to know under what conditions these types of interactions occur and when they do not. The answer to this question matters a lot if we want to translate all of the rich detailed knowledge that ecologists have gleaned from finer-scaled studies to broader scales, and to be able to predict how ecosystems might respond regionally or globally to stressors.

We realized early on that this database also provides a rich resource to address some important applied problems too, particularly related to water quality at regional to continental scales, current and possible future threats to water quality, and potential policy applications. A lot of macrosystems ecology research questions have important applications because at broad scales the imprint of people is hard to miss.

In fact, this is one of the things that’s really fun about studying macrosystems ecology – the line between basic and applied research blurs and the focus is on the research needed to better understand the systems that humans rely on.

One of the problems you mention in the paper is ‘dark data’, what is this and why it is a problem in your field?

Dark data has been defined as data that is not well indexed or stored and typically are not publicly accessible. In ecology, data are relatively expensive to collect, requiring resources to spend time in the field, often making repeated measurements, with complementary laboratory analyses, and ultimately, making sure the data are in a good format for analysis, understanding, and publication.

However, such datasets often reside on the computers of the lead scientist who runs the project, and there is not a common practice to share data from these different projects. Consequently, there is not a lot of effort put into documenting those data for other people to use or making the data publicly accessible – mostly because there often has not been a need or a desire.

But, ecologists are increasingly recognizing the value of combining these datasets and working collaboratively to conduct research at broader scales by taking advantage of some emerging trends today: improved technology and training in managing and merging datasets, increased interest in understanding ecological systems at broader scales, and increased team-based science.

How does LAGOS help overcome the challenges in macrosystems ecology research?

My colleagues and I hope that one important contribution that this GigaScience paper will make is to lay out the many steps that have to happen for successfully creating heterogeneous, large, and somewhat complicated databases for conducting macrosystems ecology research that are open, reproducible, and extensible.

We hope that this paper will save future research teams a lot of start-up time in building LAGOS-like databases by providing them a road-map to take a similar approach for building an integrated, multi-thematic, geospatial, and extensible database.

Because, after all of the work we’ve put in, we don’t want others to reinvent this wheel! For example, when we started down this road about four years ago, we searched high and low for documents or articles that explained the steps to create such a database. We couldn’t find one.

Is there a need for more databases such as LAGOS and open-science/open data approaches?

Absolutely. We have learned a lot of important lessons by building LAGOS, one of them being that it takes a lot of thought at the very beginning of the effort to make a database that is reproducible and usable by other scientists.

Lagos Team
The research team that built LAGOS

We are planning on continuing to add to LAGOS ourselves, to expand LAGOS to other regions and continents and to eventually create a network of such large, integrated lake databases to address continental to global questions for freshwater systems.

We are fortunate in that there are quite a few other lake scientist groups and networks that have similar visions and that are complementary to our efforts that we hope to work with, such as GLEON and GloboLakes.

How can you see open-science and open data changing the way the greater ecology community conducts research?

The field of macrosystems ecology and the open science perspective are helping to facilitate more interdisciplinary, team-based, and network science. We think these trends are good for science and for scientists to have the capacity to ask broad and synthetic research questions. We feel strongly that ecology can only be improved by embracing open science in general and open data in particular.

Who wouldn’t want their data, and hopefully themselves, to be a part of that movement?

GigaScience is finding new ways to foster open data, open science and reproducibility. The broad scope covers the entire spectrum of life and biomedical sciences – also encompassing ecology and the numerous research communities within it.

Nicole Nogoy

Commissioning Editor at GigaScience
Nicole studied at the University of Goettingen and completed a PhD in Natural Sciences. Previously the launch and Managing Editor of Genome Medicine, she joined GigaScience in 2012 and is a Open Science, Open Data and Open Access advocate

View the latest posts on the On Biology homepage