BMC Bioinformatics: Review of 2015

2015 has been another stellar year for BMC Bioinformatics. Here we look back at five of the articles published this year that have drawn the most attention from our readers.

Use and mis-use of supplementary material in science publications

One of our devoted Section Editors highlighted an issue that is highly relevant in these times of increasing attention to reproducibility and development of open data policies – how should we handle the growing amount of supplementary information, which in many cases can surpass the length of the main text several fold? Supplementary material is necessary for a complete scientific record, especially in articles governed by size limitations. However, improperly implemented, such files might both obstruct thorough peer-review, reuse of data, and fair recognition of previous work through citation tracking.

Mihai Pop and Steven L. Salzberg propose that a solution requires the action of both scientists and publishers: while authors should ensure that the main text explicitly refers to specific supplementary text sections, figures and tables, the journals should consider removing article size limits and develop (and reinforce) clear and consistent policies for using and reviewing supplementary material. Do not forget to check the supplementary information provided for this article!

George Shuklin, via Wikipedia, CC1.0

Deep convolutional neural networks for annotating gene expression patterns in the mouse brain

Gene expression patterns, changing with time and location, control the development of neuronal connectivity, which in turn results in the final brain structure and function. A wealth of data for researchers aiming to study the relationship between gene expression signals and the brain structure in mice is provided by the Allen Developing Mouse Brain Atlas. This atlas contains images of in situ hybridization gene expression patterns at seven different embryonic developmental stages. Characterization of local expression levels and gradients, and their correspondence to different brain structures, has until now required manual assessment by expert neuroscientists. Tao Zeng and colleagues describe in this paper a new computational method that could automate this arduous task.

Implementing a deep convolutional neural network, a computational architecture itself inspired by brain information processing, they managed to show that it could be trained to extract gene expression pattern features at multiple levels of brain structures, for the four development stages that have so far been manually annotated. They are now hoping to develop their technique in order to be able to analyse data and predict classifications for the remaining three stages.

User Pascal on Flickr, CC0 1.0

QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments

High throughput RNA sequencing has blossomed during the past years, offering snapshots of the transcriptomics with unprecedented precision. This enables, for example, tools for in-depth exploration of variations in gene expression and regulation, and genetic causes of diseases, thereby driving further the development of personalised health care. As with every sequencing technology, this method is not devoid of potential biases, artefacts and other errors, which means that high-standard quality control measures are indispensable. Quality control tools for RNA sequencing do exist, but may be insufficient due to feature limitations and output inconsistencies.

Stephen Hartley and James Mullikin aimed to improve on existing analysis tools, and present in this article QoRTs (Quality of RNA-Seq Toolset). The idea is to incorporate a diverse set of quality control metrics, and to enable identification of systematic biases and artefacts by allowing the data to be grouped according to sample, run, biological characteristics or any other arbitrary criteria. Analysis is further facilitated by visualization tools for cross-comparison between replicates, and with the possibility to generate input files for differential expression or regulation analysis software, the authors hope that QoRTs will offer users a unified, time-saving data-processing and quality-control package.

User Teoman Cimit on Flickr, CC BY 2.0

A network model for angiogenesis in ovarian cancer

Based on findings that a subtype of ovarian cancer with poor clinical prognosis involves a distinct expression of genes associated with angiogenesis (the process of developing new blood vessels from existing ones), Kimberly Glass and colleagues aimed with this study to understand the underlying mechanisms leading to such differential expression. Applying their method PANDA, they analysed topological differences in the gene regulatory networks, evaluating the interplay between transcription factors and their downstream target genes by integrating three types of data: transcription factor binding site motifs, protein-protein interactions and gene expression.

The results indicate that many transcription factors regulating genes that ultimately show significant subtype-specific differences in their expression levels, are themselves not strongly differentially expressed. Using experimental cell line data, the authors also examine how key genes identified through their network analysis respond to therapeutic treatments, and suggest that some widely used drugs may in fact disrupt regulatory networks that repress angiogenesis. Instead, the authors propose alternative and complementary treatments that are yet to be tested and validated.

User Ninjatacoshell on Wikipedia, CC BY-SA 3.0

Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities

The techniques of metagenomics are offering us expanding possibilities to directly analyse genetic material extracted from environmental samples (for example water and saliva), and determine the microbial content or the functions of the genes present. Shotgun sequencing is one of the identification methods, whereby the DNA chains are randomly divided into short fragments which are sequenced, and afterwards reconstructed to be used for further matching and classification. As to the subsequent taxonomical categorisation, Michael Peabody et al. point out in this study that there is still no method that is complete and clearly outperforming the rest. They test 38 programs on two datasets: a computer generated collection of several microorganism phyla, and less diverse data from distilled water spiked with bacteria.

The authors find that the methods differ in their strengths and weaknesses with respect to accuracy, specificity, sensitivity and computational demand. Eleven of the programs that allowed for clade exclusion analysis (removing sequences at a chosen taxonomic level from the reference database in order to evaluate the accuracy in classifying higher levels) were tested further, some of them greatly over-predicting the true number of species in the datasets. With this evaluation, the authors emphasize the importance of choosing a method appropriate for the question to be investigated, and hope to have highlighted some points that should be considered when choosing an analysis tool.

View the latest posts on the BMC Series blog homepage