News

RegioneReloaded software boosts the analysis of how multiple genomic datasets are associated

Researchers from the Buschbeck lab at the Josep Carreras Leukaemia Research Institute have improved their previous package RegioneR to allow it to analyze multiple region sets along complete genomes in a computationally efficient way. The new software, named RegioneReloaded, will be used to compare the relative positions of multiple genomic features, like transcription factor binding sites or methylation hot spots, all at once, to extract valuable information on how the genome is organised in health and disease.

RegioneReloaded software boosts the analysis of how multiple genomic datasets are associated
RegioneReloaded software boosts the analysis of how multiple genomic datasets are associated

The genome is a complex place. It harbours all the instructions to keep a cell going… and this is a lot to say: with more than 20000 genes and around 80000 proteins, regulation is not a small feat. There are many regulatory layers, from the binding of proteins on specific regions to the establishment of transient epigenetic marks over the DNA or histones, to mention just a few. To make it worst -for those who want to understand it- often many regulatory signals cooperate in the same place.

Modern omics technologies allow us to identify all these features and create huge databases of what researchers call region sets: lists of precisely localised genomic regions containing a particular feature, like all sites where a specific protein binds, or all methylated spots on the DNA. Whenever two or more of these features colocalise, it is upon the research community to determine its biological significance.

Colocalization, however, is a tricky thing. Given the complexity of the genome and the vast number of regions detected in most genomic experiments, we need a way to tell if what we observe is meaningful or just by chance. RegioneR was originally developed to solve this question by -to keep it simple- shuffling the positions of one dataset and comparing it against another one, thousands of times. With this trick, and the right statistical methods, one can figure out whether the association is strong or just coincidence. Since its launch in 2016, regioneR has been widely used by researchers and cited in several published studies.

The downside of RegioneR is that can only match pairs of datasets. Now, the team led by Dr. Marcus Buschbeck and spearheaded by former lab member Dr. Roberto Malinverni, senior postdoc Dr. David Corujo and IGTP member Dr. Bernat Gel presents RegioneReloaded, a new piece of software able to compare multiple datasets at once. This new package retains and expands on the versatility of regioneR to tackle many different biological questions and will be a valuable tool for the genome analysis research community. RegioneReloaded has been published recently at the top journal Bioinformatics and is freely available online at Bioconductor, the reference open-source repository for bioinformatic R packages.

In the publication, the team describes the new features of the tool and demonstrates its use using well characterised datasets. The next step is to apply it to new research questions, to start extracting information on how multiple genomic elements cooperate physically onto the DNA to grant the basic cellular functions. This understanding is essential to learn the way a cell organises its genetic programs and understand what goes wrong in diseases like cancer, which can show the researchers new ways to cure it.



Back