Campus ICO-Germans Trias i Pujol
Josep Carreras Leukaemia Research Institute Directions
Can Ruti CampusCtra de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona
In the interface between genomics, digital pathology and artificial intelligence the Cellular Systems Genomics group aims to define the spatiotemporal organization of complex tissues in health and disease, by the identification of key regulatory mechanisms driving heterogeneity in cellular identity and function, particularly in the context of inflammation, inflammatory disorders and autoimmune diseases.
To address these questions, we will adopt a single-cell perspective, enabling the fine-grained and spatially resolved molecular profiling of tissues. We will develop new machine learning approaches and open source tools in order to unlock molecular mechanisms hidden in large-scale datasets.
In a short-term perspective, these methods will help understand disease mechanisms, allowing the stratification of patients based on their molecular and cellular characteristics, ultimately providing new therapeutic targets for their treatments.
Single cell sequencing allows to profile thousands of individual cells per experiment, enabling the unbiased analysis of tissues, organs and even entire organisms at an unprecedented resolution.
These data represent a powerful tool for cell biology, with relevant clinical applications including diagnosis and treatment of diseases. Despite the many advantages of this approach, data are noisy and sparse, making the computational analysis challenging. To address these challenges, we apply machine learning and other statistical methods to develop new analytical frameworks and open source tools to analyze, interpret and integrate data coming from single-cell and spatial genomics experiments.
As part of the Human Cell Atlas (HCA) consortium, which aims to create a catalogue of all cell types in our body, we have extensive experience on the systematic comparison of protocols in single cell RNA sequencing (scRNA-seq). In conjunction with the new Single Cell Unit of the Institute, which is equipped with the Chromium controller to perform the single-cell analysis, we will provide support to design new experiments and generate high-quality data and computational analysis.
Beyond transcriptomic profiling with scRNA-seq, different cellular modalities can now be measured, including single-cell epigenetics (scATAC-seq), spatial transcriptomics as well as the joint profiling of chromatin accessibility and transcription on the same cell.
However, the integration of multimodal data poses new analytical challenges and new benchmarking are needed to assess reproducibility and integrity of these methods. We are working on new mathematical frameworks for the integration of multimodal data, enabling the comprehensive characterization of cells in their identity and function.
In the European Pancreas Atlas consortium (ESPACE, https://www.espace-h2020.eu), we are working to build a first version of the Human Cell Atlas of the Pancreas, by profiling the transcriptome and epigenome of cells from distinct anatomical regions of the adult pancreas. The integration of distinct single-cell and spatial data types will allow the comprehensive transcriptional and epigenetic landscape of pancreas cell types within their spatial context.
Our experience in single-cell data analysis on healthy and diseased tissues allowed us to build a deep understanding of cell-type structure and plasticity in different research contexts. To accelerate biological discovery and advance science, our group will share user-friendly computational solutions, by promoting open science, diversity and supporting an inclusive and collaborative environment.
Juan de la Cierva Senior Postdoctoral Fellowship (2020)
Show all publications
SPOTlight: Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes.Nucleic Acids Research . 2021 Feb 5 , .
The integration of orthogonal data modalities greatly supports the interpretation of transcriptomic landscapes in complex tissues. In particular, spatially resolved gene expression profiles are key to understand tissue organization and function. However, spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots). Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets. We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions. SPOTlight deconvolution of the mouse brain correctly mapped subtle neuronal cell states of the cortical layers and the defined architecture of the hippocampus. In human pancreatic cancer, we successfully segmented patient sections into healthy and cancerous areas, and further fine-mapped normal and neoplastic cell states. Trained on an external pancreatic tumor immune reference, we charted the localization of clinical-relevant and tumor-specific immune cell states. Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.More information
Zonation of Ribosomal DNA Transcription Defines a Stem Cell Hierarchy in Colorectal Cancer.Cell Stem Cell. 2020 Jun 4;26(6):845-861.e12 , .
Colorectal cancers (CRCs) are composed of an amalgam of cells with distinct genotypes and phenotypes. Here, we reveal a previously unappreciated heterogeneity in the biosynthetic capacities of CRC cells. We discover that the majority of ribosomal DNA transcription and protein synthesis in CRCs occurs in a limited subset of tumor cells that localize in defined niches. The rest of the tumor cells undergo an irreversible loss of their biosynthetic capacities as a consequence of differentiation. Cancer cells within the biosynthetic domains are characterized by elevated levels of the RNA polymerase I subunit A (POLR1A). Genetic ablation of POLR1A-high cell population imposes an irreversible growth arrest on CRCs. We show that elevated biosynthesis defines stemness in both LGR5+ and LGR5- tumor cells. Therefore, a common architecture in CRCs is a simple cell hierarchy based on the differential capacity to transcribe ribosomal DNA and synthesize proteins.More information
Benchmarking single-cell RNA- sequencing protocols for cell atlas projects.Nat Biotechnol . 2020 Jun;38(6):747-755 , .
Single-cell RNA sequencing (scRNA-seq) is the leading technique for characterizing the transcriptomes of individual cells in a sample. The latest protocols are scalable to thousands of cells and are being used to compile cell atlases of tissues, organs and organisms. However, the protocols differ substantially with respect to their RNA capture efficiency, bias, scale and costs, and their relative advantages for different applications are unclear. In the present study, we generated benchmark datasets to systematically evaluate protocols in terms of their power to comprehensively describe cell types and states. We performed a multicenter study comparing 13 commonly used scRNA-seq and single-nucleus RNA-seq protocols applied to a heterogeneous reference sample resource. Comparative analysis revealed marked differences in protocol performance. The protocols differed in library complexity and their ability to detect cell-type markers, impacting their predictive value and suitability for integration into reference cell atlases. These results provide guidance both for individual researchers and for consortium projects such as the Human Cell Atlas.More information
Robustness and applicability of functional genomics tools on scRNA-seq data.Genome Biol. 2020 Feb 12;21(1):36 , .
Background: Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way.
Results: To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community.
Conclusions: Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.More information
BigSCale: An Analytical Framework for Big-Scale Single-Cell DataGenome Res. 2018 Jun;28(6):878-890 , .
Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.More information
Single-cell transcriptome conservation in cryopreserved cells and tissuesGenome Biol . 2017 Mar 1;18(1):45 , .
A variety of single-cell RNA preparation procedures have been described. So far, protocols require fresh material, which hinders complex study designs. We describe a sample preservation method that maintains transcripts in viable single cells, allowing one to disconnect time and place of sampling from subsequent processing steps. We sequence single-cell transcriptomes from >1000 fresh and cryopreserved cells using 3'-end and full-length RNA preparation methods. Our results confirm that the conservation process did not alter transcriptional profiles. This substantially broadens the scope of applications in single-cell transcriptomics and could lead to a paradigm shift in future study designs.More information
Dual MET and ERBB inhibition overcomes intratumor plasticity in osimertinib-resistant-advanced non-small-cell lung cancer (NSCLC).Ann Oncol . 2017 Oct 1;28(10):2451-2457 , .
Background: Third-generation epidermal growth factor receptor tyrosine kinase inhibitors (EGFR-TKIs) such as osimertinib are the last line of targeted treatment of metastatic non-small-cell lung cancer (NSCLC) EGFR-mutant harboring T790M. Different mechanisms of acquired resistance to third-generation EGFR-TKIs have been proposed. It is therefore crucial to identify new and effective strategies to overcome successive acquired mechanisms of resistance.
Methods: For Amplicon-seq analysis, samples from the index patient (primary and metastasis lesions at different timepoints) as well as the patient-derived orthotopic xenograft tumors corresponding to the different treatment arms were used. All samples were formalin-fixed paraffin-embedded, selected and evaluated by a pathologist. For droplet digital PCR, 20 patients diagnosed with NSCLC at baseline or progression to different lines of TKI therapies were selected. Formalin-fixed paraffin-embedded blocks corresponding to either primary tumor or metastasis specimens were used for analysis. For single-cell analysis, orthotopically grown metastases were dissected from the brain of an athymic nu/nu mouse and cryopreserved at -80°C.
Results: In a brain metastasis lesion from a NSCLC patient presenting an EGFR T790M mutation, we detected MET gene amplification after prolonged treatment with osimertinib. Importantly, the combination of capmatinib (c-MET inhibitor) and afatinib (ErbB-1/2/4 inhibitor) completely suppressed tumor growth in mice orthotopically injected with cells derived from this brain metastasis. In those mice treated with capmatinib or afatinib as monotherapy, we observed the emergence of KRAS G12C clones. Single-cell gene expression analyses also revealed intratumor heterogeneity, indicating the presence of a KRAS-driven subclone. We also detected low-frequent KRAS G12C alleles in patients treated with various EGFR-TKIs.
Conclusion: Acquired resistance to subsequent EGFR-TKI treatment lines in EGFR-mutant lung cancer patients may induce genetic plasticity. We assess the biological insights of tumor heterogeneity in an osimertinib-resistant tumor with acquired MET-amplification and propose new treatment strategies in this situation.More information