Our research focuses on processing, annotation, interpretation of cutting-edge omics datasets to understand human diseases such as cancers. Our approaches include computational/statistical modeling, machine learning, large data integration, and close wet-lab collaboration.


Integrative Omics

Data accumulation in multi-omics enables us to better interpret gene/protein functions through integrative approaches. Our group develops computational methods to address data integration with: gene expression (RNA-seq & scRNA-seq), chromatin status/accessibility (ChIP-seq, ATAC-seq, scATAC-seq), 3D chromatin looping (HiC & HiChIP), genome editing (CRISPR screens) and protein expression (spatial proteomics).

Project Highlights:

  • Understanding enhancer function with multi-omics data integration. Key questions to interpret enhancer function include who and how enhancers organize gene regulation. We integrate sequencing assays measuring enhancer acitivities, enhancer-protein binding, gene expression and enhaner-gene interactions to provide quantitative answers to the questions. With well-designed integrative approaches, we uncovered novel insights to understand super enhancer internal organization and their roles in defining cancer identities. By integrating public datasets with in-house validations, we identified critical enhancer regulators in cancers.

  • Harmonizing genomic sequencing variabilities across wide biological conditions. One barrier to efficiently integrate genomic sequencing datasets is the data heterogeneities within one data modality and across multi-modalities. The heterogeneities raise from diverse biological and technical parameters in different studies. For example, ChIP-seq datasets are tolerant to high measuring variabilities due to PCR induced GC content biases and other intrisic bias factors. We develop statistical models to deconvolute such data heterogeneity and improve the interpretation of epigenomic sequencing signals.


Cancer Biomarkers

Cancers are one of the main biological settings where we apply our computational methodologies . Our recent work focuses on understanding oncogenesis mechanisms based on close collaboration with investigators from diverse background. We study oncogenesis mechanisms triggered by oncogenic viruses (e.g. Epstein–Barr virus and human papillomavirus etc.) and clonal hematopoiesis.

Project Highlights:

  • Oncogenesis driven by virus-triggered 3D chromatin looping. One of our key hypotheses is that viruses alter the host genome 3D looping during infection in diferent cancers. This provides us an unique angle to evaluate novel gene biomarkers in virus-triggered cancers. We also found that viruses re-organize critical super enhancers in re-wiring tumor genome expression.

  • Clonal hematopoiesis biomarkers across solid tumors. Clonal hematopoiesis was found highly associated to reduced survival in cancer patients. The molecular mechanisms however are unclear. We work closely with the ORIEN network to decode relationships between cancers and clonal hematopoiesis, to understand the common and unique roles of clonal hematopoiesis.


Open Positions

We have a few opening positions for Postdoc, Graduate Students and Interns. Please send us emails for inquiries. Let’s solve data science challenges and conquer human diseases together.