Research

Computational biology for gene regulation and epigenomics

The Sheffield Lab uses computation to ask and answer biological questions. Our biological interest is to understand gene regulation: How does DNA encode regulatory networks that enable cellular differentiation? Gene regulatory systems are finely tuned, and when they break down, it can lead to diseases like cancer. To better understand normal and diseased gene regulation, we collect high-throughput genome-scale data in single cells and cell populations, and then harness the power of supercomputing, machine learning, and software engineering to answer questions about biological systems. This research is inherently interdisciplinary, approaching questions in biology and medicine with tools from computer science and statistics.

Biology

  • Gene regulation, chromatin, and epigenetics
  • Cancer epigenomics
  • Cell state and fate in development
  • Single-cell heterogeneity

Tools

  • R/Bioconductor tools for large-scale bioinformatic analysis
  • Integrating large genome-scale datasets using high performance computing
  • Applied machine learning
  • Scientific computing, reproducibility, open data, and data sharing in genomics

Interest Areas

Gene regulation and chromatin structure

Gene regulation and chromatin structure

The group studies how DNA encodes regulatory networks that enable cellular differentiation, and how these systems break down in disease. We ask fundamental questions about gene regulation, such as how regulatory DNA interacts to drive cellular programs, or how cells develop and respond to stimuli through chromatin remodeling at the single-cell level.

Keywords: genomics, epigenetics, pediatric cancer (Ewing sarcoma), gene regulatory networks, primate evolution, phylogenetics, chromatin, population genetics, systems biology
Computational cancer epigenomics

Computational cancer epigenomics

Driven by biological interests in epigenomics and gene regulation, we analyze DNA methylation and chromatin accessibility and how these signals characterize cancers. Cancer is caused by a regulatory process run amok, and we study these regulatory programs in their normal and diseased state.

Keywords: high-throughput sequencing, ENCODE, epigenomic datasets
Single-cell sequencing analysis

Single-cell sequencing analysis

Using microfluidics and sequencing technology, we investigate how cells differentiate and respond to their environments at single-cell resolution. We develop computational methods to analyze single-cell RNA-seq, ATAC-seq, and multi-omic data to understand cellular heterogeneity, identify rare cell populations, and map developmental trajectories in normal development and disease.

Keywords: high-throughput sequencing, ENCODE, epigenomic datasets
Scientific computing and large-scale biomedical data management

Scientific computing and large-scale biomedical data management

We develop research infrastructure for scientific computing in genomics, focusing on data interoperability and analysis. The group builds novel models of genomic data and state-of-the-art APIs and systems that help biologists manage and analyze large-scale genomic and multi-omic datasets efficiently.

Keywords: Big data, high-performance clusters, machine learning, database design, Linux, Docker, software engineering, web development, cloud computing, statistical analysis, computational models, pipeline management, bioinformatics tools, algorithm development

Publications