The Sheffield Lab uses computation to ask and answer biological questions. Our biological interest is to understand gene regulation: How does DNA encode regulatory networks that enable cellular differentiation? Gene regulatory systems are finely tuned, and when they break down, it can lead to diseases like cancer. To better understand normal and diseased gene regulation, we collect high-throughput genome-scale data in single cells and cell populations, and then harness the power of supercomputing, machine learning, and software engineering to answer questions about biological systems. This research is inherently interdisciplinary, approaching questions in biology and medicine with tools from computer science and statistics.

Biology

  • Gene regulation, chromatin, and epigenetics
  • Cancer epigenomics
  • Cell state and fate in development
  • Single-cell heterogeneity

Tools

  • R/Bioconductor tools for large-scale bioinformatic analysis
  • Integrating large genome-scale datasets using high performance computing
  • Applied machine learning
  • Scientific computing, reproducibility, open data, and data sharing in genomics

Publications

Interest areas

thumbnail

Gene regulation and chromatin structure

The group studies how DNA encodes regulatory networks that enable cellular differentiation, and how these systems break down in disease. We ask fundamental questions about gene regulation, such as how regulatory DNA interacts to drive cellular programs, or how cells develop and respond to stimuli through chromatin remodeling at the single-cell level.

Selected relevant publications:
  • Smith et al. (2021). PEPPRO: quality control and processing of nascent RNA profiling data
    Genome Biology. DOI: 10.1186/s13059-021-02349-4
  • Smith et al. (2020). PEPATAC: An optimized ATAC-seq pipeline with serial alignments
    bioRxiv. DOI: 10.1101/2020.10.21.347054
  • Lawson et al. (2020). COCOA: coordinate covariation analysis of epigenetic heterogeneity
    Genome Biology. DOI: 10.1186/s13059-020-02139-4
  • Smith and Sheffield (2020). Analytical Approaches for ATAC-seq Data Analysis
    Current Protocols in Human Genetics. DOI: 10.1002/cphg.101
  • Lawson et al. (2018). MIRA: An R package for DNA methylation-based inference of regulatory activity
    Bioinformatics. DOI: 10.1093/bioinformatics/bty083
  • Wang et al. (2018). BART: a transcription factor prediction tool with query gene sets or epigenomic profiles
    Bioinformatics. DOI: 10.1093/bioinformatics/bty194
  • Nagraj et al. (2018). LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis
    Nucleic Acids Research. DOI: 10.1093/nar/gky464
  • Sheffield et al. (2017). DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma
    Nature Medicine. DOI: 10.1038/nm.4273
  • Sheffield and Bock (2016). LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
    Bioinformatics. DOI: 10.1093/bioinformatics/btv612
  • Klughammer et al. (2015). Differential DNA Methylation Analysis without a Reference Genome
    Cell Reports. DOI: 10.1016/j.celrep.2015.11.024
  • Schmidl et al. (2015). ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors
    Nat. Methods. DOI: 10.1038/nmeth.3542
  • Tomazou et al. (2015). Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1
    Cell Reports. DOI: 10.1016/j.celrep.2015.01.042
  • Sheffield et al. (2013). Patterns of regulatory activity across diverse human cell-types predict tissue identity, transcription factor binding, and long-range interactions
    Genome Res. DOI: 10.1101/gr.152140.112
  • Tewari et al. (2012). Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity
    Genome Biol. DOI: 10.1186/gb-2012-13-10-r88
  • ENCODE Consortium (2012). An integrated encyclopedia of DNA elements in the human genome
    Nature. DOI: 10.1038/nature11247
  • Natarajan et al. (2012). Predicting cell-type-specific gene expression from regions of open chromatin
    Genome Res. DOI: 10.1101/gr.135129.111
  • Thurman et al. (2012). The accessible chromatin landscape of the human genome
    Nature. DOI: 10.1038/nature11232
  • Shibata et al. (2012). Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection
    PLos Genet. DOI: 10.1371/journal.pgen.1002789
  • Sheffield and Furey (2012). Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays
    Genes. DOI: 10.3390/genes3040651
  • Song et al. (2011). Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity
    Genome Res. DOI: 10.1101/gr.121541.111
thumbnail

Computational cancer epigenomics

Driven by biological interests in epigenomics and gene regulation, we analyze DNA methylation and chromatin accessibility and how these signals characterize cancers. Cancer is caused by a regulatory process run amok, and we study these regulatory programs in their normal and diseased state.

Selected relevant publications:
  • Lawson et al. (2020). COCOA: coordinate covariation analysis of epigenetic heterogeneity
    Genome Biology. DOI: 10.1186/s13059-020-02139-4
  • Corces et al. (2018). The chromatin accessibility landscape of primary human cancers
    Science. DOI: 10.1126/science.aav1898
  • Klughammer et al. (2018). The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space
    Nature Medicine. DOI: 10.1038/s41591-018-0156-x
  • Sheffield et al. (2017). DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma
    Nature Medicine. DOI: 10.1038/nm.4273
  • Kovar et al. (2016). The second European interdisciplinary Ewing sarcoma research summit – A joint effort to deconstructing the multiple layers of a complex disease
    Oncotarget. DOI: 10.18632/oncotarget.6937
  • Tomazou et al. (2015). Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1
    Cell Reports. DOI: 10.1016/j.celrep.2015.01.042
thumbnail

Single-cell sequencing analysis

Using new microfluidics and sequencing technology technology, we are interested in asking fundamental questions about how cells differentiate and respond to their environments at the single cell level.

Selected relevant publications:
  • Litzenburger et al. (2017). Single-cell epigenomic variability reveals functional cancer heterogeneity
    Genome Biol. DOI: 10.1186/s13059-016-1133-7
  • Bock et al. (2016). Multi-Omics of Single Cells: Strategies and Applications
    Trends Biotechnol. DOI: 10.1016/j.tibtech.2016.04.004
  • Schmidl et al. (2015). ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors
    Nat. Methods. DOI: 10.1038/nmeth.3542
  • Farlik et al. (2015). Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics
    Cell Reports. DOI: 10.1016/j.celrep.2015.02.001
thumbnail

Scientific computing and large-scale biomedical data management

We are interested in research infrastructure to enable broader scientific computing in genomics and beyond, particularly the interoperability of data and analysis. The group is developing core infrastructure to solve general problems in scientific computing. As genomic and multi-omic data have increased in size, we develop novel models of genomic data and to build state-of-the-art APIs and systems that help biologists get the most out of data.

Selected relevant publications:
  • Sheffield et al. (2021). Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects
    bioRxiv. DOI: 10.1101/2020.10.08.331322
  • Feng and Sheffield (2020). IGD: high-performance search for large-scale genomic interval datasets
    Bioinformatics. DOI: 10.1093/bioinformatics/btaa1062
  • Stolarczyk et al. (2020). Refgenie: a reference genome resource manager
    GigaScience. DOI: 10.1093/gigascience/giz149
  • Sheffield (2019). Bulker: a multi-container environment manager
    OSF Preprints. DOI: 10.31219/osf.io/natsj
  • Feng et al. (2019). Augmented Interval List: a novel data structure for efficient genomic interval search
    Bioinformatics. DOI: 10.1093/bioinformatics/btz407
  • Lawson et al. (2018). MIRA: An R package for DNA methylation-based inference of regulatory activity
    Bioinformatics. DOI: 10.1093/bioinformatics/bty083
  • Sheffield et al. (2018). simpleCache: R caching for reproducible, distributed, large-scale projects
    The Journal of Open Source Software. DOI: 10.21105/joss.00463
  • Sheffield and Bock (2016). LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
    Bioinformatics. DOI: 10.1093/bioinformatics/btv612

Literature threads

Here are some lists of papers on some relevant topics that I am interested in: