<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Computational methods for region-based analysis of epigenome signals Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- <!-- .slide: data-background="/images/presentations/bg.svg.png" data-transition-speed="slow" --> ### Outline <style> .previewblock { float: left; width: 20px; height: 45px; margin: 0; border: none; white-space: nowrap; box-sizing: border-box; } .questionblock { float: left; width: 100%; margin: 5px 0; border: 1px solid rgba(255, 255, 255, .2); } </style> <div class="previewblock" style="width:20%">The (epi)genome revolution</div> <div class="previewblock" style="width:40%">Project organization</div> <div class="previewblock" style="width:40%">Epigenome tools</div> <div class="previewblock" style="width:20%">|</div> <div class="previewblock" style="width:40%">|</div> <div class="previewblock" style="width:40%">|</div> <br clear="all"> <div class="previewblock" style="width:20%; background:#883388">20%</div> <div class="previewblock" style="width:40%; background:#338833">40%</div> <div class="previewblock" style="width:40%; background:#338888">40%</div> <div class="previewblock" style="width:20%"></div> <div class="previewblock" style="width:40%"></div> <div class="previewblock" style="width:40%"></div> <br clear="all"> <div class="previewblock" style="width:20%"></div> <div class="previewblock" style="width:40%"></div> <div class="previewblock" style="width:40%"></div> <div class="questionblock" style="background:#222; color:#eee; font-size: 0.6em; margin-top: 35px">◁ Questions ▷</div> --- <style> .reveal section img { margin:0px; } </style> ## Outline **The (epi)genome revolution** (20%) **Project organization** (40%) **Epigenome tools** (40%) --- ## The genome revolution --- <img src="https://www.genome.gov/images/content/costpermb2015_4.jpg" width="650"> A revolution driven by DNA sequencing technology --- Sequencing technology can also measure epigenome signals <div class="well"> Epigenomics is the study of the chemical modification and physical conformation of cellular DNA and bound proteins <img src="/slides/computational-methods-for-region-based-analysis/genome-epigenome.svg" width="650"> <!-- .element: class="fragment" --> </div> --- <div class="col2"> <img src="/slides/computational-methods-for-region-based-analysis/rosa2013_chromatin.png" width="550"> Rosa et al. 2013 </div> <div class="col2"> - Histone modification: ChIP-seq <!-- .element: class="fragment" --> - DNA methylation: Bisulfite-seq <!-- .element: class="fragment" --> - Chromatin accessibility: ATAC-seq <!-- .element: class="fragment" --> </div> --- <img src="/slides/computational-methods-for-region-based-analysis/IGV_DNase.png" width="550"> --- ### The Sequence Read Archive is growing <img src="/slides/computational-methods-for-region-based-analysis/sra_growth.png" width="650"> https://www.ncbi.nlm.nih.gov/sra/docs/sragrowth/ --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## Data is becoming more... <div style="display: flex; justify-content: space-between;"> <div style="width: 30%; text-align: center;"> abundant <img src="/_modules/pep/sequencing_costs_2015.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> available <img src="/_modules/pep/cos_black.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> powerful <img src="/_modules/pep/gwasw.jpg" width="100%"> </div> </div> --- ## So why are the world's problems not solved? <div class="fragment"> <img src="/_modules/pep/data_not_info.svg" width="500"> </div> --- <img src="/_modules/pep/data_analysis_info.svg" width="700"> --- ## First step in bioinformatics analysis <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: center;"> <img src="/_modules/pep/icons/pipeline.svg" width="200"> pipeline </div> <div class="fragment" style="width: 45%; text-align: center;"> <img src="/_modules/pep/search_trends.svg" width="100%"> Papers with "bioinformatics pipeline" in title </div> </div> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info_highlight.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_interface.svg" width="700"> --- ## Data munging <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input-new_white.svg" width="100%"> </div> <div class="fragment" style="width: 45%;"> <img src="/_modules/pep/data_input-rev_white.svg" width="100%"> </div> </div> --- ## Then, downstream tools need a different organization <img src="/_modules/pep/data_input_steps.svg" width="525"> --- ## What if? <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input2.svg" width="100%"> </div> <div style="width: 45%;"> <img src="/_modules/pep/data_input2_rev.svg" width="100%"> </div> </div> --- ## The solution <img src="/_modules/pep/data_input_plug.svg" width="625"> --- ## The solution <img src="/_modules/pep/data_input_steps_plug.svg" width="475"> --- ## PEP: A standard format for project metadata <img src="/_modules/pep/pep_center_white.svg" width="700"> --- ## PEP Ecosystem <img src="/_modules/pep/pep_ecosystem.svg" width="750"> --- ## <img src="/_modules/pep/pep_logo.svg" width="70" style="vertical-align: middle;"> PEP format <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> project_config.yaml</span> ```yaml metadata: sample_annotation: /path/to/samples.csv output_dir: /path/to/output/folder ``` --- <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> samples.csv</span> ```csv sample_name, protocol, organism, data_source frog_0h, RNA-seq, frog, /path/to/frog0.gz frog_1h, RNA-seq, frog, /path/to/frog1.gz frog_2h, RNA-seq, frog, /path/to/frog2.gz frog_3h, RNA-seq, frog, /path/to/frog3.gz ``` --- #### Microwave syndrome <div> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="180"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="180"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="180"> </div> <div class="well fragment">In user interface design, prioritizing easy access to integrated functions over their individual components. </div> --- <section style="font-size: 0;"> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="120"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="120"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="120"><br> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1_console.jpg" height="560"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB_console.jpg" height="560" class="fragment"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ_console.jpg" height="560" class="fragment"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk2.svg" height="650"> </section> --- ### The UNIX philosophy <div class="col2"> <img src="/_modules/microwave-syndrome/unix_book.jpg" height="450"> </div> <div class="col2"> <div class="well"><span style="color:#ffb; font-weight:bold">[T]he power of a system comes more from the relationships among programs than from the programs themselves.</span><br><br> <span style="font-size: 0.8em">Many UNIX programs do quite trivial tasks in isolation, but, combined with other programs, become general and useful tools.</span><br/><br/> <span class="small">- Kernighan and Pike, The UNIX Programming Environment (1983, p. viii)</span> </div> </div> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum2.svg" height="650"> </section> --- <section transition="fade-in" id="links1"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links1.svg" height="650"> </section> --- <section transition="fade-in" id="links2"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links2.svg" height="650"> </section> --- <section transition="fade-in" id="links3"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links3.svg" height="650"> </section> --- <section transition="fade-in" id="links5"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links5.svg" height="650"> </section> --- <section transition="fade-in" id="links6"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links6.svg" height="650"> </section> --- <div class="col2"> ### Problem <img src="/_modules/microwave-syndrome/pep/data_input-new_white.svg" width="375"> </div> <div class="col2 fragment"> ### Solution <img src="/_modules/microwave-syndrome/pep/data_input_plug.svg" width="375"> </div> --- <div class="col2"> ### Problem <img src="/slides/computational-methods-for-region-based-analysis/data_input-new_white.svg" width="375"> </div> <div class="col2"> ### Solution <!-- .element: class="fragment" --> <img src="/slides/computational-methods-for-region-based-analysis/data_input_plug.svg" width="375"> <!-- .element: class="fragment" --> </div> --- ## PEP: Portable Encapsulated Projects <img src="/_modules/pep-format/pep_center_white.svg" width="700"> --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Start with a simple CSV with tabular data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Add a YAML for project-level data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">project_config.yaml </div> ```yaml sample_table: /path/to/samples.csv output_dir: /path/to/output/folder other_variable: value ``` --- ### Add programmatic sample and project modifiers. <div style="text-align: left"> <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/subproject_white.svg" width="50" class="bullet">Subprojects</span><br> </div> --- <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <div class="well">Automatically build new sample attributes from existing attributes.</div> Without derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | /path/to/frog0.gz | | frog_1h | 1 | RNA-seq | frog | /path/to/frog1.gz | | frog_2h | 2 | RNA-seq | frog | /path/to/frog2.gz | | frog_3h | 3 | RNA-seq | frog | /path/to/frog3.gz | Using derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | --- | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | Project config file: ```yaml sample_modifiers: derive: attributes: [input_file] sources: my_samples: "/path/to/my/samples/{organism}_{t}h.gz" your_samples: "/path/to/your/samples/{organism}_{t}h.gz" ``` {variable} identifies sample annotation columns <div class="well">Benefit: Enables distributed files, portability</div> --- <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <div class="well">Add new sample attributes conditioned on values of existing attributes</div> <div class="col2"> Before:<br> | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | </div> <div class="col2"> After:<br> | sample_name | protocol | organism | genome | |-------------|:--------:|----------|--------| | human_1 | RNA-seq | human | hg38 | | human_2 | RNA-seq | human | hg38 | | human_3 | RNA-seq | human | hg38 | | mouse_1 | RNA-seq | mouse | mm10 | </div> --- | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | Project config file: ```yaml sample_modifiers: imply: - if: organism: human then: genome: hg38 - if: organism: mouse then: genome: mm10 ``` <div class="well">Benefit: Divides project from sample metadata</div> --- <span class="bullet"><img src="/_modules/pep-format/subproject.svg" width="50" class="bullet">Subprojects</span><br> <div class="well">Define activatable project attributes.</div> ```yaml project_modifiers: amendments: diverse: metadata: sample_annotation: psa_rrbs_diverse.csv cancer: metadata: sample_annotation: psa_rrbs_intracancer.csv ``` <div class="well">Benefit: Defines multiple similar projects in a single file</div> --- <img src="/slides/computational-methods-for-region-based-analysis/geofetch_pep_looper_pepatac_peppy.svg" height="650"> --- <img src="/_modules/lola-intro/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> ### Locus Overlap Analysis <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Sheffield and Bock (2016). <i>Bioinformatics</i>.</span><br/> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Nagraj, Magee, and Sheffield (2018). <i>Nucleic Acids Research</i>.</span> --- <img src="/shorts/lola/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Sheffield and Bock (2016). *Bioinformatics*.</span><br/> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Nagraj, Magee, and Sheffield (2018). *Nucleic Acids Research*.</span> ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- ## LOLAweb <img src="/shorts/lola/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> A shiny app and server for interactive LOLA analysis. Public server: [http://lolaweb.databio.org](http://lolaweb.databio.org) GitHub: [https://github.com/databio/LOLAweb](https://github.com/databio/LOLAweb) --- ### DEMO <video controls width="800"> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- # LOLA refresher  --- # LOLA requires comparing sets of intervals  Can we improve the efficiency to enable faster, larger-scale analysis? --- # If subject list has no containment, identifying overlaps is fast  <!-- .element: class="fragment" --> binary search on start intervals, followed by backward steps: <!-- .element: class="fragment" -->  --- # The problem arises with contained interval overlaps   --- # How can we improve efficiency without guaranteeing no containment? --- # Many approaches to solve the 'containment' issue: - Nested Containment Lists (GRanges) (Alekseyenko and Lee, 2007; Aboyoun, P, Pages, H, and Lawrence, 2012) - R-trees (bedtools) (Kent et al., 2002; Quinlan and Hall, 2010), Augmented interval trees (Cormen et al., 2001) These methods try to structure the data to provide non-containment guarantees --- # Methods provide non-containment guarantees <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> ### R-trees Annotates tree nodes with a *minimum bounding rectangle* of elements. A query that does not intersect the bounding rectangle will not intersect any child element. </div> <div style="width: 45%;"> ### Nested Containment Lists  </div> </div> --- # Augmented Interval List 1. Augment the list with the running maximum *end* value. *solves the problem for lowly-contained lists* 2. Decompose the list to minimize containment. *extends the solution to highly-contained lists* --- # Augment with the running maximum end value, `maxE` Provides a *local guarantee* of no containment.  --- # AIList works on contained lists   --- # But long containment runs are problematic   --- # Decompose long runs with constant `maxE`  --- # Performance - How does the `maxE` minimum run length affect performance? - How does it compare to existing approaches? - How does it scale with increasing size of subject? --- # Datasets  --- # How does the `maxE` minimum run length affect performance?  --- # How does it compare to existing approaches?  --- # How does it scale with increasing size of subject?  --- # Conclusion and Directions AIList is best-in-class for one-to-one interval comparisons --- ## Acknowledgments <div style="display: flex; justify-content: space-between;"> <div style="width: 30%; font-size: 0.6em;"> <img src="/shorts/ailist/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/shorts/ailist/University_of_Virginia_logo_white.svg" height="40"> **Sheffield lab** - John Lawson - Vince Reuter - Ognen Duzlevski - Jason Smith - **Jianglin Feng** - Michal Stolarczyk - Aaron Gu - Anant Tewari </div> <div style="width: 30%; font-size: 0.6em;"> **Funding:** <img src="/shorts/ailist/University_of_Virginia_logo_white.svg" height="40"> <img src="/shorts/ailist/NIH_logo_black.svg" height="80"> <img src="/shorts/ailist/hfsp_logo.svg" height="60"> </div> </div> --- ## Conclusion <div class="col2"> <div style="padding:7px"> Pepkit provides a start-to-finish toolkit for processing epigenome data. <img src="/slides/computational-methods-for-region-based-analysis/pep_logo.svg" width="125" style="padding-top:35px; padding-bottom:35px"> [pepkit.github.io](http://pepkit.github.io) </div> </div> <div class="col2"> <div style="padding:7px"> <!-- .element: class="fragment" --> LOLA is one of our tools to ask biological questions of genomic regions <img src="/slides/computational-methods-for-region-based-analysis/LOLA-logo-white.svg" width="255" style="padding-top:45px; padding-bottom:45px"> [code.databio.org/LOLA](http://code.databio.org/LOLA) </div> </div> --- ## Acknowledgments <div class="col3" style="font-size:.6em"> <img src="/slides/computational-methods-for-region-based-analysis/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/slides/computational-methods-for-region-based-analysis/University_of_Virginia_logo_white.svg" height="40"> **Sheffield lab** - John Lawson - Vince Reuter - Jason Smith - Jianglin Feng - Michal Stolarczyk - Aaron Gu - Ognen Duzlevski </div> <div class="col3" style="font-size:.6em"> <img src="/slides/computational-methods-for-region-based-analysis/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/slides/computational-methods-for-region-based-analysis/University_of_Virginia_logo_white.svg" height="40"> **SOM research computing** - Pete Nagraj - Neal Magee </div> <div class="col3" style="font-size:.6em"> **Funding:** <img src="/slides/computational-methods-for-region-based-analysis/University_of_Virginia_logo_white.svg" height="40"> <img src="/slides/computational-methods-for-region-based-analysis/NIH_logo_black.svg" height="80"> <img src="/slides/computational-methods-for-region-based-analysis/hfsp_logo.svg" height="60"> </div> --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>