<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Bioinformatics data management and epigenome analysis methods Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## Outline **Motivation and background** (20%) **Data management** (50%) **Epigenome analysis** (30%) --- # Most pipelines require individual metadata organization <div class="col2"> <img src="/slides/bioinformatics-data-management-epigenome-analysis/pep/data_input-new_white.svg" width="375"> </div> <div class="col2 fragment"> <img src="/slides/bioinformatics-data-management-epigenome-analysis/pep/data_input-rev_white.svg" width="375"> </div> --- # What if? <div class="col2"> <img src="/slides/bioinformatics-data-management-epigenome-analysis/pep/data_input2.svg" width="325"> </div> <div class="col2"> <img src="/slides/bioinformatics-data-management-epigenome-analysis/pep/data_input2_rev.svg" width="325"> </div> <div class="fragment"> Why is this hard to do? <br>Because of <i>microwave syndrome</i>.... </div> <aside class="notes"> Why is this hard to do? One reason is because of something I call microwave syndrome.... </aside> --- #### Microwave syndrome <div> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="180"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="180"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="180"> </div> <div class="well fragment">In user interface design, prioritizing easy access to integrated functions over their individual components. </div> --- <section style="font-size: 0;"> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="120"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="120"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="120"><br> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1_console.jpg" height="560"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB_console.jpg" height="560" class="fragment"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ_console.jpg" height="560" class="fragment"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk2.svg" height="650"> </section> --- ### The UNIX philosophy <div class="col2"> <img src="/_modules/microwave-syndrome/unix_book.jpg" height="450"> </div> <div class="col2"> <div class="well"><span style="color:#ffb; font-weight:bold">[T]he power of a system comes more from the relationships among programs than from the programs themselves.</span><br><br> <span style="font-size: 0.8em">Many UNIX programs do quite trivial tasks in isolation, but, combined with other programs, become general and useful tools.</span><br/><br/> <span class="small">- Kernighan and Pike, The UNIX Programming Environment (1983, p. viii)</span> </div> </div> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum2.svg" height="650"> </section> --- <section transition="fade-in" id="links1"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links1.svg" height="650"> </section> --- <section transition="fade-in" id="links2"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links2.svg" height="650"> </section> --- <section transition="fade-in" id="links3"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links3.svg" height="650"> </section> --- <section transition="fade-in" id="links5"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links5.svg" height="650"> </section> --- <section transition="fade-in" id="links6"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links6.svg" height="650"> </section> --- <div class="col2"> ### Problem <img src="/_modules/microwave-syndrome/pep/data_input-new_white.svg" width="375"> </div> <div class="col2 fragment"> ### Solution <img src="/_modules/microwave-syndrome/pep/data_input_plug.svg" width="375"> </div> --- <img src="/slides/bioinformatics-data-management-epigenome-analysis/pepatac/pep_looper_pepatac.svg" height="650"> --- ## PEP: Portable Encapsulated Projects <img src="/_modules/pep-format/pep_center_white.svg" width="700"> --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Start with a simple CSV with tabular data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Add a YAML for project-level data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">project_config.yaml </div> ```yaml sample_table: /path/to/samples.csv output_dir: /path/to/output/folder other_variable: value ``` --- ### Add programmatic sample and project modifiers. <div style="text-align: left"> <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/subproject_white.svg" width="50" class="bullet">Subprojects</span><br> </div> --- <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <div class="well">Automatically build new sample attributes from existing attributes.</div> Without derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | /path/to/frog0.gz | | frog_1h | 1 | RNA-seq | frog | /path/to/frog1.gz | | frog_2h | 2 | RNA-seq | frog | /path/to/frog2.gz | | frog_3h | 3 | RNA-seq | frog | /path/to/frog3.gz | Using derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | --- | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | Project config file: ```yaml sample_modifiers: derive: attributes: [input_file] sources: my_samples: "/path/to/my/samples/{organism}_{t}h.gz" your_samples: "/path/to/your/samples/{organism}_{t}h.gz" ``` {variable} identifies sample annotation columns <div class="well">Benefit: Enables distributed files, portability</div> --- <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <div class="well">Add new sample attributes conditioned on values of existing attributes</div> <div class="col2"> Before:<br> | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | </div> <div class="col2"> After:<br> | sample_name | protocol | organism | genome | |-------------|:--------:|----------|--------| | human_1 | RNA-seq | human | hg38 | | human_2 | RNA-seq | human | hg38 | | human_3 | RNA-seq | human | hg38 | | mouse_1 | RNA-seq | mouse | mm10 | </div> --- | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | Project config file: ```yaml sample_modifiers: imply: - if: organism: human then: genome: hg38 - if: organism: mouse then: genome: mm10 ``` <div class="well">Benefit: Divides project from sample metadata</div> --- <span class="bullet"><img src="/_modules/pep-format/subproject.svg" width="50" class="bullet">Subprojects</span><br> <div class="well">Define activatable project attributes.</div> ```yaml project_modifiers: amendments: diverse: metadata: sample_annotation: psa_rrbs_diverse.csv cancer: metadata: sample_annotation: psa_rrbs_intracancer.csv ``` <div class="well">Benefit: Defines multiple similar projects in a single file</div> --- <img src="/slides/bioinformatics-data-management-epigenome-analysis/pepatac/pep_looper_pepatac.svg" height="650"> --- ## <img src="/_modules/looper/logo_looper.svg" width="150" style="vertical-align: middle;"> Looper Deploys pipelines across samples by connecting samples to any command-line tool <div class="small"> <a href="https://looper.databio.org">https://looper.databio.org</a> </div> --- <img src="/_modules/looper/looper_role_white_v2.svg" width="100%"> --- ## pipeline_interface.yaml ```yaml protocol_mappings: RNA-seq: rna-seq pipelines: rna-seq: name: RNA-seq_pipeline path: path/to/rna-seq.py arguments: "--option1": sample_attribute "--option2": sample_attribute2 ``` - maps protocols to pipelines <!-- .element: class="fragment" --> - maps sample attributes (columns) to pipeline arguments <!-- .element: class="fragment" --> --- ## Looper features <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span><br> <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span><br> <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span><br> </div> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span><br> <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span><br> </div> </div> --- ## <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span> Run your entire project with one line: ```bash looper run project_config.yaml ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span> ```yaml protocol_mappings: RRBS: rrbs WGBS: wgbs EG: wgbs.py SMART-seq: rnaBitSeq -f; rnaTopHat -f ATAC-SEQ: atacseq DNase-seq: atacseq CHIP-SEQ: chipseq ``` Many-to-many mappings --- ## <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span> ```yaml pipeline_key: name: pipeline_name arguments: "--option" : value resources: default: file_size: "0" cores: "2" mem: "6000" time: "01:00:00" large_input: file_size: "2000" cores: "4" mem: "12000" time: "08:00:00" ``` Resources can vary by input file size --- ## <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span> ```yaml compute: slurm: submission_template: templates/slurm_template.sub submission_command: sbatch localhost: submission_template: templates/localhost_template.sub submission_command: sh ``` --- Adjust compute package on-the-fly: ```bash looper run project_config.yaml --compute localhost ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span> Looper only submits jobs for samples not already flagged as running, completed, or failed. ```bash looper check project_config.yaml ``` ```bash looper summarize project_config.yaml ``` --- <div> <img src="/_modules/pepatac/logo_pepatac.svg" width="175" style="padding-top:25px; padding-bottom:25px"> <br> A robust ATAC-seq pipeline <br> built on the PEP toolkit <div class="small"> <a href="http://code.databio.org/PEPATAC">http://code.databio.org/PEPATAC</a><br> </div> </div> --- <img src="/_modules/pepatac/pepatac_workflow_white.svg" width="600"> --- ### Comparison <img src="/_modules/pepatac/computation_comparison.svg" width="800"> --- ### Prealignments Nuclear-mitochondrial DNA (NuMts) confuse aligners --- <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_simultaneous_alignment.svg" width="800" class="fragment"> <img src="/_modules/pepatac/numts_alignment.svg" width="800" class="fragment"> --- <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_problems.svg" width="800"> <img src="/_modules/pepatac/numts_alignment.svg" width="800"> --- <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_problems.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_blacklist.svg" width="800"> --- <img src="/_modules/pepatac/numts_alignment_blacklist.svg" width="800"> <div> <li>Inaccurate alignment statistics</li> <li>Requires pre-defined NuMt locations</li> <li>Wastes compute power</li> </div> --- <img src="/_modules/pepatac/prealignments.svg" width="800"> --- <img src="/_modules/pepatac/prealignments2.svg" width="800"> --- ### Advantages of serial alignments - Accuracy (better rates plus no blacklist needed). - Speed. - Modular reference assemblies. --- <img src="/_modules/pepatac/prealignment_speed.svg" width="800"> --- ### Output <img src="/_modules/pepatac/pepatac_output_sample.svg" width="650"> --- <img src="/_modules/pepatac/pepatac_summary.png" width="800"><br> <a href="http://code.databio.org/PEPATAC/files/examples/gold/summary.html">http://code.databio.org/PEPATAC/files/examples/gold/summary.html</a> --- <iframe src="http://code.databio.org/PEPATAC/files/examples/gold/summary.html" width="100%" height="675"></iframe> --- <img src="/slides/bioinformatics-data-management-epigenome-analysis/pepatac/pep_looper_pepatac_peppy.svg" height="650"> --- # Questions --- ## Epigenome analysis methods <div class="col2"> <span class="bullet"><a href="#LOLA">LOLA</a></span> <br>Locus Overlap Analysis </div> <div class="col2"> <span class="bullet"><a href="#MIRA">MIRA</a></span> <br> Methylation-based Inference of Regulatory Activity </div> --- <img src="/_modules/lola-intro/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> ### Locus Overlap Analysis <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Sheffield and Bock (2016). <i>Bioinformatics</i>.</span><br/> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Nagraj, Magee, and Sheffield (2018). <i>Nucleic Acids Research</i>.</span> --- <img src="/shorts/lola/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Sheffield and Bock (2016). *Bioinformatics*.</span><br/> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Nagraj, Magee, and Sheffield (2018). *Nucleic Acids Research*.</span> ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- ## LOLAweb <img src="/shorts/lola/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> A shiny app and server for interactive LOLA analysis. Public server: [http://lolaweb.databio.org](http://lolaweb.databio.org) GitHub: [https://github.com/databio/LOLAweb](https://github.com/databio/LOLAweb) --- ### DEMO <video controls width="800"> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- <img src="/_modules/lola-web/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> A shiny app and server for interactive LOLA analysis.<br> Public server: <a href="http://lolaweb.databio.org">http://lolaweb.databio.org</a><br> GitHub: <A href="https://github.com/databio/LOLAweb">https://github.com/databio/LOLAweb</A> </div> --- ### DEMO <video controls> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- ## Methylation-based Inference of Regulatory Activity (MIRA) <div class="small"> <a href="http://code.databio.org/MIRA/">http://code.databio.org/MIRA/</a><br> </div> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Lawson et al. (2018). <i>Bioinformatics</i>.</span> --- ### DNA methylation <img src="/_modules/bis-seq-intro/dnameth_intro.svg" /> --- ### DNA methylation <img src="/_modules/bis-seq-intro/dnameth_intro2.svg" /> --- ### <span class="bullet"><img src="/_modules/bis-seq-intro/bolt.svg" width="50" class="bullet">Bisulfite-seq</span> <img src="/_modules/bis-seq-intro/dnameth_bisulfite.svg" /> --- # <span class="bullet"><img src="/_modules/region-pooling/merge.svg" width="50" class="bullet">Region pooling</span> --- <!-- .slide: data-transition="fade-in" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## MIRA concept <img src="/_modules/mira/mira.svg" /> --- ## MIRA workflow <img src="/_modules/mira/mira2.svg" /> --- ## MIRA analysis <img src="/_modules/mira/mira3.svg" /> --- ## MIRA results: Differential activity <img src="/_modules/mira/ews_pat/MIRA_result_1.svg" width="700"/> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Sheffield et al. (2017). <i>Nature Medicine</i>.</span> --- ## MIRA results: Activity scores <img src="/_modules/mira/ews_pat/MIRA_result_2.svg" /> --- ## MIRA results: Enrichment analysis <img src="/_modules/mira/ews_pat/MIRA_result_3.svg" width="800"/> --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>