<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Tools for epigenome analysis of genomic regions and data-intensive project management Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## Overview <div class="col3"> <span class="bullet"><a href="#LOLA"><img src="/images/icons/link_white.svg"></a>LOLA</span> <br>Locus Overlap Analysis </div> <div class="col3"> <span class="bullet"><a href="#MIRA"><img src="/images/icons/link_white.svg"></a>MIRA</span> <br> Methylation-based Inference of Regulatory Activity </div> <div class="col3"> <span class="bullet"><a href="#PEP"><img src="/images/icons/link_white.svg"></a>PEP</span> <br> Portable Encapsulated Projects </div> --- <img src="/_modules/lola-intro/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> ### Locus Overlap Analysis <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Sheffield and Bock (2016). <i>Bioinformatics</i>.</span><br/> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Nagraj, Magee, and Sheffield (2018). <i>Nucleic Acids Research</i>.</span> --- <img src="/shorts/lola/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Sheffield and Bock (2016). *Bioinformatics*.</span><br/> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Nagraj, Magee, and Sheffield (2018). *Nucleic Acids Research*.</span> ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- ## LOLAweb <img src="/shorts/lola/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> A shiny app and server for interactive LOLA analysis. Public server: [http://lolaweb.databio.org](http://lolaweb.databio.org) GitHub: [https://github.com/databio/LOLAweb](https://github.com/databio/LOLAweb) --- ### DEMO <video controls width="800"> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- <img src="/_modules/lola-web/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> A shiny app and server for interactive LOLA analysis.<br> Public server: <a href="http://lolaweb.databio.org">http://lolaweb.databio.org</a><br> GitHub: <A href="https://github.com/databio/LOLAweb">https://github.com/databio/LOLAweb</A> </div> --- ### DEMO <video controls> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- ## Methylation-based Inference of Regulatory Activity (MIRA) <div class="small"> <a href="http://code.databio.org/MIRA/">http://code.databio.org/MIRA/</a><br> </div> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Lawson et al. (2018). <i>Bioinformatics</i>.</span> --- ### DNA methylation <img src="/_modules/bis-seq-intro/dnameth_intro.svg" /> --- ### DNA methylation <img src="/_modules/bis-seq-intro/dnameth_intro2.svg" /> --- ### <span class="bullet"><img src="/_modules/bis-seq-intro/bolt.svg" width="50" class="bullet">Bisulfite-seq</span> <img src="/_modules/bis-seq-intro/dnameth_bisulfite.svg" /> --- # <span class="bullet"><img src="/_modules/region-pooling/merge.svg" width="50" class="bullet">Region pooling</span> --- <!-- .slide: data-transition="fade-in" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## MIRA concept <img src="/_modules/mira/mira.svg" /> --- ## MIRA workflow <img src="/_modules/mira/mira2.svg" /> --- ## MIRA analysis <img src="/_modules/mira/mira3.svg" /> --- ## MIRA results: Differential activity <img src="/_modules/mira/ews_pat/MIRA_result_1.svg" width="700"/> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Sheffield et al. (2017). <i>Nature Medicine</i>.</span> --- ## MIRA results: Activity scores <img src="/_modules/mira/ews_pat/MIRA_result_2.svg" /> --- ## MIRA results: Enrichment analysis <img src="/_modules/mira/ews_pat/MIRA_result_3.svg" width="800"/> --- ## Organizing large-scale biological data around standardized projects --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## Data is becoming more... <div style="display: flex; justify-content: space-between;"> <div style="width: 30%; text-align: center;"> abundant <img src="/_modules/pep/sequencing_costs_2015.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> available <img src="/_modules/pep/cos_black.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> powerful <img src="/_modules/pep/gwasw.jpg" width="100%"> </div> </div> --- ## So why are the world's problems not solved? <div class="fragment"> <img src="/_modules/pep/data_not_info.svg" width="500"> </div> --- <img src="/_modules/pep/data_analysis_info.svg" width="700"> --- ## First step in bioinformatics analysis <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: center;"> <img src="/_modules/pep/icons/pipeline.svg" width="200"> pipeline </div> <div class="fragment" style="width: 45%; text-align: center;"> <img src="/_modules/pep/search_trends.svg" width="100%"> Papers with "bioinformatics pipeline" in title </div> </div> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info_highlight.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_interface.svg" width="700"> --- ## Data munging <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input-new_white.svg" width="100%"> </div> <div class="fragment" style="width: 45%;"> <img src="/_modules/pep/data_input-rev_white.svg" width="100%"> </div> </div> --- ## Then, downstream tools need a different organization <img src="/_modules/pep/data_input_steps.svg" width="525"> --- ## What if? <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input2.svg" width="100%"> </div> <div style="width: 45%;"> <img src="/_modules/pep/data_input2_rev.svg" width="100%"> </div> </div> --- ## The solution <img src="/_modules/pep/data_input_plug.svg" width="625"> --- ## The solution <img src="/_modules/pep/data_input_steps_plug.svg" width="475"> --- ## PEP: A standard format for project metadata <img src="/_modules/pep/pep_center_white.svg" width="700"> --- ## PEP Ecosystem <img src="/_modules/pep/pep_ecosystem.svg" width="750"> --- ## <img src="/_modules/pep/pep_logo.svg" width="70" style="vertical-align: middle;"> PEP format <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> project_config.yaml</span> ```yaml metadata: sample_annotation: /path/to/samples.csv output_dir: /path/to/output/folder ``` --- <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> samples.csv</span> ```csv sample_name, protocol, organism, data_source frog_0h, RNA-seq, frog, /path/to/frog0.gz frog_1h, RNA-seq, frog, /path/to/frog1.gz frog_2h, RNA-seq, frog, /path/to/frog2.gz frog_3h, RNA-seq, frog, /path/to/frog3.gz ``` --- ## PEP: Portable Encapsulated Projects <img src="/_modules/pep-format/pep_center_white.svg" width="700"> --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Start with a simple CSV with tabular data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Add a YAML for project-level data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">project_config.yaml </div> ```yaml sample_table: /path/to/samples.csv output_dir: /path/to/output/folder other_variable: value ``` --- ### Add programmatic sample and project modifiers. <div style="text-align: left"> <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/subproject_white.svg" width="50" class="bullet">Subprojects</span><br> </div> --- <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <div class="well">Automatically build new sample attributes from existing attributes.</div> Without derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | /path/to/frog0.gz | | frog_1h | 1 | RNA-seq | frog | /path/to/frog1.gz | | frog_2h | 2 | RNA-seq | frog | /path/to/frog2.gz | | frog_3h | 3 | RNA-seq | frog | /path/to/frog3.gz | Using derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | --- | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | Project config file: ```yaml sample_modifiers: derive: attributes: [input_file] sources: my_samples: "/path/to/my/samples/{organism}_{t}h.gz" your_samples: "/path/to/your/samples/{organism}_{t}h.gz" ``` {variable} identifies sample annotation columns <div class="well">Benefit: Enables distributed files, portability</div> --- <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <div class="well">Add new sample attributes conditioned on values of existing attributes</div> <div class="col2"> Before:<br> | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | </div> <div class="col2"> After:<br> | sample_name | protocol | organism | genome | |-------------|:--------:|----------|--------| | human_1 | RNA-seq | human | hg38 | | human_2 | RNA-seq | human | hg38 | | human_3 | RNA-seq | human | hg38 | | mouse_1 | RNA-seq | mouse | mm10 | </div> --- | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | Project config file: ```yaml sample_modifiers: imply: - if: organism: human then: genome: hg38 - if: organism: mouse then: genome: mm10 ``` <div class="well">Benefit: Divides project from sample metadata</div> --- <span class="bullet"><img src="/_modules/pep-format/subproject.svg" width="50" class="bullet">Subprojects</span><br> <div class="well">Define activatable project attributes.</div> ```yaml project_modifiers: amendments: diverse: metadata: sample_annotation: psa_rrbs_diverse.csv cancer: metadata: sample_annotation: psa_rrbs_intracancer.csv ``` <div class="well">Benefit: Defines multiple similar projects in a single file</div> --- <div> <img src="/_modules/geofetch/geofetch_logo.svg" width="275"><br> Connects the Gene Expression Omnibus (GEO) <br> and Sequence Read Archive (SRA) <br> with PEP format<br> </div> <br><span class="small bullet"><img src="/_modules/geofetch/web.svg" height="25" class="bullet"><a href="https://geofetch.databio.org">geofetch.databio.org</a></span> --- ## <img src="/_modules/looper/logo_looper.svg" width="150" style="vertical-align: middle;"> Looper Deploys pipelines across samples by connecting samples to any command-line tool <div class="small"> <a href="https://looper.databio.org">https://looper.databio.org</a> </div> --- <img src="/_modules/looper/looper_role_white_v2.svg" width="100%"> --- ## pipeline_interface.yaml ```yaml protocol_mappings: RNA-seq: rna-seq pipelines: rna-seq: name: RNA-seq_pipeline path: path/to/rna-seq.py arguments: "--option1": sample_attribute "--option2": sample_attribute2 ``` - maps protocols to pipelines <!-- .element: class="fragment" --> - maps sample attributes (columns) to pipeline arguments <!-- .element: class="fragment" --> --- ## Looper features <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span><br> <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span><br> <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span><br> </div> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span><br> <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span><br> </div> </div> --- ## <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span> Run your entire project with one line: ```bash looper run project_config.yaml ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span> ```yaml protocol_mappings: RRBS: rrbs WGBS: wgbs EG: wgbs.py SMART-seq: rnaBitSeq -f; rnaTopHat -f ATAC-SEQ: atacseq DNase-seq: atacseq CHIP-SEQ: chipseq ``` Many-to-many mappings --- ## <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span> ```yaml pipeline_key: name: pipeline_name arguments: "--option" : value resources: default: file_size: "0" cores: "2" mem: "6000" time: "01:00:00" large_input: file_size: "2000" cores: "4" mem: "12000" time: "08:00:00" ``` Resources can vary by input file size --- ## <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span> ```yaml compute: slurm: submission_template: templates/slurm_template.sub submission_command: sbatch localhost: submission_template: templates/localhost_template.sub submission_command: sh ``` --- Adjust compute package on-the-fly: ```bash looper run project_config.yaml --compute localhost ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span> Looper only submits jobs for samples not already flagged as running, completed, or failed. ```bash looper check project_config.yaml ``` ```bash looper summarize project_config.yaml ``` --- ## PEP Ecosystem <img src="/_modules/pep-details/pep_ecosystem.svg" width="750"> --- ## peppy package <img src="/_modules/pep-details/logo/logo_python.svg" height="50" style="vertical-align: middle;"> ```python import peppy prj = Project("pep_config.yaml") samples = prj.get_samples() for sample in samples: print(sample.name) # do further analysis to each sample ``` <span class="bullet"><img src="/_modules/pep-details/icons/ftnetwork-connected.svg" width="50" class="bullet">Project API</span> --- ## pepr package <img src="/_modules/pep-details/logo/logo_R.svg" height="50" style="vertical-align: middle;"> ```r library("pepr") prj = pepr::Project("pep_config.yaml") samples = pepr::pepSamples(prj) for (sample in samples) { message(pepr::sampleName(sample)) # do further analysis to each sample } ``` --- ## Conclusion - PEP format is a novel approach to standardize *projects*. - Initial tools like `geofetch` and `looper` build PEP projects and connect them to pipelines - Python and R packages provide a universal interface to PEP metadata for tools and analysis More information at [pepkit.github.io](http://pepkit.github.io/). --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>