<style> .reveal section img { margin:0px; } </style> <style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # A modular project computing ecosystem and application to ATAC-seq data processing Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- <!-- .slide: data-background="/images/presentations/bg.svg.png" data-transition-speed="slow" --> ### Outline <style> .previewblock { float: left; width: 20px; height: 45px; margin: 0; border: none; white-space: nowrap; box-sizing: border-box; } .questionblock { float: left; width: 100%; margin: 5px 0; border: 1px solid rgba(255, 255, 255, .2); } </style> {{outlineContent}} <div class="questionblock" style="background:#222; color:#eee; font-size: 0.6em; margin-top: 35px">◁ Questions ▷</div> --- #### Most pipelines require individual metadata organization <div class="col2"> <img src="/_modules/motivation-modular-pipelines/pep/data_input-new_white.svg" width="375"> </div> <div class="col2 fragment"> <img src="/_modules/motivation-modular-pipelines/pep/data_input-rev_white.svg" width="375"> </div> --- ### What if? <div class="col2"> <img src="/_modules/motivation-modular-pipelines/pep/data_input2.svg" width="325"> </div> <div class="col2"> <img src="/_modules/motivation-modular-pipelines/pep/data_input2_rev.svg" width="325"> </div> <div class="fragment"> Why is this hard to do? <br>Because of <i>microwave syndrome</i>.... </div> --- #### Microwave syndrome <div> <img src="/_modules/motivation-modular-pipelines/IFB_17PM-MEC1.png" height="180"> <img src="/_modules/motivation-modular-pipelines/LG_LMV2031SB.png" height="180"> <img src="/_modules/motivation-modular-pipelines/panasonic_NN-CT585SBPQ.png" height="180"> </div> <div class="well fragment">In user interface design, prioritizing easy access to integrated functions over their individual components. </div> --- <section style="font-size: 0;"> <img src="/_modules/motivation-modular-pipelines/IFB_17PM-MEC1.png" height="120"> <img src="/_modules/motivation-modular-pipelines/LG_LMV2031SB.png" height="120"> <img src="/_modules/motivation-modular-pipelines/panasonic_NN-CT585SBPQ.png" height="120"><br> <img src="/_modules/motivation-modular-pipelines/IFB_17PM-MEC1_console.jpg" height="560"> <img src="/_modules/motivation-modular-pipelines/LG_LMV2031SB_console.jpg" height="560" class="fragment"> <img src="/_modules/motivation-modular-pipelines/panasonic_NN-CT585SBPQ_console.jpg" height="560" class="fragment"> </section> --- <section transition="fade-in"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_chunk.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_chunk2.svg" height="650"> </section> --- ### The UNIX philosophy <div class="col2"> <img src="/_modules/motivation-modular-pipelines/unix_book.jpg" height="450"> </div> <div class="col2"> <div class="well"><span style="color:#ffb; font-weight:bold">[T]he power of a system comes more from the relationships among programs than from the programs themselves.</span><br><br> <span style="font-size: 0.8em">Many UNIX programs do quite trivial tasks in isolation, but, combined with other programs, become general and useful tools.</span><br/><br/> <span class="small">- Kernighan and Pike, The UNIX Programming Environment (1983, p. viii)</span> </div> </div> --- <section transition="fade-in"> <img src="/_modules/motivation-modular-pipelines/pipelines/modularity_spectrum.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/motivation-modular-pipelines/pipelines/modularity_spectrum2.svg" height="650"> </section> --- <section transition="fade-in" id="links1"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links1.svg" height="650"> </section> --- <section transition="fade-in" id="links2"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links2.svg" height="650"> </section> --- <section transition="fade-in" id="links3"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links3.svg" height="650"> </section> --- <section transition="fade-in" id="links4"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links4.svg" height="650"> </section> --- <section transition="fade-in" id="links5"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links5.svg" height="650"> </section> --- <section transition="fade-in" id="links6"> <img src="/_modules/motivation-modular-pipelines/pipelines/pipeline_links6.svg" height="650"> </section> --- <div class="col2"> ### Problem <img src="/slides/a-modular-project-computing-ecosystem-atacseq/data_input-new_white.svg" width="375"> </div> <div class="col2"> ### Solution <!-- .element: class="fragment" --> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/data_input_plug.svg" width="375"> <!-- .element: class="fragment" --> </div> --- <div class="col2"> ### Problem <img src="/slides/a-modular-project-computing-ecosystem-atacseq/data_input-new_white.svg" width="375"> </div> <div class="col2"> ### Solution <!-- .element: class="fragment" --> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/data_input_plug.svg" width="375"> <!-- .element: class="fragment" --> </div> --- <img src="/slides/a-modular-project-computing-ecosystem-atacseq/pep_looper_pepatac.svg" height="650"> --- ## PEP: Portable Encapsulated Projects <img src="/_modules/pep-format/pep_center_white.svg" width="700"> --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Start with a simple CSV with tabular data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` --- <div class="bullet"> <h2><img src="/_modules/pep-format/pep_logo.svg" width="70">PEP format</h2> </div> Add a YAML for project-level data. <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">samples.csv </div> ``` sample_name,protocol,organism,input_file frog_0h,RNA-seq,frog,/path/to/frog0.gz frog_1h,RNA-seq,frog,/path/to/frog1.gz frog_2h,RNA-seq,frog,/path/to/frog2.gz frog_3h,RNA-seq,frog,/path/to/frog3.gz ``` <hr> <div class="bullet"> <img src="/_modules/pep-format/file.svg" width="30">project_config.yaml </div> ```yaml sample_table: /path/to/samples.csv output_dir: /path/to/output/folder other_variable: value ``` --- ### Add programmatic sample and project modifiers. <div style="text-align: left"> <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <span class="bullet"><img src="/_modules/pep-format/subproject_white.svg" width="50" class="bullet">Subprojects</span><br> </div> --- <span class="bullet"><img src="/_modules/pep-format/replace_white.svg" width="50" class="bullet">Derived attributes</span><br> <div class="well">Automatically build new sample attributes from existing attributes.</div> Without derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | /path/to/frog0.gz | | frog_1h | 1 | RNA-seq | frog | /path/to/frog1.gz | | frog_2h | 2 | RNA-seq | frog | /path/to/frog2.gz | | frog_3h | 3 | RNA-seq | frog | /path/to/frog3.gz | Using derived attribute: | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | --- | sample_name | t | protocol | organism | input_file | |-------------|---|:--------:|----------|------------| | frog_0h | 0 | RNA-seq | frog | my_samples | | frog_1h | 1 | RNA-seq | frog | my_samples | | frog_2h | 2 | RNA-seq | frog | my_samples | | frog_3h | 3 | RNA-seq | frog | my_samples | | crab_0h | 0 | RNA-seq | crab | your_samples | | crab_3h | 3 | RNA-seq | crab | your_samples | Project config file: ```yaml sample_modifiers: derive: attributes: [input_file] sources: my_samples: "/path/to/my/samples/{organism}_{t}h.gz" your_samples: "/path/to/your/samples/{organism}_{t}h.gz" ``` {variable} identifies sample annotation columns <div class="well">Benefit: Enables distributed files, portability</div> --- <span class="bullet"><img src="/_modules/pep-format/implies_white.svg" width="50" class="bullet">Implied attributes</span><br> <div class="well">Add new sample attributes conditioned on values of existing attributes</div> <div class="col2"> Before:<br> | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | </div> <div class="col2"> After:<br> | sample_name | protocol | organism | genome | |-------------|:--------:|----------|--------| | human_1 | RNA-seq | human | hg38 | | human_2 | RNA-seq | human | hg38 | | human_3 | RNA-seq | human | hg38 | | mouse_1 | RNA-seq | mouse | mm10 | </div> --- | sample_name | protocol | organism | |-------------|:--------:|----------| | human_1 | RNA-seq | human | | human_2 | RNA-seq | human | | human_3 | RNA-seq | human | | mouse_1 | RNA-seq | mouse | Project config file: ```yaml sample_modifiers: imply: - if: organism: human then: genome: hg38 - if: organism: mouse then: genome: mm10 ``` <div class="well">Benefit: Divides project from sample metadata</div> --- <span class="bullet"><img src="/_modules/pep-format/subproject.svg" width="50" class="bullet">Subprojects</span><br> <div class="well">Define activatable project attributes.</div> ```yaml project_modifiers: amendments: diverse: metadata: sample_annotation: psa_rrbs_diverse.csv cancer: metadata: sample_annotation: psa_rrbs_intracancer.csv ``` <div class="well">Benefit: Defines multiple similar projects in a single file</div> --- <img src="/slides/a-modular-project-computing-ecosystem-atacseq/pep_looper_pepatac.svg" height="650"> --- ## <img src="/_modules/looper/logo_looper.svg" width="150" style="vertical-align: middle;"> Looper Deploys pipelines across samples by connecting samples to any command-line tool <div class="small"> <a href="https://looper.databio.org">https://looper.databio.org</a> </div> --- <img src="/_modules/looper/looper_role_white_v2.svg" width="100%"> --- ## pipeline_interface.yaml ```yaml protocol_mappings: RNA-seq: rna-seq pipelines: rna-seq: name: RNA-seq_pipeline path: path/to/rna-seq.py arguments: "--option1": sample_attribute "--option2": sample_attribute2 ``` - maps protocols to pipelines <!-- .element: class="fragment" --> - maps sample attributes (columns) to pipeline arguments <!-- .element: class="fragment" --> --- ## Looper features <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span><br> <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span><br> <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span><br> </div> <div style="width: 45%; text-align: left;"> <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span><br> <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span><br> </div> </div> --- ## <span class="bullet"><img src="/_modules/looper/icons/input-mouse.svg" width="50" class="bullet"> Single-input runs</span> Run your entire project with one line: ```bash looper run project_config.yaml ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flexible.svg" width="50" class="bullet"> Flexible pipelines</span> ```yaml protocol_mappings: RRBS: rrbs WGBS: wgbs EG: wgbs.py SMART-seq: rnaBitSeq -f; rnaTopHat -f ATAC-SEQ: atacseq DNase-seq: atacseq CHIP-SEQ: chipseq ``` Many-to-many mappings --- ## <span class="bullet"><img src="/_modules/looper/icons/piechart.svg" width="50" class="bullet"> Flexible resources</span> ```yaml pipeline_key: name: pipeline_name arguments: "--option" : value resources: default: file_size: "0" cores: "2" mem: "6000" time: "01:00:00" large_input: file_size: "2000" cores: "4" mem: "12000" time: "08:00:00" ``` Resources can vary by input file size --- ## <span class="bullet"><img src="/_modules/looper/icons/computer.svg" width="50" class="bullet"> Flexible compute</span> ```yaml compute: slurm: submission_template: templates/slurm_template.sub submission_command: sbatch localhost: submission_template: templates/localhost_template.sub submission_command: sh ``` --- Adjust compute package on-the-fly: ```bash looper run project_config.yaml --compute localhost ``` --- ## <span class="bullet"><img src="/_modules/looper/icons/flag_checker.svg" width="50" class="bullet"> Job status-aware</span> Looper only submits jobs for samples not already flagged as running, completed, or failed. ```bash looper check project_config.yaml ``` ```bash looper summarize project_config.yaml ``` --- <img src="/slides/a-modular-project-computing-ecosystem-atacseq/pep_looper_pepatac.svg" height="650"> --- ## <img src="/_modules/pepatac/logo_pepatac.svg" width="175" style="padding-top:25px; padding-bottom:25px; vertical-align: middle;"> PEPATAC An optimized ATAC-seq pipeline with serial alignments <div class="small"> <a href="http://pepatac.databio.org">http://pepatac.databio.org</a> </div> <span class="small bullet"><img src="/_modules/pepatac/icons/paper.svg" height="25" class="bullet">Smith et al (2021, In press). <i>NAR Genomics and Bioinformatics</i>.</span> --- ## PEPATAC workflow <img src="/_modules/pepatac/pepatac_workflow_white.svg" width="600"> --- ## PEPATAC strengths <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <span style="color:goldenrod">Modular system</span> <br><br> <span>Prealignments</span> </div> <div style="width: 45%;"> <span>Flexibility and portability</span> <br><br> <span>Outputs</span> </div> </div> --- ## Command-line interface with only 3 required arguments ``` $ /pipelines/pepatac.py -h ``` ``` usage: pepatac.py [-h] [-R] [-N] [-D] [-F] [-C CONFIG_FILE] [-O PARENT_OUTPUT_FOLDER] [-M MEMORY_LIMIT] [-P NUMBER_OF_CORES] -S SAMPLE_NAME -I INPUT_FILES [INPUT_FILES ...] [-I2 [INPUT_FILES2 [INPUT_FILES2 ...]]] -G GENOME_ASSEMBLY [-Q SINGLE_OR_PAIRED] [-gs GENOME_SIZE] [--frip-ref-peaks FRIP_REF_PEAKS] [--TSS-name TSS_NAME] [--anno-name ANNO_NAME] [--keep] [--peak-caller {fseq,macs2}] [--trimmer {trimmomatic,skewer}] [--prealignments PREALIGNMENTS [PREALIGNMENTS ...]] [-V] PEPATAC version 0.7.3 optional arguments: -h, --help show this help message and exit -R, --recover Overwrite locks to recover from previous failed run -N, --new-start Overwrite all results to start a fresh run -D, --dirty Don't auto-delete intermediate files -F, --force-follow Always run 'follow' commands -C CONFIG_FILE, --config CONFIG_FILE Pipeline configuration file (YAML). Relative paths are with respect to the pipeline script. -O PARENT_OUTPUT_FOLDER, --output-parent PARENT_OUTPUT_FOLDER Parent output directory of project -M MEMORY_LIMIT, --mem MEMORY_LIMIT Memory limit (in Mb) for processes accepting such -P NUMBER_OF_CORES, --cores NUMBER_OF_CORES Number of cores for parallelized processes -I2 [INPUT_FILES2 [INPUT_FILES2 ...]], --input2 [INPUT_FILES2 [INPUT_FILES2 ...]] Secondary input files, such as read2 -Q SINGLE_OR_PAIRED, --single-or-paired SINGLE_OR_PAIRED Single- or paired-end sequencing protocol -gs GENOME_SIZE, --genome-size GENOME_SIZE genome size for MACS2 --frip-ref-peaks FRIP_REF_PEAKS Reference peak set for calculating FRiP --TSS-name TSS_NAME Name of TSS annotation file --anno-name ANNO_NAME Name of reference bed file for calculating FRiF --keep Keep prealignment BAM files --peak-caller {fseq,macs2} Name of peak caller --trimmer {trimmomatic,pyadapt,skewer} Name of read trimming program --prealignments PREALIGNMENTS [PREALIGNMENTS ...] Space-delimited list of reference genomes to align to before primary alignment. -V, --version show program's version number and exit required named arguments: -S SAMPLE_NAME, --sample-name SAMPLE_NAME Name for sample to run -I INPUT_FILES [INPUT_FILES ...], --input INPUT_FILES [INPUT_FILES ...] One or more primary input files -G GENOME_ASSEMBLY, --genome GENOME_ASSEMBLY Identifier for genome assembly ``` --- ## Portable Encapsulated Projects (PEP) provide interoperability <img src="/_modules/pepatac/pepatac_modularity_1.svg" width="600"> --- ## Portable Encapsulated Projects (PEP) provide interoperability <img src="/_modules/pepatac/pepatac_modularity_2.svg" width="600"> --- ## PEP specification for sample metadata 1. Configuration file: `config.yaml` ```yaml pep_version: 2.0.0 sample_table: "path/to/sample_table.csv" ``` 2. Tabular sample annotation table: `sample_table.csv`: ```csv "sample_name", "protocol", "file" "frog_1", "ATAC-seq", "frog1.fq.gz" "frog_2", "ATAC-seq", "frog2.fq.gz" "frog_3", "ATAC-seq", "frog3.fq.gz" "frog_4", "ATAC-seq", "frog4.fq.gz" ``` <a href="http://pep.databio.org">pep.databio.org</a> --- ## MapReduce or Scatter/Gather 1. Map/Scatter PEPATAC across individual samples ```bash looper run config.yaml ``` 2. Gather results and do cross-sample analysis ```bash looper runp config.yaml ``` --- ## PEPATAC strengths <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <span>Modular system</span> <br><br> <span style="color:goldenrod">Prealignments</span> </div> <div style="width: 45%;"> <span>Flexibility and portability</span> <br><br> <span>Outputs</span> </div> </div> <div class="fragment"> Nuclear-mitochondrial DNA (NuMts) confuse aligners </div> --- ## Nuclear-mitochondrial DNA (NuMts) <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_simultaneous_alignment.svg" width="800" class="fragment"> <img src="/_modules/pepatac/numts_alignment.svg" width="800" class="fragment"> --- ## NuMts alignment problems <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_problems.svg" width="800"> <img src="/_modules/pepatac/numts_alignment.svg" width="800"> --- ## NuMts with blacklist approach <img src="/_modules/pepatac/numts.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_problems.svg" width="800"> <img src="/_modules/pepatac/numts_alignment_blacklist.svg" width="800"> --- ## Problems with region masking <img src="/_modules/pepatac/numts_alignment_blacklist.svg" width="800"> - Inaccurate alignment statistics - Requires pre-defined NuMt locations - Wastes compute power --- ## Serial prealignments <img src="/_modules/pepatac/prealignments.svg" width="800"> --- ## Serial prealignments process <img src="/_modules/pepatac/prealignments2.svg" width="800"> --- ## Advantages of serial alignments - Accuracy (better rates plus no blacklist needed). - Speed. - Modular reference assemblies. --- ## Prealignment mapping rates <img src="/_modules/pepatac/prealignment_mapping.svg" width="800"> --- ## Prealignment speed improvements <img src="/_modules/pepatac/prealignment_speed.svg" width="800"> --- ## PEPATAC strengths <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <span>Modular system</span> <br><br> <span>Prealignments</span> </div> <div style="width: 45%;"> <span style="color:goldenrod">Flexibility and portability</span> <br><br> <span>Outputs</span> </div> </div> --- ## Flexibility and Portability - trimmer options: `skewer` and `trimmomatic` - peak caller options: `macs2` and `fseq` - aligner options: `bowtie2` and `bwa` ```bash ./pepatac.py --trimmer trimmomatic --peak-caller fseq ``` --- ## Flexibility and Portability Parameterization via config file `pepatac.yaml`: ```yaml # basic tools tools: # absolute paths to required tools java: java python: python samtools: samtools bedtools: bedtools bowtie2: bowtie2 fastqc: fastqc macs2: macs2 picard: ${PICARD} skewer: skewer perl: perl # ucsc tools bedGraphToBigWig: bedGraphToBigWig wigToBigWig: wigToBigWig bigWigCat: bigWigCat bedSort: bedSort bedToBigBed: bedToBigBed # optional tools fseq: fseq trimmo: ${TRIMMOMATIC} Rscript: Rscript # user configure resources: genomes: ${GENOMES} adapters: null # Set to null to use default adapters parameters: # parameters passed to bioinformatic tools samtools: q: 10 macs2: f: BED q: 0.01 shift: 0 fseq: of: npf # narrowPeak as output format l: 600 # feature length t: 4.0 # "threshold" (standard deviations) s: 1 # wiggle track step ``` --- ## Flexibility and Portability Running options: - natively - conda - containers using `docker` or `singularity`. - use bulker to manage containers for your (http://bulker.io) ```bash git clone github.com/databio/pepatac docker pull databio/pepatac docker run --rm -it databio/pepatac pipelines/pepatac.py ``` --- ## PEPATAC strengths <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <span>Modular system</span> <br><br> <span>Prealignments</span> </div> <div style="width: 45%;"> <span>Flexibility and portability</span> <br><br> <span style="color:goldenrod">Outputs</span> </div> </div> --- ## Output <img src="/_modules/pepatac/pepatac_output_sample.svg" width="650"> --- ## Summary report <img src="/_modules/pepatac/pepatac_summary.png" width="800"> <a href="http://pepatac.databio.org/en/latest/files/examples/gold/gold_summary.html">http://pepatac.databio.org/en/latest/files/examples/gold/gold_summary.html</a> --- ## PEPATAC in practice <div style="font-size:0.5em"> - **O'Connor et al. (2021).** *bioRxiv*. DOI: [10.1101/2021.07.15.452570](http://dx.doi.org/10.1101/2021.07.15.452570) - **Ram-Mohan et al. (2021).** *Life Science Alliance*. DOI: [10.26508/lsa.202000976](http://dx.doi.org/10.26508/lsa.202000976) - **Robertson et al. (2021).** *Nature Genetics*. DOI: [10.1038/s41588-021-00880-5](http://dx.doi.org/10.1038/s41588-021-00880-5) - **Cheung et al. (2021).** DOI: [10.1038/s41590-021-00928-y](http://dx.doi.org/10.1038/s41590-021-00928-y) - **Hasegawa et al. (2021).** *bioRxiv*. DOI: [10.1101/2021.04.28.441728](http://dx.doi.org/10.1101/2021.04.28.441728) - **Weber et al. (2021).** *Science*. DOI: [10.1126/science.aba1786](http://dx.doi.org/10.1126/science.aba1786) - **Tovar et al. (2021).** *bioRxiv*. DOI: [10.1101/2021.01.29.428733](http://dx.doi.org/10.1101/2021.01.29.428733) - **Granja et al. (2021).** *Nature Genetics*. DOI: [10.1038/s41588-021-00790-6](http://dx.doi.org/10.1038/s41588-021-00790-6) - **Fan et al. (2020).** *Cell Reports*. DOI: [10.1016/j.celrep.2020.108473](http://dx.doi.org/10.1016/j.celrep.2020.108473) - **Smith and Sheffield (2020).** *Current Protocols in Human Genetics*. DOI: [10.1002/cphg.101](http://dx.doi.org/10.1002/cphg.101) - **Liu (2020).** DOI: [10.18632/oncotarget.27584](http://dx.doi.org/10.18632/oncotarget.27584) - **Zhou et al. (2020).** *bioRxiv*. DOI: [10.1101/2020.05.16.099325](http://dx.doi.org/10.1101/2020.05.16.099325) - **Cai et al. (2020).** DOI: [10.1186/s12920-020-0695-0](http://dx.doi.org/10.1186/s12920-020-0695-0) - **Li et al. (2020).** DOI: [10.1038/s41419-020-2303-9](http://dx.doi.org/10.1038/s41419-020-2303-9) - **Liang et al. (2019).** DOI: [10.1002/1873-3468.13549](http://dx.doi.org/10.1002/1873-3468.13549) - **Corces et al. (2018).** *Science*. DOI: [10.1126/science.aav1898](http://dx.doi.org/10.1126/science.aav1898) </div> --- ## Conclusion <div class="col2"> <div style="padding:7px"> If you're doing ATAC-seq analysis Try pepatac! <img src="/slides/a-modular-project-computing-ecosystem-atacseq/pepatac_logo.svg" width="185" style="padding-top:35px; padding-bottom:35px"> [code.databio.org/PEPATAC](http://code.databio.org/PEPATAC) </div> </div> <div class="col2"> <div style="padding:7px"> <!-- .element: class="fragment" --> If you're developing pipelines Try looper! <img src="/slides/a-modular-project-computing-ecosystem-atacseq/logo_looper.svg" width="125"> [looper.readthedocs.io](http://looper.readthedocs.io) </div> </div> --- <div style="padding:7px"> Everyone else Eat chicken nuggets! <img src="/slides/a-modular-project-computing-ecosystem-atacseq/white_microwave.svg" width="125"> </div> --- ## Acknowledgments <div class="col3" style="font-size:.6em"> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/slides/a-modular-project-computing-ecosystem-atacseq/University_of_Virginia_logo_white.svg" height="40"> **Sheffield lab** - John Lawson - Vince Reuter - Jason Smith - Jianglin Feng - Michal Stolarczyk - Aaron Gu </div> <div class="col3" style="font-size:.6em"> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/logo_cemm.svg" height="30"> **Christoph Bock** - Andre Rendeiro - Johanna Klughammer <img src="/slides/a-modular-project-computing-ecosystem-atacseq/stanford.svg" height="30"> **Howard Chang** - Ryan Corces - Yuning Wei - Jin Xu </div> <div class="col3" style="font-size:.6em"> **Funding:** <img src="/slides/a-modular-project-computing-ecosystem-atacseq/University_of_Virginia_logo_white.svg" height="40"> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/NIH_logo_black.svg" height="80"> <img src="/slides/a-modular-project-computing-ecosystem-atacseq/hfsp_logo.svg" height="60"> </div> --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section> --- ### Parallelism Philosophy <img src="/_modules/parallelism/parallel_sequential.svg" width="100%"><br> <div class="fragment"> <div class="col3" style="background-color:#211">by process <img src="/_modules/parallelism/parallel_process.svg" width="300"> </div> <div class="col3" style="background-color:#112">by sample <img src="/_modules/parallelism/parallel_sample.svg" width="300"> </div> <div class="col3" style="background-color:#121">by dependence <img src="/_modules/parallelism/parallel_dependency.svg" width="300"> </div> </div> <br clear="all"> <div class="fragment"> <div class="col3" style="background-color:#211">Very easy</div> <div class="col3" style="background-color:#112">Easy</div> <div class="col3" style="background-color:#121">Hard</div> </div> <br clear="all"> <div class="fragment"> <div class="col3" style="background-color:#211"> <img src="/_modules/parallelism/parallel_process_benefit.svg" width="300"> </div> <div class="col3" style="background-color:#112"> <img src="/_modules/parallelism/parallel_sample_benefit.svg" width="300"> </div> <div class="col3" style="background-color:#121"> <img src="/_modules/parallelism/parallel_dependency_benefit.svg" width="300"> </div> </div>