<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Scattering CWL across tabular samples and Interactive computing environments from CWL Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## Problem 1 I got a sample table from my collaborator.<br> How do I run a CWL workflow on it? --- ## A simple CWL job ``` cwl-runner wc-tool.cwl wc-job.yml ``` <div class="col2"> ``` cwlVersion: v1.0 class: CommandLineTool baseCommand: [wc, -l] inputs: file: type: File inputBinding: position: 1 outputs: [] ``` </div> <div class="col2"> ``` file: class: File path: data/frog1_data.txt ``` </div> <br clear="all"/> <div class="fragment">Scatter across multiple samples...</div> --- ## Scattering with nested workflows <div class="col2"> main.cwl ``` steps: alignment: run: alignment.cwl scatter: fq in: fq: fq genome: genome gtf: gtf out: [qc_html, bam] featureCounts: requirements: ResourceRequirement: ramMin: 500 run: featureCounts.cwl in: n_input_bam: aln/bam gtf: gtf out: [featurecounts] ``` </div> <div class="col2"> inputs.yaml ``` fq: - class: File location: rnaseq/raw_fastq/s1.fq format: http://edamontology.org/format_1930 - class: File location: rnaseq/raw_fastq/s2.fq format: http://edamontology.org/format_1930 - class: File location: rnaseq/raw_fastq/s3.fq format: http://edamontology.org/format_1930 genome: class: Directory location: hg19-chr1-STAR-index gtf: class: File location: rnaseq/ref/genes.gtf ``` </div> <br clear="all"/> <div class="footnote" style="font-size:0.5em">Adapted from Peter Amstutz<br><a href="http://github.com/common-workflow-library/rnaseq-cwl-training">github.com/common-workflow-library/rnaseq-cwl-training</a></div> --- ## But I have a CSV sample table ``` sample_name,library,file frog_1,anySampleType,data/frog1_data.txt frog_2,anySampleType,data/frog2_data.txt ``` --- ## Introduction to looper <img src="/shorts/short-cwl-pep/looper_role_white_simple.svg" style="width:800px"> ``` looper run config.yaml ``` --- ## CWL Interface Configuration cwl_interface.yaml: ``` pipeline_name: count_lines pipeline_type: sample input_schema: input_schema.yaml command_template: > cwl-runner wc-tool.cwl {sample.sample_yaml_cwl} pre_submit: python_functions: - looper.write_sample_yaml_cwl ``` project_config.yaml: ``` pep_version: 2.0.0 sample_table: file_list.csv sample_modifiers: append: pipeline_interfaces: cwl_interface.yaml looper: output_dir: pipeline_results ``` --- ## Scattering across samples using looper ``` > looper run project_config.yaml ``` ``` Looper version: 1.3.1-dev Command: run ## [1 of 2] sample: frog_1; pipeline: count_lines Calling pre-submit function: looper.write_sample_yaml_cwl Writing sample yaml to pipeline_results/submission/frog_1_sample_cwl.yaml Writing script to /home/nsheff/code/incubator/learn_cwl/cwl-pep/simple_demo/pipeline_results/submission/count_lines_frog_1.sub Job script (n=1; 0.00Gb): pipeline_results/submission/count_lines_frog_1.sub Compute node: zither Start time: 2021-01-26 14:54:50 INFO /home/nsheff/.local/bin/cwl-runner 3.0.20200807132242 INFO Resolved 'wc-tool.cwl' to 'file:///home/nsheff/code/incubator/learn_cwl/cwl-pep/simple_demo/wc-tool.cwl' INFO [job wc-tool.cwl] /tmp/7vhoojf2$ wc \ -l \ /tmp/tmpxcekd0he/stg6b7f7559-6e4f-409b-8b9f-adc73dd5ca82/frog1_data.txt 4 /tmp/tmpxcekd0he/stg6b7f7559-6e4f-409b-8b9f-adc73dd5ca82/frog1_data.txt INFO [job wc-tool.cwl] completed success {} INFO Final process status is success ## [2 of 2] sample: frog_2; pipeline: count_lines Calling pre-submit function: looper.write_sample_yaml_cwl Writing sample yaml to pipeline_results/submission/frog_2_sample_cwl.yaml Writing script to /home/nsheff/code/incubator/learn_cwl/cwl-pep/simple_demo/pipeline_results/submission/count_lines_frog_2.sub Job script (n=1; 0.00Gb): pipeline_results/submission/count_lines_frog_2.sub Compute node: zither Start time: 2021-01-26 14:54:51 INFO /home/nsheff/.local/bin/cwl-runner 3.0.20200807132242 INFO Resolved 'wc-tool.cwl' to 'file:///home/nsheff/code/incubator/learn_cwl/cwl-pep/simple_demo/wc-tool.cwl' INFO [job wc-tool.cwl] /tmp/96ojstvh$ wc \ -l \ /tmp/tmp09syks28/stgd2a8068d-a68f-4759-9cf6-0f359aa49740/frog2_data.txt 7 /tmp/tmp09syks28/stgd2a8068d-a68f-4759-9cf6-0f359aa49740/frog2_data.txt INFO [job wc-tool.cwl] completed success {} INFO Final process status is success Looper finished Samples valid for job generation: 2 of 2 Commands submitted: 2 of 2 Jobs submitted: 2 ``` --- ## Looper uses a generic input format <img src="/shorts/short-cwl-pep/pep_as_input.svg" style="width:800px"> --- ## Using PEP across tools <img src="/shorts/short-cwl-pep/logo_looper.svg" style="width:100px; float:left"> ``` looper run config.yaml ``` <br clear="all"/> <div class="col2"><img src="/shorts/short-cwl-pep/logo_R.svg" style="width:100px"> ``` install.packages("pepr") library("pepr") p = pepr::Project("config.yaml") projConfig = config(p) mySamples = sampleTable(p) ``` </div> <div class="col2"> <img src="/shorts/short-cwl-pep/logo_python.svg" style="width:280px"> ``` pip install peppy import peppy prj = peppy.Project("config.yaml") samples = prj.samples sample_table = prj.sample_table ``` </div> --- ## Features of PEP <div class="col3">Project modifiers <img src="/shorts/short-cwl-pep/cartoon_imports.svg" style="background-color:white; padding: 9px; width:280px"> <img src="/shorts/short-cwl-pep/cartoon_amendments.svg" style="background-color:white; padding: 9px; width:280px"> </div> <div class="col3">Sample modifiers <img src="/shorts/short-cwl-pep/cartoon_sample_modifiers.svg" style="background-color:white; padding: 9px; width:280px"> </div> <div class="col3">Schema validation <img src="/shorts/short-cwl-pep/validation.svg" style="background-color:white; padding: 9px; width:280px"> </div> <br> Learn more:<br> <a href="http://pep.databio.org">http://pep.databio.org</a> --- ## Problem 2 I want to test something in a workflow computing environment:<br> - I'm troubleshooting a failing command <!-- .element: class="fragment" --> - I want to try a step interactively with other data <!-- .element: class="fragment" --> - I want to demo a different approach using the same tools <!-- .element: class="fragment" --> <br> <span class="fragment">How do I run interactive code at the terminal... <br> as if I were a workflow?</span> --- ## Docker in CWL <div class="col2"> ``` #!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: node hints: DockerRequirement: dockerPull: node:slim inputs: src: type: File inputBinding: position: 1 outputs: example_out: type: stdout stdout: output.txt ``` </div> <div class="col2 fragment"> ``` docker run -i --rm \ --volume /home:/home \ --volume /tmp:/tmp \ --volume /ext:/ext \ --env=TMPDIR --workdir `pwd` \ --user=1000:1000 \ --network="host" \ --docker-arg \ --another-docker-arg \ --yet-another-docker-arg \ node:slim node ... command ``` </div> --- ## Intro to Bulker Simple commands run in containers behind-the-scenes <img src="http://bulker.io/img/bulker_executables.svg" style="background:white;width:600px"> <br> <a href="http://bulker.io">bulker.io</a> --- ## Bulker basics ``` pip install bulker bulker load demo bulker activate demo cowsay Hello world! <- actually runs in docker ``` --- ## Bulker + CWL = cwl2man ``` bulker cwl2man -c workflow.cwl -m manifest.yaml bulker load my-interactive-env -m manifest.yaml bulker activate my-interactive-env voila! ``` --- ## Summary looper : scattering CWL workflows across tabular data <br><br> bulker : portable, interactive environments from CWL