<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Organizing large-scale biological data around standardized projects Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## Data is becoming more... <div style="display: flex; justify-content: space-between;"> <div style="width: 30%; text-align: center;"> abundant <img src="/_modules/pep/sequencing_costs_2015.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> available <img src="/_modules/pep/cos_black.jpg" width="100%"> </div> <div style="width: 30%; text-align: center;"> powerful <img src="/_modules/pep/gwasw.jpg" width="100%"> </div> </div> --- ## So why are the world's problems not solved? <div class="fragment"> <img src="/_modules/pep/data_not_info.svg" width="500"> </div> --- <img src="/_modules/pep/data_analysis_info.svg" width="700"> --- ## First step in bioinformatics analysis <div style="display: flex; justify-content: space-between;"> <div style="width: 45%; text-align: center;"> <img src="/_modules/pep/icons/pipeline.svg" width="200"> pipeline </div> <div class="fragment" style="width: 45%; text-align: center;"> <img src="/_modules/pep/search_trends.svg" width="100%"> Papers with "bioinformatics pipeline" in title </div> </div> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_pipeline_info_highlight.svg" width="700"> --- ## Problem solved? <img src="/_modules/pep/data_interface.svg" width="700"> --- ## Data munging <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input-new_white.svg" width="100%"> </div> <div class="fragment" style="width: 45%;"> <img src="/_modules/pep/data_input-rev_white.svg" width="100%"> </div> </div> --- ## Then, downstream tools need a different organization <img src="/_modules/pep/data_input_steps.svg" width="525"> --- ## What if? <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/_modules/pep/data_input2.svg" width="100%"> </div> <div style="width: 45%;"> <img src="/_modules/pep/data_input2_rev.svg" width="100%"> </div> </div> --- ## The solution <img src="/_modules/pep/data_input_plug.svg" width="625"> --- ## The solution <img src="/_modules/pep/data_input_steps_plug.svg" width="475"> --- ## PEP: A standard format for project metadata <img src="/_modules/pep/pep_center_white.svg" width="700"> --- ## PEP Ecosystem <img src="/_modules/pep/pep_ecosystem.svg" width="750"> --- ## <img src="/_modules/pep/pep_logo.svg" width="70" style="vertical-align: middle;"> PEP format <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> project_config.yaml</span> ```yaml metadata: sample_annotation: /path/to/samples.csv output_dir: /path/to/output/folder ``` --- <span class="bullet"><img src="/_modules/pep/icons/file.svg" width="30" class="bullet"> samples.csv</span> ```csv sample_name, protocol, organism, data_source frog_0h, RNA-seq, frog, /path/to/frog0.gz frog_1h, RNA-seq, frog, /path/to/frog1.gz frog_2h, RNA-seq, frog, /path/to/frog2.gz frog_3h, RNA-seq, frog, /path/to/frog3.gz ``` --- #### Microwave syndrome <div> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="180"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="180"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="180"> </div> <div class="well fragment">In user interface design, prioritizing easy access to integrated functions over their individual components. </div> --- <section style="font-size: 0;"> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1.png" height="120"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB.png" height="120"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ.png" height="120"><br> <img src="/_modules/microwave-syndrome/IFB_17PM-MEC1_console.jpg" height="560"> <img src="/_modules/microwave-syndrome/LG_LMV2031SB_console.jpg" height="560" class="fragment"> <img src="/_modules/microwave-syndrome/panasonic_NN-CT585SBPQ_console.jpg" height="560" class="fragment"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_chunk2.svg" height="650"> </section> --- ### The UNIX philosophy <div class="col2"> <img src="/_modules/microwave-syndrome/unix_book.jpg" height="450"> </div> <div class="col2"> <div class="well"><span style="color:#ffb; font-weight:bold">[T]he power of a system comes more from the relationships among programs than from the programs themselves.</span><br><br> <span style="font-size: 0.8em">Many UNIX programs do quite trivial tasks in isolation, but, combined with other programs, become general and useful tools.</span><br/><br/> <span class="small">- Kernighan and Pike, The UNIX Programming Environment (1983, p. viii)</span> </div> </div> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum.svg" height="650"> </section> --- <section transition="fade-in"> <img src="/_modules/microwave-syndrome/pipelines/modularity_spectrum2.svg" height="650"> </section> --- <section transition="fade-in" id="links1"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links1.svg" height="650"> </section> --- <section transition="fade-in" id="links2"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links2.svg" height="650"> </section> --- <section transition="fade-in" id="links3"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links3.svg" height="650"> </section> --- <section transition="fade-in" id="links5"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links5.svg" height="650"> </section> --- <section transition="fade-in" id="links6"> <img src="/_modules/microwave-syndrome/pipelines/pipeline_links6.svg" height="650"> </section> --- <div class="col2"> ### Problem <img src="/_modules/microwave-syndrome/pep/data_input-new_white.svg" width="375"> </div> <div class="col2 fragment"> ### Solution <img src="/_modules/microwave-syndrome/pep/data_input_plug.svg" width="375"> </div> --- ## PEP Ecosystem <img src="/_modules/pep-details/pep_ecosystem.svg" width="750"> --- ## peppy package <img src="/_modules/pep-details/logo/logo_python.svg" height="50" style="vertical-align: middle;"> ```python import peppy prj = Project("pep_config.yaml") samples = prj.get_samples() for sample in samples: print(sample.name) # do further analysis to each sample ``` <span class="bullet"><img src="/_modules/pep-details/icons/ftnetwork-connected.svg" width="50" class="bullet">Project API</span> --- ## pepr package <img src="/_modules/pep-details/logo/logo_R.svg" height="50" style="vertical-align: middle;"> ```r library("pepr") prj = pepr::Project("pep_config.yaml") samples = pepr::pepSamples(prj) for (sample in samples) { message(pepr::sampleName(sample)) # do further analysis to each sample } ``` --- ## Conclusion - PEP format is a novel approach to standardize *projects*. - Initial tools like `geofetch` and `looper` build PEP projects and connect them to pipelines - Python and R packages provide a universal interface to PEP metadata for tools and analysis More information at [pepkit.github.io](http://pepkit.github.io/). --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>