<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # BiocProject: a Bioconductor-oriented project management package Nathan Sheffield, Michal Stolarczyk <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- <!-- .slide: data-background="/images/presentations/bg.svg.png" data-transition-speed="slow" --> ### Outline <style> .previewblock { float: left; width: 20px; height: 45px; margin: 0; border: none; white-space: nowrap; box-sizing: border-box; } .questionblock { float: left; width: 100%; margin: 5px 0; border: 1px solid rgba(255, 255, 255, .2); } </style> <div class="previewblock" style="width:40%">Motivation for PEPs</div> <div class="previewblock" style="width:60%"></div> <div class="previewblock" style="width:40%">|</div> <div class="previewblock" style="width:60%"></div> <br clear="all"> <div class="previewblock" style="width:40%; background:#883388">40%</div> <div class="previewblock" style="width:60%; background:#333388">60%</div> <div class="previewblock" style="width:40%"></div> <div class="previewblock" style="width:60%">|</div> <br clear="all"> <div class="previewblock" style="width:40%"></div> <div class="previewblock" style="width:60%">Demonstration of BiocProject</div> <div class="questionblock" style="background:#222; color:#eee; font-size: 0.6em; margin-top: 35px">◁ Questions ▷</div> --- ## Most workflows require individual metadata organization <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/shorts/short-biocproject/data_input-new_white.svg" width="375"> </div> <div style="width: 45%;" class="fragment"> <img src="/shorts/short-biocproject/data_input-rev_white.svg" width="375"> </div> </div> --- ## What if? <div style="display: flex; justify-content: space-between;"> <div style="width: 45%;"> <img src="/shorts/short-biocproject/data_input2.svg" width="325"> </div> <div style="width: 45%;"> <img src="/shorts/short-biocproject/data_input2_rev.svg" width="325"> </div> </div> --- ## The solution <img src="/shorts/short-biocproject/data_input_plug.svg" width="625"> --- ## The solution <img src="/shorts/short-biocproject/data_input_steps_plug.svg" width="475"> --- ## PEP: A standard format for project metadata <img src="/shorts/short-biocproject/pep_center_white.svg" width="700"> --- ## PEP Ecosystem <img src="/shorts/short-biocproject/pep_ecosystem.svg" width="750"> --- ## <img src="/shorts/short-biocproject/pep_logo.svg" width="70" style="vertical-align: middle;"> PEP format <span class="bullet"><img src="/shorts/short-biocproject/icons/file.svg" width="30" class="bullet"> project_config.yaml</span> ```yaml metadata: sample_annotation: /path/to/samples.csv output_dir: /path/to/output/folder ``` --- <span class="bullet"><img src="/shorts/short-biocproject/icons/file.svg" width="30" class="bullet"> samples.csv</span> ```csv sample_name, protocol, organism, data_source frog_0h, RNA-seq, frog, /path/to/frog0.gz frog_1h, RNA-seq, frog, /path/to/frog1.gz frog_2h, RNA-seq, frog, /path/to/frog2.gz frog_3h, RNA-seq, frog, /path/to/frog3.gz ``` --- ## BiocProject integrates PEP into Bioconductor It provides: - automated data loading - functions for interacting with project metadata - PEP-annotated Bioconductor data objects --- ## Install ``` devtools::install_github(repo='pepkit/pepr') devtools::install_github(repo='pepkit/BiocProject') ``` --- ## Load an example PEP with bioconductor section Here's a demo included with the package: ``` metadata: sample_table: sample_table.csv bioconductor: readFunName: readBedFiles readFunPath: readBedFiles.R ``` `readFunName` is an R function that reads in your PEP. `readFunPath` is an R file that contains your function --- ## sample_table | sample_name | file_path | |-------------|-----------| | laminB1Lads | data/laminB1Lads.bed | | vistaEnhancers | data/vistaEnhancers.bed | --- ## readBedFiles.R ``` readBedFiles = function (project) { cwd = getwd() paths = pepr::samples(project)$file_path sampleNames = pepr::samples(project)$sample_name setwd(dirname(project@file)) result = lapply(paths, function(x) { df = read.table(x) colnames(df) = c("chr", "start", "end") gr = GenomicRanges::GRanges(df) }) setwd(cwd) names(result) = sampleNames return(GenomicRanges::GRangesList(result)) } ``` --- ``` library(BiocProject) configFile = system.file("extdata", "example_peps-master", "example_BiocProject", "project_config.yaml", package = "BiocProject") bp = BiocProject(file=configFile) #> Loaded config file: .../example_BiocProject/project_config.yaml #> The 'bioconductor' key found in the Project config #> Used function 'readBedFiles' from the environment ``` --- ``` bp = BiocProject(file=configFile) bp #> GRangesList object of length 2: #> $laminB1Lads #> GRanges object with 1302 ranges and 0 metadata columns: #> seqnames ranges strand #> [1] chr1 11401198-11694590 * #> [2] chr1 14877629-15246452 * #> [3] chr1 18229570-19207602 * #> [4] chr1 29618442-31162049 * #> [5] chr1 33943885-35623392 * #> ... ... ... ... #> [1298] chrX 154066672-154251301 * #> [1299] chrY 2880166-7112793 * #> [1300] chrY 15047033-15333970 * #> [1301] chrY 15603977-16627892 * #> [1302] chrY 16966225-21013116 * #> #> ... #> <1 more element> #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths #> #> metadata: PEP project object. Class: Project #> file: example_BiocProject/project_config.yaml #> samples: 2 ``` --- ``` samples(bp) #> sample_name file_path #> 1: laminB1Lads data/laminB1Lads.bed #> 2: vistaEnhancers data/vistaEnhancers.bed ``` --- ``` config(bp) #> Config object. Class: Config #> metadata: #> sample_table: example_BiocProject/sample_table.csv #> bioconductor: #> readFunName: readBedFiles #> readFunPath: example_BiocProject/readBedFiles.R #> name: example_BiocProject ``` --- ## BiocProject in action ## Zero to hero in 3 lines of code --- 1. Create a PEP: ``` geofetch -i GSE129383 -P /pepatac/pipeline_interface.yaml ... Finished processing 1 accessions Creating complete project annotation sheets and config file... Sample annotation sheet:${SRAMETA}/GSE129383/GSE129383_annotation.csv Sample subannotation sheet:${SRAMETA}/GSE129383/GSE129383_subannotation.csv Config file: ${SRAMETA}/GSE129383/GSE129383_config.yaml ``` This downloads raw data and creates your PEP. --- 2. Run the PEPATAC ATAC-seq pipeline: ``` looper run GSE129383_config.yaml --sp sra_convert looper run GSE129383_config.yaml ``` This runs PEPATAC on your newly-created PEP. --- 3. Load processed data into R with BiocProject: ``` bp = BiocProject::BiocProject("GSE129383_config.yaml") bp ``` This loads your bed files into R. --- ### Conclusion - PEP is a language-agnostic standard *project* representation. - BiocProject loads metadata and data for a PEP into R - Add in `geofetch`, `looper`, and `PEPATAC` to connect raw data through analysis More information at [pepkit.github.io](http://pepkit.github.io/); [code.databio.org/BiocProject](http://code.databio.org/BiocProject). --- ## Acknowledgments <div style="display: flex; justify-content: space-between;"> <div style="width: 30%; font-size: 0.6em;"> <img src="/shorts/short-biocproject/University_of_Virginia_Rotunda_logo.svg" height="30"><img src="/shorts/short-biocproject/University_of_Virginia_logo_white.svg" height="30"> **Sheffield lab** - Ognen Duzlevski - Jianglin Feng - Aaron Gu - Kristyna Kupkova - John Lawson - Vince Reuter - Jason Smith - **Michal Stolarczyk** </div> <div style="width: 30%; font-size: 0.6em;"> **Bioconductor** - Levi Waldron - Sean Davis </div> <div style="width: 30%; font-size: 0.6em;"> **Funding:** <img src="/shorts/short-biocproject/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/shorts/short-biocproject/University_of_Virginia_logo_white.svg" height="40"> <img src="/shorts/short-biocproject/NIH_logo_black.svg" height="100"> NIGMS 1R35GM128636 </div> </div>