<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Refgenie and bioconductor Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- # Refgenie and bioconductor ## The problem Many tools require genome-related assets (like indexes). How should we organize these on disk? <img src="/shorts/short-refgenie-bioc/refgenie/folder_structures.svg" style="background:white" width="600"> --- ## A standard organization simplifies tool interface ``` pipeline.py --genome hg38 ``` ``` pipeline.py --bowtie2-index path/to/hg38/bowtie2-index \ --tss_annotation path/to/hg38/tss_annotation.bed \ --ensembl_anno path/to/hg38/ensembl_v86.gtf ``` --- ## Illumina's iGenomes is one answer iGenomes is *a collection of reference sequences and annotation files for commonly analyzed organisms*. You download a tarball of a standard structure for your genome of interest, then write tools off that. --- ## The 'central repository' approach is limited - *Not scripted.* No iGenomes for an arbitrary genome/asset. - *Not modular*. No access to individual assets. - *Not programmatic*. Can't access data/metadata via API. --- ## Refgenie solves these limitations - *Two ways to retrieve an asset.* - `build` any asset from a recipe. - `pull` any individual asset from a server - *Better discoverability*. - `list/listr` shows assets - `refgenieserver` is a browseable web interface and API - *Managed locations*. - `seek` returns the local path to assets - `add/remove` to manage your own assets --- ## Refgenie CLI example ``` refgenie pull hg38/fasta refgenie build hg38/bowtie2_index refgenie seek hg38/bowtie2_index ``` --- ## Using Refgenie from R ``` mod = reticulate::import("refgenconf", convert=FALSE) rgc = mod$RefGenConf(Sys.getenv("REFGENIE")) rgc$pull("hg38", "bowtie2_index", "default") rgc$seek("hg38", "bowtie2_index")) ``` --- ## Resources - These slides: [databio.org/slides](https://databio.org/slides) - Refgenie documentation: [refgenie.databio.org](https://refgenie.databio.org) - Refgenieserver instance: [refgenomes.databio.org](https://refgenomes.databio.org) - GitHub: [github.com/databio/refgenie](https://github.com/databio/refgenie/) --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>