<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Methods for analyzing non-coding genomic intervals and their applications in cancer biology Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- <!-- .slide: data-background="/images/presentations/bg.svg.png" data-transition-speed="slow" --> ### Outline <style> .previewblock { float: left; width: 20px; height: 45px; margin: 0; border: none; white-space: nowrap; box-sizing: border-box; } .questionblock { float: left; width: 100%; margin: 5px 0; border: 1px solid rgba(255, 255, 255, .2); } </style> <div class="previewblock" style="width:20%">Background, LOLA/MIRA</div> <div class="previewblock" style="width:35%">COCOA</div> <div class="previewblock" style="width:35%">RegionSet2vec</div> <div class="previewblock" style="width:10%">Other projects</div> <div class="previewblock" style="width:20%">|</div> <div class="previewblock" style="width:35%">|</div> <div class="previewblock" style="width:35%">|</div> <div class="previewblock" style="width:10%">|</div> <br clear="all"> <div class="previewblock" style="width:20%; background:#333388">20%</div> <div class="previewblock" style="width:35%; background:#338888">35%</div> <div class="previewblock" style="width:35%; background:#338833">35%</div> <div class="previewblock" style="width:10%; background:#883388">10%</div> <div class="previewblock" style="width:20%"></div> <div class="previewblock" style="width:35%"></div> <div class="previewblock" style="width:35%"></div> <div class="previewblock" style="width:10%"></div> <br clear="all"> <div class="previewblock" style="width:20%"></div> <div class="previewblock" style="width:35%"></div> <div class="previewblock" style="width:35%"></div> <div class="previewblock" style="width:10%"></div> <div class="questionblock" style="background:#222; color:#eee; font-size: 0.6em; margin-top: 35px">◁ Questions ▷</div> --- # Biological motivation <div class="col2"> <br><br> <img src="/_modules/bio-motivation-regulatory-dna/dna_folding_diversity.svg" width="400"><br/> Cells alter phenotype by using DNA differently. <br> </div> <div class="col2 fragment"> <img src="/_modules/bio-motivation-regulatory-dna/differentation_gone_awry.svg" width="500"><br/> Breakdowns lead to disease </div> --- <section> <img src="/slides/methods-for-analyzing-non-coding-genomic-intervals-and-cancer/chromatin-generic.svg"> </section> --- <section> <img src="/slides/methods-for-analyzing-non-coding-genomic-intervals-and-cancer/chromatin-generic-experiments.svg"> </section> --- # <span class="bullet"><img src="/_modules/region-pooling/merge.svg" width="50" class="bullet">Region pooling</span> --- <!-- .slide: data-transition="fade-in" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <!-- .slide: data-transition="fade-in fade-out" -->  --- <img src="/_modules/lola-intro/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> ### Locus Overlap Analysis <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Sheffield and Bock (2016). <i>Bioinformatics</i>.</span><br/> <span class="small bullet"><img src="/_modules/lola-intro/paper.svg" height="25" class="bullet">Nagraj, Magee, and Sheffield (2018). <i>Nucleic Acids Research</i>.</span> --- <img src="/shorts/lola/LOLA-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="http://code.databio.org/LOLA/">http://code.databio.org/LOLA/</a><br> </div> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Sheffield and Bock (2016). *Bioinformatics*.</span><br/> <span style="font-size: 0.8em;"><img src="/shorts/lola/paper.svg" height="25" style="vertical-align: text-bottom; margin-right: 5px;">Nagraj, Magee, and Sheffield (2018). *Nucleic Acids Research*.</span> ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- ## LOLAweb <img src="/shorts/lola/LOLAweb-logo-white.svg" width="275" style="padding-top:25px; padding-bottom:25px"> A shiny app and server for interactive LOLA analysis. Public server: [http://lolaweb.databio.org](http://lolaweb.databio.org) GitHub: [https://github.com/databio/LOLAweb](https://github.com/databio/LOLAweb) --- ### DEMO <video controls width="800"> <source src="lw.webm" type="video/webm"> Your browser does not support the video tag. </video> --- ## Methylation-based Inference of Regulatory Activity (MIRA) <div class="small"> <a href="http://code.databio.org/MIRA/">http://code.databio.org/MIRA/</a><br> </div> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Lawson et al. (2018). <i>Bioinformatics</i>.</span> --- <!-- .slide: class="center" --> <img src="/_modules/project-cover/{{project_logo}}" width="275" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="{{project_url}}">{{project_url}}</a><br> </div> <span class="small bullet"><img src="/_modules/project-cover/icons/paper.svg" height="25" class="bullet" style="vertical-align: text-bottom; margin-right: 5px;">{{project_citations}}</span> --- ## MIRA concept <img src="/_modules/mira/mira.svg" /> --- ## MIRA workflow <img src="/_modules/mira/mira2.svg" /> --- ## MIRA analysis <img src="/_modules/mira/mira3.svg" /> --- ## MIRA results: Differential activity <img src="/_modules/mira/ews_pat/MIRA_result_1.svg" width="700"/> <span class="small bullet"><img src="/_modules/mira/icons/paper.svg" height="25" class="bullet">Sheffield et al. (2017). <i>Nature Medicine</i>.</span> --- ## MIRA results: Activity scores <img src="/_modules/mira/ews_pat/MIRA_result_2.svg" /> --- ## MIRA results: Enrichment analysis <img src="/_modules/mira/ews_pat/MIRA_result_3.svg" width="800"/> --- ## Coordinate Covariation Analysis (COCOA) <img src="/_modules/cocoa/cocoa_logo_light.svg" width="225" style="padding-top:25px; padding-bottom:25px"> <br> <div class="small"> <a href="http://code.databio.org/COCOA/">http://code.databio.org/COCOA/</a><br> </div> <span class="small bullet"><img src="/_modules/genomicdistributions/paper.svg" height="25" class="bullet">Lawson et al. (2020). <i>Genome Biology</i>.</span> --- <style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Methods for analyzing non-coding genomic intervals and their applications in cancer biology Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- @title --- @cocoa --- ## Acknowledgments <div class="col3" style="font-size:.6em"> <img src="/slides/cocoa/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/slides/cocoa/University_of_Virginia_logo_white.svg" height="40"> **Collaborators** - Fran Garrett-Bakelman - Stefan Bekiranov </div> <div class="col3" style="font-size:.6em"> **Sheffield lab** - **John Lawson** - **Jason Smith** - Jianglin Feng - Michal Stolarczyk - Kristyna Kupkova - Aaron Gu - Jose Verdezoto - Tessa Danehy </div> <div class="col3" style="font-size:.6em"> **Funding:** <img src="/slides/cocoa/University_of_Virginia_logo_white.svg" height="40"> UVA Cancer Center <img src="/slides/cocoa/NIH_logo_black.svg" height="80"> </div> --- ## Acknowledgments <div class="col3" style="font-size:.6em"> <img src="/slides/cocoa/University_of_Virginia_Rotunda_logo.svg" height="40"><img src="/slides/cocoa/University_of_Virginia_logo_white.svg" height="40"> **Collaborators** - Fran Garrett-Bakelman - Stefan Bekiranov </div> <div class="col3" style="font-size:.6em"> **Sheffield lab** - **John Lawson** - **Jason Smith** - Jianglin Feng - Michal Stolarczyk - Kristyna Kupkova - Aaron Gu - Jose Verdezoto - Tessa Danehy </div> <div class="col3" style="font-size:.6em"> **Funding:** <img src="/slides/cocoa/University_of_Virginia_logo_white.svg" height="40"> UVA Cancer Center <img src="/slides/cocoa/NIH_logo_black.svg" height="80"> </div> --- ### Region-set 2 Vec Embeddings of genomic region sets <br> in lower dimensions. <div class="small"> <a href="https://github.com/databio/regionset-embedding">https://github.com/databio/regionset-embedding</a><br> </div> <span class="small bullet"><img src="/_modules/regionset2vec/paper.svg" height="25" class="bullet"><a href="https://doi.org/10.1093/bioinformatics/btab439">Gharavi et al. (2021). <i>Bioinformatics</i>.</a></span> <br> <div style="padding:12px; font-size: 16pt; display:inline-block;"> <span style="border: 0px solid grey; float:right; margin: 0px 4px; padding: 0px 4px"> <img src="/_modules/regionset2vec-extension/Erfaneh.jpg" width="100" style="margin:0px;"> <br>Erfaneh Gharavi </span> </div> --- <div> <h3>Region-set 2 Vec</h3> Embeddings of genomic region sets <br> in lower dimensions. <div class="small"> <a href="https://github.com/databio/regionset-embedding">https://github.com/databio/regionset-embedding</a><br> </div> </div> <span class="small bullet"><img src="/_modules/regionset2vec/paper.svg" height="25" class="bullet"><a href="https://doi.org/10.1093/bioinformatics/btab439">Gharavi et al. (2021). <i>Bioinformatics</i>.</a></span> --- ### Word embeddings <img src="/_modules/regionset2vec/word-vector-space-similar-words.jpg" width="680"> <div class="small">http://suriyadeepan.github.io</div> --- ### Word2vec model <img src="/_modules/regionset2vec/mikolov2013_fig1.png" width="680"> <br><span class="small bullet"><img src="/_modules/regionset2vec/paper.svg" height="25" class="bullet"><a href="https://arxiv.org/abs/1301.3781">Mikolov et al. (2013). <i>arXiv:1301.3781v3</i>.</a></span> --- ### Word context <img src="/_modules/regionset2vec/word-context.png" width="640" style="background:white"> <div class="well"> You shall know a word by the company it keeps. (Firth 1957)<br> Words that occur in similar contexts tend to have similar meanings. </div> <div class="small">Image credit: Shubham Agarwal</div> --- ### Genomic context <div class="well"> A genomic interval is more likely to appear in a BED file with other genomic intervals of a similar function. </div> --- <img src="/_modules/regionset2vec/complexity-scale.svg" width="1040"> --- ### Genomic Interval Embeddings <img src="/_modules/regionset2vec/method_detail_v3.svg" width="1040" style="background:white"> --- ### Evaluation We have created unsupervised 100-dimensional vector representations (embeddings) of region sets.<br> Do relationships among vectors reflect biology? <div class="fragment"> <img src="/_modules/regionset2vec/method_overview_v3.svg" width="1040" style="background:white"> </div> --- ## Evaluation 1: Classification performance <img src="/_modules/regionset2vec/evaluation-classification-result.svg" width="740"> --- ## Evaluation 1: Classification performance <img src="/_modules/regionset2vec/evaluation-classification-result-2.svg" width="740"> --- ### Evaluation 1: Classification performance <img src="/_modules/regionset2vec/umap_classification.svg" width="740" style="background:white"> <div class="fragment"> <img src="/_modules/regionset2vec/umap_classification2.svg" width="740" style="background:white"> </div> --- ### Conclusion <ul> <li>Regionset2vec adapts word2vec to learn genomic region embeddings</li> <li>Regionset2vec embeddings capture biological information</li> <li>NLP approaches can be adapted for applications in genomic interval analysis</li> </ul> --- <section> ### Future applications <div class="col2">Cancer mutations <img src="/slides/methods-for-analyzing-non-coding-genomic-intervals-and-cancer/nlp/mutation_maf.svg" style="background:white"> </div> <div class="col2">Single-cell <img src="/slides/methods-for-analyzing-non-coding-genomic-intervals-and-cancer/nlp/single-cell.svg" style="background:white"> </div> </section> --- ### BEDbase A high-performance server and API <br> for genomic interval data. <br><span class="small bullet"><img src="/_modules/bedbase-teaser/paper.svg" height="25" class="bullet"><a href="https://bedbase.org">bedbase.org</a></span> --- <div class="col2"> <img src="/_modules/refgenie-teaser/refgenie_logo_light.svg" style="padding-top:25px; padding-bottom:25px; width: 350px"> <br> Reference genome manager <div class="small"> <a href="http://refgenie.databio.org">http://refgenie.databio.org</a><br> </div> <span class="small bullet"><img src="/_modules/refgenie-teaser/paper.svg" height="25" class="bullet"><a href="https://www.biorxiv.org/content/10.1093/gigascience/giz149">Stolarczyk et al. (2020).</a> <i>GigaScience</i>.</span><br/> <span class="small bullet"><img src="/_modules/refgenie-teaser/paper.svg" height="25" class="bullet"><a href="https://doi.org/10.1093/nargab/lqab036">Stolarczyk, Xue, and Sheffield (2021).</a> <i>NAR Genomics and Bioinformatics</i>.</span><br/> </div> <div class="col2"> <img src="/_modules/refgenie-teaser/refgenie_interfaces.svg" style="background:white" width="550"> <pre><code>refgenie pull hg38/bowtie2_index</code></pre> </div> --- <div> ### <span class="bullet"><img src="/_modules/pep/pep_logo_white.svg" width="60"> Portable Encapsulated Projects (PEP)</span> A structure and toolkit for organizing large-scale, <br> sample-intensive biological research projects<br> </div> <span class="small bullet"><img src="/_modules/genomicdistributions/paper.svg" height="25" class="bullet"><a href="http://dx.doi.org/10.1093/gigascience/giab077">Sheffield et al. (2021).</a> <i>GigaScience</i>. <a href="http://pep.databio.org/">http://pep.databio.org/</a></span> <img src="/_modules/pep/pep_workflow.svg" width="550" style="align:center"><br/> --- <style> #acknowledgements { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="acknowledgements" data-background="/images/presentations/bg.svg.png"> # Thank You <br clear="all"/> <span class="small bullet"><img src="/images/external/github_bug_black.svg" height="20" class="bullet"><a href="http://github.com/nsheff">nsheff</a></span> · <span class="small bullet"><img src="/images/icons/web.svg" height="25" class="bullet"><a href="http://databio.org">databio.org</a></span> · <span class="small bullet"><img src="/images/icons/letter.svg" height="25" class="bullet"><a href="mailto:nsheffield@virginia.edu">nsheffield@virginia.edu</a></span> <div class="bullet" style="background-color:rgb(45,45,45,.65); border-radius: 25px; opacity:0.9"> <img src="/images/external/uva_dgs_logo.svg" height="65"> <img src="/images/logo/logo_databio_long.svg" height="45"> </div> </section>