<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Coordinate covariation analysis John Lawson, Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## Coordinate Covariation Analysis (COCOA) <a href="http://code.databio.org/COCOA/">http://code.databio.org/COCOA/</a> <span class="small bullet"><img src="/images/icons/paper.svg" height="25" class="bullet">Lawson et al. (2020). <i>Genome Biology</i>.</span> --- #### We want to understand variation between individuals. <div class="col3"> <img src="/_modules/cocoa-intro/differential-variation.svg"> <br> Differential analysis. </div> <div class="col3 fragment"> <img src="/_modules/cocoa-intro/continuous-variation.svg"> <br> Continuous variation. </div> <div class="col3 fragment"> <img src="/_modules/cocoa-intro/unsupervised-variation.svg"> <br> Unsupervised analysis. </div> <br clear="all"> <div class="fragment" style="text-align: left;"> COCOA focuses on continuous variation and doesn't require known groups. </div> --- #### DNA methylation is high-dimensional and low-interpretable  Can we use region sets to biologically annotate the source of that variation? --- #### Big picture steps: 1. Quantify variation with PCA. 2. Annotate PCs with region sets. --- #### COCOA workflow  --- #### PCA for breast cancer data from TCGA <div style="float:left">  </div> --- #### Top hits 1. Gata3 2. H3R17me2 3. ER 4. Foxa1 5. AR - FOXA1 is a key determinant of estrogen receptor function and endocrine response. - GATA-3 expression in breast cancer has a strong association with estrogen receptor - Upon estrogen stimulation, the E2F1 promoter is subject to H3R17me2... --- # PC1 does indeed split samples by ER status  --- #### Raw DNA Methylation in ER binding regions  --- #### Rank distribution for ER-related regions  ER-related regions have higher loadings on PC1 --- #### But is the variation specific to the binding sites?  A peak suggests specificity to that genomic locus --- #### Conclusions - COCOA provides a method to understand continuous regulatory variation - Availability: Bioconductor, http://code.databio.org/COCOA/