<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Using Docker Bioconductor containers for portable, sharable, and reproducible R analysis and package development Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## Outline <div class='col3'> <img src="/slides/docker-bioc/docker_large_v-dark-trans.png" width="125" height="125"><br> Docker basics </div> <div class='col3'> <object data="bioconductor_logo_grey.svg" type="image/svg+xml" width="125" height="125">Browser does not support SVG</object><br><br> Bioconductor containers <div class="small"><a href="#/bioconductor">Jump there</a></div> </div> <div class='col3'> <object data="computer.svg" type="image/svg+xml" width="125" height="125">Browser does not support SVG</object><br><br> Use Cases and Demo </div> --- ## What is Docker? <div class="well"> Docker allows you to package an application <em>with all of its dependencies</em> into a standardized unit ... that <em>contains everything it needs to run</em>: code, runtime, system tools, system libraries ... This guarantees that it will <em>always run the same</em>, regardless of the environment </div> <div class="small" style="text-align:right;">From <a href="https://www.docker.com/whatisdocker">docker.com</a></div> --- ## Sounds like a virtual machine? Docker vs Virtual Machines<br> <div class="col2"> <img src="/slides/docker-bioc/virtualization-icon-300px.png" width="175" height="175"><br> <div style="font-size:.7em; padding:10px"> Virtual machines include the application, the necessary binaries and libraries and an <em>entire guest operating system</em> - which may be tens of GBs in size. </div></div> <div class="col2"> <img src="/slides/docker-bioc/layered_filesystems_sm.png" width="175" height="175"><br> <div style="font-size:.7em; padding:10px"> Containers include the application and all of its dependencies, but <em>share the kernel</em> with other containers. They run as an isolated process in userspace on the host operating system. </div></div> <div class="small" style="text-align:right;">From <a href="https://www.docker.com/whatisdocker">docker.com</a></div> --- ## How is Docker useful? <div style="vertical-align:middle; text-align:left"> <img src="/slides/docker-bioc/yes.png" width="40" style="vertical-align:middle; padding-right:10px">Version controlled environments </div> <div style="vertical-align:middle; text-align:left"> <img src="/slides/docker-bioc/yes.png" width="40" style="vertical-align:middle; padding-right:10px">Increased reproducibility </div> <div style="vertical-align:middle; text-align:left"> <img src="/slides/docker-bioc/yes.png" width="40" style="vertical-align:middle; padding-right:10px">Environment sharing and distribution (<a href="http://hub.docker.com">DockerHub</a>) </div> --- ## Some Terminology - **Image:** A read-only template for containers. Think "Class" - **Container:** An instance of an image (it is created from an image). Think "Object" or "Instance" - **Layer:** An image consists of a series of layers, which are merged in the container. - **Dockerfile:** Instructions used to build an image. - **Registry:** An image storage center, holding public or private images which can be uploaded or downloaded (DockerHub). - **Repository:** A storage area of version-controlled images, like GitHub repositories. --- ## Dockerfiles Dockerfiles contain instructions for building an image. You can link Dockerfiles on GitHub to Dockerhub to trigger auto-builds. ```Docker FROM bioconductor/devel_core MAINTAINER Nathan Sheffield <nathan@code.databio.org> # Updating is required before any apt-gets RUN sudo apt-get update && apt-get install -y --force-yes\ # Required for R Package XML libxml2-dev \ # Curl; required for RCurl; but present in upstream images # libcurl4-gnutls-dev \ # GNU Scientific Library; required by MotIV libgsl0-dev \ # Open SSL is used, for example, devtools dependency git2r libssl-dev \ # CMD Check requires to check pdf size qpdf # Boost libraries are helpful for some r packages RUN sudo apt-get update && apt-get install -y --force-yes \ libboost-all-dev COPY Rprofile .Rprofile COPY Rsetup/install_fonts.R Rsetup/install_fonts.R COPY Rsetup/fonts Rsetup/fonts RUN Rscript Rsetup/install_fonts.R # Install packages COPY Rsetup/Rsetup.R Rsetup/Rsetup.R RUN Rscript Rsetup/Rsetup.R COPY Rsetup/rpack_basic.txt Rsetup/rpack_basic.txt COPY Rsetup/rpack_bio.txt Rsetup/rpack_bio.txt RUN Rscript Rsetup/Rsetup.R --packages=Rsetup/rpack_basic.txt RUN Rscript Rsetup/Rsetup.R --packages=Rsetup/rpack_bio.txt # If you want to develop R packages on this machine (need biocCheck): COPY Rsetup/rpack_biodev.txt Rsetup/rpack_biodev.txt RUN Rscript Rsetup/Rsetup.R --packages=Rsetup/rpack_biodev.txt # CMD Check requires to check pdf size RUN sudo apt-get install -y --force-yes qpdf # Copy over the stuff in Rpack and add it to path COPY Rpack/ Rpack/ ENV PATH Rpack:$PATH ``` <div class="small footnote"> You can find some examples in my <a href="http://github.com/sheffien/docker">Dockerfile repository on github</a> </div> --- ## Some basic commands ```bash user@host$ docker ``` ```bash Commands: build Build an image from a Dockerfile commit Create a new image from a container's changes images List images info Display system-wide information ps List containers pull Pull an image or a repository from a Docker registry server push Push an image or a repository to a Docker registry server rm Remove one or more containers rmi Remove one or more images run Run a command in a new container (and lots more)... ``` --- ## Docker meets Bioconductor <img src="/slides/docker-bioc/docker_large_v-dark-trans.png" width="125" height="125"> <object data="bioconductor_logo_grey.svg" type="image/svg+xml" width="125" height="125">Browser does not support SVG</object><br> Examples of available images: ```bash bioconductor/release_core bioconductor/devel_core bioconductor/release_sequencing bioconductor/devel_sequencing ``` <div class="small">More information at <a href="https://www.bioconductor.org/help/docker/">bioconductor's docker page</a>.</div> --- ## 3 Example Use Cases 1. Containerize R CMD check and BiocCheck 2. Containerize an analysis as a deployable application 3. Maintain a personal/team R container to work from anywhere --- ## Use case 1 An R CMD check container --- ## Rpack.sh Script <object data="file.svg" type="image/svg+xml" width="25" style="vertical-align: top"></object> Rpack.sh ```sh roxygenize.sh -i $1 R --no-save <<END devtools::install_deps("$1"); END a=$(R CMD build $1) echo "Building..$a" # Get the name of the built tarball regex="building '( .* )'" [[ $a =~ $regex ]] name="${BASH_REMATCH[1]}" echo "R CMD check $name..." R CMD check $name echo "R CMD BiocCheck $name..." R CMD BiocCheck $name ``` This script roxygenizes a package, builds it, then runs R CMD check and R CMD BiocCheck. --- ## Container Layers <object data="rdevel_layers.svg" type="image/svg+xml" width="625">Browser does not support SVG</object><br> <object data="file.svg" type="image/svg+xml" width="25" style="vertical-align: top"></object> dockrpack.sh (outside the container) ```bash #! /bin/bash echo $1 docker run -it -v $1:$1 sheffien/rdev bash -c "Rpack.sh $1" ``` --- ## Running R CMD check Now you can run R CMD check and BiocCheck in a container with all requirements, in a single command. ```bash dockrpack.sh $HOME/code/LOLA ``` ```bash Building... * checking for file '/home/nsheffield/code/LOLA/DESCRIPTION' ... OK * preparing 'LOLA': * checking DESCRIPTION meta-information ... OK * installing the package to build vignettes * creating vignettes ... OK * checking for LF line-endings in source and make files * checking for empty or unneeded directories * looking to see if a 'data/datalist' file should be added * building 'LOLA_0.99.9.tar.gz' Built tarball: LOLA_0.99.9.tar.gz R CMD check LOLA_0.99.9.tar.gz... * using log directory '//LOLA.Rcheck' * using R version 3.2.2 (2015-08-14) * using platform: x86_64-pc-linux-gnu (64-bit) * using session charset: UTF-8 * checking for file 'LOLA/DESCRIPTION' ... OK ... R CMD BiocCheck LOLA_0.99.9.tar.gz... * This is BiocCheck, version 1.5.8. * BiocCheck is a work in progress. Output and severity of issues may change. * Installing package... * Checking for version number mismatch... ... Summary: REQUIRED count: 0 RECOMMENDED count: 0 CONSIDERATION count: 4 ``` --- ## Use case 2 Package up your application to make distribution easy --- ## ENTRYPOINT Configuration Add an <A href="https://docs.docker.com/v1.8/reference/builder/#entrypoint">ENTRYPOINT</a> to configure a container as an executable. ```bash # Dockerfile for sheffien/lola FROM sheffien/rdev RUN wget http://big.databio.org/regionDB/LOLACoreCaches_latest.tgz RUN tar -xf LOLACoreCaches_latest.tgz RUN wget http://big.databio.org/regionDB/lola_vignette_data_150505.tgz RUN tar -xf lola_vignette_data_150505.tgz COPY LOLA bin/LOLA ENTRYPOINT ["LOLA", "-d", "LOLACore/hg19", "-u", "data/activeDHS_universe.bed"] ``` Any additional command-line arguments to `docker run` are passed to the ENTRYPOINT executable, like so: ```bash docker run -v $HOME:/data sheffien/lola -i /data/setA_100.bed -o /data ``` We're running a bioconductor package in a <em>portable, version controlled, and self-contained environment</em> (!) --- ## Use case 3 Switch your R production environment to a container --- ## Two Approaches There are two ways to do this: <br clear="all"/> <div class="col2"> <div style="font-size:.7em; padding:10px"> <object data="file.svg" type="image/svg+xml" height="35" style="vertical-align: middle"></object> 1. <em>Use a Dockerfile</em><Br> Rebuild container with each Dockerfile update. </div> </div> <div class="col2"> <div style="font-size:.7em; padding:10px"> <object data="cloud.svg" type="image/svg+xml" height="35" style="vertical-align: middle"></object> 2. <em>Commit changes github-style</em> <br> Push interactive changes to DockerHub. </div> </div> <br clear="all"> <div style="font-size:.7em; padding:10px"> <img src="/slides/docker-bioc/no.png" width="40" style="vertical-align:middle; padding-right:10px"> Both require your production compute environment to allow running docker </div> --- ## DEMO --- ## Try it! ```bash # Grab the latest Bioc devel image (may take awhile) docker pull bioconductor/devel_base # Create and start a container running R (starts instantly!) docker run --name myR -it bioconductor/devel_base R --save --restore ``` Now, from inside R on in the container: ```r # Install some new packages, change the environment > install.packages("data.table") > biocLite("LOLA") > variable = 12345 ``` ```bash # Now, exit (Ctrl+D) and and view the containers (-n shows stopped) docker ps -n 5 # start it up again and see your changes docker start -i myR # Commit and share! docker commit -m "Added LOLA" myR sheffien/newrepo docker images docker push sheffien/newrepo ``` --- ## Thanks for listening!