<style> #title { height: 100% !important; display: flex !important; flex-direction: column !important; justify-content: center !important; } </style> <section id="title" data-background="/images/presentations/bg.svg.png" data-transition-speed="slow"> # Collaborative software development Nathan Sheffield <div class="bullet"> <img src="/images/external/uva_dgs_logo.svg" height="85"> <img src="/images/logo/logo_databio_long.svg" height="65"> </div> <span style="font-size:0.6em"><a href="http://www.databio.org/slides">www.databio.org/slides</a></span> </section> --- ## The three levels of collaboration 0- None 1- One-way communication 2- Conferencing 3- Coordination --- ## Why collaborate on software? <div class="col2"> <img src="/slides/collaborative-software-development/afgan.png" width=400 style="margin:0px; padding:0px"> <img src="/slides/collaborative-software-development/morin.png" width=400 style="margin:0px; padding:0px"> <img src="/slides/collaborative-software-development/nature.png" width=400 style="margin:0px; padding:0px"> <img src="/slides/collaborative-software-development/groen.png" width=400 style="margin:0px; padding:0px"> </div> <div class="col2"> <img src="/slides/collaborative-software-development/garijo.png" width=400 style="margin:0px; padding:0px"> <img src="/slides/collaborative-software-development/nbt.png" width=400 style="margin:0px; padding:0px"> <img src="/slides/collaborative-software-development/stodden.png" width=400 style="margin:0px; padding:0px"> </div> --- ## Why collaborate on software? Because collective progress increases with increased collaboration. <img src="/slides/collaborative-software-development/increase.svg" height=400> --- ## But I don't develop software! <img src="/slides/collaborative-software-development/development.svg" height="400"> <h2 class="fragment">Yes you do. <br> Data analysis is software development</h2> --- ## Levels of collaboration --- ### 0. None <img src="/slides/collaborative-software-development/none.svg" height=100> I write and use code for my project. --- ### 1. One-way Communication. <img src="/slides/collaborative-software-development/communication.svg" height=100> I give you my script and you run it. Analogy: TV --- ### 2. Conferencing. <img src="/slides/collaborative-software-development/conferencing2.svg" height=200> Interactive work toward a shared goal; collecting bug reports and user feedback. Analogy: Brainstorming conference call. --- ### 3. Coordination. <img src="/slides/collaborative-software-development/coordination.svg" height=200> Interdependent work toward a shared goal. Analogy: a sports team. Everyone contributes, adjusts to others, and does something different. --- ### How do we move toward coordination? <div class="bullet">0- None <img src="/slides/collaborative-software-development/none.svg" height="50"></div> <div class="bullet">1- One-way communication <img src="/slides/collaborative-software-development/communication.svg" height="50"> </div> <div class="bullet">2- Conferencing <img src="/slides/collaborative-software-development/conferencing2.svg" height="100"></div> <div class="bullet">3- Coordination <img src="/slides/collaborative-software-development/coordination.svg" height="100"></div> --- <img src="/slides/collaborative-software-development/git_logo_white.svg" height="400"> <img src="/slides/collaborative-software-development/github_bug_black.svg" height="400"> --- <div class="col2"> Git<br> <img src="/slides/collaborative-software-development/git_logo_white.svg" height="100"><br> a distributed version-control system that tracks changes in software development <br><br> <ul> <li>created by Linus Torvalds in 2005 for development of the Linux kernel</li> <li>free and open-source (GPL2)</li> </ul> </div> <div class="col2"> Github<br> <img src="/slides/collaborative-software-development/github_bug_black.svg" height="100"><br> a web-based hosting service for version control using Git <br> <br><br> <ul> <li>company started Feb. 2008</li> <li>purchased by Microsoft for $7.5 billion in 2018</li> </ul> </div> --- ## git/github ecosystem ### version control [centralized vs distributed](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control) [git vs svn](https://trends.google.com/trends/explore?date=all&geo=US&q=git,svn) ### distribution [the octoverse](https://octoverse.github.com/) ### collaboration [dashboard](https://github.com/orgs/databio/dashboard) --- ## Git solves problems ### Version control --- ### Problem 1 #### My computer crashed and I lost all my code. Solution: Remote backup (S3?) *or* git + GitHub --- ### Problem 2 #### I want to work on my code from my home and work computers Solution: Remote working copy (Dropbox?) *or* git + GitHub --- ### Problem 3 #### My changes broke this function and I can't remember how it used to work. Solution: Manual version control: "code1.R" and "code2.R"? *or* git + GitHub --- ### Problem 4 #### I can't remember what code I used on this sample last year. Or, I want to note this particular version because I used it for the initial paper submission. Solution: Version control + unstructured notes/logs? *or* git + GitHub tags --- ### Problem 5: My remote backup crashed and I lost all my history. Solution: More remote backups (*distributed* VCS)? *or* git + GitHub --- ## Git solves problems ### Distribution --- ### Problem 1 #### I want to publish my code with my paper so others can find and use it. How should I do it? Solution: Website? *or* git + GitHub --- ### Problem 2 #### How can I get a permanent, fast URL for my software so I can build an automated container that will download and install it automatically? Solution: A high-quality code hosting service? *or* git + GitHub --- ### Problem 3 #### I'd like other people to be able to find and use my code. How can I advertise it? Solution: Google adwords? *or* git + GitHub --- ### Problem 4 #### How can I find software that people actually use that's relevant for my project? Solution: Google? *or* git + GitHub --- ## Git solves problems ### Collaboration --- ### Problem 1 #### Someone else found a bug in my code and wants to show me how to fix it. Solution: E-mail? *or* User submits a pull request on GitHub. You can also [point to specific lines](https://github.com/databio/pypiper/blob/653216887cb2b2ad8e9119b76f40b39da58ec115/pypiper/ngstk.py#L72-L75). --- ### Problem 2 #### My friend and I are working on a similar problem. How can we share our code with one another, but not with anyone else? Solution: E-mail? Dropbox? *or* GitHub collaborators or organizations --- ### Problem 3 #### My collaborator wants to keep using my code for this current project while I develop and test a new feature. Solution: Duplicate the code? *or* git branches + GitHub --- ### Problem 4 #### A user is having trouble getting something to work. How do they know who I am and how to contact me? Solution: An E-mail address on a web page? *or* git + public GitHub issues --- ### Problem 5 #### I figured out how to adapt this published tool to work for my data. How can I contribute these changes back to the original authors? Solution: E-mail? *or* git + GitHub pull request --- ### Problem 6: #### Our lab/center all needs to do on a similar thing over and over, with slight differences. How can we share effort but also keep things separate? Solution: Lots of duplicated scripts with minor tweaks? *or* git + GitHub branches and tags --- ## Key git/github concepts ### repository *vs* remote ### branch *vs* clone ### clone *vs* fork ### pull request *vs* merge ### commit *vs* push ### issue, tag, [stage](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository) --- ## How git works ### And things to avoid --- ### Do: commit text files Git uses line-by-line comparison. See this [pull request on the `peppy` repository](https://github.com/pepkit/peppy/pull/238/files) ### Don't: commit binary files --- ### Do: commit small versioned files Git retains a copy of everything you've committed, even if you delete it. ### Don't: commit large static files --- ### Do: make commits frequently Nothing can't be undone. Frequent commits helps you track your work. ### Don't: be scared to break something --- ### Do: learn to use branches [Branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) are a super useful organizational structure ### Don't: be scared of using branches --- ### Do: use the command line Write your own [aliases](https://github.com/nsheff/env/blob/master/alias_git.sh) for commands you use frequently. ### Don't: just rely on the web interface --- ### Do: use the issue tracker Every project can enable a GitHub issue tracker, which links nicely to code. ### Don't: use e-mail to document problems and solutions --- ### Other niceties - *GitHub pages*: free hosting for static web pages - *Jekyll*: Github's blog-aware static site generator - *Git hooks*: executes scripts before or after events - *Github Wiki*: a no-frills wiki on every repository - *GitHub project tracker*: integrates a simple kanban system - *Github API*: provides programmatic access - *Gists*: small code snippets - Free private repositories for individuals - Free private repositories for academic groups --- ### Git's utility transcends software - analytical code, not just tools - VCS/collaboration for writing grants, papers, CV/biosketch - VCS and host for lab web page and all code documentation - citation management database - shared lab instructions - Environments: modulefiles, Dockerfiles, config files - a shared figure repository for lab members - presentations - communicating with groups of people, brainstorming --- ### Git is a single infrastructure that provides solutions to a huge number of problems --- [peppy repository](https://github.com/pepkit/peppy) ---