Posts

Musings about science, technology, biology, and computing.

Recently, I needed to run a process on an HPC cluster that required a secret, but I wanted to avoid storing my private key as a file on the cluster for security reasons. Instead, I looked for a way to decrypt an encrypted secret on the HPC while keeping my private key securely on my local machine. A great solution for this is GPG agent forwarding, which allows a remote machine to use a local GPG agent to decrypt secrets. This worked well when I could log into a single head node, but it broke when my HPC cluster implemented a load balancer that assigned me to a random node each time I logged in. The typical approach -- deleting the existing agent socket and reconnecting -- became unreliable. This post explains the problem in detail, walks through several failed solutions, and ultimately presents the working method I found to maintain secure GPG agent forwarding even when connecting through a randomized load balancer.

I use pandoc for converting markdown into PDF. Using LaTeX templates, I can make nice-looking PDFs output that, in my opinion, look as good as a professional publication. But one challenge I've struggled with is how to handle supplementary citations. Here, I outline the problem and describe a lua filter I wrote to solve it.

Video conferencing is on the rise, but people are still learning how to use it. It's convenient in many ways, but also comes with some new challenges that make it harder to communicate than talking face-to-face. In this post I share some specific things I've learned about how to improve communication and reduce fatigue when using video technology.

What's next for the revolution in scientific publishing? The scientific publishing industry is facing challenges to pillars of tradition like peer review and journal subscription. In the face of these challenges, the traditional roles of scientific publishers are deteriorating. New tools are now putting scientific publication within reach of the masses, leading to an explosion of new journals and funding models. As the material cost to publish a scientific paper approaches zero, what stops a scientist from simply self-publishing research? Here, I present my experience building a basic self-publishing system and my perspective on the changing modes of scholarly publication.

Success is nonrandom

By Nathan Sheffield

A philosophical exploration of Darwin's 'survival of the fittest,' arguing that fitness isn't merely about leaving offspring, but represents the nonrandomness of success—challenging Waddington's claim that natural selection is a tautology.

Parallelism in bioinformatics

By Nathan Sheffield

Many common bioinformatic tasks are embarrassingly parallel, and there are many ways to parallelize. The way you decide to parallelize will affect both performance and developer cost, and choosing the best way in practice depends on the specifics of the project. To implement parallelism in a project usually requires significant developer resources, and unfortunately, in my opinion these resources are not always well-spent. This post summarizes my experience with parallelism in bioinformatics into a few useful concepts to guide how to employ it to the greatest benefit in terms of both computational efficiency and respect for developer time.

On content and style: the beauty of markdown

By Nathan Sheffield

How separating content from style—a fundamental principle in CSS—can improve scientific productivity through markdown, enabling portable, reusable content across papers, presentations, CVs, and web pages.

Stop writing shell scripts!

By Nathan Sheffield

Why shell scripts are great for simple tasks but become unmaintainable for complex programs—and why Python is almost always the better choice for anything requiring functions or control structures.

The default path for downloading SRA data

By Nathan Sheffield

How to configure sratoolkit to download SRA files to a shared filesystem instead of your home directory, and how to manage the large temporary .sra files that accumulate over time.

Using git's push-to-deploy feature to push code changes directly to a server's non-bare repository, automatically updating working files—perfect for deploying websites and maintaining read-only code repositories on remote servers.

Open software and data freedom

By Nathan C. Sheffield

A personal essay on why open source software matters—not for cost savings, but for data freedom and control, illustrated through the author's experience with proprietary finance software versus GNUCash.

A lightweight Python class that converts nested dictionaries into objects with attribute-style access (object.value instead of object['value']), perfect for working with YAML configuration files.

Modular presentations with reveal.js and Jekyll

By Nathan Sheffield

A guide to creating modular, web-shareable presentations using reveal.js and Jekyll, combining the power of HTML presentations with templating engines for reusable components and consistent styling across multiple presentations.

Running R CMD check in Docker

By Nathan Sheffield

Simplify R package testing with Docker containers. Run R CMD check and BiocCheck without installing dependencies using a pre-configured Docker image with all necessary prerequisites.

Managing citations

By Nathan Sheffield

Exploring text-based citation management with JabRef and leveraging DOI-based tools like CrossRef's citation formatter for a more stable, programmatic approach to managing academic references.

Saving docker images locally

By Nathan Sheffield

Quick guide on how to save Docker images as tar archives and transfer them between computers without using Docker Hub, using docker save and docker load commands.

Welcome

By Nathan Sheffield

Introducing a new Jekyll-based site for sharing code tutorials, software development, and data resources in computational biology and bioinformatics.

Word Count Line Estimate

By Nathan Sheffield

A fast bash script for estimating line counts in massive files by sampling the first lines and extrapolating based on file size, providing quick approximations without processing entire files.

What are chromosomal breakpoints?

By Nathan Sheffield

An introduction to chromosomal breakpoints—the locations where chromosomes break and reattach during recombination, and how abnormal reattachment can lead to translocations like the Philadelphia chromosome.

What is allelic imbalance?

By Nathan Sheffield

A clear explanation of allelic imbalance—when the two copies of a gene (maternal and paternal) are expressed at different levels, caused by factors like gene imprinting or cis-acting mutations.