<style> .reveal .slides section .fragment.highlight-skyblue.visible { color: skyblue; } .reveal .slides section .fragment.highlight-gold.visible { color: gold; } .reveal .slides section .fragment.highlight-so.visible { text-decoration: line-through; } .reveal .slides section .fragment.highlight-skyblue, .reveal .slides section .fragment.highlight-so, .reveal .slides section .fragment.highlight-gold { opacity: 1; visibility: inherit; } .reveal .blackwell { min-height:20px;padding:15px;margin-bottom:10px;background-color:#111; border:1px solid #444; font-size:95%; } .reveal h1 { font-size:40pt; } </style> # Clarity: # Strategies for revising scientific writing <img src="/slides/clarity-scientific-writing/time_machine.jpg" height="400"><br/> Tempers flare when Professors Carlson and Lazzell, working independently, ironically set their time machines to identical coordinates. --- ## Let's start with an example --- ## Example 1 What makes this sentence unclear? <div class="blackwell"> The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes. </div> --- ## Example 1 Distance between subject and verb <div class="blackwell"> <span style="color:gold">The assumptions</span> that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, <span style="color:gold">all represent</span> oversimplications of the complex process of sequence evolution in eukaryotic genomes. </div> --- ## Example 1 Complex subject <div class="blackwell"> <span style="color:skyblue">The assumptions</span> <span style="color:orchid">that all sites evolve at one of two evolutionary rates (conserved and nonconserved)</span>, <span style="color:lightgreen">that these rates are uniform across the genome</span>, <span style="color:cyan">that sites evolve independently conditional on whether they are in conserved or nonconserved regions</span>, <span style="color:orange"> and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns</span>, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes. </div> --- ## Example 1 <span style="color:lime">verbs</span> vs. <span style="color:skyblue">Implied actions</span> (nominalizations) <div class="blackwell"> The <span style="color:skyblue">assumptions</span> that all sites <span style="color:lime">evolve</span> at one of two <span style="color:skyblue">evolutionary</span> rates (<span style="color:skyblue">conserved</span> and <span style="color:skyblue">nonconserved</span>), that these rates <span style="color:lime">are</span> uniform across the genome, that sites evolve independently conditional on whether they <span style="color:lime">are</span> in <span style="color:skyblue">conserved</span> or <span style="color:skyblue">nonconserved</span> regions, and that the phylogenetic <span style="color:skyblue">models</span> for <span style="color:skyblue">conserved</span> and <span style="color:skyblue">nonconserved</span> regions <span style="color:lime">have</span> the same branch-length proportions, base <span style="color:skyblue">compositions</span>, and <span style="color:skyblue">substitution patterns</span>, all <span style="color:lime">represent</span> <span style="color:skyblue">oversimplications</span> of the complex <span style="color:skyblue">process</span> of <span style="color:skyblue">sequence evolution</span> in eukaryotic genomes. </div> --- ## Example 1 <span style="color:gold">List</span> precedes its <span style="color:skyblue">context</span> <div class="blackwell"> <span style="color:gold">The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns,</span> <span style="color:skyblue">all represent oversimplifications of the complex process of sequence evolution in eukaryotic genomes.</span> </div> --- ## Conciseness <br> ↑ ↓ <br> Clarity <br> ↑ ↓ <br> Cohesion --- ## The Four Problems --- ## The Four Problems Things that make scientific writing unclear 1. Subjects and verbs too far apart 2. Overabundance of nominalizations 3. Poor flow (misplacement of old vs new information) 4. Excessive or unnecessary use of passive voice <br><br> **NOT the complexity of the topic!** --- ## The Four Problems <span style="color:gold">Subjects</span> and <span style="color:skyblue">verbs</span> too far apart - <span style="color:gold">Who</span> did it, and <span style="color:skyblue">what</span> did they do? English readers expect <span style="color:gold">doers</span> to be near their <span style="color:skyblue">actions</span>. - Complex subjects (subjects modified with essential clauses) can violate this expectation. --- ## The Four Problems <span style="color:gold">Subjects</span> and <span style="color:skyblue">verbs</span> too far apart Complex subject: <div class="blackwell"> <span style="color:gold">Analysis of peak composition of clusters containing more than 20 peaks (large clusters hereafter) to identify a minimal required set to determine the clusters</span> <span style="color:skyblue">identified</span> mediator and cohesin subunits as the best individual features. </div> Simplified subject: <div class="blackwell"> <span style="color:gold">Mediator and cohesin subunits</span> <span style="color:skyblue">were identified</span> as the best individual features by analysis of peak composition of clusters containing more than 20 peaks <span style="color:gray">(large clusters hereafter) to identify a minimal required set to determine the clusters.</span> </div> --- ## The Four Problems Overabundance of nominalizations - English readers expect actions to be in verbs. - Nominalizations are actions that appear in parts of a sentence other than a verb (e.g. in nouns or adjectives). - Some nominalizations are clear, but many reduce clarity. --- ## The Four Problems Overabundance of nominalizations Actions in Nominalizations: <div class="blackwell"> The <span style="color:skyblue">assumption</span> that all RNAs are poly- adenylated <span style="color:gold">is</span> an <span style="color:skyblue">oversimplification</span> of the transcription process.</div> Actions in Verbs: <div class="blackwell"> The model <span style="color:skyblue">oversimplifies</span> the transcription process because it <span style="color:skyblue">assumes</span> that all RNAs are polyadenylated.</div> --- ## The Four Problems Poor flow (lack of cohesion) A cohesive sentence <span style="color:lightgreen">links</span> with neighboring sentences by <span style="color:skyblue">starting with familiar ideas</span> and <span style="color:gold">ending with new ideas</span>. <br><br> <span style="color:skyblue">old</span> → <span style="color:gold">new</span> <br><br> Disrupt flow by: - Starting with unfamiliar ideas - Ending with backwards-linking ideas <br><br> Cohesion matters at both sentence-level and paragraph-level. --- ## Cohesion visualization <img src="/slides/clarity-scientific-writing/cohesion.svg" height="500"><br/> --- <img src="/slides/clarity-scientific-writing/time_machine.jpg" height="500"><br/> Tempers flare when Professors Carlson and Lazzell, working independently, ironically set their time machines to identical coordinates. Note: - From the opening --- <img src="/slides/clarity-scientific-writing/time_machine.jpg" height="500"><br/> With their time machines ironically set to identical coordinates while working independently, Professors Carlson and Lazzell's tempers flare. Note: - This is not as funny because the ordering is off. - The nugget of hilarity is at the beginning instead of the end. --- ## The Four Problems Poor flow (lack of cohesion) (in a paper about farmers...) <div class="blackwell"> Farmers try to provide optimal growing conditions for crops by using soil additives to adjust soil pH. Garden lime, or agricultural limestone, is made from pulverized chalk, and can be used to raise the pH of the soil. Clay, which is a naturally acidic soil type, often requires addition of agricultural lime.</div> --- ## The Four Problems <span style="color:skyblue">old information</span> vs <span style="color:gold">new information</span> (in a paper about farmers...) <div class="blackwell"><span style="color:skyblue">Farmers</span> try to provide optimal growing conditions for crops by using soil additives to adjust <span style="color:gold">soil pH</span>. <span style="color:gold">Garden lime</span>, or agricultural limestone, is made from pulverized chalk, and can be used to raise the <span style="color:skyblue">pH of the soil</span>. <span style="color:gold">Clay</span>, which is a naturally acidic soil type, often requires addition of <span style="color:skyblue">agricultural lime</span>.</div> --- ## The Four Problems <span style="color:skyblue">old information</span> vs <span style="color:gold">new information</span> (in a paper about farmers...) <div class="blackwell"><span style="color:skyblue">Farmers</span> try to provide optimal growing conditions for crops by using soil additives to adjust <span style="color:gold">soil pH</span>. One way to raise the <span style="color:skyblue">pH of the soil</span> is an additive made from pulverized chalk called <span style="color:gold">garden lime</span> or agricultural limestone. <span style="color:skyblue">Agricultural limestone</span> is often added to naturally acidic soils, such as <span style="color:gold">clay</span>.</div> --- ## The Four Problems Excessive or unnecessary use of passive voice <div class="col2"> ### Active <span style="color:skyblue">I</span> stole <span style="color:gold">the money</span> </div> <div class="col2"> ### Passive <span style="color:gold">The money</span> was stolen <span class="fragment fade-out"> by <span style="color:skyblue">me</span></span> </div> <div> Passive voice has side-effects:<br> - It often increases length - It can eliminate the actor (causing ambiguity) - **Reverses the order of the sentence (A-B vs. B-A)** </div> --- ## The Four Problems Excessive or unnecessary use of passive voice - Consider cohesion: Don't choose passive voice simply out of habit. Do choose passive voice when it <span style="color:gold">improves cohesion by putting familiar ideas first.</span> - Most scientific journals encourage authors to use active voice for the sake of clarity, conciseness, and cohesion. - **Passive voice is NOT inherently scientific!** --- ## Passive voice What do the journals say? ### Science Use active voice when suitable, particularly when necessary for correct syntax (e.g., 'To address this possibility, we constructed a λZap library...,' not 'To address this possibility, a λZap library was constructed...'). --- ## Passive voice What do the journals say? ### Nature Active voice has been Nature policy for as long as I can remember; it is enshrined in our style manual and is specifically recommended to all authors as part of our standard acceptance procedure. However, if an author insists on the passive, we would probably allow it... So you will see papers in Nature in the passive voice, but you can be assured that this is at the author's insistence rather than Nature policy. - Maxine Clark, editor --- ## Revision techniques --- ## Revision techniques Ways to improve clarity, conciseness, and cohesion - Omit needless words - Put actions in verbs - Use nominalizations to summarize - Place verbs near subjects - Put familiar information first --- ## Revision techniques Omit needless words 1. It is absolutely vital that... <br> → We must... 2. At the same time... <br> → Simultaneously/furthermore... 3. There were five mice receiving antibiotics... <br> → Five mice received antibiotics. --- ## Revision techniques Put actions in verbs 1. We performed an <span style="color:skyblue">analysis</span>... <br> → We <span style="color:skyblue">analyzed</span> 2. The <span style="color:skyblue">quantification</span> of the atoms was done... <br> → The atoms were <span style="color:skyblue">quantified</span>. 3. The MS managed the <span style="color:skyblue">measurement</span> and <span style="color:skyblue">identification</span> of the proteins. <br> → The MS <span style="color:skyblue">measured</span> and <span style="color:skyblue">identified</span> the proteins. --- ## Revision techniques Use summarizing nominalizations Nominalizations are useful when they summarize the action of the previous sentence: <div class="blackwell">Our analysis using regression and k-means clustering revealed...</div> <div class="blackwell"> → We <span style="color:skyblue">analyzed</span> the data with regression and k-means clustering. This <span style="color:skyblue">analysis</span> revealed...</div> --- ## Revision techniques Use summarizing nominalizations Complex subject: <div class="blackwell"> Analysis of peak composition of clusters containing more than 20 peaks (large clusters hereafter) to identify a minimal required set to determine the clusters identified mediator and cohesin subunits as the best individual features. </div> Summarizing nominalization: <div class="blackwell"> For large clusters (containing more than 20 peaks), we identified a minimal required set of peaks that determine the cluster. <span style="color:gold">This analysis</span> <span style="color:skyblue">identified</span> mediator and cohesin subunits as the best individual features. </div> --- ## Revision techniques Place verbs near subjects <div class="blackwell"> <span style="color:skyblue">DNA</span> in repeat regions or small microsatellites or with long stretches of the same base <span style="color:gold">causes</span> problems for next-gen sequencers. </div> <div class="blackwell"> → <span style="color:skyblue">DNA</span> <span style="color:gold">causes</span> problems for next-gen sequencers when it is in repeat regions or small microsatellites or has long stretches of the same base. </div> --- ## Revision techniques Put familiar information first <div class="blackwell">We <span style="color:skyblue">searched</span> the database of sequences to look for similar structures. <span style="color:gold">A protein</span> involved in the regulation of the BRCA1 gene in humans was found by <span style="color:skyblue">the search</span>. </div> <div class="blackwell">→ We <span style="color:skyblue">searched</span> the database of sequences to look for similar structures. <span style="color:skyblue">This search</span> found <span style="color:gold">a protein</span> involved in the regulation of the BRCA1 gene in humans. </div> --- ## Now for some practice --- ## Example 2 - What would you do? <div class="blackwell">This component will chiefly involve a description and quantitative analysis of the study's data collection process.</div> <div class="fragment"> <br>We suggest: put actions in verbs <div class="blackwell"> → This component describes and quantitatively analyzes the data collection process.</div> <br><span style="color:lime">✓</span> The sentence is more concise (10 vs 16 words). <br><span style="color:lime">✓</span> The meaning is clearer. </div> --- ## Example 3 - What would you do? <div class="blackwell"> Detailed <span class="fragment highlight-skyblue" data-fragment-index="1">analyses</span> of the <span class="fragment highlight-skyblue" data-fragment-index="1">evolutionary</span> features of <span class="fragment highlight-skyblue" data-fragment-index="1">different</span> types of <span class="fragment highlight-skyblue" data-fragment-index="1">regulatory</span> elements <span class="fragment highlight-gold" data-fragment-index="1">are</span> an important area for future <span class="fragment highlight-skyblue" data-fragment-index="1">research</span>. </div> <div class="fragment" data-fragment-index="1"> ### We suggest: put actions in verbs Consider <span style="color:skyblue">implied actions</span> vs. <span style="color:gold">verb</span>. </div> <div class="fragment"> <div class="blackwell">→ Future research should analyze the evolutionary features of different types of regulatory elements.</div> <span style="color:lightgreen">✓</span> The sentence is more concise (13 vs 19 words). <br><span style="color:lightgreen">✓</span> The subject is clearer. <br><span style="color:lightgreen">✓</span> The subject and verb are closer together. </div> --- ## Example 4 - What would you do? <div class="blackwell">Improvements are expected in the predictive power of all the scores being computed on multispecies alignments.</div> <div class="fragment"> ### We suggest: use active voice <div class="blackwell">→ [We expect to] improve the predictive power of our multispecies alignment scores.</div> <span style="color:lightgreen">✓</span> The sentence is more concise (12 vs 16 words). <br><span style="color:lightgreen">✓</span> Prepositions no longer disrupt flow. <br><span style="color:lightgreen">✓</span> Sentence is more direct. </div> --- ## Example 5 - What would you do? <div class="blackwell">Some astonishing questions about <span class="fragment highlight-so" data-fragment-index="1">the nature of </span>the universe have been raised by scientists studying <span class="fragment highlight-so" data-fragment-index="1">the nature of</span> black holes <span class="fragment highlight-so" data-fragment-index="1">in space</span>. The collapse of a dead star into a point perhaps no larger than a marble creates a black hole.</div> <div class="fragment" data-fragment-index="1"> ### We suggest: put familiar info first, omit needless words <div class="blackwell">→ Scientists studying black holes have raised some astonishing questions about the universe. A black hole is created by the collapse of a dead star into a point perhaps no larger than a marble.</div> <span style="color:lightgreen">✓</span> The link is clearer; these sentences are more cohesive. </div> --- ## Example 6 - What would you do? <div class="blackwell">The second reaction is <span class="fragment highlight-so" data-fragment-index="1">really</span> the <span class="fragment highlight-so" data-fragment-index="1">end</span> result of <span class="fragment highlight-skyblue" data-fragment-index="1">a very large number</span> of reactions. <span class="fragment highlight-so" data-fragment-index="1">It is also worth noting that</span> these two reactions form a simple linear chain whereby the product of the first reaction is the reactant for the second.</div> <div class="fragment" data-fragment-index="1"> ### We suggest: omit needless words <div class="blackwell">→ The second reaction is the result of <span style="color:skyblue">numerous</span> reactions. Moreover, these two reactions form a simple linear chain whereby the product of the first reaction is the reactant for the second.</div> <br><span style="color:lightgreen">✓</span> More concise (32 vs. 42 words) </div> --- ## Example 7 - What would you do? <div class="blackwell">Significant positive correlations were evident between the substitution rate and a nucleosome score from resting human T-cells.</div> <div class="fragment"> ### We suggest: Put actions in verbs <div class="blackwell">→ In resting human T-cells, the substitution rate correlated with a nucleosome score.</div> <br><span style="color:lightgreen">✓</span> More concise (12 vs. 17) <br><span style="color:lightgreen">✓</span> The verb is <span style="color:lightgreen">correlate</span> rather than the nebulous <span style="color:lightgreen">were evident</span> </div> --- ## Example 1 (again, in context) - What would you do? <div class="blackwell"> <span style="color:skyblue">The model used by the software is a fairly rich probabilistic model, but it is clearly not realistic in several respects.</span> The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplifications of the complex process of sequence evolution in eukaryotic genomes. </div> --- ## Example 1 (again, in context) ### We suggest: Put verbs near subjects The gist of the sentence: <span style="color:gold">Certain assumptions oversimplify the complex process of sequence evolution in eukaryotic genomes.</span><br><br> Should the gist of the sentence go first or last? Before the list of assumptions or after it? --- ## Example 1 (again, in context) ### A possible revision <div class="blackwell">→ [Our model admittedly] oversimplifies the complex process of sequence evolution in eukaryotic genomes by assuming that: (1) all sites evolve at one of two evolutionary rates (conserved and nonconserved), (2) these rates are uniform across the genome, (3) sites evolve independent of whether they are in conserved or nonconserved regions, and (4) the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns.</div> --- ## Example 1 (again, in context) ### Positive consequences <br><span style="color:lightgreen">✓</span> The most important action (oversimplify) is now a verb <br><span style="color:lightgreen">✓</span> The verbs follow closely after the subjects <br><span style="color:lightgreen">✓</span> The sentence is more cohesive: familiar information links to the previous sentence at the beginning <br><span style="color:lightgreen">✓</span> The sentence contains cues for parsing information (by, [1, 2, 3, 4], however, etc.) --- <img src="/slides/clarity-scientific-writing/wakefield.JPG" height="500"><br/> <div class="fragment"> By selecting the soda cracker over the graham<br> during snack time, kindergarten history <br>is made by Kevin Wakefield, Nov. 12, 1957. </div> Note: - In this example, I'm first showing you the "less funny version". --- <img src="/slides/clarity-scientific-writing/wakefield.JPG" height="500"><br/> Nov. 12, 1957: Kevin Wakefield, during snack <br>time, makes kindergarten history by selecting<br>the soda cracker over the graham. Note: - This is the funny version, with the nugget of hilarity at the end. --- ## References and further reading - The [Duke Scientific Writing Resource](https://cgi.duke.edu/web/sciwriting/) - Style: Toward clarity and grace (1990), Joseph Williams - Expections (2004) and The Sense of Structure (2004), George Gopen - How to write consistently boring scientific literature (2007), Kaj Sand-Jensen - The infectiousness of pompous prose (1992), Martin W. Gregory - How we write about biology (1991), Randy Moore - Writing intelligible English prose for biomedical journals (2007), John Ludbrook - Whose literature is science? (2003), Judith A. Swan - What is the scientific literature? (1986), John Maddox - Scientific literature: Clear as mud (2003), Jonathan Knight - The science of scientific writing (1990), George Gopen, Judith Swan - The readability of marketing journals: are award-winning articles better written? (2008), Sawyer, Laran, & Xu --- ## Thanks for listening! <span class="small footnote"> Comics by Gary Larson (The Far Side)<br> Color by [Gareth Wonfor](https://www.flickr.com/photos/mrmorodo/8286969175/)<br> </span>