Calanus Finmarchicus Descriptive Essay
To date, the majority of publications describing de novo transcriptomes of calanoid copepods have targeted a single genus, Calanus [23, 45–48]. The individuals used in the current study are from the coastal region of Oahu, Hawai‘i: they belong to the L. madurae species complex [3,4]. Illumina sequencing of six libraries yielded 528 million paired-end reads ranging in length from 50 to 151 base pairs (bp) and over 92% of these reads were of high quality (Phred score ≥30). These reads were de novo assembled using the Trinity software package (see Methods)(Table 1). The first step in quality assessment was to generate the battery of standard statistical measures characterizing the results. The assembly produced 211,002 transcripts with an average length of 872 bp, a maximum of 23,836 bp and an N50 value of 1,184 bp (Table 1). It contained 153,604 “Trinity predicted genes” that is transcripts with unique “TR# | c#_g#” identifiers (Table 1). Of the “Trinity predicted genes”, the majority (127,025) were singletons (83%), with the remaining genes (26,579) possessing from two to 71 “Trinity predicted isoforms” (TR#|c#_g#_i#). This is similar to the percentage reported for C. finmarchicus .
For the L. madurae assembly, the mapping rate was high, ranging from 88 to 92% for the six individual samples (Table 1, S1 Table), which is above the suggested cut-off at 80% mapping rate for a successful assembly. Ambiguous mapping, which was ~30% (31–37% of reads that aligned >1 time; S1 Table), is likely due to the large number of multiple isoforms assembled by Trinity.
The complete de novo transcriptome containing 211,002 transcripts was used in three separate workflows to further assess the quality of the assembly (Fig 2, see methods). First, bioinformatic tools were used to identify transcripts with coding regions (CDS), which were then annotated against SwissProt, Gene Ontology and KEGG databases, followed by BUSCO analysis. Next, targeted protein discovery was focused on large conserved and complex proteins (“giant proteins” and NaVs), proteins of interest of this copepod group (GFPs and crystallins), and proteins members of a complex pathway (circadian signalling system). Finally, several approaches were tested for generating a representative reference transcriptome for gene expression studies.
Functional annotation of the transcriptome
TransDecoder (see Methods) identified 72,391 transcripts with coding regions (CDS; length ≥ 100 amino acids) in the de novo assembly. Nearly 87% of the CDS retrieved significant hits with E-values of 10−3 or lower when blasted against the SwissProt database, and over 95% of these were further annotated with gene ontology terms (Table 1). Within the “biological process” category, L. madurae transcripts covered broadly conserved eukaryotic processes with “cellular process”, “metabolic process” and “single-organism process” representing more than 60% of the annotated transcripts (Fig 3). Eighty percent of transcripts with GO terms were annotated within the KEGG database (Table 1), indicating good coverage of transcripts encoding proteins/enzymes involved in lipid, amino acid and energy metabolism pathways (S1 Fig). BUSCO analysis identified 76% (2,036) complete orthologs of 2,679 core eukaryotic genes in the CDS with <1% of these genes present in more than one copy (duplicated). An additional 11% of fragmented core genes were found among the CDS, with only 12% of core genes missing completely (Table 1).
Biological processes represented in L. madurae transcriptome.
The assembly and annotation statistics of the L. madurae de novo transcriptome were compared with those of other non-model arthropods: three insect species and five other copepods (Table 2) [23, 47,49–53]. The number of assembled transcripts is quite variable across de novo transcriptomes with the number in the L. madurae transcriptome (~200K) being among the highest (Tables 1 and 2). The number of transcripts with coding regions is higher in copepods, including L. madurae, than that reported for the insect, Lygus hesperus (Western tarnished plant bug). Interestingly, the L. madurae annotation rates (87% of transcripts with coding regions) were higher than those reported in the other copepods which can in part be attributed to limiting annotation to protein encoding transcripts (Table 2). The number of predicted core proteins was similar across the transcriptomes with an approximate coverage of 80 to 90% based on the BUSCO analysis (Table 2). Overall, the annotation statistics suggests that the L. madurae transcriptome is at least as good in quality and depth as the others with which it was compared.
Comparison of de novo transcriptomes generated for non-model arthropods.
The large number of putative lncRNA transcripts in L.madurae suggests that there may be more lncRNA loci in this crustacean than in D. melanogaster [54–55]. However, a shotgun assembly only produces predicted transcripts, and further analyses are needed to confirm which transcripts are indeed lncRNAs, as opposed to genes coding for very small proteins (<100 amino acids long), incomplete transcripts, or assembly artifacts (e.g. fragmented UTRs which have been found in this transcriptome).
Searches of target genes based on automated annotation
The presence of transcripts encoding “giant” proteins (those >4,000 amino acids) was used as an indicator of quality of the assembly. The L. madurae assembly included 23 transcripts that exceeded 15,000 bp in length. The lengths of these transcripts are comparable to those reported for six of the transcriptomes listed in Table 2. The majority of the long transcripts encoded “giant” proteins belonging to titin/connectin family, such as “twitchin”, and proteins involved in cellular architecture/cytoskeleton such as “nesprin”. Examples of long transcripts, all of which are predicted to be full-length and annotations are given in Table 3.
An unusual feature of Labidocera and other pontellids is a sophisticated frontal eye structure [56, 57]. Unlike most copepods, the pontellid eye includes a clear lens, which requires structural proteins that are both stable and transparent. However not much is known about the structure of invertebrate lenses . In vertebrates, the structural proteins of lenses include crystallins, which have been well characterized. A search of the L. madurae list of automated annotated transcripts identified 20 putative crystallins. Fifteen of these encode putative α-crystallins, with others encoding putative members of the β-crystallin (2), the γ-crystallin (1) and λ-crystallin (1) families (S2 Table). The β- and γ-crystallins, which form a partnership with α-crystallins, are the primary structural proteins of the vertebrate lens [59,60]. Thus, one or more of these transcripts might be involved in lens formation in L. madurae.
Manual sequence annotation using targeted gene discovery
Green fluorescent proteins (GFPs)
Pontellids are well known for the presence of GFPs, which include some of the brightest GFPs currently known . In L. madurae, GFPs are concentrated at the base of the appendages as seen in the side view of an adult female (Fig 1C and 1D). Three transcripts were found that putatively encode GFPs (S2 Table). Two of the predicted proteins, both full lengths, shared 90% amino acid identity with a pair of GFPs identified in a closely related species, Pontella mimocerami . The third L. madurae GFP is most similar to a jellyfish (Aequorea victoria) GFP with which it shares 90% amino acid identity (S2 Table); this protein appears to represent a new class of copepod GFP. These putative transcripts encoding crystallins could serve as a starting point for any study investigating lens formation in copepods, specifically the pontellids, which possess modified naupliar eyes.
Large proteins with splice variants: voltage-gated sodium channels (NaV)
Large proteins that belong to families with closely-related members and which possess multiple splice sites or other regions of variation can be challenging to assemble and group dependably. One such protein family comprises the NaVs. In arthropods and in particular copepods de novo transcriptomes, incomplete or fragmented genes are common within this family (e.g. see publicly accessible transcriptomes in the following references: [23, 45, 48, 52] and NCBI Bioprojects PRJEB20069, PRJNA231234). Thus, as a stringent test of transcriptome quality, we assessed the assembly of the L. madurae NaVs proteins (Labma NaVs), comparing it with that from our previously published well-vetted transcriptome for C. finmarchicus [23, 38, 62,63]. We examined whether expectations were met in: 1) the number and completeness of predicted NaV genes, identified by their expected characteristics (match statistics, conserved motifs, length); 2) the occurrence and nature of predicted splice variants; 3) how well NaVs were grouped into the Trinity hierarchy; 4) the occurrence and nature of irregularities (incorrect or incomplete sequences).
Characteristics of NaVs expected to be present in an invertebrate transcriptome include occurrence of contigs from two families of orthologous genes, designated NaV1 and NaV2 . However, in L. madurae three predicted gene families (TR#) were identified as NaVs by the automated annotation. This is one more than expected (Table 4). These had low E-values (<8e-156) and were identified either as para or 60E, the D. melanogaster designations for NaV1 and NaV2 respectively. Querying the full transcriptome with a well-vetted arthropod sodium-channel sequence from D. melanogaster (SwissProt SP3500) retrieved 13 sequences from the same three gene families with E-values < 1e-88. Sequences with the next higher E-values had features of voltage-gated calcium channels. The retrieved sequences are shown diagrammatically in Fig 4. ReBLASTing each of the NaV contigs into Flybase returned either para or 60E (Table 4). To further resolve the identity of the contigs, they were used to query the C. finmarchicus transcriptome , retrieving top hits for 7 isoforms corresponding to NaV1.1 (TR7852|c0_g1), 2 corresponding to NaV1.2 (TR7852|c0_g2) and 4 in two TR groupings corresponding to the NaV2 gene (TR65477_c0_g1 and TR68660_c0_g1). The motifs expected of NaVs, shown diagrammatically at the top of Fig 4 (see caption for details), were validated through sequence alignment (MAFFT) and hence the various groups have been designated Labma NaV1.1, 1.2 and 2.
Labidocera madurae voltage-gated sodium channel sequences assembled by Trinity.
Labidocera madurae (Labma) voltage-gated sodium channel transcripts/predicted proteins.
Full-length proteins of the NaV family are expected to be around 200 kD in size. Completeness of predicted proteins was verified for one or more contigs from each Labma NaV1 gene as well as from the single reconstructed Labma NaV2 gene (see below). Start and stop codons as well as 5' and 3' UTRs are present in all three. When all optional sequence segments (putative exons) are included, predicted proteins 2072 and 2069 amino acids long result for Labma NaV1.1 and Labma NaV1.2, respectively. These match the lengths predicted for corresponding genes of C. finmarchicus (2094 and 2079 respectively) , for D. melanogaster NaV1 (2131; UniProtKB P35500), and for human NaV1.1 (2009; UniProtKB P35498). Similarly, the 2533 residue length of the reconstructed Labma NaV2 was within 2% of that for C. finmarchicus (2485aa) and 10% of that for D. melanogaster (2821aa). Thus, three NaV genes, with appropriate characteristics, are well assembled in the L. madurae transcriptome. Two or more sites of splice variation separated by more than a cDNA-insert-length of identical bridging sequence cannot be assembled reliably without additional information. Labma NaV1.1 has two sites with variant segments at opposite ends of the molecule. Site I is an N-terminal region of optional segments (putative exons; Fig 4); site II is an alternatively spliced segment nearer the C-terminal end. Both sites correspond to ones in Calfi NaV1.1 (Table 4). The two sites are separated by a minimum of 630 residues in L. madurae, well over a cDNA-insert-length (200–300 bp mean value), so the associations implied by the contigs assembled that include those two regions are unreliable. This does not imply a poorer quality of assembly compared with other paired-end assemblies of cDNA inserts of the same length: it is intrinsic to the shotgun approach. This caveat applies to four of the seven contigs of Labma NaV1.1 (Fig 4), but as well to the long contigs (18 in all) of Calfi NaV1.1 (see Fig 10 of Lenz et al ). Despite this ambiguity, the Labma NaV1.1 contigs gave solid evidence for the presence of four optional segments at Site I and one alternative segment at Site II, which is qualitatively similar to the pattern found in C. finmarchicus. No clear evidence for splice variants was found for Labma NaV1.2 (i2 is an anomalous fragment, possibly artifactual), the same being the case for Calfi NaV1.2. For Labma NaV2, the two members of each pair of fragments (TR65477 and TR68660) differ in the presence of "optional" segments in each, a feature not found in Calfi NaV2 (arrows in the NaV2 diagram of Fig 4). Thus, aside from this last case, the L. madurae transcriptome showed splicing features expected from the C. finmarchicus assembly. Most differences in the details (see below) are likely species differences.
Hierarchical transcript grouping by Trinity, as outlined in Methods, enables classifying assembled sequences into likely gene families, genes and isoforms. It performed well on the Labma NaV1 genes, separating them correctly into two genes nested within a single family. In contrast, transcripts for the same Calfi NaV1 genes are more broadly assigned, spanning four "Chrysalis components" (comps = gene proxies; Table 4). Reassembly of the C. finmarchicus transcriptome using Trinity 2.0.6 only reduced this number from four to three and failed to include them in the same gene family. Thus the L. madurae transcriptome is of higher quality in this respect than that of C. finmarchicus. On the other hand, a single transcript coded for Calfi NaV2, while Labma NaV2 was present as two fragments assigned to different Trinity (2.0.6) predicted gene families (Table 4). Still, these fragments had overlapping ends and could be amalgamated to form a full-length predicted protein with all of the expected properties. Thus the overall structure of the three NaV genes was successfully assembled in the L. madurae transcriptome with about the same quality as for that of the C. finmarchicus.
Irregularities in the L. madurae assembly were of several types, described in more detail in S5 Fig. To summarize, the number of Labma NaVs assembled was smaller (three vs. six) than for C. finmarchicus. This is likely in part a species difference. Anomalous sequences of various origins were also noted. These include a short contig (TR25803|c0_g1_i1) that may represent an additional Labma NaV1 (Table 4) and a sequence with a frame-shift that is probably an error. In addition, several issues appear to have arisen from the ambiguity in assembling regions of variation bridged by segments with identical sequences that are longer than one cDNA-insert- length: 1) isoforms, especially within the Labma NaV1.1 gene, code for partial rather than full-length proteins (Table 4); 2) Calfi NaV1.1s have many more full-length contigs (18 vs 2) perhaps reflecting a greater leniency of Trinity 1.0 for matching variable regions; 3) genetic variability within the population may have increased the number of variable regions, possibly contributing to premature truncation of sequences.
Overall, the L. madurae transcriptome assembled NaVs as well as or better than that of C. finmarchicus . However, it highlighted the limitations inherent in matching variant segments separated by stretches of identical sequence longer than a cDNA-insert-length.
Key regulatory pathways: circadian signaling system
The number of full-length circadian signaling system proteins deduced from Labidocera assembly supports the conclusion that this transcriptome is of high quality. Twenty-one protein families [65–68] were searched for and putative homologs were identified in the L. madurae assembly (Table 5), with the proteins encoded by the identified transcripts predicted (S3 Table), and vetted via reciprocal BLAST searches (S4 Table and S5 Table) and protein structural motif analysis (S6 Table). The protein families included: 1) the core clock proteins clock (CLK): cryptochrome 2 (CRY2), cycle (CYC), period (PER) and timeless (TIM); 2) the clock-associated proteins: casein kinase II α (CKII α), casein kinase IIß (CKIIß), clockwork orange (CWO), doubletime (DBT), jetlag (JET), PAR-domain protein 1 (PDP1), protein phosphatase 1 (PP1), protein phosphatase (PP2A) catalytic subunit microtubule star (MTS), PP2A regulatory subunit twins (TWS), PP2A regulatory subunit widerborst (WDB), shaggy (SGG), supernumerary limbs (SLIMB) and vrille (VRI); 3)the clock input pathway protein cryptochrome 1 (CRY1); and 4) the putative clock output pathway proteins: pigment dispersing hormone (PDH) and pigment dispersing hormone receptor (PDHR).
Putative Labidocera madurae (Labma) circadian signaling system transcripts/proteins identified via in silico transcriptome mining.
Translation of the identified transcripts revealed that the vast majority encoded full-length proteins (Table 5, S3 Table), with just two encoding partial sequences (Table 5). For many protein groups, multiple variants, all likely derived from a common gene, were predicted. These variants were most likely derived from alternative splicing, as well as single nucleotide polymorphisms (e.g., the five CYC variants shown in S2 Fig). In addition, for a number of groups, proteins derived from multiple genes were identified (e.g., the four distinct PP1s shown in S3 Fig). PDP1 was represented with four predicted genes, one with a splice variant, as shown in Fig 5. While parts of the molecule were very conserved, there were significant differences between the predicted proteins, which may reflect diversity in function. In the case of the CRY2 protein, 12 distinct transcripts were identified, and while they differed in length (Table 5), the predicted proteins were all identical. These transcripts differed in the two untranslated regions (5'UTR and 3'UTR), which may be related to differential processing and/or tissue-specific expression.
Alignment of five PDP1 protein sequences predicted from the L. madurae de novo transcriptome.
In addition to vetting the completeness/quality of the L. madurae transcriptome, the mining of this resource for circadian protein-encoding transcripts has shed light on the clock system of this species, and for that matter, those of crustaceans in general. The large suite of proteins predicted from the Labidocera transcriptome (Table 5), include, among others, the canonical core clock proteins CLK, CYC, PER and TIM, all showing significant homology to those of D. melanogaster (S4 Table). They possess structural domains consistent with their fruit fly homologs, domains required for normal function (S1 Fig). Moreover, putative L. madurae homologs of both CRY1 and CRY2 were identified (Table 5), a finding that suggests that the Labidocera circadian system is organized more similarly to the “ancestral-type” clock proposed for lepidopteran/mosquito species than to that of D. melanogaster . Specifically, CRY2, which is missing in Drosophila, but participates in the core clock itself, is likely to be a repressor of CLK-CYC-mediated transcription, while CRY1 functions as a photoreceptor, putatively providing photic input to the core clock. This result is consistent with the “ancestral-type” circadian systems described in other crustaceans that have been examined via genome/transcriptome analyses [33–35, 69], suggesting that this type of clock organization is broadly conserved within members of this arthropod subphylum.
The mining of the Labidocera transcriptome resulted in the discovery of the first PDP1s from a member of the Copepoda. The results suggest the presence of multiple genes from several protein families: DBT (three genes), PDP1 (four genes), PP1 (four genes), MTS (two genes), TWS (two genes) and SGG (two genes). No members of PDP1 had been identified previously from either C. finmarchicus or T. californicus [33,34]. The identification of the L. madurae PDP1 genes allowed for the revisitation of the C. finmarchicus and T. californicus transcriptomes for putative homologs. Using the Labidocera PDP1 predicted proteins as queries, related proteins have now been discovered in these two copepod species (A. E. Christie, unpublished). Moreover, mining of the assembly led to the prediction of a novel isoform of PDH, NSEMLHILRSMPKDMGKIIRNamide, which is just the second member of this peptide family identified from a copepod , a peptide that may serve as an output signal from the Labidocera clock for controlling its physiology and behavior.
Comparison between the results from the targeted gene discovery workflow with the results from the automated annotation is shown in Fig 6. The circadian system pathway retrieved from the KEGG database (map0471) resulted in the identification of five of the eight expected genes (Fig 6). The automated annotation programs failed to identify VRI, PDP1 and PER among the L. madurae transcripts with coding regions (CDS). These results underscore the value of targeted gene discovery in combination with the automated bioinformatics tools to obtain a complete annotation for a de novo transcriptome.
Predicted gene mapping to the circadian rhythm pathway obtained through KEGG annotation.
Reference transcriptome analysis
Identification of differentially expressed genes between L. madurae developmental stages
The generation of a transcriptome that provides robust results for gene expression profiling is key for application to physiological ecology. While sequenced and annotated genomes are used as reference in model species, de novo assembled transcriptomes, in combination with bioinformatic tools for annotation and statistical testing, provide a powerful alternative. However, for a transcriptome of a non-model species to be used as an alternative for a genome, it needs to be of high quality and complete. Here, we compare four strategies for obtaining a reference for read mapping and identification of differentially expressed genes (DEGs). While the full transcriptome (211,002 transcripts) is optimal for targeted gene discovery, including the identification of genetic variants (i.e., splice variants, indels, SNPs), it also generates a large percentage of ambiguous mapping that could affect statistical testing. In addition to the full transcriptome (“Full”), we generated three alternative “reference” transcriptomes from the “Full” assembly by: 1) selecting the longest transcript for Trinity predicted genes (unique TR#_c#_g#; “Pred. genes”); 2) selecting only transcripts with coding regions (CDS) (“Full-CDS”); and 3) selecting only transcripts with coding regions (CDS) from the “Trinity predicted genes” transcriptome (“Pred. genes-CDS”).
Table 6 shows the effects of applying these filters. The number of transcripts decreased from 211K to 45K in the smallest “reference”. Nevertheless, the four transcriptomes were comparable with respect to the number of core eukaryotic proteins, which declined only by 3% between the full and the Trinity-predicted “unique” gene transcriptomes (“Pred. gene”, “Pred. gene-CDS”). With the exception of the full transcriptome, the number of duplicated genes (genes with more then one copy) was low (< 0.5%). The percentage of mapped reads using Bowtie decreased from 91% to 68% between the Full and Pred. genes-CDS references Furthermore, the three derived reference transcriptomes had fewer ambiguous reads than the full transcriptome, and the “unique gene” approach led to the lowest number of reads mapped more than once (14% and 6% for “Pred. genes” and “Pred. genes-CDS”, respectively).
Comparison across four possible reference transcriptomes generated from the de novo assembly for gene expression studies.
Differences among these potential “reference transcriptomes” were further evaluated by testing for differential gene expression between copepodites and adult females. Thus, we mapped reads to the four “reference transcriptomes” using two different bioinformatics tools (Bowtie and kallisto) followed by statistical testing to DEGs (edgeR). While the number of counts (= mapped reads) associated with each transcript is higher in Bowtie than in kallisto, this did not affect the number of transcripts tested for relative gene expression after applying the 1 cpm filter (Table 6). The number of DEGs identified by edgeR using counts generated by Bowtie varied by more than a factor of two among the references used. Nevertheless, 8,970 DEGs were shared among the four references (S4 Fig). In contrast, the number of DEGs identified with kallisto was similar for all four transcriptomes (Table 6), with 6,229 shared among all references (Fig 7). A comparison between Bowtie and kallisto of the shared DEGs identified 5,438 common DEGs (S4 Fig). The smallest reference transcriptome (“Pred. genes-CDS”) had best agreement between Bowtie and kallisto with 9,827 shared DEGs, which represented approximately 89% (kallisto) and 77% (Bowtie) of identified DEGs, which is not surprising given that this transcriptome had the smallest number of ambiguous reads (S4 Fig). In general, mapping by kallisto is more conservative, making it the preferred mapping program for the identification of DEGs, in particular in association with an assembly program like Trinity, which is designed to preserve isoform variants .
Non-proportional Venn diagram for the number of differentially expressed genes (DEGs) identified using four different transcriptomes as a reference for mapping of reads.
While the large number of shared DEGs regardless of mapping program or reference transcriptome (5,438 DEGs) was reassuring, there were still many of DEGs that were identified in one or two references but not the others as shown in the Venn diagram for DEGs generated from kallisto mapped reads (Fig 7). There was good agreement between the Full and Full-CDS (9,637 DEGs) and the Pred. genes and Pred. genes-CDS (9,028 DEGs; Fig 7).
Differential expression of transcripts identified through targeted gene discovery
To gain further insight into differences in expression, we examined expression results for the targeted genes identified in the previous sections (Tables 3, 4 and 5). For all investigated transcripts, expression rate was higher such that the transcripts did pass the 1 cpm filter and were considered for the statistical test. The transcripts encoding “giant” proteins were represented in all reference transcriptomes, and two transcripts, fibrillin-1 and nesprin-1, were consistently identified as differentially expressed (Table 7). Other target genes that contributed to the shared DEGs (6,229) included a NaV (Labma1.2) and one transcript each of PER-v1, CWO-v1 and VRI (Table 7; Fig 7).
Comparison among reference transcriptomes in the identification of differentially expressed genes (DEGs) between L. madurae copepodites and adult females among transcripts encoding for “giant” proteins, voltage-gated sodium channels and...
The transcriptomes differed in the number of NaV transcripts given the presence of isoforms. Thus, NaV
“Ocean acidification” (OA), a change in seawater chemistry driven by increased uptake of atmospheric CO2 by the oceans, has probably been the most-studied single topic in marine science in recent times. The majority of the literature on OA report negative effects of CO2 on organisms and conclude that OA will be detrimental to marine ecosystems. As is true across all of science, studies that report no effect of OA are typically more difficult to publish. Further, the mechanisms underlying the biological and ecological effects of OA have received little attention in most organismal groups, and some of the key mechanisms (e.g. calcification) are still incompletely understood. For these reasons, the ICES Journal of Marine Science solicited contributions to this special issue. In this introduction, I present a brief overview of the history of research on OA, call for a heightened level of organized (academic) scepticism to be applied to the body of work on OA, and briefly present the 44 contributions that appear in this theme issue. OA research has clearly matured, and is continuing to do so. We hope that our readership will find that, when taken together, the articles that appear herein do indeed move us “Towards a broader perspective on ocean acidification research”.
Background to this theme issue
“Ocean acidification” (OA), a change in seawater chemistry driven by increased uptake of atmospheric CO2 by the oceans, has probably been the most-studied single topic in marine science in recent times (Figure 1). According to the Web of Science© (WOS), no articles used the term “OA” before 2000. Five articles responded to the search term “OA” in 2005, although only two of these deal with the topic in a manner consistent with contemporary research. From 2006 to 2015, >3100 articles on OA appeared—>300 per year! In 2015 alone, >600 articles were published on this topic. At the time of this writing, one of the first articles on OA, Orr et al. (2005), had been cited >1500 times according to the WOS and >2400 in Google Scholar. There have also been an extraordinary number of review articles, meta-analyses, special article theme sets, journal issues, and “white” (position) papers on OA produced by national and international agencies. The explosion of research on this topic is unprecedented in the marine sciences.
The number of articles per year (a) and the number of citations to those articles per year (b) returned in response to a topical search for “OA” conducted using the WOS© database on 16 January 2016.
The number of articles per year (a) and the number of citations to those articles per year (b) returned in response to a topical search for “OA” conducted using the WOS© database on 16 January 2016.
The majority of the literature on OA, particularly in the early days of research on the topic, report negative effects of CO2 on organisms and conclude that OA will be detrimental to marine ecosystems. Some of these, typically published in “high impact” journals and covered by the mass media, predict an OA-generated ocean calamity (sensuDuarte et al., 2014—see their Table 1). As is true across all of science, studies that report no effect of OA are typically more difficult to publish and, when published, seem to appear in lower-ranking journals (Browman, 1999). Further, the mechanisms underlying the biological and ecological effects of OA—via higher concentrations of CO2, hydrogen ions (=lower pH), and/or carbonate chemistry (less carbonate ions)—have received little attention in most organismal groups (although see, for example, Pörtner, 2008; Kelly and Hoffmann, 2012; Bozinovic and Pörtner, 2015), and some of the key mechanisms (e.g. calcification) are still incompletely understood (see Roleda et al., 2012; Cyronak et al., 2016a, b; Waldbusser et al., 2016). For these reasons, the ICES Journal of Marine Science (IJMS) solicited contributions to a special issue with the theme, “Towards a broader perspective on ocean acidification research”.
Contributions were encouraged of studies showing no effect of OA; emphasizing the variability of responses to OA, including individuals that are not affected by the OA treatment, and how selection might act on that variability; multiple stressor experiments in which the effect of OA is assessed relative to and/or in combination with other drivers; assessments of the limits of using acute experiments to make inferences/conclusions about a chronic driver that will change slowly over decades; mechanisms of action of low pH and/or high CO2 that consider realistic time-scales for changes in the driver; acclimation, carry-over, and epigenetic effects that consider realistic time-scales for changes in the driver; and evolutionary effects (e.g. development of resistance).
In the remainder of this introduction, I will (i) present a brief overview of the history of research on OA, (ii) call for a heightened level of organized (academic) scepticism to be applied to the body of work on OA (sensuDuarte et al., 2014; Boonstra et al., 2015), and (iii) briefly present the contributions that appear in this theme issue of the IJMS. Although I call for a more sceptical scrutiny and balanced interpretation of the body of research on OA, it must be emphasized that OA is happening and it will have effects on some marine organisms and ecosystem processes.
The progression of research on OA
The following qualitative and subjective overview is based on a perusal of the titles and abstracts of articles appearing in a WOS search for “ocean acidification” for each year (individually) from 2005 to 2015. Given the very large number of articles that could be used as examples of any of the study types referred to below, I have not cited any—the interested reader will not have any difficulty finding them.
The first articles on OA were descriptions of the process itself (CO2-driven changes in the biogeochemistry of seawater and sediments) and its implications. This was followed by an explosion of work (mainly laboratory-based) on the possible effects of OA on various marine organisms, at first mainly calcifiers or the calcified hard parts of organisms without calcarious shells. These were mostly restricted to part of one generation (a limited number of life history stages), or at most a single complete life cycle, with one or a small number of biological endpoints measured as effect indicators. In early work, treatment exposure levels often greatly exceeded those predicted to occur hundreds of years into the future even without any reduction in CO2 emissions. The majority of these early works reported significant negative effects of high CO2, from which it was inferred that there would be a detrimental effect of OA over the coming decades–centuries. Thereafter, longer-term effect studies began to appear, which first included single-generation carry-overs and then multiple generations. By necessity, these have been on organisms with short generation times. As the approach to CO2 exposures matured, very high treatment levels became less common. More studies that showed no effect of high CO2 (predicted for the next century)—and even beneficial effects (e.g. for some phytoplankton and macrophytes)—appeared. Upwelling and vent systems were used as in situ case studies of natural future OA-like conditions. Some in situ work mimics such systems by injecting CO2 and following the response of organisms/communities locally. Results of experiments that included multiple stressors in addition to CO2 were published. The most common of these has been temperature, but salinity, oxygen, and a variety of others have also been included (in a global climate change context). Such studies typically report that the additional driver(s) has a stronger effect than CO2, although it is difficult to isolate the effect of the individual variables. The reality that the functional response curve of each driver will likely differ, as will the organism's ability to adapt to them, further complicates interpretations of multiple driver experiments. Studies on the effect of CO2 on trophic interactions (indirect effects) are sparse—such experiments are logistically complex and difficult to interpret. A small number of recent studies integrate the results of the preceding body of work into risk assessments and scenario modelling, typically on economically important species of fish and shellfish; most conclude that the prognosis is dire, although in the context of what follows, that conclusion might be premature.
The preceding describes how OA research has matured. The following describes how it still has a way to go.
Applying organized scepticism to research on the effects of OA
Scientific or academic scepticism calls for critical scrutiny of research outputs before they are accepted as new knowledge (Merton, 1973). Duarte et al. (2014) stated that “…there is a perception that scientific skepticism has been abandoned or relaxed in many areas…” of marine science. They argue that OA is one such area, and conclude that there is, at best, weak evidence to support an OA-driven decline of calcifiers. Below, I raise some of the aspects of OA research to which I contend an insufficient level of organized scepticism has been applied (in some cases, also to the articles in this theme issue). I arrived at that conclusion after reading hundreds of articles on OA (including, to be fair, some that also raise these issues) and overseeing the peer-review process for the very large number of submissions to this themed issue. Importantly, and as Duarte et al. (2014) make clear, a retrospective application of scientific scepticism such as the one that follows could—and should—be applied to any piece of/body of research.
Exposure levels, water chemistry, and limits to making inferences about the effect of a long-term driver from a short-term experiment
Many early studies on OA applied treatment levels that greatly exceeded even worst-case climate change scenarios and did not report water chemistry in sufficient detail to determine if the treatment mimicked future OA-driven seawater conditions. Although most recent work has improved with respect to treatment levels, mimicking future water chemistry remains tricky.
A rationale commonly used to justify high CO2/low pH treatments is the need to identify at what levels organisms are affected. However, the limits to making inferences about how an organism or ecosystem will respond to a climate-change scale variable (i.e. one that changes over decades–centuries) from their response during a short-term challenge experiment (i.e. hours–days–weeks) has not been adequately addressed—or even mentioned—in most studies. This is reflected in a confusion of terms common in OA studies—when describing the outcome of a short-term CO2 challenge, authors often make the inferential leap and use “OA” when discussing their results, without any caveats. Oddly, incorporation of the extensive toxicology literature is almost entirely missing from OA studies, either when it comes to adopting established exposure protocols or to framing the inferences that can/cannot be drawn from short-term experiments. Also missing from most studies is anything more than a superficial statement about the possibility for acclimation, adaptation, or evolution, something that is necessary to extend the outcome of a short-term challenge experiment into an inference about the effect of a long-term driver (see below).
Spatio-temporal variability in CO2 and pH
Biogeochemists are well aware of the spatio-temporal variability in CO2 and pH—daily (high productivity areas), seasonal (blooms), interannual (higher temperatures), horizontal (coastal upwelling areas, high turbidity zones), and vertical (deep vs. surface waters) ranges in these can be extensive (e.g. Wootton et al., 2008; Hofmann et al., 2011; Waldbusser and Salisbury, 2014; Kapsenberg et al., 2015). Biologists have struggled to incorporate this variability into experiments designed to test the effects of OA, and into their interpretations of the outcomes (Eriander et al., 2016). Some researchers have pointed out that organisms that are exposed to large ranges in CO2 and pH during their daily lives (e.g. vertical migrators), life cycles (e.g. organisms that reside offshore as larvae but move to the coast as juveniles or adults), or somewhere in their distributional ranges, should be more tolerant of OA (e.g. Lewis et al., 2013).
Imbalanced focus on individuals that are affected and insufficient focus on inter-individual variability and within-experiment selection bias interpretations of ecological impacts
Almost all CO2 challenge experiments produce a range of responses in the test organism—some individuals are badly affected, others less, and some not at all. There are several issues associated with all such experiments that it is important to be cognizant of and account for: (i) analyses and interpretations should not ignore or minimize individuals that are little affected or unaffected (after all, these are the ones whose genes will be passed on to the next generation); (ii) inter-individual variability should be highlighted; (iii) the longer that the experiment runs the more likely it is that an internal selection process for the tolerant individuals has occurred. All of these are important in the context of the next section.
Acclimation, generational carry-over effects, adaptation, epigenetics, and evolution
Almost all experiments conducted to assess OA are short-term toxicity challenges. Therefore, using them as the basis from which to make inferences about a process that will occur slowly over the next decades–centuries must be made with appropriate caution. That is, the experiments and the interpretations made from them must consider how populations might acclimatize, adapt, and evolve to climate change, including OA (e.g. Donelson et al., 2011; Hoffmann and Sgrò, 2011; Sunday et al., 2013; Harvey et al., 2014). Recent studies indicate that even the effects of OA that are considered most worrisome—various behavioural impairments resulting from short-term exposure to high CO2 (see Nagelkerken and Munday, 2016)—might be reduced or overcome through adaptation and evolution (Regan et al., 2016). More knowledge of the mechanisms of direct action of OA-related drivers—higher concentrations of CO2, hydrogen ions (=lower pH), and/or carbonate chemistry (less carbonate ions)—and of indirect drivers such as the effects of OA on food quality, are essential to understand what degree of adaption is possible. Readers should be duly sceptical of studies that completely ignore the possibility of adaptation when presenting their inferences about OA, particularly scenario modelling of socio-economic impacts.
We must also do better to incorporate analogous work in other fields, for example, rapid evolution of tolerance to envirotoxins (e.g. Whitehead et al., 2012) and environmental change (e.g. Collins et al., 2014; Stoks et al., 2015; Thibodeau et al., 2015) via a combination of genetic and epigenetic mechanisms (Yona et al., 2015).
Negative results—those that do not support a research hypothesis (e.g. OA will have detrimental effects on marine organisms)—can provide more balance for a subject area for which most published research reports positive results. Negative results can indicate that a subject area is not mature or clearly enough defined, or that our current methods and approaches are insufficient to produce a definitive result. Gould (1993) asserted that positive results tell more interesting stories than negative results and are, therefore, easier to write about and more interesting to read. He calls this a privileging of the positive. This privileging leads to a bias that acts against the propagation of negative results in the scholarly literature (see also Browman, 1999). Further, it is also important to recognize that studies showing no effect of OA are less equivocal than those that do, for all of the reasons noted above. Following from this, it is essential that authors writing about possible effects of OA present and discuss research that is inconsistent with their results and/or their interpretations—openly, honestly, and rigorously. Readers should be duly sceptical of articles that do not do this.
Despite all of the above, OA research has clearly matured and, as reflected in the contributions to this theme issue, is continuing to do so.
An overview of the contributions in this theme issue
Studies on the mechanisms of action of OA
Jokiel (2016) revisits four key simplifying assumptions that have been made to assess how OA will impact coral reefs and finds that there is currently insufficient evidence to support them and that they should be further evaluated. He presents the hypothesis that calcification in corals is limited by their ability to rid themselves of the protons generated by calcification rather than strictly by the aragonite saturation state of seawater. This is further developed by Cyronak et al. (2016a, b), although Waldbusser et al. (2016), in a comment on Cyronak et al., discuss how a bicarbonate/proton flux and omega-based sensitivity model do not have to be mutually exclusive. These studies will hopefully serve to shift thinking towards a more synthetic view of the mechanisms through which OA will impact marine calcifiers.
Studies on methodological issues in OA research
Cornwall and Hurd (2016) assessed 465 OA studies published between 1993 and 2014, focusing on the methods used to replicate experimental units. They found that 95% of these studies had interdependent or non-randomly interspersed treatment replicates, or did not report sufficient methodological detail, and they provide guidelines on how to improve the design of such experiments. Reum et al. (2016) discuss the implications of co-variation among environmental variables (e.g. oxygen and temperature) for the interpretation of OA experimental results and suggest an approach for designing experiments with CO2 levels that better reflect this co-variation. Eriander et al. (2016) (for the barnacle, Balanus improvisus) and Small et al. (2016) (for the sea urchins, Paracentrotus lividus and Arbacia lixula) report how introducing natural fluctuations in the pH/CO2 levels to which organisms are exposed in experiments (vs. exposure to constant average levels) yields different responses. They conclude that these results support including fluctuating acidification treatments in experiments on species that live in variable environments.
Studies on the behavioural effects of OA
Kim et al. (2016) report that the metabolic rate of the deep-sea hermit crab (Pagurus tanneri) exposed to low pH increased transiently, and that olfactory behaviour was impaired. However, they also observed that crabs exposed to low pH exhibited higher individual variation in the speed of antennular flicking and prey detection than observed in the control pH treatment, and suggested that this phenotypic diversity could lead to adaptation to OA. Sundin and Jutfelt (2016) report that increased levels of CO2 had no effect on behavioural lateralization, swimming activity, and prey olfactory preferences of goldsinny wrasse (Ctenolabrus rupestris), although there was an effect on predator olfactory cue avoidance—control fish initially avoided predator cue while the high CO2 group was indifferent to it. Heinrich et al. (2016) report no effect of elevated CO2 on the foraging or shelter seeking behaviour of the reef-dwelling epaulette shark (