Monday, April 7, 2014

Peaks and Valleys in the Genome

(This post is by Marius – I am just putting it up. Andrew.)

Driven by methodological advances, evolutionary biology is currently much concerned with understanding the way selection shapes the genome. In the search for such signatures of selection – and ultimately the loci associated with them – we often pursue a similar strategy: we compare populations at thousands of genetic markers with the hope of finding genomic regions of particularly high or low differentiation relative to the genome-wide baseline. We then believe that such regions can be directly linked to distinct selective processes. On the one hand, genomic regions of high divergence are thought to be the result of selection acting in opposite ways (divergent selection) between populations. Low divergence regions, on the other hand, are commonly taken as evidence for balancing selection. The results of our recent paper published in Molecular Ecology, however, challenge these common assumptions.

Figure 1. Parallel adaptation to similar derived habitats (blue) from a common source population inhabiting an ecologically distinct habitat (gray).
For our paper, we first implemented theoretical models in which we considered several populations deriving from a common source population into selectively new and similar habitats – that is, parallel adaptation (Figure 1). We demonstrate that among derived populations, this process drives a region of particularly low divergence around a selected locus. How come? Due to common ancestry, the derived populations do not only share the actual variant being selected, but also the genomic background linked to that variant. Thus, the same variant together with this background are driven to fixation in the derived populations. Consequently, when we compare such populations, we find a genomic region of low divergence surrounding a locus involved in parallel adaptation (Figure 2). Admittedly, this explanation for low divergence within parts of a genome is intuitive. Nevertheless, it is normally overlooked when interpreting low divergence regions in genome scans. From now on, let us call such a region a ‘divergence valley’.

Figure 2. The peak-valley-peak divergence signature of parallel adaptation from shared genetic variation. Important to note here is that the selected locus is actually located at the bottom of the divergence valley, and not at any of the flanking peaks!
 To either side of the divergence valley, our models reveal exaggerated divergence (Figure 2). This is because the selected variant and its linked genomic background are associated with different haplotypes in the different derived populations. Such haplotypes then become, to some extent, selected along with the actual variant and its immediately linked background under selection (this process is called ‘genetic hitchhiking’). As a consequence, different haplotypes increase in the different populations to different frequencies. It is important to note here that this phase of genetic hitchhiking only initiates the high-divergence regions flanking the divergence valley! These regions of high divergence – I will refer to them as ‘twin peaks’ from now on – grow higher and become sharper over time, even after selection has fixed the favorable variant at the locus under selection in all derived populations (Figure 2). The reason for this is that the locus under divergent selection between the source and derived populations and under parallel selection among the derived populations acts as a barrier to ongoing gene flow from the source to the derived populations. The divergence valley will thus, to a great extent, be sheltered from such genetic introgression and remain a low-divergence region among derived populations within the genome. Next to it, however, some genetic variation will introgress from migrants stemming from the source population. Because this happens only occasionally and randomly, the through hitchhiking initiated divergence twin peaks grow higher. Also, the twin peaks become sharper over time because even further away from the selected locus, genetic variation can flow almost unconstrained between the source and derived populations. This gene flow homogenizes the genome of the derived to baseline divergence. A detailed explanation and a graphical illustration of these different – yet together acting – processes can be found in our paper. Also, you will find there a thorough dissection of many factors influencing these divergence patterns (recombination rate, strength of selection, time, migration, number of initial colonizers, and multiple interacting selected loci). What I can tell you already is that the above-explained patterns emerged very consistently in all our simulations!

Picture 1: Top: The Sayward River estuary, where one of the marine stickleback samples was taken for our empirical analysis. Bottom: A typical marine stickleback. Note its characteristic armor plating all along the body axis, which is absent in most freshwater stickleback (Picture: M. Roesti).
Agreed, up to now, this blog post has been quite theoretical. Luckily, we have a great model system at hand to take these theoretical predictions out into the wild. That model system is the threespine stickleback fish. Stickleback have repeatedly colonized and adapted to freshwater (parallel adaptation) from a common marine source population since the last glaciation period. This corresponds exactly to our modeled situation above. In a second part of our paper, we thus predicted to find a divergence valley flanked by twin peaks (together, we can refer to them as ‘peak-valley-peak’; Figure 2) around three particular genes. These genes are great candidates for being under strong divergent marine-freshwater selection, and thus seemed ideal to test whether we would find the peak-valley-peak divergence signature of parallel adaptation to freshwater. We included a total of eight freshwater populations from Vancouver Island (BC, Canada) and two marine samples from the coast of that island in our empirical analyses (Picture 1 and 2). As expected, marine and freshwater stickleback proved strongly differentiated at all three genes. To calculate differentiation, we used haplotype information taken from targeted sequencing as well as the classic divergence measure FST calculated at thousands of polymorphisms along the genome (RAD sequencing data). We further applied an alternative approach to calculate differentiation, for which we looked at the separation of marine and freshwater stickleback within many phylogenetic trees along the genome. Now, our main interest was in divergence among the derived populations adapted in parallel to freshwater. Excitingly, comparing these freshwater populations among each other indeed revealed the predicted peak-valley-peak divergence signature around all three genes! As this worked out so well, we then searched the entire stickleback genome for further such signatures and found many more of them. This allowed us to propose new genes that have been important for replicate freshwater adaptation. Interestingly, we also found that those chromosomes harboring many of these signatures of selection exhibited the strongest overall divergence between marine and freshwater stickleback. This indicates that divergently selected loci can drive heterogeneity in genomic divergence on a chromosome-wide scale.

Picture 2: One of many breathtaking watersheds on Vancouver Island (BC, Canada) inhabited by freshwater stickleback (Picture: M. Roesti).
So what does this all mean? Our results show that parallel adaptation – the very process involving similar selection pressures – can drive high population divergence within parts of a genome. These high-divergence regions, however, are not holding the actual targets of selection themselves; instead, these targets are located in particularly low-divergence regions when the same genetic variation has been re-used for adaptation. Our results are certainly relevant to many organisms for which we have evidence or a strong feeling that parallel adaptation from shared variation has happened. Also, the case where similar selection pressures act in different populations on parts of the genome may be more common than what appears ‘ecologically intuitive’ to us. Threespine stickleback fish provide a particularly neat model system because we can here draw on many independent and parallel adaptation events to freshwater. Also, we can sample marine stickleback, contemporary representatives of the genetic source underlying this parallelism.

Overall, our findings should be taken into consideration when reasoning on divergence signatures within a genome. Finally, our insights can be used as explicit tools in the hunt for selection signatures, and ultimately, adaptation genes. I hope you will enjoy reading our paper!

Full story:

Roesti M, Gavrilets S, Hendry AP, Salzburger W, Berner D (2014). The genomic signature of parallel adaptation from shared genetic variation. Molecular Ecology (From the Cover).

No comments:

Post a Comment