[ This post is by Marius Roesti; I'm just putting it up. -B. ]
Current adaptation research is increasingly employing genome scans to search for signatures of selection in the form of markers displaying relatively elevated divergence (typically quantified by Fst) between populations in ecologically distinct habitats. While performing such a study using RADseq in threespine stickleback (see Roesti et al. 2012, Mol. Ecol.), we came across an analytical issue crucial to genome scans, but largely ignored so far.
The point is that polymorphisms with a low minor allele frequency (MAF) are not informative in inferring genetic differentiation between populations and can thus bias the outcome and interpretation of genome scans. Such low-MAF polymorphisms obviously arise easily from technical problems (e.g. sequencing or PCR errors), but theory predicts clearly that a high frequency of low-MAF polymorphisms (especially singletons) is a natural feature of biological samples. No matter how low-MAF polymorphisms arise, we argue in our just-published BMC Evolutionary Biology paper (Roesti et al. 2012, in press) that genetic markers with a low MAF can bias any study – not only genome scans – that involves Fst calculation or any other divergence metric.
The reason why low-MAF markers should be discarded is that researchers conducting genome scans make the tacit assumption that genetic markers adequately capture two key processes – hitchhiking with a selected locus, and drift. The markers used for a genome scan thus need to adequately mirror these historical processes. However, this is not the case for low-MAF polymorphisms because they mostly represent short-lived mutations without historical depth, and their highly imbalanced allele frequencies prevent them from displaying the genetic footprints of selective sweeps. In other words, genetic markers with a low MAF simply lack the potential to capture possible frequency shifts at close-by genomic regions. The best way to think about this is to toy around with this idea even further: if a monomorphic locus was included in a genome scan study, there would be a clear bias in the outcome because this locus obviously cannot capture hitchhiking and drift even if these processes occur! For exactly that reason, nobody thinks about using monomorphic loci for genome scans, but it is generally overlooked that low-MAF markers are nearly as uninformative as fully monomorphic loci.
Again, low-MAF polymorphisms are predicted to be very common in any study system. Marker numbers reported in genome scans studies that do not exclude uninformative polymorphisms are thus likely massively inflated, and divergence patterns and outliers potentially inaccurate. For instance, in our recent genome scans, we applied a minimum threshold of 25% for the minor allele. This leads to a marker loss of 60-70%. However, the remaining markers were really informative. In the BMC paper we make suggestions how this problem can be addressed.
Overall, our study highlights the major need to carefully handle and interpret molecular data, especially in the current golden age of evolutionary genetics and genomics.
Roesti M, Salzburger W and Berner D: Uninformative polymorphisms bias genome scans for signatures of selection. BMC Evolutionary Biology 2012, 12:94. http://www.biomedcentral.com/1471-2148/12/94/abstract