Friday, May 11, 2012

Fusing theory and data: a plea for help from Dan Bolnick

I have a question for the readers of this blog - something I need to "poll" you on.
     First, a little bit of background. When I was a beginning graduate student at the University of California at Davis, a faculty member on my committee asked me whether I intended to be "a consumer, or a producer, of theory". His goal was to determine what kind of statistics or pure math classes he should direct me to take, which is a decision every student is faced with. My goal at the time was to do empirical tests of theory (a vague concept - see below), so I casually said "consumer" and went for the statistics options.  Fourteen years later, I look back at that decision as both correct, and unfortunate. Correct, in that the solid grounding in statistics that I gained has served me well in the following years. Unfortunate, in that it perpetuates an either/or view of empirical and theoretical work that has come back to haunt me this spring.
      Here's why:  Eva Kisdi and U. Helsinki has invited me to give a series of lectures (6 hours) in August on the intersection of theory and empirical work on speciation. Naively agreeing, I now need to figure out how theory and empirical approaches actually intersect. Turns out to be harder than I expected for several reasons. I raise two questions here, and invite readers to respond:

1) What does it mean for theory and data to intersect? I see several options, but would welcome additional ideas here:
        A) Theory can make predictions about what is possible, empiricists can test whether the predicted phenomenon actually exists. This is a weak test because it does not actually show that the theoretical justification is correct, only that the end product is correct. Many different models might generate the same pattern (c.f., ecological neutral theory's "predictions"). Another problem is that many models predict many possible outcomes, depending on exact parameter values. For instance in my graduate work I "tested" the theoretical prediction that intraspecific competition drives disruptive selection (Bolnick 2004 Evolution), but really the model predicts either disruptive OR stabilizing selection, depending on parameter values that I didn't (couldn't) measure. I can think of a number of examples in the speciation literature, but would welcome suggestions as to your favorites.
    B)  Theory can make quantitative predictions that can be exactly tested for quantitative fit (see Optimal Foraging Theory work in the 70's for example) - I can't think of examples in the speciation literature: suggestions welcome.
   C) Empirical data can demonstrate that something occurs biologically (e.g., sympatric speciation), and the models can be developed to formalize our thoughts about how this works. Examples include Schliewen's Nature paper on Cameroon cichlids that made sympatric speciation respectable again (for some people) and the subsequent irrational exuberance about the topic among theoreticians.
   D)  Empirical data can be used to parameterize a model to make biologically informed predictions (see my new paper with Mark Kirkpatrick in Current Zoology.
   E) a model can be built for strictly statistical purposes to estimate parameters using empirical data. For instance, coalescent theory has been used to build analysis programs (e.g., Migrate, IM) that are used to estimate parameters (timing of divergence, subsequent gene flow) relevant to speciation. Lots of examples, mostly in the area of molecular evolution and phylogenetics and population genetics, all of which sometimes spill over into the speciation literature.
    F)  The holy grail: All of the above. Make an empirical observation, build a model to explain it. Parameterize the model with independent data to make a specific novel prediction, test this empirically. Are there ANY examples?

2)  What are BEST examples you can think of, of papers that actually fuse theory and empirical data to address questions in speciation?

Please submit comments (or email me privately at 


  1. Hi Dan. Sorry it's taken me a while to comment; just got back from a tenth anniversary in Paris, then immediately had to move to a new apartment, so things have been a bit chaotic. :->

    For your A), you're thinking at the level of theory predicting particular patterns that empirical work then confirms, and I agree that this method is weak (although the more patterns the theory can predict, the stronger the method gets, presumably). But there's another level as well: theory can call attention to the importance of a mechanism, and empirical work can then confirm that the mechanism exists in nature (without necessarily confirming that it leads to the consequences predicted by theory). I'm thinking here of magic traits. When Gavrilets first coined the term "magic trait", it was, as I understand it, somewhat of a term of derision, because it was felt that they were unlikely to be common in nature. Empiricists quickly latched onto the idea, however, and have been finding magic traits in many systems. Whether they are as important for speciation as theory predicts that they might be, however, remains to be determined; the "effect size" of magic traits is unknown. See the recent review by Servedio et al in TREE (and the Hendry Lab comment on that review, by Haller et al). This is a much stronger approach to A) than working only at the level of predicting patterns, however. Now, if for example phylogenetic tests can be used to show that clades with magic traits are more species than clades without magic traits, that would be a pretty solid result, in my view, because it started at the level of mechanism, on both the theoretical and the empirical side, and only once the mechanism was established to exist did it proceed to the level of prediction of pattern. Am I making sense?

  2. Regarding your B), quantitative predictions of speciation... well, at the level of predicting adaptive divergence you've got models like Hendry, Day & Taylor 2001, and I think that model has been used by various researchers to make quantitative predictions that they have then tested in the field, although I don't have the cites at hand. But taking that a step further, to quantitative prediction of speciation rate, is tough for obvious reasons; there are so many things that likely affect the development of reproductive isolation, and the process probably depends quite intimately upon genetic details. I wouldn't trust any theoretical prediction of absolute speciation rate, I think; that seems silly on the face of it. But predictions of relative rates, predicting the effect of some parameter, is much more plausible; as I mentioned above, this would be interesting for magic traits, and one could imagine doing it for many other traits too. I think I've seen some studies of this sort, using phylogenetics data, for example looking at the effect of latitude, dispersal, or sexual selection on speciosity of clades; although separating high speciation rate from low extinction rate is a difficult problem with the phylogenetics approach.

  3. I don't think I have much to contribute to your question #2. The first papers that spring to mind are mostly by you. :-> Rice and Salt (1988 I think?) made a strong impression on me. Gavrilets and Vose 2007 (Palms on an oceanic island). I don't think I'm good at this game, though. Maybe I'm too far over on the opposite side of that dumb theory/empirical divide.

  4. In thinking about this, I realize that I think in terms of option C. As early example, the models I built were intended to do several things:

    1. Demonstrate that an interpretation of empirical data was theoretically possible. For example, when we showed that partial reproductive isolation could evolve quite rapidly (14 generations) when salmon populations colonize new environments (Hendry et al. 2000 - Science), a number of people didn't believe this was possible. Evolution just couldn't happen that rapidly and some other explanation needed be invoked - or maybe our data were just wrong. So, with Troy Day's help, I built a simple model that showed that, yes, a substantial amount of RI could evolve on the order of 14 generations (Hendry 2004 – EER).

    2. Provide a theoretical framework for making inferences from empirical data. Take for example, the problem of inferring that gene flow constrains adaptive divergence. Just how strong would this effect be? For instance, would a gene flow rate of 10% cause a decrease in adaptive divergence of 1% or 10% or 20% What would it look like in an analysis plotting the amount of gene flow (x-axis) versus the amount of adaptive divergence (y-axis) among multiple independent population pairs? For instance, would it be linear, asymptotically decay – or what? So, again with Troy Day’s help, we built a model that answered these questions and then we could use this model to interpret the data we got from stickleback populations.

    I was generally pleased with how these approaches worked but have become less excited about the simplifying assumptions that were necessary in these Gaussian analytical models. So, more recently, my students (Xavier Thibert-Plante, Jacques Labonne, and Ben Haller) have been using individual-based models to address evolutionary questions. Here are just a few examples. Xavier explored the idea that selection against migrants and hybrids could reduce gene flow substantially at unlinked neutral markers (2010 – Mol Ecol). It could but only under a limited range of parameter space. Jacques (2010 – Am Nat) explored the idea that adaptive divergence that influenced secondary sexual traits could simultaneously create a reproductive barrier (owing to natural selection) and a reproductive “enhancer” (owing to natural selection). Again, it could under a number of conditions. And Ben is exploring the idea that once populations are well adapted to a given fitness peak, stabilizing selection will no longer be detectable – thus resolving the paradox of stasis. I will let him tell the answer in some future blog.

    So, in essence, I (we) almost always use theory to explore and formalize initially-verbal hypotheses that had been developed from empirical data – your option C, I think. That is, we had an idea based on empirical data but weren’t certain whether it was viable under a plausible range of parameter values.

    In short, my feeling is that theory should be attempting to explain natural phenomena. Thus, when I go to talks and see models that are clearly unrealistic regarding fundamental properties of the phenomena that they are seeking to explain, I find them uninteresting. They are fine mathematical exercises but they are irrelevant for explaining speciation in nature, which is what we need to understand.

    Please note that I am not saying that models should be parameterized for particular systems. Instead, I am saying that models should be general but realistic enough to include the key parameters that are relevant to the phenomenon of interest. As the most obvious example, models of speciation should be sexual models, not asexual!!!!

    As for my favorite theory papers, I have to say that Lande and Arnold (1983) was the bomb for linking theory and empirical data. I can’t imagine a more important and influential paper in this respect.

  5. Quick response:

    1/I do not think empiricists should restrict themselves to "test" theoretical models. Rather, they should come up with new observations or experimental results that demand explanation, and which challenge existing theory. Thus, empiricists should not be the "slaves" of theoreticians, but rather the theoreticians should develop models that explain natural phenomena and processes.

    Personally, I feel that the statistical framework of evolutionary quantitative genetics, as outlined by Lande & Arnold (1983) and highlighted by Andrew above, is one of the most succesful integrations of theory and empirical work to date in evolutionary biology. This is because evolutionary quantitative genetics and the formalism that it represents provides a natural language (or framework) to connect field data with theory. Much of my own work both as as postdoc (Sinervo, Svensson & Comendant 2000 Nature) and as an independent PI (Svensson, Abbott & Härdling 2005 Am. Nat.) has been inspired by this largely American research tradition, which is still rather underdeveloped in my own home continent (Europe).

    2/The best paper that fuses theory and empirical data in speciation is, in my opinion, Rice & Hosters (1993, Evolution) review about "by-product speciation", i. e. how reproductive isolation develops as a correlated response to divergent selection in different environments.

    1. This comment has been removed by the author.

    2. This comment has been removed by the author.

  6. Hi. Here is late post of some papers that I think would fit in: ##Species area relationship: Losos & Schluter (2000, Nature) ## Ecological theory of adaptive radiation: Nosil & Crespi (2006, PNAS), Genetic basis and one-allele mechanism: Ortiz-Barrientos & Noor (2005, Science) ##Genetic correlations and speciation: Hawthorne & Via (2001, Nature) ##Sensory drive: Seehausen et al. (2008, Nature), Boughman (2001, Nature).

  7. For theory and data to intersect means for each to highlight the strengths and weaknesses of the other. Models are great tools for exploring "the possible" but are limited to the parameters they are given. Empiricism is a powerful way to investigate "the actual" but is muddled by the complexity of nature.

    perhaps too obvious to even post, but worth reminding students of these points.