Sunday, January 1, 2017

F**k replication. F**k controls.

Just kidding – high replication and proper controls are the sine qua non of experimental science, right? Or are they, given that high replication and perfect controls are sometimes impossible or trade-off with other aspects of inference? The point of this post is that low replication and an absence of perfect controls can sometimes indicate GOOD science – because the experiments are conducted in a context where realism is prioritized.

Replication and controls are concepts that are optimized for laboratory science, where both aspects of experimental design are quite achievable with relatively low effort – or, at least, low risk. The basic idea is to have some sort of specific treatment (or treatments) that is (are) manipulated in a number of replicates but not others (the controls), with all else being held constant. The difference between the shared response for the treatment replicates and the shared response (or lack thereof) for the control replicates is taken as the causal effect of the specific focal manipulation.

However, depending on the question being asked, laboratory experiments are not very useful because they are extracted from the natural world, which is – after all – the context we are attempting to make inferences about. Indeed, I would argue that pretty much any question about ecology and evolution cannot be adequately (or at least sufficiently) addressed in laboratory experiments because laboratory settings are too simple and too controlled to be relevant to the real world.

1. Most laboratory experiments are designed to test for the effect of a particular treatment while controlling for (eliminating) variation in potential confounding and correlated factors. But why would we care about the effect of some treatment abstracted from all other factors that might influence its effects in the real world? Surely we what we actually care about is the effect of a particular causal factor specifically within the context of all other uncontrolled – and potentially correlated and confounding – variation in the real world.

2. Most laboratory experiments use highly artificial populations that are not at all representative of real populations in nature – and which should therefore evolve in unrepresentative ways and have unrepresentative ecological effects (even beyond the unrealistic laboratory “environment”). For example, many experimental evolution studies start with a single clone, such that all subsequent evolution must occur through new mutations – but when is standing genetic variation ever absent in nature? As another example, many laboratory studies use – quite understandably – laboratory-adapted populations; yet such populations are clearly not representative of natural populations.

In short, laboratory experiments can tell us quite a bit about laboratory environments and laboratory populations. So, if that is how an investigator wants to focus inferences, then everything is fine – and replicates and controls are just what one wants. I would argue, however, that what we actually care about in nearly all instances is real populations in real environments. For these more important inferences, laboratory experiments are manifestly unsuitable (or at least insufficient) – for all of the reasons described above. Charitably, one might say that laboratory experiments are “proof of concept.” Uncharitably, one might say they tend to be “elegantly irrelevant.”

After tweeting a teaser about this upcoming post, I received a number of paper suggestions. I like this set.
To make the inferences we actually care about – real populations in real environments – we need experiments WITH real populations in real environments. Such experiments are the only way to draw robust and reliable and relevant inferences. Here then is the rub: in field experiments, high replication and/or precise controls can be infeasible or impossible. Here are some examples from my own work:

1. In the mid 2000s, I trotted a paper around the big weeklies about how a bimodal (in beak size) population of Darwin’s finches had lost their bimodality in conjunction with increasing human activities at the main town on Santa Cruz Island, Galapagos. Here we had, in essence, an experiment where a bimodal population of finches was subject to increasing human influences. Reviewers at the weeklies complained that we didn’t have any replicates of the “experiment.” (We did have a control – a bimodal population in the absence of human influences.) It was true! We did not have any replicates simply because no other situation is known where a bimodal population of Darwin’s finches came into contact with an expanding human population. Based on this criticism of no replication – despite the fact that replication was both impossible and irrelevant – our paper was punted from weeklies. Fortunately, it did end up in a nice venue (PRSB) – and has since proved quite influential.

Bimodality prior to the 1970s has been lost to the present at a site with increasing human influence (AB: "the "experiment") but not at a site with low human influence (EG: "the control"). This figure is from my book.

2. More recently, we have been conducting experimental evolution studies in nature with guppies. In a number of these studies, we have performed REPLICATE experimental introductions in nature: in one case working with David Reznick and collaborators to introduce guppies from one high-predation (HP) source population into several low-predation (LP) environments that previously lacked guppies. Although several of these studies have been published, we have received – and continue to receive – what seem to me to be misguided criticisms. First, we don’t have a true control, which is suggested to be introducing HP guppies into some guppy-free HP environment. However, few such environments exist and, when such introductions are attempted (Reznick, pers. comm.), the guppies invariably go extinct. So, in essence, this HP-to-HP control is impossible. Second, our studies have focused on only two to four of the replicate introductions, which has been criticized because N=2 (or N=4) is too low to make general conclusions about the drivers of evolutionary change. Although it is certainly true that N=10 would be wonderful, it is simply not possible in nature owing to limited available of suitable introduction sites. Moreover, N=2 (N=1 even) is quite sufficient to infer how those specific populations are evolving, and, for N>1, whether they are evolving similarly or differently.

Real, yes, but not unlimited.

3. Low numbers of replicate experiments have also been criticized because too many other factors vary idiosyncratically among our experimental sites (they are real, after all) to allow general conclusions. The implication is that we should not be doing such experiments in nature because we can’t control for other covarying and potentially confounding factors – and because the large numbers of replicates necessary to statistically account for those other factors are not possible. I first would argue that the other covarying and confounding factors are REAL, and we should not be controlling them but rather embracing their ability to produce realism. Hence, if two replicates show different responses to the same experimental manipulation, those different responses are REAL and show that the specific manipulation is NOT generating a common response when layered onto the real complexities of nature. Certainly, removing those other factors might yield a common response to the manipulation but that response would be fake – in essence, artificially increasing an effect size by reducing the REAL error variance.

For experiments the experiments that matter, replication and controls trade-off with realism – and realism is much more important. A single N=2 uncontrolled field experiment is worth many N=100 lab experiments. A single N=1 controlled field experiment is worth many different controlled lab experiments. Authors (and reviewers and editors) should prioritize accordingly.

1. It is certainly true that limited replication and imperfect controls mean that some inferences are limited. Hence, it is important to summarize what can and cannot be inferred under such conditions. I will outline some of these issues in the context of experimental evolution.

2. Even without replication and controls, inferences are not compromised about evolution in the specific population under study. That is, if evolution is document in a particular population, then evolution did occur in that population in that way in that experiment. Period.

3. With replication (let’s say N=2 experiments), inferences are not compromised about similarities and differences in evolution in the two experiments. That is, if evolution is similar in two experiments, it is similar. Period. If evolution is different in two experiments, it is different. Period.

4. What is more difficult is making inferences about specific causality: that is, was the planned manipulation the specific cause of the evolution observed, or was a particular confounding factor the specific cause of the difference between two replicates? Despite these limitations, an investigator can still make several inferences. Most importantly, if evolution occurs differently in two replicates subject to the same manipulation (predation or parasitism or whatever), then that manipulation does NOT have a universal over-riding effect on evolutionary trajectories in nature. Indeed, experiment-specific outcomes are a common finding in our studies: despite a massive shared shift in a particular set of environmental conditions, replicate populations can sometimes respond in quite different ways. This outcome shows that context is very important and, thereby, highlights the insufficiency of laboratory studies that reduce or eliminate context-dependence and, critically, its idiosyncratic variation among populations. Ways to improve causal inferences in such cases are to use “virtual controls,” which amount to clear a priori expectations about ecological and evolutionary effects of a given manipulation, and or “historical replicates,” which can come from other experimental manipulations done by other authors in other studies. Of course, such alternative methods are still attended by caveats that need to be made clear.

I argue that ecological and evolutionary inferences require experiments with actual populations in nature, which should be prioritized at all levels of the scientific process even if replication is low and controls are imperfect. Of course, I am not arguing for sloppy science – such experiments should still be designed and implemented in the best possible manner. Yet only experiments of this sort can tell us how the real world works. F**k replication and f**k controls if they get in the way of the search for truth.

Additional points:

1. I am not the first frustrated author to make these types of arguments. Perhaps the most famous defense of unreplicated field experiments was that by Stephen Carpenter in the context of whole-lake manipulations. Carpenter also argued that mesococosms were not very helpful for understanding large scale phenomena. 

2. Laboratory experiments are obviously useful for some things, especially physiological studies that ask, for example, how do temperature and food influence metabolism in animals and how do light and nutrients influence plant growth. Even here, however, those influences are likely context dependence and could very well differ in the complex natural wold. Similarly, laboratory studies are useful for asking questions such as “If I start with a particular genetic background and impose a particular selective condition under a particular set of otherwise controlled conditions, how will evolution proceed?” Yet those studies must recognize that the results are going to be irrelevant outside of that particular genetic background and that particular selective condition under that particular set of controlled conditions.

3. Skelly and Kiesecker (2001 – Oikos) have an interesting paper where they compare and contrast effect sizes and sample sizes in different “venues” (lab, mesocosms, enclosures in nature) testing for effects of competition on tadpole growth. They report that the different venues yielded quite different experimental outcomes, supporting my points above that lab experiments don’t tell us much about nature. They also report that replication did not decrease from the lab to the more realistic venues – but the sorts of experiments reviewed are not the same sort of real-population real-environment experiments described above, where trade-offs are inevitable.

From Skelly and Kiesecker (2001 - Oikos).
4. Speaking of mesocosms (e.g., cattle tanks or bags in lakes), perhaps they are the optimal compromise between the lab and nature, allowing for lots of replication and for controls in realistic settings. Perhaps. Perhaps not. It will all depend on the specific organisms, treatments, environments, and inferences. The video below is an introduction to the cool new mesocosm array at McGill.

5. Some field experimental evolution studies can have nice replication, such as the islands used for Anolis lizard experiments. However, unless we want all inferences to come from these few systems, we need to also work in other contexts, where replication and controls are harder (or impossible).

6. Some investigators might read this blog and think “What the hell, Hendry just rejected me because I lacked appropriate controls in my field experiment?” Indeed, I do sometimes criticize field studies for the lack of a control (or replication) but that is because the inferences attempted by the authors do not match the inferences possible from the study design. For instance, inferring a particular causal effect often requires replication and controls – as noted above. 


  1. Interesting and important post, Andrew. As an experimental biologist who generally works in the field I'm often asked by lab biologists questions along the line of 'Why can't you do that in the lab?'. Given that lab experiments are obviously useful under some circumstances, it would be good to have a 'decision tree' schematic to outline the conditions under which a lab/mesocosm/field experiment is ideal.

    There are a number of cases where a field experiment would likely be 'overkill'. And as you say there are many more cases where the abstractions of the lab are far too removed from nature. There's a real cost of field studies though, because data collection is slower and the possibility of failure is higher.

    When you say ... "the results are going to be irrelevant outside of that particular genetic background and that particular selective condition under that particular set of controlled conditions.", somebody could say the same about field experiments. For example, if I do an about the evolution of defenses in plants, your critique could be applied to my chosen genetic background (i.e., white clover), particular selective condition (i.e., herbivores in Toronto), and controlled conditions (i.e., plants in a 1x1 m array). While my study was a powerful test of my specific hypothesis, is it really more generally relevant just because it was conducted outside? Or is it only relevant for clover, in Toronto, in 1x1 m arrays?

    I plan to keep working outside, but I think we ought to recognize situations where field experiments are unnecessarily complicated and results from the lab may hold up generally.

  2. Two other refs to consider:

    Oksanen, L., 2001. Logic of experiments in ecology: is pseudoreplication a pseudoissue? Oikos 94, 27–38.
    I've always been fond of Alternative #3 here - in some cases, for experiments of realistic scale when replicating a treatment is prohibitive, replicate the heck out of your controls to establish a reference distribution.


    Ruesink, J., 2000. Intertidal mesograzers in field microcosms: linking laboratory feeding rates to community dynamics. J. Exp. Mar. Biol. Ecol. 248, 163–176.
    A really nice study that shows the difference in grazing rates/effect size with the same species in the lab versus field.


A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...