Saturday, August 6, 2016

On Failure

By Dan Bolnick

Sitting on the beach tonight playing chess and drinking wine with my postdoc Yoel Stuart, I couldn’t help but worry about tomorrow. Tomorrow morning is a crucial step in an experiment that colleagues and I have planned for years. The idea came in 2008, but took years to get all the pieces in place. One NSF grant was funded, and completed in three years, to get preliminary data to plan a second NSF grant that was also funded to do this experiment. We also had to convince the Howard Hughes Medical Institute to build a huge fish room in a remote field station (Bamfield Marine Sciences Center). Then we had to get permits. Then we spent a year and a half breeding fish. Field trips to collect parents, personnel time to take care of F1s, a grad student RA position to live in Bamfield and cross fish to make F2s, then more personnel time to breed the fish. Then, weeks spent sewing a kilometer of netting into little cages, and cutting up and assembling two and a half kilometers of PVC piping into cage frames. A week of field work with six people to install the 160 cages in 4 locations on northern Vancouver Island.

Ready for stickleback - but will it work?
Years of preparation, multiple grants, and now a moment of truth: will the juvenile fish survive the 5 hour drive north over rough dirt roads, from their rearing facility at BMSC to their grandparents’ native lakes and streams? If not, the intended experiment will have failed before it really began. No data. No conclusions. (Admittedly, this upcoming transplant is just one of 8 planned transplants in this project, so we have the opportunity to learn, adjust, and recover.)

Naturally, I am thinking a lot about failure tonight. Not just the potential failure of this transplant experiment in particular, but broader questions of failure in science. Evening on a beach is a good time and place for broad contemplation. Pachena Beach, especially, with its slanting sunlight light through drifting fog and tall trees.

So, I find myself wondering?
  • How many attempted experiments failed for logistical reasons and just never get reported?
  • What are the various reasons why we fail?
  • What do we learn from our various experimental failures?
  • When is failure a productive source of insight, versus a plain old flop?

I started to also write the question: “How do we best insulate ourselves from failure?”, then paused. The fact is, failure is not uniformly bad. Sure, too many high-risk projects may leave us empty-handed. But, over-attention to failure can be bad too. Fear of failure can drive some people to paralysis. Others may take risks but falsify the results of failed attempts. Still others opt to rely exclusively on ‘safe’ projects, that often cover well-trodden ground and thus teach us little that is new or interesting.  This leads to the conclusion that we shouldn’t insulate ourselves from failure. Instead, we need to become good judges of scientific risk, choosing an intellectual ‘portfolio’ of projects that combine an appropriate range of risks. A mix of high- and low-risk.

So instead of asking how to avoid failed experiments, I would rather ask how we can teach aspiring students to judge risk in advance, and how to be brave but not foolhardy in taking on projects. This is surely fodder for an entire book. Such books probably even exist. I don’t know, because I am sitting on a beach without wifi (thank goodness). Lacking access to the web, I will attempt a much more modest goal with this blog post. I will attempt a taxonomy of scientific failures. And, I will illustrate these failures with vignettes from my own experience. Consider this my mea culpa of failed attempts at science. Hopefully it will be both cathartic for me, entertaining for you, and get the right karma in place to keep my north-bound fish alive in the coming day.

Spoiler alerts: the following contains some references to events in Game of Thrones. If you don’t care, fine, you can just ignore the ‘literary’ references and focus on the ideas and biology. If you’ve read the books or seen the TV show, fine. If you haven’t yet read these but intend to, then you might want to skip down to the line saying . Sorry.

Failure category 1: The Viserys Targaryen. For “Game of Thrones” aficionados, Viserys is a minor but entertaining character: the child of a dethroned king who connived to reconquer his ancestral kingdom. He thought he had a plan to do so, but sort of bumbled along and didn’t implement things very deftly, with the result that his plan fell apart (and he had molten gold poured over his head). Had he thought a bit more clearly, he should have foreseen some of these problems. So, I’ll invoke Viserys to represent a very common category of failure, in which the basic plan sees reasonable at a cursory glance, but the details and implementation don’t live up to expectations. This is perhaps the most common and most avoidable form of failure. You come away empty-handed, except perhaps with a better understanding of how NOT to design an experiment (which is indeed useful).
Viserys and his golden crown.

Failure category 2: The Wise Masters. Continuing with our literary theme (if you choose to call it that), the Wise Masters really thought that they had a well-worked out path to their goals. They simply overlooked a colossal and totally unexpected fact: their adversary had massive pet dragons. Oops. Not really something you can plan ahead for. So, we’ll give the Wise Masters a nod in naming failures in which truly unforeseeable problems undermine otherwise well-thought-through plans. These may be more common than we like to think, but are inherently less avoidable than the Viserys Targaryen. Admittedly, these two kinds of failure are going to overlap a bit: an unexpected ‘dragon’ to one researcher might be foreseeable to another. This is why you should show your research plan to colleagues and mentors as much as possible – someone out there may anticipate your particular dragon.

The Wise Masters are about to meet Drogon the dragon

Failure category 3: The Eddard Stark. This one is simple: many beautiful hypotheses are slain by ugly facts. Much like the idealistic Eddard Stark tried to govern but was undermined by the sad fact that political reality was different than he naively believed. We could equally name this after his son, Robb Stark, King of the North, who went to a wedding of an aggrieved subordinate, incorrectly assuming that the rules of hospitality could be trusted. This is the kind of failure that philosophers of science have indeed written volumes about: we have hypotheses about how the world works. We design experiments or other kinds of studies to test these hypotheses (or their null alternatives). Sometimes we ‘fail to reject the null hypothesis’. This is a failure in the entirely constructive sense that we do indeed learn something. Unlike the previous two kinds of failure, we actually get data, we analyze it, and we were wrong about something. We learn something about in the process.

Ned Stark pays the price for honor - or is it naivete?

Failure category 4. The Great Houses. The core of the book series of course, is the battle for political dominion among several families (the ‘Great Houses’), which are so focused on their squabbles that they totally overlooked a fundamental fact that their collective existence was threatened by semi-human magical winter beings. Kind of an important thing to know about, and they had some warnings thanks to the Night’s Watch. Likewise, every now and then we scientists get a hint of something really substantially new and surprising, and we often are so focused on our previous agenda that we overlook the hint, not recognizing the importance of what we just saw. This is perhaps the most problematic failure, because it represents lost opportunities for novel insight.

To summarize, our taxonomy of failures includes 1) poor planning leading to avoidable problems, 2) unexpected interference, 3) incorrect hypotheses, and 4) overlooking important things. I’ve probably failed to include something here – feel free to chime in on the comments.

Now, in the spirit of full disclosure I want to give a few examples of my own, in the hope these will help students or colleagues avoid similar mistakes, raise awareness that one’s career can survive failures (I think…), and perhaps even entertain.

Vignette 1: When I pulled a ‘Viserys Targaryen’, also known among my graduate students and postdocs as ‘Bolnick’s folly’. When I first started working on stickleback I did an experiment in one half of an hour-glass shaped lake. I later returned to that lake to examine the other half in more detail, discovering that the stickleback in the larger deeper basin and shallow small basin were dramatically different in diet (more so than the famed benthic-limnetic species pairs of stickleback). Yet despite this massive ecological difference, their phenotypes were only subtly divergent. Why not diverge as much as the species pairs? 

Ormond and Dugout Lakes on Vancouver Island. The narrow marsh separating them can be clearly seen. The barrier to dispersal was built across that marsh.

Presumably because the two lake basins are connected by a narrow marsh  (~20 meters wide) that permits free movement of migrants between the basins (Bolnick et al 2008 Biol J. Linn Soc.). So, obvious experiment: create a barrier to movement, and track the subsequent emergence of genetic and phenotypic differences, then remove the barrier and watch those differences collapse. All I needed was a barrier. So, in 2007 I found myself back in British Columbia with two field assistants and an extra week on our hands between other tasks. I had planned ahead and obtained permission to build a barrier and leave it in for a decade (~10 generations). All that remained was to make the barrier reality. We installed sturdy steel 8 foot tall fence posts in a transect across the entire neck of marsh connecting the two lakes. We attached chain-link fencing, carefully sunk into the substrate of the marsh all the way across (~30 meters wide including semi-marshy habitat that probably rarely allows migration, but we had to block that just in case). We then attached a fine screen to this fencing - We had to build it with fine enough mesh to prevent passage even of juvenile fish, so we used a sturdy type of coarse mosquito netting. One layer of netting on either side of the fence. Then we installed another layer of chain link fencing to sandwich a mosquito net in between, for added strength. All of this was buried deeply into the substrate, which involved several days of lying face down in muck in our wetsuits cutting into the peat with a saw. The end product looked satisfyingly sturdy (Fig. 2). 

Building the barrier across the marsh.

Now, I knew all along that water flowed from the smaller basin into the larger one – imperceptibly slowly, but still flowing. And I knew therefore that the fence would get water pressure and sediment build-up. But I figured water would keep seeping through, maybe raise the water level a bit. I knew this might be wrong, and the whole thing might fall apart due to water pressure. But, it was a risk I was willing to take.

Ten months later when we returned to the site, it was a mess. The barrier had clearly worked at keeping stuff from moving between the lakes – including small sediment, which built up. The fence became a dam. And those 8’foot tall fence posts were stuck firmly in the sediment (job well done!) but were not up to the task of holding back a 4 hectare lake. They bent over like straws. We found the whole fence lying on its side (Fig. 3), not because the posts came out but because the steel beams bent to nearly 90 degree angles to let the water over them. Experiment finished, no data, no biological lesson. I’d still love to do that experiment, but I just don’t know how to engineer it myself.

The Experimental outcome – no experiment

So, I took a risk, and my design did not work, so the experiment flopped, literally on its side. On the plus side, the total cost was maybe $1000 in materials and three people’s time for 5 days to build it. Low cost, high possible reward, high risk. Did I make the right decision to try this? Perhaps not, but it was exciting while it lasted and makes a fun story.

Vignette 2: When ‘dragons’ – specifically, trout – ate my graduate student’s experiment. My student Brian Lohman and I planned a study in which we would capture individual fish and collect detailed data on their microhabitat at the capture location – then mark and release them. We’d do that for a month, every day, all over a small 4 hectare lake (different one than above). Hopefully we’d get multiple captures of many individuals, obtaining detailed measures of individual movement distances and habitat use. Then we could use a habitat map to evaluate the role of habitat choice in dispersal decisions within a single lake. Things went swimmingly for weeks – it was wet and windy and grey, but otherwise Brian was able to mark a large number of fish. But as time went on, and the number of recaptures stayed at less than 10, he was puzzled. Then, on the first sunny calm day he could finally see what was going on below the surface of the water. Some local trout had apparently learned to associate his small boat with the periodic arrival of momentarily disoriented stickleback. Fish after fish was released back at their capture site, only to be instantly eaten. Not something we had ever experienced before or thought to anticipate, but the end result was too few recaptured (surviving) fish to execute the intended study.

Vignette 3: My ‘Stark’ mistakes – or ‘misStarks’: hypotheses I thought would be correct, but ultimately proved to be unsupported. There are quite a few of these. And reassuringly, many are published – you can publish negative results. I’ll pick one example that I find most instructive. In 2009 I had dinner with Rob Knight, and over wine afterwards we compared our research projects (I talked about individual diet variation in natural populations, he talked about diet effects on gut microbiota in humans and lab mice). We conceived of a simple side-project collaboration: I take an already-existing sample of 200 stickleback from one day in one lake, and get stable isotope data from each individual to characterize their diets. I send Rob DNA extracted from their intestines, and he uses next generation sequencing to characterize their gut microbiota. Then we ask whether among-individual diet variation in wild vertebrates correlates with among-individual variation in gut microbiota. We knew how to execute each lab step, and had done it before. We had the samples in hand. All systems go. Then, when we had the data, the first pass analysis found no significant effect of individual diet (carbon or nitrogen isotope ratio as the metric) and individual microbiota composition. To give you an idea of how odd this was, let me point out that there are tons of studies in humans and mice showing that diet changes the microbiota. This was such an accepted thing, that everyone I talked to about this just said “well of course it’ll work, but it’ll certainly be cool to show this in a wild population for once” – or some variant on that sentiment. But, no significant effect.

After some head-scratching, the reason for our false expectation became clear: although sex had little significant effect on the microbiota, and diet had no significant effect on the microbiota, there was a strong sex*diet interaction. Basically, diet alters the microbiota in males, and in females, but it does so differently in each sex so that in a mixed sample (even keeping sex as a factor), the diet effect is obscured. So, our initial expectation failed because something more subtle was going on (Bolnick et al 2014 Nature Communications).

This particular story illustrates the point that sometimes our failures are because we over-simplified, and if we dig a bit deeper we discover something even more interesting. That’s not to say every failure to reject a null hypothesis leads to some more interesting and subtle insight. Sometimes our alternate hypotheses are truly incorrect, or at least not supported in any way. I’ve put out substantial effort in some studies only to get ambiguous results or no significant support for a core hypothesis (Ingram et al 2011).

Vignette 4: My most embarrassing Great-House failure, however, is just now making itself clear. I’ve collected stickleback for 17 years almost now, and have dissected large numbers of wild-caught fish to determine sex, obtain stomach contents, or examine parasite loads. In all that time, I would frequently dissect fish whose internal organs were oddly fused together – like someone had injected glue inside. I didn’t really know what to make of it, so I ignored it. But now, that overlooked observation is turning out to be a key feature of a story my lab is building up at the moment and approaching publication. Rather than spoil the surprise, I’ll leave the details for another post when this paper is done and published – suffice to say, there are cool biological processes under our noses, and we sometimes just pass them by because we are so busy with our pre-planned agenda.

I suppose the moral of vignette 4 is to remain observant of the natural history of your system, to avoid missing the proverbial “White Walker in the Room”. Ask questions about oddities that you notice, even if it is not in your planned linear trajectory. Constant vigilance! Because it might be something really neat that you are just about to pass by. Let’s face it, we spend so much of our time meticulously planning our experiments to avoid Viserys or Wise Masters type mistakes, and we spend money and time pursuing large sample sizes so we minimize the risk of statistical errors. But the best laid experimental design also generates some blinders that may stop us from noticing the things we never even thought to ask questions about.

To put this all together, I hope it is clear that there are many ways we can fail in science, and that some failures are to be expected – you just don’t know in advance which experiments will fail, and in what way. But sometimes you have a pretty good idea which might fail. That’s not a reason to abandon all hope – sometimes it is worth trying anyway, just in case. Just keep a broad portfolio of studies so you always have a variety of levels and types of risk of failure – that way something will pan out. Speaking of which, (this being the day after I started writing this post), I should be hearing momentarily from my crew whether the 690 fish survived the drive north. I’ll keep you posted. In the meantime, please feel free to respond to this post by putting in comments of any field or lab experiments of your own that just crashed and burned.


  1. Updates:
    1) The fish being transported from Bamfield Marine Sciences Center to the field were fine! Very high survival. Which is not to guarantee that survival remains high enough for the next stage of the experiment.
    2) A minor failure: once the experiment was over we had a couple extra days for side-projects. One side-project was a simple task of sampling along a transect. The idea was to set traps at 1-meter intervals across a lake-stream boundary to fine-map the location of a remarkably abrupt cline that we have found. We've collected these sites before with high (>10 fish/trap) capture rates. Not this time. A full day of two of us crashing through thick brush, deep mud, balancing across deep streams over slippery wet logs while carrying scores of traps... yielded a total of 6 fish. The fish were there in abundance, we could see them! They just didn't care for traps that day. An unforeseen complication that ran contrary to my experience at that site...

    1. Sometimes the small failures are needed to make up for the big successes.

  2. In doing some experiments with Drosophila on BCI, I needed to collect large species pools of flies from the forest from hanging bait buckets of rotting fruit.

    What could go wrong--surely nothing else in the forest will be interested in my rotting fruit right? Long story short, it turns out coatis are very smart, and it took me several weeks to figure out a relatively coati-proof portable enclosure in which to hang the bucket which consisted of a roll-up camp table, heavy chicken wire, and lots of tent pegs and bungee cords.

    All in all a simple, low-cost failure that ended up ok. (The bigger failure was never writing the 2 or 3 other Drosophila papers I meant to from all that work, but that's different.) But it does seem like there should be a special category of field experiment failure that involves swearing at other creatures.