By Dan Bolnick
Sitting on the beach tonight playing chess and
drinking wine with my postdoc Yoel Stuart, I couldn’t help but worry about
tomorrow. Tomorrow morning is a crucial step in an experiment that colleagues
and I have planned for years. The idea came in 2008, but took years to get all
the pieces in place. One NSF grant was funded, and completed in three years, to
get preliminary data to plan a second NSF grant that was also funded to do this
experiment. We also had to convince the Howard Hughes Medical Institute to
build a huge fish room in a remote field station (Bamfield Marine Sciences Center).
Then we had to get permits. Then we spent a year and a half breeding fish.
Field trips to collect parents, personnel time to take care of F1s, a grad
student RA position to live in Bamfield and cross fish to make F2s, then more
personnel time to breed the fish. Then, weeks spent sewing a kilometer of
netting into little cages, and cutting up and assembling two and a half
kilometers of PVC piping into cage frames. A week of field work with six people
to install the 160 cages in 4 locations on northern Vancouver Island.
Ready for stickleback - but will it work? |
Years of preparation, multiple grants, and now a
moment of truth: will the juvenile fish survive the 5 hour drive north over
rough dirt roads, from their rearing facility at BMSC to their grandparents’
native lakes and streams? If not, the intended experiment will have failed
before it really began. No data. No conclusions. (Admittedly, this upcoming
transplant is just one of 8 planned transplants in this project, so we have the
opportunity to learn, adjust, and recover.)
Naturally,
I am thinking a lot about failure tonight. Not just the potential failure of
this transplant experiment in particular, but broader questions of failure in
science. Evening on a beach is a good time and place for broad contemplation.
Pachena Beach, especially, with its slanting sunlight light through drifting
fog and tall trees.
So, I find myself wondering?
- How many attempted experiments failed for logistical reasons and just never get reported?
- What are the various reasons why we fail?
- What do we learn from our various experimental failures?
- When is failure a productive source of insight, versus a plain old flop?
I started to also write the question: “How do we
best insulate ourselves from failure?”, then paused. The fact is, failure is
not uniformly bad. Sure, too many high-risk projects may leave us empty-handed.
But, over-attention to failure can be bad too. Fear of failure can drive some
people to paralysis. Others may take risks but falsify the results of failed
attempts. Still others opt to rely exclusively on ‘safe’ projects, that often
cover well-trodden ground and thus teach us little that is new or
interesting. This leads to the
conclusion that we shouldn’t insulate ourselves from failure. Instead, we need
to become good judges of scientific risk, choosing an intellectual ‘portfolio’
of projects that combine an appropriate range of risks. A mix of high- and
low-risk.
So instead of asking how to avoid failed
experiments, I would rather ask how we can teach aspiring students to judge
risk in advance, and how to be brave but not foolhardy in taking on projects.
This is surely fodder for an entire book. Such books probably even exist. I
don’t know, because I am sitting on a beach without wifi (thank goodness).
Lacking access to the web, I will attempt a much more modest goal with this
blog post. I will attempt a taxonomy of scientific failures. And, I will
illustrate these failures with vignettes from my own experience. Consider this
my mea culpa of failed attempts at
science. Hopefully it will be both cathartic for me, entertaining for you, and
get the right karma in place to keep my north-bound fish alive in the coming
day.
Spoiler alerts: the following contains some
references to events in Game of Thrones. If you don’t care, fine, you can just
ignore the ‘literary’ references and focus on the ideas and biology. If you’ve
read the books or seen the TV show, fine. If you haven’t yet read these but
intend to, then you might want to skip down to the line saying . Sorry.
Failure category 1: The Viserys Targaryen. For
“Game of Thrones” aficionados, Viserys is a minor but entertaining character:
the child of a dethroned king who connived to reconquer his ancestral kingdom.
He thought he had a plan to do so, but sort of bumbled along and didn’t implement
things very deftly, with the result that his plan fell apart (and he had molten
gold poured over his head). Had he thought a bit more clearly, he should have
foreseen some of these problems. So, I’ll invoke Viserys to represent a very
common category of failure, in which the basic plan sees reasonable at a
cursory glance, but the details and implementation don’t live up to
expectations. This is perhaps the most common and most avoidable form of
failure. You come away empty-handed, except perhaps with a better understanding
of how NOT to design an experiment (which is indeed useful).
Viserys and his golden crown. |
Failure category 2: The Wise Masters. Continuing
with our literary theme (if you choose to call it that), the Wise Masters
really thought that they had a well-worked out path to their goals. They simply
overlooked a colossal and totally unexpected fact: their adversary had massive
pet dragons. Oops. Not really something you can plan ahead for. So, we’ll give
the Wise Masters a nod in naming failures in which truly unforeseeable problems
undermine otherwise well-thought-through plans. These may be more common than
we like to think, but are inherently less avoidable than the Viserys Targaryen.
Admittedly, these two kinds of failure are going to overlap a bit: an unexpected
‘dragon’ to one researcher might be foreseeable to another. This is why you
should show your research plan to colleagues and mentors as much as possible –
someone out there may anticipate your particular dragon.
The Wise Masters are about to meet Drogon the dragon |
Failure category 3: The Eddard Stark. This one
is simple: many beautiful hypotheses are slain by ugly facts. Much like the
idealistic Eddard Stark tried to govern but was undermined by the sad fact that
political reality was different than he naively believed. We could equally name
this after his son, Robb Stark, King of the North, who went to a wedding of an
aggrieved subordinate, incorrectly assuming that the rules of hospitality could
be trusted. This is the kind of failure that philosophers of science have
indeed written volumes about: we have hypotheses about how the world works. We
design experiments or other kinds of studies to test these hypotheses (or their
null alternatives). Sometimes we ‘fail to reject the null hypothesis’. This is
a failure in the entirely constructive sense that we do indeed learn something.
Unlike the previous two kinds of failure, we actually get data, we analyze it,
and we were wrong about something. We learn something about in the process.
Ned Stark pays the price for honor - or is it naivete? |
Failure category 4. The Great Houses. The core
of the book series of course, is the battle for political dominion among
several families (the ‘Great Houses’), which are so focused on their squabbles
that they totally overlooked a fundamental fact that their collective existence
was threatened by semi-human magical winter beings. Kind of an important thing
to know about, and they had some warnings thanks to the Night’s Watch.
Likewise, every now and then we scientists get a hint of something really
substantially new and surprising, and we often are so focused on our previous
agenda that we overlook the hint, not recognizing the importance of what we
just saw. This is perhaps the most problematic failure, because it represents
lost opportunities for novel insight.
To summarize, our taxonomy of failures includes
1) poor planning leading to avoidable problems, 2) unexpected interference, 3)
incorrect hypotheses, and 4) overlooking important things. I’ve probably failed
to include something here – feel free to chime in on the comments.
Now, in the spirit of full disclosure I want to
give a few examples of my own, in the hope these will help students or
colleagues avoid similar mistakes, raise awareness that one’s career can
survive failures (I think…), and perhaps even entertain.
Ormond and Dugout Lakes on Vancouver Island. The narrow marsh separating them can be clearly seen. The barrier to dispersal was built across that marsh. |
Presumably because the two lake basins are connected by a narrow marsh (~20 meters wide) that permits free movement of migrants between the basins (Bolnick et al 2008 Biol J. Linn Soc.). So, obvious experiment: create a barrier to movement, and track the subsequent emergence of genetic and phenotypic differences, then remove the barrier and watch those differences collapse. All I needed was a barrier. So, in 2007 I found myself back in British Columbia with two field assistants and an extra week on our hands between other tasks. I had planned ahead and obtained permission to build a barrier and leave it in for a decade (~10 generations). All that remained was to make the barrier reality. We installed sturdy steel 8 foot tall fence posts in a transect across the entire neck of marsh connecting the two lakes. We attached chain-link fencing, carefully sunk into the substrate of the marsh all the way across (~30 meters wide including semi-marshy habitat that probably rarely allows migration, but we had to block that just in case). We then attached a fine screen to this fencing - We had to build it with fine enough mesh to prevent passage even of juvenile fish, so we used a sturdy type of coarse mosquito netting. One layer of netting on either side of the fence. Then we installed another layer of chain link fencing to sandwich a mosquito net in between, for added strength. All of this was buried deeply into the substrate, which involved several days of lying face down in muck in our wetsuits cutting into the peat with a saw. The end product looked satisfyingly sturdy (Fig. 2).
Building the barrier across the marsh. |
Now, I knew all along that water flowed from the smaller basin into the larger one – imperceptibly slowly, but still flowing. And I knew therefore that the fence would get water pressure and sediment build-up. But I figured water would keep seeping through, maybe raise the water level a bit. I knew this might be wrong, and the whole thing might fall apart due to water pressure. But, it was a risk I was willing to take.
The Experimental outcome – no experiment |
So, I took a risk, and my design did not work, so the experiment flopped, literally on its side. On the plus side, the total cost was maybe $1000 in materials and three people’s time for 5 days to build it. Low cost, high possible reward, high risk. Did I make the right decision to try this? Perhaps not, but it was exciting while it lasted and makes a fun story.
Vignette 2: When ‘dragons’ – specifically, trout
– ate my graduate student’s experiment. My student Brian Lohman and I planned a
study in which we would capture individual fish and collect detailed data on
their microhabitat at the capture location – then mark and release them. We’d
do that for a month, every day, all over a small 4 hectare lake (different one
than above). Hopefully we’d get multiple captures of many individuals,
obtaining detailed measures of individual movement distances and habitat use.
Then we could use a habitat map to evaluate the role of habitat choice in
dispersal decisions within a single lake. Things went swimmingly for weeks – it
was wet and windy and grey, but otherwise Brian was able to mark a large number
of fish. But as time went on, and the number of recaptures stayed at less than 10, he
was puzzled. Then, on the first sunny calm day he could finally see what was
going on below the surface of the water. Some local trout had apparently
learned to associate his small boat with the periodic arrival of momentarily
disoriented stickleback. Fish after fish was released back at their capture
site, only to be instantly eaten. Not something we had ever experienced before
or thought to anticipate, but the end result was too few recaptured (surviving)
fish to execute the intended study.
Vignette 3: My ‘Stark’ mistakes – or
‘misStarks’: hypotheses I thought would be correct, but ultimately proved to be
unsupported. There are quite a few of these. And reassuringly, many are
published – you can publish negative results. I’ll pick one example that I find
most instructive. In 2009 I had dinner with Rob Knight, and over wine
afterwards we compared our research projects (I talked about individual diet
variation in natural populations, he talked about diet effects on gut
microbiota in humans and lab mice). We conceived of a simple side-project collaboration:
I take an already-existing sample of 200 stickleback from one day in one lake,
and get stable isotope data from each individual to characterize their diets. I
send Rob DNA extracted from their intestines, and he uses next generation
sequencing to characterize their gut microbiota. Then we ask whether
among-individual diet variation in wild vertebrates correlates with
among-individual variation in gut microbiota. We knew how to execute each lab
step, and had done it before. We had the samples in hand. All systems go. Then,
when we had the data, the first pass analysis found no significant effect of
individual diet (carbon or nitrogen isotope ratio as the metric) and individual
microbiota composition. To give you an idea of how odd this was, let me point
out that there are tons of studies in humans and mice showing that diet changes
the microbiota. This was such an accepted thing, that everyone I talked to
about this just said “well of course it’ll work, but it’ll certainly be cool to
show this in a wild population for once” – or some variant on that sentiment.
But, no significant effect.
After some head-scratching, the reason for our
false expectation became clear: although sex had little significant effect on
the microbiota, and diet had no significant effect on the microbiota, there was
a strong sex*diet interaction. Basically, diet alters the microbiota in males,
and in females, but it does so differently in each sex so that in a mixed
sample (even keeping sex as a factor), the diet effect is obscured. So, our
initial expectation failed because something more subtle was going on (Bolnick
et al 2014 Nature Communications).
This particular story illustrates the point that
sometimes our failures are because we over-simplified, and if we dig a bit deeper
we discover something even more interesting. That’s not to say every failure to
reject a null hypothesis leads to some more interesting and subtle insight.
Sometimes our alternate hypotheses are truly incorrect, or at least not
supported in any way. I’ve put out substantial effort in some studies only to
get ambiguous results or no significant support for a core hypothesis (Ingram
et al 2011).
Vignette 4: My most embarrassing Great-House
failure, however, is just now making itself clear. I’ve collected stickleback
for 17 years almost now, and have dissected large numbers of wild-caught fish
to determine sex, obtain stomach contents, or examine parasite loads. In all
that time, I would frequently dissect fish whose internal organs were oddly
fused together – like someone had injected glue inside. I didn’t really know
what to make of it, so I ignored it. But now, that overlooked observation is turning
out to be a key feature of a story my lab is building up at the moment and
approaching publication. Rather than spoil the surprise, I’ll leave the details
for another post when this paper is done and published – suffice to say, there
are cool biological processes under our noses, and we sometimes just pass them
by because we are so busy with our pre-planned agenda.
I suppose the moral
of vignette 4 is to remain observant of the natural history of your system, to
avoid missing the proverbial “White Walker in the Room”. Ask questions about
oddities that you notice, even if it is not in your planned linear trajectory.
Constant vigilance! Because it might be something really neat that you are just
about to pass by. Let’s face it, we spend so much of our time meticulously
planning our experiments to avoid Viserys or Wise Masters type mistakes, and we
spend money and time pursuing large sample sizes so we minimize the risk of
statistical errors. But the best laid experimental design also generates some
blinders that may stop us from noticing the things we never even thought to ask
questions about.
Updates:
ReplyDelete1) The fish being transported from Bamfield Marine Sciences Center to the field were fine! Very high survival. Which is not to guarantee that survival remains high enough for the next stage of the experiment.
2) A minor failure: once the experiment was over we had a couple extra days for side-projects. One side-project was a simple task of sampling along a transect. The idea was to set traps at 1-meter intervals across a lake-stream boundary to fine-map the location of a remarkably abrupt cline that we have found. We've collected these sites before with high (>10 fish/trap) capture rates. Not this time. A full day of two of us crashing through thick brush, deep mud, balancing across deep streams over slippery wet logs while carrying scores of traps... yielded a total of 6 fish. The fish were there in abundance, we could see them! They just didn't care for traps that day. An unforeseen complication that ran contrary to my experience at that site...
Sometimes the small failures are needed to make up for the big successes.
DeleteIn doing some experiments with Drosophila on BCI, I needed to collect large species pools of flies from the forest from hanging bait buckets of rotting fruit.
ReplyDeleteWhat could go wrong--surely nothing else in the forest will be interested in my rotting fruit right? Long story short, it turns out coatis are very smart, and it took me several weeks to figure out a relatively coati-proof portable enclosure in which to hang the bucket which consisted of a roll-up camp table, heavy chicken wire, and lots of tent pegs and bungee cords.
All in all a simple, low-cost failure that ended up ok. (The bigger failure was never writing the 2 or 3 other Drosophila papers I meant to from all that work, but that's different.) But it does seem like there should be a special category of field experiment failure that involves swearing at other creatures.