Wednesday, September 25, 2019

Does startup size predict subsequent grant success?

Warning: the following is a small sample size survey not conducted especially scientifically. This is simply out of curiosity and should not guide policy decisions.

Based on a twitter query about start-up sizes, I found myself wondering whether the size of a professor's start up package has a measurable effect on their subsequent grant writing success. In particular, do people who get larger start-up packages then get more money, representing a larger return on the larger investment? I designed a brief 5-question survey on google forms, advertised it on twitter, and got 65 responses. This blog post is a brief summary of the results, which surprised me only somewhat.

First off, a brief summary of who replied:

I then wondered whether initial start-up package size depends on gender or the university type, and found a clear and expected result: R1 universities have larger start-up packages. Encouragingly, with this small self-reported sample size there was no sign of a gender bias:

I'd show you the statistical results, but they just obviously match the visuals above.
Subject matter had no significant effect on start-up package size (mostly ecology versus evolution).

Now for the big reveal: does initial start up package size matter for later grant income?

Using the first five years as a focus, the answer is...
no, once you account for university type.

Black dots in the figure are R1 universities, green are non-R1 universities, yellow are private colleges, blue is other. There's no significant trend within either of the well-represented categories (R1, non-R1). If we do a single model with grant income as a function of university type and start-up, only university type is significant.  The pattern after 10 years is even stranger:

In both cases it is certainly clear that (1) there's a lot of noise, and (2) people who get start-up packages in excess of about 400,000 do seem to have an advantage at getting more grant money. After 10 years people who got more than 400,000 in start-up all had at least 2 million in grant intake. That seems like a good return on investment. But more is not all better: the biggest grant recipients were middle-of-the-road start-up recipients. Note that gender had no detectable effect in the trend above, but sample sizes were low and the data self-reported.

Briefly, here's my own story: In 2003 I negotiated a position with the University of Texas at Austin. They gave me just shy of $300,000 (might have been 275,000), plus renovation costs. Before even arriving at UT I already had a >$300,000 NSF grant. Within 5 years of arriving I also had received an $800,000 Packard Foundation grant, and a position at HHMI whose total value exceeded $3,000,000. By the time I left UT, I had obtained five additional NSF grants and an NIH R01 and had pulled in somewhere in excess of $10 million cumulatively. I recognize that I have been quite lucky in funding over the years. My point though is that I was able to achieve that leveraging relatively little start-up compared even with many of my peers at the time. That anecdotal experience is confirmed, tentatively, by this survey, which finds that lots of people end up being quite successful with relatively small start-ups. The data seem to suggest that above a certain start-up package size, universities see little additional benefit. It is essential to recognize, however, that this is a small sample size with questionable data and poor distribution (a few days via twitter). So, this should not guide policy. But, it does make me wonder: surely someone has done a proper job of this analysis?

Thursday, September 19, 2019



Part 2 of a series on choosing a research topic.

One of my favorite songs in college was by Natalie Merchant:

Climbing under 
A barbed wire fence 
By the railroad ties 
Climbing over 
The old stone wall 
I am bound for the riverside 
Well I go to the river 
To soothe my mind 
Ponder over 
The crazy days of my life 
Just sit and watch the river flow 

It still brings me joy to hear the song, and more and more it feels like its about my research process. Academics is so hectic: teach, write proposals, publish, committee meetings, editing... where does one find time to contemplate, to let ideas bubble up and mature? Personally, I find that some of the most valuable time is sitting by water, hence my continued love of the song above.

Which leads me to the question, where do ideas come from? What can you do to generate ideas?

Suggestion 1: Go Forth And Observe: I used to teach a class on Research Methods at UT Austin, aimed at future K-12 science teachers. The students had to come up with their own questions in any area remotely science-like, and do a project. Four projects, actually, of increasing complexity. And finding questions was always hard. So we walked them through an exercise: everyone blew up balloons, then all at once we popped our balloons. It wakes them up. As an aside: if you do this, check that nobody has a latex allergy, PTSD that might be triggered by the sound [we had a veteran in the class once and learned this], and don't hold the balloons close to eyes or they can tear corneas. Then, everyone wrote down five observations: it split into 3 pieces, they are different sizes, it was loud, the inside of the balloon is damp, there are little ripples on the edge of the tear. Then, convert those into questions. Then we discussed which questions were testable in the classical sense (yes/no answers), quantitative (numbers are the answer), and which were untestable. We'd talk about how a well phrased question clearly points you towards what you should do to answer it. And about how poorly phrased or vague questions (why did it make a sound) can be broken down into testable and more specific sub-questions. Its a great exercise, not least because my co-instructor Michael Marder, a physicist, had actually spent two decades working on the physics of that sinusoidal ripple at the margin of the torn rubber (inspired by noticing this at a child's birthday party), and discovered it has applications to predicting how earthquake cracks propagate through the earth's crust. So, students could see how a mundane thing like a balloon can lead to big science.

You can do the balloon exercise, or something like it that's more biological: go out in the woods, or snorkle in a stream or ocean. Watch the animals around you. Visit the greenhouse. Write down observations, and turn them into questions. Write down 50. There's got to be a good one in there somewhere, right?

Suggestion 2: Don't take the first idea or question that you can do. The exercise described above will almost surely lead you to a question that you can answer. But, is it the question that you SHOULD answer? Will other people care about it? If so, why?  There's this idea in economics of "opportunity cost". Sure, writing this blog is valuable. But it is taking time I could otherwise be spending on revising that manuscript for Ecology Letters, or writing lectures for my Evolutionary Medicine class, or preparing my lecture for a fast-approaching trip to Bern and Uppsala. Is this blog the best thing I could be doing with this hour of my day? Choosing a research project is even more prone to opportunity costs: you are embarking on a project that may take you six months, a year, or five years. Sure, you can do it, and you can publish some results. But is it the best and most impactful (variously defined) thing you can do with that time? In general I bet that the first idea that crosses your mind isn't the best idea you'll have. Personally, I had two ideas for research when I first entered grad school, then I went through a series of maybe 6 ideas over the course of my first year, and only landed on my actual project in early fall of my second year. The other ideas weren't bad, just not as exciting to me (and, I think, to others).
Opportuity cost, by SMBC Comics' Zach Weinersmith

Suggestion 3: Don't get stuck on local optima. I love to think of self-education in a research field as a Bayesian Monte-Carlo Markov Chain search on an intellectual landscape. Search widely, visit many different topics and ideas and questions. The ones that you keep coming back to, and spend the most time on, are probably a good indicator of a high posterior probability for your future research. But, again, if you start on an actual project too soon, you limit your ability to explore that intellectual landscape by slowing your search rate and might falsely conclude you are on a great peak for an idea, when really you've just stopped making those long jumps to new places in the intellectual landscape of relevant topics.

Suggestion 4: Know your history There are vast number of ideas, and study systems, stashed away in the literature, going back decades and beyond. As a mid-stage graduate student, I read Ernst Mayr's Animal Species and Evolution, and I was struck by how many hundreds of study systems were left lying by the wayside because someone lost interest, retired, left academia, or whatever. The questions were never fully resolved, just left partly answered. There are so many great ideas and systems waiting for your attention. And the great thing is, when pitching an idea to readers or grant reviewers, they tend to love to see the historical context: it helps to justify in their own mind that this is something interesting, if it is a topic people have been thinking of for a long time. Also, knowing your history helps you avoid repeating it. Being scooped by a contemporary is frustrating, but being scooped by somebody 40 years ago because you didn't know it was done already, that's worse.
Ernst Mayr

Suggestion 5: Read theory. A lot of evolution and ecology students are wary of mathematical theory. That's unfortunate, because it means you are cutting yourself off from a major fountain of inspiration. Learn to read theory, including the equations. It is absolutely worthwhile. Here's why. From my viewpoint, theory does a lot of things that an empiricist should pay attention to. 

First, it makes our thinking more rigorous. For example, it is intuitive to think that co-evolution between host and parasite can lead to frequency-dependent cycles where the host evolves resistance to parasite A, so parasite evolves phenotype B, then hosts evolve resistance to B but thereby become susceptible to A again, so the parasites switch back. Cyclical evolution, maintenance of genetic variation in both players. Sure, its possible, but by writing out the math theoreticians identified all sorts of requirements that we might have overlooked in our verbal model. This cyclical dynamic is harder to get that we might think, an the math helps us avoid falling into a trap of sloppy thinking that leads us down a blind alley. 

Second, and related, the math identifies assumptions that we might not realize we were making. Is assortative mating during sympatric speciation based on a magic trait that affects both mating and adaptation, or a sexual-signalling trait unrelated to adaptation? Do individuals really tend to compete for food more strongly with phenotypically similar members of their population? When writing out theory, those assumptions are often brought into the light of day (though sometimes theoreticians are unclear about them, making implicit assumptions too). These assumptions are often things we empiricists don't know much about. How strongly do females prefer phenotypically  like males within a panmictic population? I didn't know. How many stickleback males does a searching female visit before settling on a mate? No idea... Theory brought my attention to these assumptions, and they become something I can go and measure. So, the assumptions underlying the equations are an opportunity for empirical investigation, with a ready-made justification: "Theory assumes X, so we need to know if/when/where/how often this is biologically valid".

Third and hardest, theory makes predictions: if X is true, then Y should happen. These predictions can, in princple, be tested. But beware: If the entire set of assumptions X are true, then the math argues that Y is inevitable. Is it really worth testing, then? If you don't know that all features of X are true, then the theory no longer guarantees Y. If you fail to demonstrate Y, arguably you weren't actually testing the theory.

Suggestion 6: P-hack and prepare to be surprised. Having read theory, read the literature, and been observant, go back out and do something. Do a little experiment, start a little observational pilot study, just get some data. Now, do something everyone tells you not to: P-hack it. Analyze the data in many possible ways, look for relationships that you might not have identified a priori. Sure, this can lead to false positives. A lot of people argue strongly against unguided post-hoc data analysis for this reason. But we aren't at the stage yet of publishing, this is exploration, an information-finding foray. Here's a concrete example: most stickleback biologists like myself have long treated each lake as a single genetic population and assumed it is well-mixed in terms of genotypes and phenotypes (except in a few lakes with 2 species present). This has practical consequences. This past summer I watched colleagues throw 10 traps into a lake, along a mere 10 meter stretch of shoreline, then take the first trap out and it has >100 fish, so we use them and release the fish from the other 9 traps. BAD IDEA. It turns out, we now know, there is a lot of trap-to-trap variation in morphology and size and diet and genotype that arises from microgeographic variation within lakes. Here's how I got clued into this. A graduate student of mine, Chad Brock, hand-collected ~30 nesting male stickleback from each of 15 lakes in British Columbia, and immediately did spectroscopy to measure color wavelength reflectance on each male. He also happened to note the substrate, depth, and so on, of the male's nest. Six months later, back in Texas, he P-hacked, and noticed that in the first lake he was examining intensively, male color covaried with nest depth: males 0.5 meters deep were redder and males 1.5 meters deep (just meters away horizontally) were bluer. The different-colored males were within maybe 10 seconds' swimming distance of each other. This clued us in to the fact that something interesting might be going on, and we later confirmed this pattern in 10 other lakes, replicated it across years, and ultimately replicated it experimentally as well. I'm not here to tell you about our male color work though. The key point is, theory would have told me to never expect trait variation among individuals at this spatial scale, because gene flow should homogenize mobile animals at this spatial scale. But it doesn't, apparently. Here's a case where theory puts blinders on us, telling us to not bother looking for microgeographic variation. Then, when we P-hacked we were surprised and ultimately cracked open what turns out to be a very general phenomenon that we might otherwise have overlooked. 

(a caveat: P-hacking should't be the end-game, and if you do try that, please at least be totally up front when you write about which analyses are predetermined, and which (and how many) were post-hoc analyses).

Suggestion 7: Have a portfolio. In financial investment theory, it is always recommended that you invest in a portfolio. Some investments (e.g., stocks of start-ups) have the potential to go sky-high, but also the potential to crash out entirely. Other investments are solid safe bets with little risk. If you put all your money in the former, you will either be spectacularly wealthy or lose everything. If you put all your money in the latter, you are guaranteed to have some savings in the future, but maybe just keeping up with inflation. The recommendation, therefore, is to have a portfolio that mixes these alternatives. The same is true in research. There are projects you can do that would be super-cool if they suceeded and gave you a particular answer. They'd make you famous, get you that Nobel Prize your mom has been pestering you about. But, either it might not work at all, or perhaps a negative result is just uninterpretable or uninteresting. High potential reward, high risk.  Or, you could go to the nearest 10 populations of your favorite organism, and do some sequencing and build a phylogenetic tree or a phylogeographic model. Guaranteed to work, not very exciting. Low reward, no risk. Pick some of each to work on, and be aware which is which.

Note also that in economics the optimal ratio of risky to safe investments shifts with time: as you age, you have less time before retirement to recover from a crash, so you want to shift your investments increasingly into the safe category. In science I'd say the opposite is true. A consequence of the tenure system is that once people get tenure, they become less risk-averse, more likely to shoot the moon (a card game reference, not a ballistic one) for that wildly risky but high-reward idea. As a grad student, though, if you want to end up at an R1 university (disclaimer, other careers are great too!) don't get sucked into a safe-bet-only philosophy, because it probably won't make the splash you need to be recognized and excite people.

Suggestion 8: Have a toolbox. Whatever question you pick, you'll need a toolkit of skills to answer it. Bioinformatics. Bayesian hierarchical modeling. ABC. Next generation sequencing. GIS. CRISPR. These skills are "just tools". But, sometimes academic departments choose to hire faculty who can bring in skill sets that existing faculty lack (e.g., so we can have you help us analyze the data we collected but don't really know how to use). And, those "just tools" are often highly sought-after by industry. So, if you are thinking of moving into NGOs, or the private sector, often the skills you gain along the way turn out to be far more valuable for landing a job, than the splashy journal article you published.

Suggestion 9: Don't be dissuaded. Here's the riskiest advice yet. If you have a truly transformative idea, don't be dissuaded by nay-sayers. There will be people on your PhD committee, or among your colleagues and peers, who think you are full of $#it, on the wrong track, or it just won't be feasible.  Listen to them. And defend yourself, rather than just abandoning your idea. Sure, you might be wrong. But, they might be wrong too. A personal example. Tom Schoener was on my PhD committee. I was intimidated by him, he was so foundational to ecology, so smart, so prolific. So when I presented my research plan, I was initially dismayed by his response. My ideas on disruptive selection and competition depended on the assumption that individuals within a population eat different foods from each other. So, whoever eats commonly-used foods competes strongly, whoever eats rarely-used foods escapes competition, and voila, you have disruptive selection. Tom, however, pointed to a series of theoretical papers from the 1980s by Taper and Case, and by Rougharden, to argue that selection should ultimately get rid of among-individual diet variation. Therefore, Tom said, most natural populations should be ecologically homogenous: every individual eating pretty much the same thing as every other individual if they happen to encounter it. But, that didn't jive with my reading of the fish literature. So, I assembled a group of fellow graduate students (as yet uncontaminated by preconceptions on the topic) and we did a review / meta-analysis of diet variation within populations. In a sense, I did it just to prove to myself, and to Tom Schoener, that the real core of my dissertation wasn't a wild goose chase. The resulting paper has turned out to be my most-cited article by far (Bolnick et al 2003 American Naturalist). And I did it to prove a PhD committee member wrong, on a minor point of disagreement. To be clear: Tom loved that paper and assigns it in his ecology graduate course, and we get along great. But the point is, your committee members and peers have both accumulated wisdom that you should draw on, but also have preconceptions and biases that may be wrong. Defend your ideas, and if you are able to, you might really be on to something.
Tom Schoener


Part 1 of a series on choosing your research topic


I might be guilty of stereotyping here, but I suspect relatively few readers of this blog would consider themselves fashion-conscious. Do you go to fashion shows? Regularly read fashion magazines? Discard last month's clothes in favor of the latest trends? That's not something I normally associate with the crunchy-granola environmentally-conscious caricature of an evolutionary ecologist. [if you do, my apologies for stereotyping]

But we do follow fashions in our own way. Science too has its academic fashions, and in particular I'm thinking of fads in research topics (see Fads in Ecology by Abrahamson Whitham and Price, 1989 Bioscience). My goal today is to contemplate the role of fashions, for good and ill, and what you should do about them when planning your own research. This post is inspired by a discussion I co-led yesterday with Janine Caira, with our first year Ecology and Evolutionary Biology graduate students at the University of Connecticut. The focal topic was, "How to choose a good research question".

A core rule I tell students is: when choosing a research topic you must have an audience in mind. Who will want to read your resulting paper? How large is that audience, and how excited will they be? If the audience is small (e.g., researchers studying the same species as you), you aren't going to gain the recognition (citations, speaking invitations, collaboration requests) you likely crave and which will help your career progress. If your audience is large, but you are doing incremental work that will be met with a widespread yawn, that's not very helpful either. Ideally of course you want to present something that is really exciting to as many people as possible. But, the more exciting and popular it is, the more likely it is somebody has gotten there first.

Which is what brings me to fads. A fad is defined (in Google's Dictionary) as "an intense and widely shared enthusiasm for something, especially one that is short-lived and without basis in the object's qualities; a craze". Intense. Widely-shared. And with at least a hint of irrational exuberance (a reference to former Federal Reserve Chairman, Alan Greenspan).

Fads happen in science, with a caveat that they aren't always irrational exuberance: there are research topics that genuinely have value, but which nevertheless have a limited lifespan. I'll give an example: When I was a beginning graduate student, Dolph Schluter [for whom I have immense respect] had recently started publishing a series of papers on ecological speciation, along with his Ecology of Adaptive Radiations book which I heartily recommend. The core innovation here was that ecology plays a role in (1) driving trait divergence between populations that leads incidentally to mating isolation, and (2) eliminating poorly-adapted hybrids. Both ideas can be found in the literature of course, few ideas are truly 100% new. But what Dolph did was to crystallize the idea in a simple term, clearly explained, and solidly justified with data, making it compelling. And suddenly everyone wanted to study ecological speciation, it seemed to me. There was a rapid rise in publications (and reviews) on the topic. Then at a certain point it seemed like fatigue set in.  I began encountering more conversations that were skeptical: how often ecological speciation might fail to occur, where and why is it absent, how common is it really. At one point, an applicant for a postdoc position in my lab said he/she wanted to work on ecological speciation and I couldn't help wondering, okay that's interesting material but what do you have to say that's new, or is this yet another case study in our growing stockpile of examples? And I think I wasn't alone: the number of papers and conference talks on the topic seemed to wane. Its not that the subject was misled, wrong, or uninteresting: I'm not saying it was irrational exuberance. Just that the low hanging (and medium-hanging) fruit had been picked, and people seemed to move on. To drive that point home, below is a Web of Science graph of the peak and maybe slight decline in the number of publications per year invoking "ecological speciation" in a topic word search. Interestingly, total citations to articles about "ecological speciation" peaked just three years ago, after a steady rise, and the past two years showed somewhat lower total citations to the topic.
Ecological speciation articles by year

Meanwhile, other topics seem to be on the rise, such as "speciation continuum" (next bar chart), which Andrew Hendry, Katie Peichel, and I were the first to use in a paper title in 2009 (it showed up in sentences in 2 prior papers) and was the topic of a session at the recent Gordon Conference on Speciation [still not anywhere near a fad, just 72 papers use the term, and there are reasons to argue it shouldn't catch on]
Speciation continuum
And of course "eco-evolutionary dynamics" and its permutations are fast-rising and very popular these days:
Eco-evolutionary dynamics, total citations

Life cycle of a scientific fad:

1) Birth: someone either has a great new idea, or effectively re-brands an old idea in a way that makes it catch on. Sometimes an old idea gets new life with a clever experiment or model (e.g., both reinforcement and sympatric speciation were old ideas, that caught fire in the early 1990's and late 1990's respectively after new data or theory rekindled the topics). The simplest and least valuable way to start a new fad is re-branding. Don't do this, it sometimes works but really annoys people. Take a familiar idea that's been in the literature for ages and give it a name, or rename it, and pretend it's an innovative concept.
2) The sales pitch. For the idea to become a fad, someone needs to really hit the streets (or, printed pages) and sell the idea. Giving lots of talks, writing theory/empirical/data papers in journals where the idea is seen.

3) People get excited, and start thinking what they can do to contribute. There's a lag here, where the idea spreads slowly at first, then accelerates as people start to find the time to run models and write papers. For empiricists, there's a lag while people design experiments, get funding, do the experiments, analyze and write. This takes years, and doesn't all come out in one burst, so there's an exponential growth phase. This is a good time to get in on the topic. Personally, as a second year graduate student I read the Dieckmann & Doebeli 1999 and Kondrashov & Kondrashov 1999 Nature papers on theory of sympatric speciation, and immediately started designing lab and field experiments to test their model assumptions about disruptive selection and assortative mating, work that I started publishing in 2001, peaked around the mid-2000's, and touched on only occasionally since then. In short, I was part of the rising initial tide, after their theoretical advance rekindled the topic. In the graph below on "sympatric speciation" papers you can see an uptick after the 1993 paper by Schliewen et al on Cameroon crater lake cichlids, and again an acceleration after 1999 theory papers. I came in right in the middle of the wave, and published my AREES paper with Ben Fitzpatrick in 2007, right as it crested and soon began to fall off again.
Sympatric speciation

4)  Fads don't go away entirely, usually. Both Ecological Speciation and Sympatric Speciation, for example, declined slightly after their peaks (see graphs above), but are very much still with us. Because they have value. But the initial excitement has passed, the honeymoon is over.

5) Fall from favor. At some point, it becomes increasingly hard to say something creative and new about a topic. Not impossible, mind you. And so grant reviewers and journal editors become increasingly skeptical. Journals that favor innovative and flashy results get harder to publish in. I hit this, sort of, when I briefly toyed with gut microbiome research: we studied how natural variation in diet among individuals affected the gut microbiome. Science reviewed it, and the Editor was enthusiastic but wanted some more manipulative experiments to prove a core claim of ours in a controlled setting. It took a year (of postdoc salary, time, and $10,000 in sequencing) to get the data the Editor asked for. It confirmed our initial claim, beautifully. But in the intervening year, gut microbiome research had become increasingly saturated. To get a Science paper you now needed molecular mechanisms, not just documenting that phenomena occur. The same Editor who had expressed enthusiasm before, now said it wasn't interesting enough. I'm not complaining (too much), but use this to point out that when you hit a fad at its crest, standards of publication become more stringent and its harder to impress or surprise.

6) Rebirth. Some fads come in waves. Think Bell Bottoms. Or jazz swing-dancing. But I'm wrestling with finding a good example. Lamarckian evolution seems a safe one, or even sympatric speciation which in the 1960's Ernst Mayr said was dead, but like the Lernean hydra, it would grow new heads again (which it did).

Avoid or embrace the fad?

Given that fads exist, what should you do about them? On the one hand, they represent a ready-made audience. This is the hot topic of the day, and publishing in that area will surely draw many readers to your work, right? Perhaps. That depends on when you are coming in on the fad. Here are some options:

1) Start a new fad. Come up with an idea so brilliant and widely appealing that many people pile on and build on your work. This is a guaranteed ticket to fame, if not fortune. Of course, it rarely happens and you have to have some combination of exceptional brilliance and luck and good salesmanship. So, don't bank on this approach: a lot of attempted new fads quickly become failed fads (see photo below). 

2) Catch the wave: Contribute to a fad in its early days. This requires reading the current literature very closely and widely, and acting quickly on great new ideas as they appear in print (or, in conference talks, etc). You still need a good intuition for what people will find exciting in your field, but less initial creativity than option (1). This is more or less where I came into the sympatric speciation field, with a couple of somewhat skeptical theory papers, and some somewhat supportive lab and field experiments on disruptive selection. 

3) As a fad nears its peak, the audience is now very large, but truly new ideas are becoming more and more scarce. Still, there are still usually new directions you can take it. Sure we know X and Y are true, but what about Z? Be careful though: as fads near their peak, your audience starts to experience some fatigue with the topic and are more likely to say, "oh, its another paper on gene expression, yawn". Might be a good time to avoid. Or, do a meta-analysis or review that synthesizes the topic, wrapping it all up in an easily accessible package.

4) Be contrarian.  Sure, this fad thing exists. But how common is it? How strong is its effect size relative to other things we might get excited by? Might we be over-interpreting the evidence or being too simplistic? One of the reasons fads go away, is that people shift from being excited that a phenomenon even happens, to taking a more measured quantitative and objective view. Sure, there's parallel evolution, but are we just cherry-picking extreme cases and ignoring the bulk of traits and situations where evolution is less parallel? 

5) Merge fads. There used to be these TV advertisements for Reeses Peanut-butter Cups. Two people walking down the street, one eating peanut butter with a spoon (really??? who does this?), the other eating a bar of chocolate. They collide, and discover their combined food is so much better than either alone. Some great scientific papers are like Reeses Peanut-butter cups. They take two familiar subjects and merge then in an unfamiliar way. Two fads put together can make a super-fad. 

6) Revive old fads (zombie ideas). Old fads never truly die, they just hide away in a quiet steady tick of more papers that aren't making a big splash anymore perhaps. The key thing is, their audience never truly went away, they just reached a point where they moved on. But like many failed relationships, you often never truly stop loving your ex. So, if you can locate a former fad and give it new life, you have a ready-made audience and a small field of competitors. This is especially easy to do when a previous fad ran out of steam because people in the old days lacked analytical tools that we have now: sequencers or flow cytometers or Bayesian statistics or whatever. If you can apply modern lab or computational technology to an old fad, you might make fundamental new progress, on a widely-known topic. Doing this requires reading your history, to know where the good zombies are buried. When I was a graduate student, I spent a summer reading Ernst Mayr's Animal Species and Evolution. Its a seriously dry book, packed to the gills with case studies and examples, and ideas. Many of these were abandoned, for various reasons, and are just waiting around to be exhumed, re-examined in light of new perspectives and tools, and maybe re-animated.

I'm sure there are more variants on this theme, but I think the point is made:  fads are a great way to make your name in academic science. They are also a trap, if you hop on the band wagon just as it goes over the cliff into obscurity. To know which is which, you need to read read and read, and go to conferences and talk and listen, to get a sense for the pulse of your field.

Now, your turn: 

What do you see as passed or passing fads in your field? How can we know if something is a fad-to-be and get in on it early?

Tuesday, September 17, 2019

How to make rational inferences from data

This post is motivated by the paralysis that many students encounter when attempting to fit a model to their data, typically in R. I have long been frustrated by how this process sometimes turns thinking students that are seeking new ideas into desperate technicians that are seeking engineering solutions. However, even more recently, I have become concerned by the counter-productive self-questioning hand-wringing that so many students encounter during this process – to the point that they sometimes don’t believe their own data and start to second-guess their biological questions and experiments and outcomes. Hence, I have here written a “take a step back” approach to inference where only the last 5% of the process is worrying about generating a P value or AIC difference or the equivalent, thus leaving the other 95% for thinking!

A.     TURN OFF YOUR COMPUTER. Don’t look at your data. Get out a pen and paper – or, better yet, a white board. Invite your lab mates over – maybe even your supervisor. Then proceed with the following steps. Of course, it is useful to do all of this before designing your study, but the realities of field data collection can mean that the data you end up dictate the need to redo the following steps after data collection – but before analysis.

1.      Decide – based on your question/hypothesis/prediction what the published “money plot” should be - the plot that will "get you paid"! That is, what graphical representation of the data will convey to the reader the specific answer to your biological question. Draw this figure or figures and indicate what interpretation you will draw from a given pattern. An example might be an x-y plot where a positive correlation would mean one thing, no correlation would mean another, and a positive correlation would mean something else again. Don’t just imagine various ways to plot your data; instead specifically design the plot(s) that will convey to the reader the answer to the question. You should be able to point to a specific pattern that would result in a specific conclusions directly relevant to your specific conclusion.

2.      Decide how you will interpret a given effect size. For instance, if you are looking for a positive correlation coefficient between x and y, then perhaps you will compare that coefficient to a meta-analysis showing a distribution of similarly-obtained correlation coefficients. Or, to what other correlation between variables will you compare your target correlation – that is, can you define a non-causal variable that you can plot your y-axis data against – a “control” correlation that should show no true effect? Determining effect size and conveying its relatively importance to the reader will be the absolute key to rational and useful inference.

3.      Figure out your unit of replication when it comes specifically to your questions and the money plot you intend to generate. In one sense, this point might be re-stated as “don’t pseudoreplicate”, which might seem obvious but – in practice – can be confusing; or, at the least, misapplied. If, for example, your question is to what extent populations or species show parallel evolution in response to a given environmental gradient, then your unit of replication for inference is the number of populations, not the number of individuals. If you have two experimental treatments that were each imposed on five experimental tanks – those tanks become your unit of replication.

4.      Decide what your fixed and random effects are. Fixed effects are factors for which you are interested in making inferences about differences between the specific levels within the factor. Random effects are, in essence, factors where the different levels are a conceptually-random selection of replicates. Random effects are things for which you can make an inference about the overall factor (e.g., different populations have different values) but not the individual levels of that factor (you would not, with a random effect, say “population A differed from population B but not population C”). Those sorts of direct among-level comparisons are not relevant to a random effect.

B.     TURN ON YOUR COMPUTER AND OPEN A DATABASE AND GRAPHING PROGRAM. Excel, or something like that, is ideal here. If you are very comfortable in R already, then go ahead and use that but, importantly, do not open any module that will do anything other than plot data. Don’t attempt to fit any inferential models. Don’t attempt to statistically infer fits to distribution or specific outliers. Don’t generate any P values or AIC values or BIC values or log-likelihoods, etc. You are going to use your eye and your brain only! Now proceed with the following steps.

5.      Plot your data for outliers. Don’t use a program to identify them (not yet anyway) – use your eye. Look at data distributions and plot every variable against every other variable. Extreme outliers are obvious and are typically errors. These must be fixed or removed – or they will poison downstream analyses. Some errors can be easily identified and corrected by reference to original data sheets or other sources of information. If dramatic outliers cannot be fixed, delete the entire record from the dataset. Note: Don’t change or delete data just because they are contradictory to your hypotheses – the examination suggested here is hypothesis free.

6.      Decide which of covariates you need to consider. If you are measuring organisms, these covariates an obvious example is body size. If you are doing experiments or observations, other examples include temperature or moisture. These covariates are things NOT directly related to your question but are instead things that might get between your data and your inference. Plot your data against these covariates to see if you need to consider them when making initial inferences from your data. It is very important to evaluate covariates within each level that you have in your data. For instance, you need to know whether body size is influencing your measured trait WITHIN each population or treatment not across ALL data pooled.

7.      Plot your data in a fashion as close as possible to the money-plot you previously designed. If you have important covariates, make sure to correct for them as necessary. For instance, you can add an axis to your money plot that allows you to assess the key result across the range of body sizes. Make sure that your plot does not have unequal representation of experimental units (e.g., numbers of fish in different tanks) within a given level of your treatment. Otherwise, you might get tricked by one over-represented unit that has an anomalous result. This point is obviously related to the above comment about determining your unit of replication.

8.      Look at your plot and draw your inference. Does the (for example) line go in the direction you predicted? How steep is that line – that is, the effect size? How big is the difference between your control and your treatment in relation to the variation in each group (at the correct level of replication)? How does that result qualitatively compare to previous work – so that you have some idea of the relative importance of the effect you have (or have not) uncovered.

OK YOU ARE DONE. Congratulations. You know the answer. Write up your paper, defend your thesis, get a postdoc, get a job, get tenure, retire, and give your Nobel lecture.

Well, I suppose there is one more thing you should do – but, really, you are 95% done here in most cases. What you see in the data is the reality of the situation and you have interpreted it in light of previous work. Your eye is really really good at this stuff. The one small thing left to do is to figure out a way to state the level of confidence you have in the interpretation you have just drawn from the data. This minor thing is all that p values, AIC levels, BIC levels, confidence intervals, and so on are for. That is, the data are the real thing and you have drawn an interpretation from them – now all you need is a way of conveying to a reader how confident you are in that interpretation. I will make some suggestions in this regard, especially in relation to model fitting, in the next post.

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...