Wednesday, September 25, 2019

Does startup size predict subsequent grant success?

Warning: the following is a small sample size survey not conducted especially scientifically. This is simply out of curiosity and should not guide policy decisions.



Based on a twitter query about start-up sizes, I found myself wondering whether the size of a professor's start up package has a measurable effect on their subsequent grant writing success. In particular, do people who get larger start-up packages then get more money, representing a larger return on the larger investment? I designed a brief 5-question survey on google forms, advertised it on twitter, and got 65 responses. This blog post is a brief summary of the results, which surprised me only somewhat.

First off, a brief summary of who replied:




I then wondered whether initial start-up package size depends on gender or the university type, and found a clear and expected result: R1 universities have larger start-up packages. Encouragingly, with this small self-reported sample size there was no sign of a gender bias:




I'd show you the statistical results, but they just obviously match the visuals above.
Subject matter had no significant effect on start-up package size (mostly ecology versus evolution).


Now for the big reveal: does initial start up package size matter for later grant income?

Using the first five years as a focus, the answer is...
no, once you account for university type.



Black dots in the figure are R1 universities, green are non-R1 universities, yellow are private colleges, blue is other. There's no significant trend within either of the well-represented categories (R1, non-R1). If we do a single model with grant income as a function of university type and start-up, only university type is significant.  The pattern after 10 years is even stranger:


In both cases it is certainly clear that (1) there's a lot of noise, and (2) people who get start-up packages in excess of about 400,000 do seem to have an advantage at getting more grant money. After 10 years people who got more than 400,000 in start-up all had at least 2 million in grant intake. That seems like a good return on investment. But more is not all better: the biggest grant recipients were middle-of-the-road start-up recipients. Note that gender had no detectable effect in the trend above, but sample sizes were low and the data self-reported.


Briefly, here's my own story: In 2003 I negotiated a position with the University of Texas at Austin. They gave me just shy of $300,000 (might have been 275,000), plus renovation costs. Before even arriving at UT I already had a >$300,000 NSF grant. Within 5 years of arriving I also had received an $800,000 Packard Foundation grant, and a position at HHMI whose total value exceeded $3,000,000. By the time I left UT, I had obtained five additional NSF grants and an NIH R01 and had pulled in somewhere in excess of $10 million cumulatively. I recognize that I have been quite lucky in funding over the years. My point though is that I was able to achieve that leveraging relatively little start-up compared even with many of my peers at the time. That anecdotal experience is confirmed, tentatively, by this survey, which finds that lots of people end up being quite successful with relatively small start-ups. The data seem to suggest that above a certain start-up package size, universities see little additional benefit. It is essential to recognize, however, that this is a small sample size with questionable data and poor distribution (a few days via twitter). So, this should not guide policy. But, it does make me wonder: surely someone has done a proper job of this analysis?








Thursday, September 19, 2019

Inspiration

Inspiration

Part 2 of a series on choosing a research topic.

One of my favorite songs in college was by Natalie Merchant:

Climbing under 
A barbed wire fence 
By the railroad ties 
Climbing over 
The old stone wall 
I am bound for the riverside 
Well I go to the river 
To soothe my mind 
Ponder over 
The crazy days of my life 
Just sit and watch the river flow 

It still brings me joy to hear the song, and more and more it feels like its about my research process. Academics is so hectic: teach, write proposals, publish, committee meetings, editing... where does one find time to contemplate, to let ideas bubble up and mature? Personally, I find that some of the most valuable time is sitting by water, hence my continued love of the song above.

Which leads me to the question, where do ideas come from? What can you do to generate ideas?


Suggestion 1: Go Forth And Observe: I used to teach a class on Research Methods at UT Austin, aimed at future K-12 science teachers. The students had to come up with their own questions in any area remotely science-like, and do a project. Four projects, actually, of increasing complexity. And finding questions was always hard. So we walked them through an exercise: everyone blew up balloons, then all at once we popped our balloons. It wakes them up. As an aside: if you do this, check that nobody has a latex allergy, PTSD that might be triggered by the sound [we had a veteran in the class once and learned this], and don't hold the balloons close to eyes or they can tear corneas. Then, everyone wrote down five observations: it split into 3 pieces, they are different sizes, it was loud, the inside of the balloon is damp, there are little ripples on the edge of the tear. Then, convert those into questions. Then we discussed which questions were testable in the classical sense (yes/no answers), quantitative (numbers are the answer), and which were untestable. We'd talk about how a well phrased question clearly points you towards what you should do to answer it. And about how poorly phrased or vague questions (why did it make a sound) can be broken down into testable and more specific sub-questions. Its a great exercise, not least because my co-instructor Michael Marder, a physicist, had actually spent two decades working on the physics of that sinusoidal ripple at the margin of the torn rubber (inspired by noticing this at a child's birthday party), and discovered it has applications to predicting how earthquake cracks propagate through the earth's crust. So, students could see how a mundane thing like a balloon can lead to big science.

You can do the balloon exercise, or something like it that's more biological: go out in the woods, or snorkle in a stream or ocean. Watch the animals around you. Visit the greenhouse. Write down observations, and turn them into questions. Write down 50. There's got to be a good one in there somewhere, right?

Suggestion 2: Don't take the first idea or question that you can do. The exercise described above will almost surely lead you to a question that you can answer. But, is it the question that you SHOULD answer? Will other people care about it? If so, why?  There's this idea in economics of "opportunity cost". Sure, writing this blog is valuable. But it is taking time I could otherwise be spending on revising that manuscript for Ecology Letters, or writing lectures for my Evolutionary Medicine class, or preparing my lecture for a fast-approaching trip to Bern and Uppsala. Is this blog the best thing I could be doing with this hour of my day? Choosing a research project is even more prone to opportunity costs: you are embarking on a project that may take you six months, a year, or five years. Sure, you can do it, and you can publish some results. But is it the best and most impactful (variously defined) thing you can do with that time? In general I bet that the first idea that crosses your mind isn't the best idea you'll have. Personally, I had two ideas for research when I first entered grad school, then I went through a series of maybe 6 ideas over the course of my first year, and only landed on my actual project in early fall of my second year. The other ideas weren't bad, just not as exciting to me (and, I think, to others).
Opportuity cost, by SMBC Comics' Zach Weinersmith


Suggestion 3: Don't get stuck on local optima. I love to think of self-education in a research field as a Bayesian Monte-Carlo Markov Chain search on an intellectual landscape. Search widely, visit many different topics and ideas and questions. The ones that you keep coming back to, and spend the most time on, are probably a good indicator of a high posterior probability for your future research. But, again, if you start on an actual project too soon, you limit your ability to explore that intellectual landscape by slowing your search rate and might falsely conclude you are on a great peak for an idea, when really you've just stopped making those long jumps to new places in the intellectual landscape of relevant topics.

Suggestion 4: Know your history There are vast number of ideas, and study systems, stashed away in the literature, going back decades and beyond. As a mid-stage graduate student, I read Ernst Mayr's Animal Species and Evolution, and I was struck by how many hundreds of study systems were left lying by the wayside because someone lost interest, retired, left academia, or whatever. The questions were never fully resolved, just left partly answered. There are so many great ideas and systems waiting for your attention. And the great thing is, when pitching an idea to readers or grant reviewers, they tend to love to see the historical context: it helps to justify in their own mind that this is something interesting, if it is a topic people have been thinking of for a long time. Also, knowing your history helps you avoid repeating it. Being scooped by a contemporary is frustrating, but being scooped by somebody 40 years ago because you didn't know it was done already, that's worse.
Ernst Mayr

Suggestion 5: Read theory. A lot of evolution and ecology students are wary of mathematical theory. That's unfortunate, because it means you are cutting yourself off from a major fountain of inspiration. Learn to read theory, including the equations. It is absolutely worthwhile. Here's why. From my viewpoint, theory does a lot of things that an empiricist should pay attention to. 

First, it makes our thinking more rigorous. For example, it is intuitive to think that co-evolution between host and parasite can lead to frequency-dependent cycles where the host evolves resistance to parasite A, so parasite evolves phenotype B, then hosts evolve resistance to B but thereby become susceptible to A again, so the parasites switch back. Cyclical evolution, maintenance of genetic variation in both players. Sure, its possible, but by writing out the math theoreticians identified all sorts of requirements that we might have overlooked in our verbal model. This cyclical dynamic is harder to get that we might think, an the math helps us avoid falling into a trap of sloppy thinking that leads us down a blind alley. 

Second, and related, the math identifies assumptions that we might not realize we were making. Is assortative mating during sympatric speciation based on a magic trait that affects both mating and adaptation, or a sexual-signalling trait unrelated to adaptation? Do individuals really tend to compete for food more strongly with phenotypically similar members of their population? When writing out theory, those assumptions are often brought into the light of day (though sometimes theoreticians are unclear about them, making implicit assumptions too). These assumptions are often things we empiricists don't know much about. How strongly do females prefer phenotypically  like males within a panmictic population? I didn't know. How many stickleback males does a searching female visit before settling on a mate? No idea... Theory brought my attention to these assumptions, and they become something I can go and measure. So, the assumptions underlying the equations are an opportunity for empirical investigation, with a ready-made justification: "Theory assumes X, so we need to know if/when/where/how often this is biologically valid".

Third and hardest, theory makes predictions: if X is true, then Y should happen. These predictions can, in princple, be tested. But beware: If the entire set of assumptions X are true, then the math argues that Y is inevitable. Is it really worth testing, then? If you don't know that all features of X are true, then the theory no longer guarantees Y. If you fail to demonstrate Y, arguably you weren't actually testing the theory.



Suggestion 6: P-hack and prepare to be surprised. Having read theory, read the literature, and been observant, go back out and do something. Do a little experiment, start a little observational pilot study, just get some data. Now, do something everyone tells you not to: P-hack it. Analyze the data in many possible ways, look for relationships that you might not have identified a priori. Sure, this can lead to false positives. A lot of people argue strongly against unguided post-hoc data analysis for this reason. But we aren't at the stage yet of publishing, this is exploration, an information-finding foray. Here's a concrete example: most stickleback biologists like myself have long treated each lake as a single genetic population and assumed it is well-mixed in terms of genotypes and phenotypes (except in a few lakes with 2 species present). This has practical consequences. This past summer I watched colleagues throw 10 traps into a lake, along a mere 10 meter stretch of shoreline, then take the first trap out and it has >100 fish, so we use them and release the fish from the other 9 traps. BAD IDEA. It turns out, we now know, there is a lot of trap-to-trap variation in morphology and size and diet and genotype that arises from microgeographic variation within lakes. Here's how I got clued into this. A graduate student of mine, Chad Brock, hand-collected ~30 nesting male stickleback from each of 15 lakes in British Columbia, and immediately did spectroscopy to measure color wavelength reflectance on each male. He also happened to note the substrate, depth, and so on, of the male's nest. Six months later, back in Texas, he P-hacked, and noticed that in the first lake he was examining intensively, male color covaried with nest depth: males 0.5 meters deep were redder and males 1.5 meters deep (just meters away horizontally) were bluer. The different-colored males were within maybe 10 seconds' swimming distance of each other. This clued us in to the fact that something interesting might be going on, and we later confirmed this pattern in 10 other lakes, replicated it across years, and ultimately replicated it experimentally as well. I'm not here to tell you about our male color work though. The key point is, theory would have told me to never expect trait variation among individuals at this spatial scale, because gene flow should homogenize mobile animals at this spatial scale. But it doesn't, apparently. Here's a case where theory puts blinders on us, telling us to not bother looking for microgeographic variation. Then, when we P-hacked we were surprised and ultimately cracked open what turns out to be a very general phenomenon that we might otherwise have overlooked. 

(a caveat: P-hacking should't be the end-game, and if you do try that, please at least be totally up front when you write about which analyses are predetermined, and which (and how many) were post-hoc analyses).


Suggestion 7: Have a portfolio. In financial investment theory, it is always recommended that you invest in a portfolio. Some investments (e.g., stocks of start-ups) have the potential to go sky-high, but also the potential to crash out entirely. Other investments are solid safe bets with little risk. If you put all your money in the former, you will either be spectacularly wealthy or lose everything. If you put all your money in the latter, you are guaranteed to have some savings in the future, but maybe just keeping up with inflation. The recommendation, therefore, is to have a portfolio that mixes these alternatives. The same is true in research. There are projects you can do that would be super-cool if they suceeded and gave you a particular answer. They'd make you famous, get you that Nobel Prize your mom has been pestering you about. But, either it might not work at all, or perhaps a negative result is just uninterpretable or uninteresting. High potential reward, high risk.  Or, you could go to the nearest 10 populations of your favorite organism, and do some sequencing and build a phylogenetic tree or a phylogeographic model. Guaranteed to work, not very exciting. Low reward, no risk. Pick some of each to work on, and be aware which is which.

Note also that in economics the optimal ratio of risky to safe investments shifts with time: as you age, you have less time before retirement to recover from a crash, so you want to shift your investments increasingly into the safe category. In science I'd say the opposite is true. A consequence of the tenure system is that once people get tenure, they become less risk-averse, more likely to shoot the moon (a card game reference, not a ballistic one) for that wildly risky but high-reward idea. As a grad student, though, if you want to end up at an R1 university (disclaimer, other careers are great too!) don't get sucked into a safe-bet-only philosophy, because it probably won't make the splash you need to be recognized and excite people.

Suggestion 8: Have a toolbox. Whatever question you pick, you'll need a toolkit of skills to answer it. Bioinformatics. Bayesian hierarchical modeling. ABC. Next generation sequencing. GIS. CRISPR. These skills are "just tools". But, sometimes academic departments choose to hire faculty who can bring in skill sets that existing faculty lack (e.g., so we can have you help us analyze the data we collected but don't really know how to use). And, those "just tools" are often highly sought-after by industry. So, if you are thinking of moving into NGOs, or the private sector, often the skills you gain along the way turn out to be far more valuable for landing a job, than the splashy journal article you published.

Suggestion 9: Don't be dissuaded. Here's the riskiest advice yet. If you have a truly transformative idea, don't be dissuaded by nay-sayers. There will be people on your PhD committee, or among your colleagues and peers, who think you are full of $#it, on the wrong track, or it just won't be feasible.  Listen to them. And defend yourself, rather than just abandoning your idea. Sure, you might be wrong. But, they might be wrong too. A personal example. Tom Schoener was on my PhD committee. I was intimidated by him, he was so foundational to ecology, so smart, so prolific. So when I presented my research plan, I was initially dismayed by his response. My ideas on disruptive selection and competition depended on the assumption that individuals within a population eat different foods from each other. So, whoever eats commonly-used foods competes strongly, whoever eats rarely-used foods escapes competition, and voila, you have disruptive selection. Tom, however, pointed to a series of theoretical papers from the 1980s by Taper and Case, and by Rougharden, to argue that selection should ultimately get rid of among-individual diet variation. Therefore, Tom said, most natural populations should be ecologically homogenous: every individual eating pretty much the same thing as every other individual if they happen to encounter it. But, that didn't jive with my reading of the fish literature. So, I assembled a group of fellow graduate students (as yet uncontaminated by preconceptions on the topic) and we did a review / meta-analysis of diet variation within populations. In a sense, I did it just to prove to myself, and to Tom Schoener, that the real core of my dissertation wasn't a wild goose chase. The resulting paper has turned out to be my most-cited article by far (Bolnick et al 2003 American Naturalist). And I did it to prove a PhD committee member wrong, on a minor point of disagreement. To be clear: Tom loved that paper and assigns it in his ecology graduate course, and we get along great. But the point is, your committee members and peers have both accumulated wisdom that you should draw on, but also have preconceptions and biases that may be wrong. Defend your ideas, and if you are able to, you might really be on to something.
Tom Schoener


Fads

Part 1 of a series on choosing your research topic

FADS

I might be guilty of stereotyping here, but I suspect relatively few readers of this blog would consider themselves fashion-conscious. Do you go to fashion shows? Regularly read fashion magazines? Discard last month's clothes in favor of the latest trends? That's not something I normally associate with the crunchy-granola environmentally-conscious caricature of an evolutionary ecologist. [if you do, my apologies for stereotyping]












But we do follow fashions in our own way. Science too has its academic fashions, and in particular I'm thinking of fads in research topics (see Fads in Ecology by Abrahamson Whitham and Price, 1989 Bioscience). My goal today is to contemplate the role of fashions, for good and ill, and what you should do about them when planning your own research. This post is inspired by a discussion I co-led yesterday with Janine Caira, with our first year Ecology and Evolutionary Biology graduate students at the University of Connecticut. The focal topic was, "How to choose a good research question".

A core rule I tell students is: when choosing a research topic you must have an audience in mind. Who will want to read your resulting paper? How large is that audience, and how excited will they be? If the audience is small (e.g., researchers studying the same species as you), you aren't going to gain the recognition (citations, speaking invitations, collaboration requests) you likely crave and which will help your career progress. If your audience is large, but you are doing incremental work that will be met with a widespread yawn, that's not very helpful either. Ideally of course you want to present something that is really exciting to as many people as possible. But, the more exciting and popular it is, the more likely it is somebody has gotten there first.

Which is what brings me to fads. A fad is defined (in Google's Dictionary) as "an intense and widely shared enthusiasm for something, especially one that is short-lived and without basis in the object's qualities; a craze". Intense. Widely-shared. And with at least a hint of irrational exuberance (a reference to former Federal Reserve Chairman, Alan Greenspan).



Fads happen in science, with a caveat that they aren't always irrational exuberance: there are research topics that genuinely have value, but which nevertheless have a limited lifespan. I'll give an example: When I was a beginning graduate student, Dolph Schluter [for whom I have immense respect] had recently started publishing a series of papers on ecological speciation, along with his Ecology of Adaptive Radiations book which I heartily recommend. The core innovation here was that ecology plays a role in (1) driving trait divergence between populations that leads incidentally to mating isolation, and (2) eliminating poorly-adapted hybrids. Both ideas can be found in the literature of course, few ideas are truly 100% new. But what Dolph did was to crystallize the idea in a simple term, clearly explained, and solidly justified with data, making it compelling. And suddenly everyone wanted to study ecological speciation, it seemed to me. There was a rapid rise in publications (and reviews) on the topic. Then at a certain point it seemed like fatigue set in.  I began encountering more conversations that were skeptical: how often ecological speciation might fail to occur, where and why is it absent, how common is it really. At one point, an applicant for a postdoc position in my lab said he/she wanted to work on ecological speciation and I couldn't help wondering, okay that's interesting material but what do you have to say that's new, or is this yet another case study in our growing stockpile of examples? And I think I wasn't alone: the number of papers and conference talks on the topic seemed to wane. Its not that the subject was misled, wrong, or uninteresting: I'm not saying it was irrational exuberance. Just that the low hanging (and medium-hanging) fruit had been picked, and people seemed to move on. To drive that point home, below is a Web of Science graph of the peak and maybe slight decline in the number of publications per year invoking "ecological speciation" in a topic word search. Interestingly, total citations to articles about "ecological speciation" peaked just three years ago, after a steady rise, and the past two years showed somewhat lower total citations to the topic.
Ecological speciation articles by year

Meanwhile, other topics seem to be on the rise, such as "speciation continuum" (next bar chart), which Andrew Hendry, Katie Peichel, and I were the first to use in a paper title in 2009 (it showed up in sentences in 2 prior papers) and was the topic of a session at the recent Gordon Conference on Speciation [still not anywhere near a fad, just 72 papers use the term, and there are reasons to argue it shouldn't catch on]
Speciation continuum
And of course "eco-evolutionary dynamics" and its permutations are fast-rising and very popular these days:
Eco-evolutionary dynamics, total citations


Life cycle of a scientific fad:

1) Birth: someone either has a great new idea, or effectively re-brands an old idea in a way that makes it catch on. Sometimes an old idea gets new life with a clever experiment or model (e.g., both reinforcement and sympatric speciation were old ideas, that caught fire in the early 1990's and late 1990's respectively after new data or theory rekindled the topics). The simplest and least valuable way to start a new fad is re-branding. Don't do this, it sometimes works but really annoys people. Take a familiar idea that's been in the literature for ages and give it a name, or rename it, and pretend it's an innovative concept.
2) The sales pitch. For the idea to become a fad, someone needs to really hit the streets (or, printed pages) and sell the idea. Giving lots of talks, writing theory/empirical/data papers in journals where the idea is seen.

3) People get excited, and start thinking what they can do to contribute. There's a lag here, where the idea spreads slowly at first, then accelerates as people start to find the time to run models and write papers. For empiricists, there's a lag while people design experiments, get funding, do the experiments, analyze and write. This takes years, and doesn't all come out in one burst, so there's an exponential growth phase. This is a good time to get in on the topic. Personally, as a second year graduate student I read the Dieckmann & Doebeli 1999 and Kondrashov & Kondrashov 1999 Nature papers on theory of sympatric speciation, and immediately started designing lab and field experiments to test their model assumptions about disruptive selection and assortative mating, work that I started publishing in 2001, peaked around the mid-2000's, and touched on only occasionally since then. In short, I was part of the rising initial tide, after their theoretical advance rekindled the topic. In the graph below on "sympatric speciation" papers you can see an uptick after the 1993 paper by Schliewen et al on Cameroon crater lake cichlids, and again an acceleration after 1999 theory papers. I came in right in the middle of the wave, and published my AREES paper with Ben Fitzpatrick in 2007, right as it crested and soon began to fall off again.
Sympatric speciation


4)  Fads don't go away entirely, usually. Both Ecological Speciation and Sympatric Speciation, for example, declined slightly after their peaks (see graphs above), but are very much still with us. Because they have value. But the initial excitement has passed, the honeymoon is over.

5) Fall from favor. At some point, it becomes increasingly hard to say something creative and new about a topic. Not impossible, mind you. And so grant reviewers and journal editors become increasingly skeptical. Journals that favor innovative and flashy results get harder to publish in. I hit this, sort of, when I briefly toyed with gut microbiome research: we studied how natural variation in diet among individuals affected the gut microbiome. Science reviewed it, and the Editor was enthusiastic but wanted some more manipulative experiments to prove a core claim of ours in a controlled setting. It took a year (of postdoc salary, time, and $10,000 in sequencing) to get the data the Editor asked for. It confirmed our initial claim, beautifully. But in the intervening year, gut microbiome research had become increasingly saturated. To get a Science paper you now needed molecular mechanisms, not just documenting that phenomena occur. The same Editor who had expressed enthusiasm before, now said it wasn't interesting enough. I'm not complaining (too much), but use this to point out that when you hit a fad at its crest, standards of publication become more stringent and its harder to impress or surprise.

6) Rebirth. Some fads come in waves. Think Bell Bottoms. Or jazz swing-dancing. But I'm wrestling with finding a good example. Lamarckian evolution seems a safe one, or even sympatric speciation which in the 1960's Ernst Mayr said was dead, but like the Lernean hydra, it would grow new heads again (which it did).

Avoid or embrace the fad?

Given that fads exist, what should you do about them? On the one hand, they represent a ready-made audience. This is the hot topic of the day, and publishing in that area will surely draw many readers to your work, right? Perhaps. That depends on when you are coming in on the fad. Here are some options:

1) Start a new fad. Come up with an idea so brilliant and widely appealing that many people pile on and build on your work. This is a guaranteed ticket to fame, if not fortune. Of course, it rarely happens and you have to have some combination of exceptional brilliance and luck and good salesmanship. So, don't bank on this approach: a lot of attempted new fads quickly become failed fads (see photo below). 




2) Catch the wave: Contribute to a fad in its early days. This requires reading the current literature very closely and widely, and acting quickly on great new ideas as they appear in print (or, in conference talks, etc). You still need a good intuition for what people will find exciting in your field, but less initial creativity than option (1). This is more or less where I came into the sympatric speciation field, with a couple of somewhat skeptical theory papers, and some somewhat supportive lab and field experiments on disruptive selection. 



3) As a fad nears its peak, the audience is now very large, but truly new ideas are becoming more and more scarce. Still, there are still usually new directions you can take it. Sure we know X and Y are true, but what about Z? Be careful though: as fads near their peak, your audience starts to experience some fatigue with the topic and are more likely to say, "oh, its another paper on gene expression, yawn". Might be a good time to avoid. Or, do a meta-analysis or review that synthesizes the topic, wrapping it all up in an easily accessible package.



4) Be contrarian.  Sure, this fad thing exists. But how common is it? How strong is its effect size relative to other things we might get excited by? Might we be over-interpreting the evidence or being too simplistic? One of the reasons fads go away, is that people shift from being excited that a phenomenon even happens, to taking a more measured quantitative and objective view. Sure, there's parallel evolution, but are we just cherry-picking extreme cases and ignoring the bulk of traits and situations where evolution is less parallel? 



5) Merge fads. There used to be these TV advertisements for Reeses Peanut-butter Cups. Two people walking down the street, one eating peanut butter with a spoon (really??? who does this?), the other eating a bar of chocolate. They collide, and discover their combined food is so much better than either alone. Some great scientific papers are like Reeses Peanut-butter cups. They take two familiar subjects and merge then in an unfamiliar way. Two fads put together can make a super-fad. 
















6) Revive old fads (zombie ideas). Old fads never truly die, they just hide away in a quiet steady tick of more papers that aren't making a big splash anymore perhaps. The key thing is, their audience never truly went away, they just reached a point where they moved on. But like many failed relationships, you often never truly stop loving your ex. So, if you can locate a former fad and give it new life, you have a ready-made audience and a small field of competitors. This is especially easy to do when a previous fad ran out of steam because people in the old days lacked analytical tools that we have now: sequencers or flow cytometers or Bayesian statistics or whatever. If you can apply modern lab or computational technology to an old fad, you might make fundamental new progress, on a widely-known topic. Doing this requires reading your history, to know where the good zombies are buried. When I was a graduate student, I spent a summer reading Ernst Mayr's Animal Species and Evolution. Its a seriously dry book, packed to the gills with case studies and examples, and ideas. Many of these were abandoned, for various reasons, and are just waiting around to be exhumed, re-examined in light of new perspectives and tools, and maybe re-animated.



I'm sure there are more variants on this theme, but I think the point is made:  fads are a great way to make your name in academic science. They are also a trap, if you hop on the band wagon just as it goes over the cliff into obscurity. To know which is which, you need to read read and read, and go to conferences and talk and listen, to get a sense for the pulse of your field.

Now, your turn: 

What do you see as passed or passing fads in your field? How can we know if something is a fad-to-be and get in on it early?





Tuesday, September 17, 2019

How to make rational inferences from data

This post is motivated by the paralysis that many students encounter when attempting to fit a model to their data, typically in R. I have long been frustrated by how this process sometimes turns thinking students that are seeking new ideas into desperate technicians that are seeking engineering solutions. However, even more recently, I have become concerned by the counter-productive self-questioning hand-wringing that so many students encounter during this process – to the point that they sometimes don’t believe their own data and start to second-guess their biological questions and experiments and outcomes. Hence, I have here written a “take a step back” approach to inference where only the last 5% of the process is worrying about generating a P value or AIC difference or the equivalent, thus leaving the other 95% for thinking!

A.     TURN OFF YOUR COMPUTER. Don’t look at your data. Get out a pen and paper – or, better yet, a white board. Invite your lab mates over – maybe even your supervisor. Then proceed with the following steps. Of course, it is useful to do all of this before designing your study, but the realities of field data collection can mean that the data you end up dictate the need to redo the following steps after data collection – but before analysis.

1.      Decide – based on your question/hypothesis/prediction what the published “money plot” should be - the plot that will "get you paid"! That is, what graphical representation of the data will convey to the reader the specific answer to your biological question. Draw this figure or figures and indicate what interpretation you will draw from a given pattern. An example might be an x-y plot where a positive correlation would mean one thing, no correlation would mean another, and a positive correlation would mean something else again. Don’t just imagine various ways to plot your data; instead specifically design the plot(s) that will convey to the reader the answer to the question. You should be able to point to a specific pattern that would result in a specific conclusions directly relevant to your specific conclusion.

2.      Decide how you will interpret a given effect size. For instance, if you are looking for a positive correlation coefficient between x and y, then perhaps you will compare that coefficient to a meta-analysis showing a distribution of similarly-obtained correlation coefficients. Or, to what other correlation between variables will you compare your target correlation – that is, can you define a non-causal variable that you can plot your y-axis data against – a “control” correlation that should show no true effect? Determining effect size and conveying its relatively importance to the reader will be the absolute key to rational and useful inference.

3.      Figure out your unit of replication when it comes specifically to your questions and the money plot you intend to generate. In one sense, this point might be re-stated as “don’t pseudoreplicate”, which might seem obvious but – in practice – can be confusing; or, at the least, misapplied. If, for example, your question is to what extent populations or species show parallel evolution in response to a given environmental gradient, then your unit of replication for inference is the number of populations, not the number of individuals. If you have two experimental treatments that were each imposed on five experimental tanks – those tanks become your unit of replication.

4.      Decide what your fixed and random effects are. Fixed effects are factors for which you are interested in making inferences about differences between the specific levels within the factor. Random effects are, in essence, factors where the different levels are a conceptually-random selection of replicates. Random effects are things for which you can make an inference about the overall factor (e.g., different populations have different values) but not the individual levels of that factor (you would not, with a random effect, say “population A differed from population B but not population C”). Those sorts of direct among-level comparisons are not relevant to a random effect.

B.     TURN ON YOUR COMPUTER AND OPEN A DATABASE AND GRAPHING PROGRAM. Excel, or something like that, is ideal here. If you are very comfortable in R already, then go ahead and use that but, importantly, do not open any module that will do anything other than plot data. Don’t attempt to fit any inferential models. Don’t attempt to statistically infer fits to distribution or specific outliers. Don’t generate any P values or AIC values or BIC values or log-likelihoods, etc. You are going to use your eye and your brain only! Now proceed with the following steps.

5.      Plot your data for outliers. Don’t use a program to identify them (not yet anyway) – use your eye. Look at data distributions and plot every variable against every other variable. Extreme outliers are obvious and are typically errors. These must be fixed or removed – or they will poison downstream analyses. Some errors can be easily identified and corrected by reference to original data sheets or other sources of information. If dramatic outliers cannot be fixed, delete the entire record from the dataset. Note: Don’t change or delete data just because they are contradictory to your hypotheses – the examination suggested here is hypothesis free.

6.      Decide which of covariates you need to consider. If you are measuring organisms, these covariates an obvious example is body size. If you are doing experiments or observations, other examples include temperature or moisture. These covariates are things NOT directly related to your question but are instead things that might get between your data and your inference. Plot your data against these covariates to see if you need to consider them when making initial inferences from your data. It is very important to evaluate covariates within each level that you have in your data. For instance, you need to know whether body size is influencing your measured trait WITHIN each population or treatment not across ALL data pooled.

7.      Plot your data in a fashion as close as possible to the money-plot you previously designed. If you have important covariates, make sure to correct for them as necessary. For instance, you can add an axis to your money plot that allows you to assess the key result across the range of body sizes. Make sure that your plot does not have unequal representation of experimental units (e.g., numbers of fish in different tanks) within a given level of your treatment. Otherwise, you might get tricked by one over-represented unit that has an anomalous result. This point is obviously related to the above comment about determining your unit of replication.

8.      Look at your plot and draw your inference. Does the (for example) line go in the direction you predicted? How steep is that line – that is, the effect size? How big is the difference between your control and your treatment in relation to the variation in each group (at the correct level of replication)? How does that result qualitatively compare to previous work – so that you have some idea of the relative importance of the effect you have (or have not) uncovered.

OK YOU ARE DONE. Congratulations. You know the answer. Write up your paper, defend your thesis, get a postdoc, get a job, get tenure, retire, and give your Nobel lecture.

Well, I suppose there is one more thing you should do – but, really, you are 95% done here in most cases. What you see in the data is the reality of the situation and you have interpreted it in light of previous work. Your eye is really really good at this stuff. The one small thing left to do is to figure out a way to state the level of confidence you have in the interpretation you have just drawn from the data. This minor thing is all that p values, AIC levels, BIC levels, confidence intervals, and so on are for. That is, the data are the real thing and you have drawn an interpretation from them – now all you need is a way of conveying to a reader how confident you are in that interpretation. I will make some suggestions in this regard, especially in relation to model fitting, in the next post.

Sunday, August 25, 2019

Self-Citation Revisited

A few years ago, I wrote a post here called A Narcissist Index (n-index) for Academics in which I jokingly argued for an self-citation index based on the proportion of a person's total citations that to their own work. Mine was 10%, which through an on-the-fly search by a group of us in attendance at the American Society of Naturalists meeting revealed to be kind of middle-of-the-road. But that analysis was only a quick and cursory tongue-in-cheek survey.

Now, it seems, someone has taken on the task of analyzing a database of self-citations that includes more than 100,000 scientists. They calculate a number of indices of impact for authors with and without their self citations. Here now was the chance to figure out my true self-citation impact in a large pool of scientists in related fields.

Of course, a number of caveats can be kept in mind. First, the data are from Scopus, which is much less complete than Google Scholar - so the reported citations are far lower for everyone. For me, for instance, my current citations are 20,101 on Google Scholar (h-index = 77), 13,567 on Web of Science (h-index = 63), and 13,250 on Scopus (h-index = 63). Yet, as long as no bias exists, perhaps it is still a reliable indicator of self-citation impact. Second, the authors calculated a number of indices of impact, some of which seem to be completely nonsensical. So I merely used total citations and h-indexes.



To calculate my self-citation impact relative to everyone else, I first sorted to include only the categories "ecology" and "evolutionary biology", yielding 2126 people. Then I plotted h-index in 2017 including self-citations versus h-index in 2017 excluding self-citations. (The first time I posted this, I had the axes reversed - the current version is corrected.) On this, I plotted the two authors of this blog and also Steve Cooke, who has written a spirited defense of self-citation.

A first point is that, as before, I am kind of middle of the road when it comes to the impact of self-citation. To return to our original n-index, I calculated here that the proportion of my total citations due to self-citations is 16%, which is in the 60th percentile so, again - middle of the road. The co-host of this blog, Dan Bolnick, has a lower self-citation rate, with an n-index of 10.7%, which is in the 30th percentile. Give him time. As a side-bar, I was disappointed to see that - for unknown reasons - my colleague Rowan Barrett was not in the index for ecology and evolutionary biology. I wanted to check my previous conclusion that he had an extremely low self-citation rate.

A second point is that Steve Cooke is pretty close to the best at it. In fact, he has 12th highest n-index in the entire database of 2126 ecologists and evolutionary biologists. Again, self-citation is not necessarily a bad thing. Check out Steve's paper on Self-citation by researchers: narcissism or an inevitable outcome of a cohesive and sustained research program?

A final point is that, really, self-citation doesn't matter much. In fact, a regression through the line yields an r-squared of 96.7%. In short, the variation among researchers is vastly higher than the effect of self-citation within researchers. Everyone can chill out.

But, more importantly, are these counts and indices and ranks even useful. Much has been written on this topic, much of which I agree with. I had my own take in the post Should I be Proud of my H Index?



Monday, July 29, 2019

Publish - or it perishes.


This post was inspired by the following line in Lord of the Rings read to my kids while on “vacation” at my cabin in BC. “’Follow what may, great deeds are not lessened in worth,’ said Legolas. Great deed was riding the Paths of the Dead, and great it shall remain, though none in Gondor be left to sing of it in the days to come.”

Image result for dimholt road aragorn

If a tree falls in the woods, and no one is there to hear it, does it make it sound? NO.

I have always maintained that – no matter the research you did and its quality and results – it never existed if you don’t publish it. Unpublished research only enriches (or deriches, I suppose) you and the people that did it – it has no influence outside that limited sphere. Given that no one outside of the researchers know the results, it ceases to exist as research – in the sense that research is conducted for the benefit not just of the researchers that did it but for the wider world. So – always publish your research – even if you don’t like the results, even if you move on to other jobs, even if you lose interest. If you don’t publish it, it is a waste of taxpayer dollars and en(de)riches no one but yourself.

Exceptions exist, of course. If you know that the experiment or research was BAD – that is, all the fish died or the field assistant mixed up the data irretrievably or all the camera traps failed or whatever – then you obviously don’t want to publish it. But, importantly, the decision to NOT publish something should never be a function of the actual RESULTS of the study. If the study was conducted well, then the results are the results and should be published regardless of what the actual result is. If you predicted a positive correlation between x and y and you felt like you did the study right BEFORE you saw the results, then you need to publish it regardless of whether the correlation is positive or negative or non-existent. Stated another way, your perception of the quality of the research should not change AFTER you see the outcome of the research – otherwise what is the point of conducting the research in the first place.

If you don’t publish your results, you run the risk of a confirmation bias (only publishing results if the conform to prior expectations), a file-drawer problem, wasting taxpayer money, wasting future researcher’s time and resources, and so on.

If a tree falls in the woods, and people are there to hear it – but then they all die afterward and leave no descendants, did it make a sound? YES.

Many people are disappointed when they publish a study but it ends up in a “lower-tier” journal after they first tried top-tier journals where they thought it belonged; or they publish the paper and few people cite it. These are reasonable feelings for a researcher to have, of course. You poured your heart and soul – and blood and sweat and tears – and money and time and resources – into your study; and the results were cool and graphs are engaging and you did an awesome job writing it up and so on. If it doesn’t shake the foundations of your field, or at least cause them to quiver, then you feel let down; like your research wasn’t that good after all. Critically, however, what matters – once your work is published – is how YOU feel about it. Are you proud of it? If so, then external validation is nice, but not important in the end.

My daughter has a t-shirt that reads “Don’t let the number of likes define your art.” – to which I like to jokingly add “unless you get a lot of likes.” The sentiment of the original saying is what I am talking about above, of course; if you like your research, then it is good! The tongue-in-cheek addition, however, also acknowledges that perhaps you don’t see the worth of your own work – but that others do. [See my post on "Should I be Proud of my H-index?"]This can happen if you have spent so much time and endless revising and reanalyzing on a study that you are simply deathly sick of it – and just want to be done with it so you can move on to good research in your future (or some non-research endeavor).


So, be proud of your research if others like it – or even if only you like it. But, regardless, you need to publish it first.

Wednesday, July 24, 2019

What Associate Editors (ideally) do

The following is a cross-post by Dan Bolnick, copied verbatim from his blog platform as Editor-In-Chief of The American Naturalist. The blog was written aimed at new Associate Editors (AEs), to articulate the journal's aspirations for the role of AEs. Unlike many journals, AmNat AEs often write more extensive and helpful comments than the reviewers. I have seen many cases where papers were fundamentally changed for the better, really transformed, by careful Associate Editor feedback. While some authors find this extent of feedback off-putting (not to name any names, Andrew), many authors respond very positively to the care that we put into our decisions. This becomes a cultural norm, as repeat authors may become new Associate Editors who pay it forward. The following essay is meant to articulate these norms. Although it is aimed at AmNat AEs, I am reposting it here to reach a broader audience of authors, reviewers, and AEs for other journals.

-Dan Bolnick












_________________________________________________________________________________________________________________________________________________________________

Associate Editors that join The American Naturalist's editorial board are given a set of detailed guidelines for the practical side of being an AE: how to use Editorial Manager, for instance, to select reviews or return a recommendation to the Editors once reviews are in. But there are also parts of this job that we assume Associate Editors know, without necessarily articulating it clearly. We assume that new AEs know the journal's editorial style from having been reviewers for us, and having received reviews and decisions when submitting to us. But, it always helps to make some expectations explicit, so the Editors of the journal decided we should more clearly articulate our vision of the role of Associate Editors at The American Naturalist.

There is a lot of negativity about the peer review process today. In person and on social media, scientists love to complain about the reviews that seem unreasonable, the decisions that felt cursory. In contrast, our journal office regularly receives ‘thank you’ emails, even from authors whose papers we declined. For example, last week I received a thank-you from an author whose paper I declined without even sending to an AE. I had spotted what I considered a fundamental logical flaw in the paper, and tried to kindly but firmly explain why it was a problem, the author couldn’t claim their data was evidence for the phenomenon they claimed. The authors used that feedback to rethink their approach to interpreting the data, collected additional data, and ultimately generated a stronger paper at another equally-good journal. I’ve seen papers we rejected end up in Ecology Letters and other high-end journals. I don’t see that as a failure on our part, but rather a success: we helped authors improve their papers, contributing to the quality of the published literature, even if it ended up at a competing journal. Of course, this is only a helpful contribution on our part if our reviews and decision letters pushed the authors to go an extra step to improve the clarity of their writing, accuracy of their analysis and interpretation, or add key data.

Authors recognize this value-added. At the Evolution meeting this summer I had numerous people praise the review process at the American Naturalist. I’ve seen similarly complimentary comments on Twitter. And I think this is a key part of our brand. Yes, we aim to publish conceptually innovative cross-cutting work. But we also aim to provide a positive review and decision-making experience, whatever the outcome. So, what does that mean for our expectations for you, our Associate Editors?

1)      Look at the paper carefully before review, and decide if it is worth using reviewer time. At this stage, I normally take notes that help me later with writing a decision letter.

2)      If you believe it is not a sufficiently important advance, or has flaws that will raise serious barriers to publication, write a review that clearly explains your logic. Editorial declines without review, by you or by me, should come with enough of a review that authors are convinced we read the paper carefully, and thought about it. It should contain enough feedback that authors feel the submission was worthwhile even if they did not get accepted, because it helps them adjust their approach for the next journal.

3)      Sometimes, a paper has real but unrealized potential: there’s a gem of an idea, but it is buried in shoddy writing, a flawed model or statistical analysis, poor graphics.  You know it’s not going to survive peer review very well, but you think it mightstand a good chance if the authors fix it up first. This is where the Editorial Decline Without Prejudice (DWOP) comes into play. You have a chance to send this back to the authors with a clear statement of what the value of the paper could be, and what needs to happen to realize that potential. You are doing everyone a favor here: the authors have a better shot at success, you save the reviewers by helping them see a better paper first. I’ve found that authors are shocked and thrilled to have an editor or AE give them pre-review feedback to improve their chances with reviewers. This strategy shouldn’t be applied to all papers, but to those with really high value but which will almost surely encounter serious but avoidable reviewer resistance.

4)      When reviews come in, write a review of your own. I tended to first read the reviews, then I would revisit the paper’s text, figures, and tables and any notes I took on my first look through. I would then write a review that includes:

a.      My own feedback on the paper, on points that the reviewers may have missed.
b.      A summary of the essential points from reviewers (and my own reading) that either preclude further consideration, or that must be dealt with in a revision.
c.       If you disagree with the reviewers on some point, or need to arbitrate between conflicting reviews,  do so while being respectful to the reviewers. This is important when the reviewers ask for something unreasonable, or especially when they express themselves in an overly aggressive or negative manner. You are an arbiter who can provide a buffer between the authors and a mean or thoughtless reviewer. Luckily, I rarely feel like this is an issue, which speaks well of our reviewers, but it does happen sometimes.

5)      Keep a lookout for ‘diamonds in the rough’: a paper with a great underlying idea, a unique dataset, that might get negative reviews in its current form, but which might be truly great with work. Sometimes one is tempted to just knee-jerk decline papers that get negative reviews. But look closely. Some of our best papers were met with initially very critical reviews, or editorial DWOPS that might have been declines. I want to especially draw your attention to this blog post by Meg Duffy (now one of our AEs):
In this post, Meg describes a paper that was initially rejected from Ecology, and might easily have been rejected from AmNat next. But, then-AE Yannis Michalakis delved deeply into the paper. Through a series of revisions, his careful recommendation letters led Meg and co-author Spencer Hall (also now an AE) to hone the paper into a publication that went on to win the Ecological Society of America’s Mercer Award. I think we’d all aspire to be the AE who helps hone an initially rough submission into a citation classic. 

6)      Note, of course, that this doesn’t mean you should never decline papers. Not everything is a citation-classic in the making. We should also avoid endless cycles of revision and re-review, and you should feel free to recommend "decline" for papers that don't improve after resubmission or revision. Authors are likely to be upset if their paper ultimately gets declined after multiple rounds of revision, so it’s best to try to assess how likely the authors are to be able to address the editors’ and reviewers’ major concerns after the first round of review or first revision, and decline papers that are unlikely to improve sufficiently to meet our requirements. Again, it is rare for us to hear from an irate author, but when we do it is almost always because their paper got declined after two or more reviews.

7)      Keep in mind that, at Am Nat, the three Editors also play an active role in assessing papers and reviews, and that the Editor might occasionally disagree with your recommendation or might feel the need to seek additional advice on a paper. Please don’t feel offended when this happens. Editors see many manuscripts and try to ensure that all submissions are treated fairly while also monitoring our overall acceptance rate. Editors may often change a recommendation of “Major Revision” to “DWAP” or vice versa, or might change a “DWOP” to a “Decline”. Sometimes this is because we see something differently than you do. In cases of more serious disagreement, the Editor may discuss the manuscript with you and try to reach a consensus. More often, it is because we are trying to manage the bulk flow of papers into print. If we accept too many papers, we get a backlog and authors get upset about delayed printing. If we accept too few papers per month for a while, we might have too few articles to physically bind together into a print issue. So, one of the Editors’ jobs is to quietly manage the overall acceptance rate. The most effective way for us to do this is to nudge papers from Decline to DWOP or vice versa, or from DWOP to major revision or back.

8)   Find the right balance between pushing authors to improve their writing, while allowing them to retain their own voice as writers. It sometimes happens that a great scientific idea comes wrapped in hard-to-read prose. It is okay to guide authors through the difficult process of improving their writing. It can make all the difference between a great idea that nobody reads, and a great idea that moves the field in a new direction. But, try not to push authors out of a perfectly sound and readable writing style, that might differ from your own.

9)  Make a decision.  You don't always need to take reviewers' time, especially when receiving a revision. If the authors have made substantial technical changes, that you are not in a position to evaluate, then of course send a revision back out for review. But in general our first instinct should be to read the response letter and manuscript ourselves to decide whether the authors satisfied the reviewers' concerns. If so, check the manuscript for any remaining concerns, and try to reach a final decision. For more about re-review, and also the DWOP / Decline distinction, see a previous editors blog post here: http://comments.amnat.org/2017/11/an-open-letter-from-incoming-amnat.html 


In sum, handling a paper as AE for AmNat requires a bit more than a typical review that you might do for another journal.  I want to articulate these expectations explicitly to discourage the temptation to write relatively cursory decision letters, that mostly just summarize a couple of key points from the reviews. I understand that temptation. We editors sometimes succumb it ourselves by just pointing to your recommendations or the reviews, without adding much (especially on days with a half dozen decision letters to write). But I always strive to have at least a few insights of my own to add. When authors know that the AE and Editor have read their paper carefully enough to have their own opinions (as opposed to echoing reviewers’ opinions), they feel like their paper has been given a fair evaluation. 

Dan Bolnick
Editor-in-Chief
American Naturalist
July 2019

Does startup size predict subsequent grant success?

Warning: the following is a small sample size survey not conducted especially scientifically. This is simply out of curiosity and should not...