Eco-Evo Evo-Eco

Saturday, March 7, 2026

The Null Hypothesis is Always Wrong

No two populations are identical for any trait. No two communities have the same species composition. No detectable phenotype is ever completely neutral (i.e., “some” selection is always present). No trait is completely absent of genetic (or plastic) influences. No evolutionary trajectory is ever random. No behaviour lacks at least some repeatability. In all statistical tests, the data violate at least some of the assumptions. In short, the null hypothesis is always wrong – unless, of course, the researcher has a completely vacuous (i.e., not informed by the biology of the system) or intentionally disingenuous (e.g., known already to be wrong) alternative hypothesis.

The issue, of course, is that the EFFECT of any focal hypothesis (two populations are different, a phenotype is heritable, a behaviour is repeatable, data always violate some assumptions) is often very weak. When effects are weak, it becomes very hard to reject a null hypothesis; even though that null hypothesis is, in reality, inevitably wrong. A clear example comes from simulation modeling. Researchers do not put variables into models unless they plausibly have an effect – and so running statistical tests to confirm that a model term has a “significant” effect is simply a function of the number of replicate simulations one runs. Stated the opposite way, even an infinitesimal effect will be significant if you have enough data – in the simulation model case, you can just run more replicates of our simulations and voila – the effect will become significant.

I will first outline a series of problems associated with the use of null hypotheses, while also emphasizing that the solution is to focus on effect sizes (and sampling distribution for those effect sizes). I realize that some statisticians will (perhaps violently) object to some of the statements below – and will also point out problems with specific statements or suggestions. My point in this post, however, is to bring some biology back and to promote an understanding of what statistics are used for in a biological sense: they are simply a way of indicating a degree of confidence that one has when stating a conclusion from data.

1. Null hypothesis testing distracts from what matters – the effect size.

When researchers use a null hypothesis, they tend to make a dichotomous decision: that is, “my alternative hypothesis is not correct” or “my alternative hypothesis is correct.” (Of course, the technical way to state the outcomes is different – but the preceding statement is what everyone actually wants to infer.) In the first case (failing to reject the null), researchers often then don’t even report the estimated effect size – which has a well-known effect biasing meta-analyses. Researchers should always report the estimated effect size even if the null is not rejected! (Remember, it is the estimated effect size that matters). Further, researchers struggle with p values that are barely non-significant – thus, discussing them as “marginally significant” or “suggestive” or “a trend.” Again, all of this goes away if the focus is on effect size.

Perhaps most insidious, researchers conclude that their data fit the assumptions of their statistical model (e.g., error distribution) when they fail to reject a null hypothesis that the assumptions are not violated. The reality is that the data ALWAYS violate the assumptions of the model: no data/residual distribution is ever normal (or log-normal or Poisson or whatever distribution you achieve with transformations or not), no two populations ever have the same variance, no X-Y relationship is ever perfectly linear, and so on. What matters is the extent to which (that is, the effect size) the data violate the assumptions and WHETHER THAT MATTERS FOR YOUR CONCLUSION. Similar issues attend the removal of “non-significant” terms from models (e.g., terms that do not provide a “significant improvement to the fit of the model to the data”) – especially when those non-significant terms account for structure in the data (Arnqvist 2019 - TREE).

In the reverse case (rejecting the null), researchers often then just state they have found an effect without emphasizing the magnitude of that effect. For example, a researcher that rejects a null hypothesis will often state that two populations differ in trait value or that a trait is repeatable or heritable. The reality is that no populations are identical for any trait, all traits are (at least to some extent) heritable and all behaviours are (at least to some extent) repeatable, what matters is HOW heritable or repeatable. Simply stating that a trait is repeatable or heritable provides no useful information – what we need to know is HOW heritable (as that predicts the rate of evolution) or HOW repeatable the trait is (Hendry 2023 - Bioscience).

2. The null hypothesis is given favorable status when it is really just one of several competing hypotheses.

With limited data, many substantial effect sizes will be deemed non-significant in a statistical test. The reason is that the null often encompasses such a huge potential range of possibilities that it becomes very hard to reject the null even when – as noted above – the null hypothesis is surely wrong. Stated another way, the null hypothesis is given “favored status” among a set of competing hypotheses. The solution to this problem is to instead gauge the level of support for several alternative hypotheses, including – if one wishes – a “random” equivalent to a null hypothesis.

An excellent example of this problem (and solution) comes from attempts to detect non-random changes in paleontological time series. In a particularly telling instance, Mike Bell collected fossil time series at high resolution for threespine stickleback traits (Bell et al. 2006 – Paleobiology). Although the traits (armour plates and pelvic structures) were well known to be functional and to repeatedly evolve in response to selection in contemporary populations, and even though the dramatic change in the trait through time was perfectly consistent with contemporary time series, all available statistical tests failed to reject the null hypothesis of random change. Gene Hunt then took the much more reasonable approach for the same data set of estimating the level of support for alternative hypothesis (e.g., drift versus adaptation to a fitness peak), revealing that the adaptive hypotheses had much more support than a drift hypothesis (Hunt et al. 2008 - Evolution). Gene Hunt also applied this competing-hypothesis approach to other fossil time series, again revealing much more support for selective mechanisms than had previously been inferred using null-hypothesis approaches (Hunt 2007 PNAS).

This competing hypothesis approach is embraced by methods such as model comparisons based on AIC or BIC. Importantly, however, one must not fall into the trap of simply then stating one hypothesis is a better fit than the other but should instead emphasizing the level of support for the different hypotheses (e.g., AIC weights).

The time series of fossil stickleback traits analyzed by Bell et al. (2006) and then re-analyzed by Hunt et al. (2007). This figure is from the latter paper.

3. The null hypothesis is subjective.

It might seem that null hypotheses are objective: that is, “a variable of interest has no effect” or “two populations are not different.” In many instances, however, the null hypothesis is subjective – and thus, when combined with the above “favored status” given to a null hypothesis can dramatically influence conclusions for the same set of data.

A starting example here occurs for correlations between two variables (X and Y) in two groups (e.g., two populations or two treatments). A logical question is whether the two slopes differ – and so the null hypothesis would state that the Y variable has no statistical influence from the interaction between group and the X variable. Assume the null hypothesis is then rejected and the researcher goes home happy. At the same time, we would presumably only be interested in the relationship between X and Y within a population if the relationship within that population were significant (in the classical approach), in which case the null hypothesis would be that no relationship was evident within a population. Thus, in the case where one population shows a slight positive correlation between X and Y and the other population shows a negative correlation between X and Y, the two slopes can be significantly different from each other (i.e., “the relationship differs between the two populations”) even though neither slope is significantly different from zero (i.e., “X does not influence Y in either population). (I am not here saying that both null hypotheses SHOULD be tested in such cases. I am merely here pointing out the cognitive dissonance that can arise via null hypothesis logic.)

Another example comes from the use of one-tailed versus two-tailed hypotheses. In many cases, researchers are interested in the direction of an effect (e.g., Y increases as X increases) and so the null hypothesis is that the correlation is NOT positive and the alternative hypothesis is one-tailed (a positive relationship), thus increasing power. What does a researcher do then, when they find a strong relationship in the opposite direction to the alternative hypothesis? Yes, the alternative hypothesis was incorrect and yet it would be silly to leave it at that. The evidence instead clearly suggests a strong effect in the opposite direction – and, yet technically, in the null hypothesis approach, all the researcher can technically conclude is that the effect is not positive – they cannot conclude the effect is negative. Alternatively, the researcher might choose a two-tailed test, in which case the null hypothesis is that the effect is not zero. In this case two-tailed case, however (again, technically in the null hypothesis approach), the researcher cannot state the DIRECTION of the effect they find because the alternative hypothesis was simply that the slope was not zero.

Other examples of subjective null hypotheses abound. In studies of parallel evolution, for example, the null hypothesis can completely flip depending on your interest. On the one hand, the null hypothesis might be that evolutionary trajectories are random, with the alternative being that they are not random. For other researchers, the null hypothesis might be that the evolutionary trajectories are parallel (because they are interested in deviations from parallelism), with the alternative being that they are not parallel (De Lisle and Bolnick 2020 - Evolution).

Again, the solution is to estimate the effect size and the sampling distribution for those effect sizes.

4. Other issues

I probably haven’t yet mentioned one of the criticisms you were expecting – that the critical value for rejection (usually 5%) is arbitrary – in essence, made up by Ronald Fisher when he invented the null hypothesis statistical approach. This specific arbitrary cut-off contributes to many of the above effects, such as the “favored” status for the null and the lack of emphasis on effect size. Importantly, however, this problem can’t be fixed simply by changing the critical value.

Another problem arises when multiple comparisons are involved. In some cases, it is argued that one needs to adjust your P value (e.g., Tukey post-hoc tests, Bonferroni “corrections,” False Discovery Rates, etc.) to some experiment-wide or study-wide effect. This adjustment becomes advantageous when one doesn’t want to reject a null hypothesis – usually in the case of testing for violated assumptions. But it is also problematic in the other direction. Imagine for instance, that you measure whether some particular trait differs between populations and find that it does (because you can reject the null that it doesn’t). Then you decide to measure more traits and realize that – to maintain a study-level p-value – you need to adjust for the chance that you will mistakenly reject the null hypothesis 5% of the time FOR EACH TRAIT. It doesn’t take a lot of traits for your original significant relationship to become non-significant. (Yes, I realize you can employ multivariate tests but – often – trait specific conclusions are necessary.) Another common example is when you test for differences between more than two populations – and find that you can reject the null hypothesis that they are different. However, with such a test, you can’t conclude which specific populations are different from which other populations – so you then conduct post-hoc tests of one sort or other and find that you can’t find ANY differences – even though you “know” that at least one difference must be present.

The solution

As noted above, the solution to these problems is the work with effect sizes, such as R2 or AIC or Hedge’s G or Cohen’s D or estimated slopes in linear models. Importantly, however, these estimates should be combined with presentations of their sampling distributions (the probability distribution of effect size estimates given the data and the sample sizes), which can then be used to estimate confidence intervals (or equivalents) on effect size estimates. Of course, each estimator has its own set of uses and misuses – but that is for another time.

In closing

Many of you might be thinking that I am a hypocrite because work from my lab does often use null hypothesis-based statistical testing. The reason that we do this is, of course, that reviewers and editors and journals often insist on it – and so, if one is to publish a paper, one needs to use that approach. HOWEVER, we always report and emphasize and compare the effect sizes – as opposed to null hypothesis rejection. Stated another way, we do report p-values so that people who care about them can see them – but we also report and emphasize the effect sizes, which is what matters.

Wednesday, June 4, 2025

Unlocking understanding in undergraduate evolution education

Suegene Noh wrote this blog post with input from Nancy Chen, Alejandra Carmago-Cely, Kiyoko Gotanda, Amanda Puitiza, Lucia Ramirez, Juleyska Vasquez-Cardona, Yaamini Venkataraman

A version of this post is also published on the Genes to Genomes blog https://genestogenomes.org/unlocking-understanding-in-undergraduate-evolution-education/

Teaching is an integral part of many of our jobs as academics, so collaborating with like-minded scientists and scholars to think about how best to present biological concepts to students can be a valuable and rewarding experience. More critically, the way we teach evolution in undergraduate courses, particularly how we address student misconceptions or address the legacy of harmful ideologies, can have important impacts on student social beliefs and senses of belonging in science.

The RIE2 (Resources for Inclusive Evolution Education) organizing committee is composed of 8 women of color at various career stages in the field of evolutionary biology (i.e. Ph.D Candidates, Postdoctoral Researchers, Assistant Professors, and Associate Professors) who came together at a National Science Foundation and Dartmouth College funded Networks of Success conference organized by the Women and Non-Binary People of Colour in Ecology and Evolutionary Biology. We developed this initiative to increase the accessibility of teaching resources that improve and deepen the understanding of evolutionary concepts that are prone to misuse. We conceptualized a working group that would convene participants from the fields of evolutionary biology, science, technology, and society studies (STS), history of science, and education to co-create such resources (https://qubeshub.org/community/groups/rie2). To this end, we invited two external scholars (Dr. Banu Subramanian, Wellesley College, STS, and Dr. Angela Google, University of Rhode Island, Biology Education) as consultants to assist with building bridges across disciplines and provide feedback during resource creation.

We first developed an online repository of educational materials provided by educators (https://qubeshub.org/community/groups/rie2/resources). We subsequently recruited participants through society listservs, social media, personal networks, and email outreach to department chairs. From these potential participants, we invited participants at the beginning of June 2024 with a goal of balancing career stages and geographical locations.

We started the first RIE2 working groups at a kick-off event on June 17, 2024. At this event we set community guidelines for how to work together and held a workshop inclusive practices in STEM teaching with Dr. Corrie Moreau (Cornell University). From thereon, participants were organized into two timelines based on preference. The short timeline groups (13 participants in two groups: adaptation and genotype-by-phenotype association) were slated to wrap up resource creation by the end of August 2024. The long timeline groups (18 participants in 3 groups: genetic drift, natural selection, and phenotypic plasticity) were slated to wrap up resource creation by the end of February 2025. We provided both sets of groups with virtual community check-in meetings with opportunities for immediate peer feedback and with scaffolded deadlines for teaching resource components. Individual working groups self-organized to create components of teaching modules over the subsequent weeks or months.

With funding from the American Genetics Association, the European Society for Evolutionary Biology, Genetics Society of America, the Society for Molecular Biology and Evolution, our initiative has supported 30 total participants, including 7 organizers, in creating inclusive resources for teaching evolution. Our participants comprised 10 faculty, 7 postdocs, 10 grad students, and 3 independent scholars. Given our goal of critically examining eugenic and settler-colonial underpinnings of the field of evolutionary biology in the process of the initiative, we were excited to have 5 STS scholars included in our participants. Our first iteration working groups focused on five topics: adaptation, genotype-phenotype associations, natural selection, genetic drift, and phenotypic plasticity. Five modules have been published to QUBES (https://qubeshub.org/community/groups/rie2/publications). As of May 31, 2025, the five modules together have been viewed 1888 times and downloaded 949 times.

Based on the number of downloads and individual comments received, we expect these modules to be broadly useful for instructors interested in offering more inclusive takes on these complex evolutionary concepts. To better assess their impact, we have recently partnered with Dr. Robin Costello (University at Buffalo, Biology education) to create a survey to quantify student outcomes based on application of our teaching modules in a course setting. One of us, Dr. Suegene Noh at Colby College, had the opportunity to teach the Genetic Drift module in March 2025. This module includes six new activities developed by the genetic drift working group over the past nine months. The survey results revealed that student understanding of genetic drift, as measured using the Genetic Drift Inventory, significantly increased. More importantly, students were more inclined to agree that society affects the way scientific knowledge is built and how scientific knowledge is applied. The unique activities in our module, including the one that engages with Queer theory, helped students more readily make this connection to society.

We believe our working group initiative was a success. So far we’ve collected anonymous feedback from the participants of the short timeline. They reacted positively to meeting and collaborating with other participants from different institutions and countries. A total of 8 participants were not based in the United States and their experience with different education systems and curriculums provided valuable input into the module development. We are excited to see our modules in use in the world and welcome any feedback instructors have. We will continue assessing the impact of the resources created during this first iteration, and we hope to organize additional working groups in the future.

Friday, May 2, 2025

Tales from the Hill

Contributed by Dan Bolnick

On April 30 2025 (the same day NSF froze all funding actions), about 20 colleagues and I were crisscrossing Capitol Hill in Washington DC meeting with staff of our Senators and Representatives (and occasionally the congressfolk themselves). I was there with a delegation from the American Institute of Biological Sciences (AIBS) to ask Congress to support science in the US through continued funding at or above recent levels. Allocating funding, after all, is where Congress’ primary power lies. But in addition to our “ask”, we also came to our meetings with an “offer”. We would ask the staff, “what can we do to help the Representative or Senator push through science-friendly policy?” That offer of help changed the nature of our meetings – the aides began giving us specific advice, questions, and requests. Here, I want to convey some of the lessons I learned from those conversations. I’ll give the short social-media post ‘executive summary’ version, then elaborate below in a longer post.

Quick version:

When asked about what we scientists can do to help federal funding of science, Congressional aides suggested the following:

1) Send your representatives specific, personal, heart-warming stories of the benefits of science funding, and also examples of the harm done by cuts to science. Be sure it connects with everyday lives in ways anyone can relate to.

2) Engage in science communication and public outreach to get the wider public excited about science, and concerned about cuts.

3) Focus your letter writing and phone calling wisely, on those in power (GOP), and on those who need to be persuaded (GOP). So those of you in “red states” are most important in advocating for science. Find common ground with opponents of science funding to try to persuade them.

4) Your reps (especially democrats) are cut off from information coming from federal agencies under the executive branch. They are learning about science cuts from us, or from the media, so when you learn about breaking examples of terminated grants, interference in free speech, etc, call or mail your reps to inform them.

5) You can ghost write material (Dear Colleague Letters, Oversight Letters, questions for committees and hearings) that your representative might find useful, and their staff are not qualified to write.

6) There is hope: there remains bipartisan support for science in Congress, they just need the spine to stand up for their beliefs. Hearing from constituents who support science, and hearing the benefits of NSF and harms from cutting it, helps strengthen their resolve.

Now, the long version…

First, a brief explanation about the AIBS Congressional Visits Day. The AIBS offers a two day training event in Washington DC each year, which I very highly recommend. This event covers some basics of science communication, the federal budget process, and how to develop a ‘script’ for meetings with congressfolk or their aides to succinctly convey your “ask”. I found the training itself to be very useful, a nice mix of presentations about do’s and don’t’s, coupled with active group work, presentation, and giving feedback on each others’ scripts. The event was attended by a couple dozen scientists – mostly graduate students and postdocs – with a strong field and organismal biology representation. At the end of the training, we split up into delegations by geographic region and had a full day of meetings with our representatives. Congressional offices tend to only book meetings with their own constituents, so my group (myself from Connecticut, plus two Massachusetts and one New Hampshire residents) met with Senators’ staff from our three states, and staff from our particular House districts. Other AIBS attendees from elsewhere in the country met with staff from southern, east coast, Midwest, southwest, mountain, and west coast representatives.

As a member of the New England delegation, we had it rather easy: every one of our Senators and Representatives had previously voted for the CHIPS and Science act (e.g., funding NSF at nearly $12 billion dollars). New Hampshire is very much a swing state where we thought it was important to clearly convey how important science funding is. There, we went with our script, which had a central “ask” (please protect science in the US) with three pillars to it (funding, protecting free speech and inquiry, and protecting the people who do science).

But for the most part, I felt a bit guilty taking 20 minutes of a busy congressional aide’s time to argue for something they were likely to do anyway. So we pivoted to giving more time to the “offer”. And that’s where things began to get interesting. Here are some of the responses that we got, when we asked what we as scientists (and you, too) can do to help congress help science:

1. Tell the reps stories.

This was the most universal theme. “Stories, stories, stories,” said one aide. Ultimately, lawmaking and funding are about how constituents feel, and making their constituents feel heard and supported. And a key rule in politics is that emotionally resonant stories are far more effective at winning hearts and minds, than numbers can be. So, call or write to your congressperson with stories about how science funding is benefitting their constituents. This gives your representative a supply of stories that they can draw on in conversations with skeptical colleagues, on speeches, in press conferences.

The stories need to be something that they, and a random constituent, would value. If I just say that the money makes it possible for me to study fish speciation, that’s not going to strike a non-biologist as valuable. So, what kinds of stories connect for a politician? Stories that relate to their constituents’ jobs, workforce training, income, health, safety, and quality of life. To some extent, that means applied research producing innovations that help small businesses or agriculture or fisheries, conservation, health care advances, technology, patents. Applications. Now, many of us don’t actually pursue applied research (I sure don’t). But “basic” science (or what I prefer to call discovery science) is nevertheless a great engine for producing unexpected benefits. You might not have any applications from your work – yet. But might you in the future? I’ll give a couple examples of stories that I prepared for my congressional meetings, just for a flavor, at the end of this post.

Even if you aren’t doing applied work you can tell personal stories about employment. I told my House Representative’s aide that I had brought in $X in research funding, that we largely spent employing students and research technicians, generating over 250 jobs in my district over the years. Those employees spent their income within the Representative’s district. And when they did research they often bought supplies from local businesses. So those research dollars coming into the Representative’s district are spent several times over, contributing to the local economy.

Workforce training is important to representatives also. Academics sometimes complain that we train more PhDs than there are faculty jobs to receive them. But that’s a strength, not a weakness – those ‘excess’ PhDs may be the most important thing we do, because they go on and use their skills in data analysis and communication to contribute to the US economy. As an example, I brought a brilliant student into the US from another country, and after she graduated with a PhD in biology she used her statistical savvy to work as a data scientist, first at AirBnB, then Oculus, and now as a leading data scientist at Meta, contributing to the growth of our tech economy. Then, there are all the undergraduates who we work with in the classroom, lab, and field, who go on to use the skills we teach them in do work as doctors, pharmacists, epidemiologists, start-up company owners, K-12 teachers, and far more. Stories of these former students’ later careers convey the key role NSF funding has in creating an educated workforce needed for the modern economy.

That said, statistics do help sometimes. An aide for one senator with a strong foreign-affairs focus clearly took notes when I gave her specific numbers on how US spending on science R&D compared to Chinese spending on R&D, and the opposing trends (US cutting R&D, China increasing it dramatically). Another took notes when I mentioned specific numbers about grant funding received and number of people employed. Still others liked a colleague’s numbers about the seafood trade deficit ($20 billion, twice the total spending on NSF). I find this report on the economic value of federal R&D funding especially compelling: https://aura.american.edu/articles/report/Preliminary_Estimates_of_the_Macroeconomic_Costs_of_Cutting_Federal_Funding_for_Scientific_Research/28746446?file=53480237 from the Institute from Macroeconomic & Policy Analysis.

2. Engage in public outreach and science communication.

Congressional support of science funding will pretty much always be as strong, or as tepid, as public support for science. Many of us have struggled, at one time or another, to explain to our relatives (or, friends) what we do, and why it is interesting. If your own relatives don’t understand your work’s value, they can at least appreciate that it is giving you some income (I hope). But now imagine a random tax payer who is equally skeptical of the value of your work, but actively dislikes the idea of their tax dollars going to your salary. We need that random person to see the value in your science, so that they are motivated enough to write to their own representatives in support of science funding. Or, at least, not object to science funding. To bring that random person on board, we need effective science communication aimed at the public. Often that might take the form of articles or OpEds in local papers, local news channels, which reach the constituents in your own district. Convince people that grants aren’t wasteful, that the money is going to something that generates something they value. You might value knowledge for its own sake, but don’t assume others do, so think about value also in terms of economic growth, jobs, health, safety, conservation. One congressional aide emphasized these “tangible impacts” multiple times, connecting science to benefits that non-scientists will feel in their “everyday lives”.

3. Focus advocacy efforts wisely.

One of the things that surprised me is how focused Congressfolk are, on their own constituents. If you want to schedule a meeting with them, or their staff, you must live in their district. It’s not enough to have information relevant to their district. Nor is it enough to do research or spend funds in their district. Therefore, our New England delegation met with exclusively Democratic Senators and Representatives (we didn’t have anyone from Maine along). So we were preaching to the choir (a phrase that at least two aides used, which felt a tiny bit like a rebuke for wasting their time). But those same aides emphasized that what we really should be doing is pushing our peers in swing states and Republican-represented states to write letters, call, and arrange similar meetings. Focus our advocacy efforts on the people currently in power, who chair committees and have a majority of votes. The Democrats we met with seemed to feel nearly as helpless as we do. Also focus advocacy on the anti-science representatives who need to hear from constituents who disagree with them. So, we really need those of you who are in Republican-controlled districts to do some of the heavy lifting here. This isn’t just folks with Republican senators or house members; you may have a Democratic Congressional representative but a Republican governor (looking at you, New Hampshire) who can be pressured to express concerns to his party members in Washington DC. The rest of us can help out by working with our colleagues to hone a message, help you practice your delivery. But, it is crucial that our red-state colleagues be especially vocal. And given the current climate of fear of reprisals, we really need to encourage our senior colleagues who are retired, or nearing retirement, to lead the charge here. They don’t have to fear having a grant cancelled, or their employer targeted, so have far more freedom to speak their minds.

When reaching out to a more skeptical audience, there are some things to keep in mind. First, tell stories that have emotional and economic resonance (see point 1, above). Second, find common ground. There’s always something that your opposite will value, that you can draw on. One member of our delegation works on population genetics of marine organisms, and could point to the seafood trade deficit: we spend 20 billion dollars more on buying foreign seafood, than we export. That’s twice the entire NSF budget! So if we can spend a bit of NSF money to improve our seafood industry by better matching genotypes of aquaculture organisms (e.g., oysters) to warming water climates, that’s an investment that can improve economic productivity and cut into the trade deficit. Republican politicians are currently obsessed with efficiency, so we can make the point that grant terminations are downright inefficient. If you build a house, and stop just before you put the roof on because you want to stop spending money, that’s not savings, that’s wasting the funds you’ve already put in, without seeing any of the benefit that would later accrue. Hunters and gun lobbyists value nature. And everyone values health. If you are willing to go down a slightly jingoistic path, you can raise alarms about how US government funding for scientific research & development is declining as a share of the economy, whereas in China it increased by over 8% just last year. They are on track to spend 2.6% of their budget on R&D, compared to our 0.6%. Lots of republicans are worried about geopolitical competition from China, and a powerful argument for them is to invoke memories of Russian achievements during the Sputnik area, and how that spurred scientific funding that allowed the US to dominate in science globally. Argue that this is a new Sputnik moment. Find that common ground and exploit it to explain why science funding, and freedom of inquiry, yield benefits that your interlocutor wants.

The good news is, after our congressional visits day we reconvened with a few of our colleagues from other regions of the US, for dinner. They reported that even their Republican representatives were broadly supportive of continuing NSF, NIH, and NOAA funding. During the last Trump administration, he proposed a 50% cut to NSF, but congress ignored that request and passed a budget that gave NSF a slight increase.

4. Pass along breaking information

I was astonished to hear that federal agency staff have been largely prohibited from communicating with Democratic members of Congress. This means that our Democratic representatives are mostly in the dark about grant terminations, prohibited speech, and other political interference within science funding agencies. They only know what they learn through the news (which is often significantly behind on breaking events in science), and from what their constituents tell them. So please, if you have specific evidence of something that congress should know about (a terminated grant, impounded funds, grant programs being mothballed), let your representatives know about it. It may be that they hear it first from you. For example, some of the staff we talked to did not know that the NSF only awarded half the usual number of Graduate Research Fellowships – that’s over 1000 of our best students who normally would have received salaries to pursue innovative science. That’s less income for their districts, and less economic growth as a result. Other staff did not know that grant programs earmarked by congress had nevertheless been archived, an illegal impoundment of Congressionally-allocated funds. So, every grant termination, every cancelled review panel, let your representatives know! This is especially important because the US Constitution gives Congress the “power of the purse” to determine spending levels. In practice, the White House is using a strategy called “impoundment control” (which, I am told is illegal, though I am not an expert on this point) in which they simply drag their feet on actually spending allocated funds. Congress needs to know of specific examples of impoundment (and the harm it causes) if they are to pursue protecting their constitutional role in government.

5. Ghost write

Your congressional representative has a small staff of mostly 20 year olds, very few of whom have any scientific training. Those aides don’t have the expertise, nor the time, to provide their boss with detailed text that can be used to move forward legislation on science topics. But you do. We were told by an aide to Elizabeth Warren, that we are welcome to provide ghost-written documents that the Senator could use. For example, within the Senate, and within the House, there are “Dear Colleague Letters” that express a representative’s intent to support funding for a particular item in the budget, at a particular amount. For instance, there is currently a Dear Colleague Letter within the House of Representatives written by Representatives Neguse and Fitzpatrick calling for 9.9 billion in funding for NSF – that’s a shade more than the last budget, though less than the CHIPS and Science Act of 2023 ($11.96 billion), and is probably our best case scenario. These Ddear colleague letters get signed by other reps, as a way of gauging the level of support for a topic before a vote. These are an important tool for selecting subjects to legislate and fund, and building political consensus. What surprised me was that Warren’s aide said they are willing to see drafts of Dear Colleague Letters written by us, their constituents, which they might then use. In a congressional office that is thin on science expertise and short on time, this can be a powerful tool for moving forward congressional action on a topic you care about. Similarly, we constituents can ghost write Oversight Letters, which congressfolk can send to federal agencies asking for information and explanations. Finally, when there is an upcoming Congressional hearing on a topic, your Senator or Representative has a chance to ask hard questions of people like NSF or NIH directors, DOGE staff, or whoever is being grilled that day. You can submit questions that you would like your representative to ask during these hearings, of people who are testifying or questions they might pose to their fellow representatives.

These are grim times for science to be sure. US federal funding of science has been a cornerstone of nearly a century of global scientific leadership and economic growth. That leadership is quickly being squandered and once lost may not be regained (for example, Germany never recovered the global scientific leadership it had before World War 2, though now many US scientists are looking wistfully across the Atlantic). But let’s not cede the field uncontested. You have allies in Congress and in government, you simply need to give them the information and support that they need to be effective at achieving our common goals. That requires time and effort from you, but what better use of your time could there be at this juncture in history?

Example Story 1: 35% of deaths involve tissue damage called fibrosis. This is a build up of scar tissue in or around organs such as lungs, heart, or liver. I didn’t set out to study fibrosis. I began as an evolutionary biologist studying how fish split into new species. While pursuing NSF-funded research about evolution of fish in small coastal lakes, we stumbled on something unexpected: fish in some lakes experience severe fibrosis, fusing together all their organs. In other lakes, the same fish species shows no fibrosis at all, or can recover from prior fibrosis (whereas in humans the disease is irreversible). Today, my NIH-funded research team uses this same small fish to study the genetics and cellular causes of fibrosis, hoping to discover tools to prevent, or reverse, this pervasive immune disease. This is an example of how NSF funding for discovery-oriented research, driven by mere curiosity, can yield benefits that may launch new therapies and ultimately improve human well-being. The combination of federal investment in discovery-oriented research at NSF, and applied research at NIH, is a powerful combination that drives innovation for long-term economic growth in the US, and to improve health and well-being of your constituents.

Example Story 2: Funding science at NSF benefits your constituents and the nation. I’ll use my own team as an example: In 2 decades as a professor I’ve brought my university $XX million in funding, from NSF, NIH, and private foundations. That money generates new knowledge – for instance, I study evolutionary biology but this led me to research on an immune pathology contributing to millions of deaths in the US. But the funds have a much wider impact. We buy research supplies from local businesses and equipment from start-ups. My lab alone has employed over 250 people – students, postdocs, technicians whose income contribute to the local economy. And the skills they learn in the process contributes to the CT workforce: my students have become statisticians, doctors, brewers, epidemiologists, K-12 STEM teachers, and start-up owners.

Example Story 3: Let me give you an example of how NSF funding for discovery-focused science has benefitted your constituency. My student ------- moved to CT where she won a prestigious NSF graduate research fellowship to study evolutionary biology. This fellowship gave her freedom to explore different research directions, and she eventually started research on yeasts that live on apples in nearby orchards. In the process of studying how yeasts evolve to attack particular apple varieties, she unexpectedly found a poorly known species of yeast. In the lab, she bred it for cider production, creating a new commercial product we are bringing to market with LongView Cider House at Rodgers Orchards in New Britain. NSF support for her salary (which she spent living in your district) made it possible for her to help a family farm and a start-up company in your district. NSF grants and fellowships can often have this kind of unanticipated benefit, and contribute to economic growth and prosperity. A recent study by the Macroeconomic Policy Institute confirms what we have long known, that public funding of science has large economic benefits. Conversely, they found that “A 25 percent cut to public R&D spending would reduce GDP by an amount comparable to the decline in GDP during the Great Recession”. NSF is facing the possibility of 55% budget cuts. NIH faces 45% cuts. This year NSF awarded only half as many Graduate research fellowships. Postdoc fellowships are getting cut. Grants are being revoked arbitrarily. We need your support for maintaining, and growing, science funding, for the good of your constituents’ prosperity, health, quality of life, and yes, education and new knowledge.

Saturday, March 23, 2024

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists was a concept called the Adaptive Landscape. The reason for such exalted status is that the adaptive landscape was then – and remains – the only formal quantitative way to predict and interpret an adaptive radiation of few organisms into many. I was heavily indoctrinated into this framework - as my postdoc was at UBC during precisely the time when Dolph Schluter was writing his now-classic book The Ecology of Adaptive Radiation.

Adaptive landscapes come in several forms (e.g., genotype based, allele frequency based, phenotype based), and the one we are concerned with here is the “phenotypic adaptive landscape.” Perhaps the first serious description of this landscape was the one presented by George Gaylord Simpson in his 1953 book The Major Features of Evolution. In essence, Simpson’s landscape depicts phenotype combinations (e.g., population average values for two important traits, such as beak size and beak shape for birds) and the expected fitness associated with a population having those mean values. The resulting landscapes are expected to be “rugged,” with various peaks of high fitness separated by valleys of low fitness. The idea is that a lineage (e.g., of birds) is expected to diversify phenotypically into multiple species that each occupy one of those different high-fitness peaks, with few individuals having phenotypes in the low-fitness combinations between the peaks. From one species thus comes many in a predictable way by different species matching their traits to particular resources or habitats that represent high-fitness peaks on the adaptive landscape,

This adaptive landscape idea remained largely conceptual, or heuristic, until Lande (1976, 1979) figured out how to represent it mathematically. Then a bit later, Lande and Arnold (1983) showed how to link the adaptive landscape to formal estimates of natural selection at the phenotypic level – thus providing a means for formally estimate the landscape. This theoretical work spurred several decades of intense interest in attempting to quantify adaptive landscapes – or parts of them – for particular adaptive radiations. Yet the effort required turned out to be rather extreme for several reasons. Most critically, perhaps, the full landscape can be specified only if one knows the fitness of individuals across the entire range of phenotypic space (e.g., from small beaks to large beaks for every possible beak shape value from pointy to blunt). Yet, almost by definition, individuals with phenotypes of low fitness should be rare in nature - because they are constantly selected against; and so the fitness of phenotypes between species tend to be unknown in nature.

The Holy Grail

Owing to this problem of “missing phenotypes,” as well as other difficulties, I would argue that no formal adaptive landscape – in the Lande sense – existed for any natural radiation of organisms at the time I started my postdoc. Yet its predictive promise made it the Holy Grail of the time.

Although no adaptive landscapes had been formally estimated, some studies got part way there (see the Appendix below on “other landscapes I have known and loved”). As one example, Benkman (2003) measured the performance of crossbills on different types of conifer cones and used these estimates and other information to construct an “individual fitness landscape” spanning the range of phenotypes – that is, the expected fitness of an individual having each possible combination of beak trait values. As had been predicted, the landscape had peaks of high fitness near the average phenotypes of the different crossbill types – and those peaks were separated by phenotypes with low fitness – yet something was missing.

This figure is from my 2017 book.

In particular, a formal adaptive landscape relates MEAN phenotypes for a population (assuming a particular phenotypic variance) to the expected MEAN fitness of that population, which requires a particular conversion – as the gif below (made by Marc-Olivier Beausoleil) illustrates. When this conversion takes place, the fitness peaks tend to sink and the fitness valleys tend to rise – because the adaptive landscape averages fitness across all of the phenotypes in a population, which inevitably span a range of fitnesses on either side of a peak (or valley). That is, because a population adapted to a fitness peak will have some variation in phenotypes, the off-peak individuals will reduce mean population fitness relative to a population where all individuals were identical and had phenotypes EXACTLY on the peak. The inevitable is that adaptive landscapes based on population means are always smoother than those based on individual fitness values. As an example, the individual fitness peaks for crossbills shown above tend to disappear if converted to a formal adaptive landscape based on population means (C. Benkman pers. comm.)

So, as time went on, a formal adaptive landscape for an adaptive radiation in nature remained elusive.

My Galapagos Dreams

In 2002, I had the opportunity to visit Galapagos at the behest of Jeff Podos (UMASS Amherst) who had just received an NSF Career Grant that could fund my participation. I was really excited and read all of the classic books by Lack (1947), Bowman (1961), and Grant (1998). Yet, I still had no real practical experience with Galapagos or even with bird research. Thus, on my first visit, I decided to not actually conduct any of my own research but rather learn about the system – both by helping Jeff with his research and also simply walking around in the field making natural history observations that might motivate future experiments. A highlight from that year was the afternoon I spent in the field with the deacons of all things finch, Peter and Rosemary Grant.

One of the major efforts of Jeff’s team was to capture finches at a focal site (El Garrapatero), measure their beak traits, band them, and then relate those traits to song features and mating. Within a year, I had encouraged an increasing emphasis on using the banded birds to track inter-annual survival. When it was clear that this effort would be productive, I set the goal of – once and for all – estimating the adaptive landscape – in the formal way – for Darwin’s finches at this site.

My first step in this effort was to use the extremely-variable population of Geospiza fortis (the medium ground finch) at El Garrapatero to measure selection across their phenotypic distribution. This population had long been suggested to be bimodal in its beak size distribution, with somewhat distinct “large” and “small” beak morphs. My idea was that the fitness landscape should show a valley between the two morphs, such that intermediate birds have lower fitness – a selective function called disruptive selection. Using two years of data, low-and-behold, disruptive selection between the beak modes (and stabilizing selection around each mode) is what we found. This initial discovery was exhilarating but (1) we had only two years of data, (2) we had a potentially imprecise measure of fitness (survival over one year), and (3) we were looking at only one species.

This figure is from my 2017 book.

It took another 10 years of accumulated data – contributed by many team members and collaborators working at El Garrapatero – to solve the first of those problems. In particular, Marc-Olivier Beausoleil (graduate student of Rowan Barrett at McGill) was able to compile years of the most intensive data collection for G. fortis to show that, yes, disruptive selection was always working to maintain some separation of the two morphs (large and small) – and that the intensity of this selection could be explained by weather conditions. Specifically, disruptive selection was strongest when dry periods followed wet periods – probably because many fledglings were produced during wet periods which then increased competition (and hence mortality) during subsequent dry periods.

But these landscapes were still at the level of the individual (rather than population mean), and they still showed only one species, whereas the Geospiza radiation at this location has four species: the small ground finch (Geospiza fuliginosa), the medium ground finch (Geospiza fortis), the large ground finch (Geospiza magnirostris), and the cactus finch (Geospiza scandens). When would we have enough data for this more comprehensive effort.

After 17 years of data collection that included more than 3400 individual birds, we set out to give it a try. For a fitness surrogate, we chose longevity, which Peter and Rosemary had previously shown was strongly correlated with total fitness (note: we could not track reproductive output, such as the number of fledglings, in our population). Marc-Olivier Beausoleil led this effort and first calculated an individual fitness landscape, relating the fitness of individual birds to their individual phenotypes across the entire dataset. As expected, and as shown in other studies (crossbills, African seedcrackers, and more – see Appendix), the individual fitness landscape was rugged, showing clear peaks and valleys. Further, the average phenotypes of each species (and the two morphs of G. fortis) were situated fairly close – in phenotypic space – to the estimated peaks of high fitness.

Would this reasonable and logical landscape hold up when the individual fitness landscape was converted to a formal population mean adaptive landscape. We held our breadth – and I was skeptical given how no study of an adaptive radiation had been comfortable doing so before. As expected, the landscape was smoother and a peak or two sank to the point of obscurity: yet, remarkably, the adaptive landscape still had peaks and valleys and the mean phenotypes of each species (or G. fortis morph) was reasonably close to a peak of mean fitness. IT WORKED – and it only took 20 years of effort!

So, what did we learn from all this work. Well, for starters, one can – with enough effort – estimate a formal adaptive landscape for a real adaptive radiation in nature. Second, these landscapes do have the expected peaks and valleys even when using MEAN phenotypes and fitness – they are rugged indeed! Third, evolution of the various species and morphs seems to follow the estimated landscape, with about as many phenotypic modes (species or morphs) as estimate peaks and with the mean trait values of each close to a different peak. Yet some deviations from these expectations were also seen – note in the figure aboe how the circles (mean phenotypes) are always displaced a bit off the peak. At this point, we expect the deviations arise from our incomplete fitness surrogate: longevity. Perhaps, for instance, the birds that live the longest had the fewest offspring (or, stated the other way around, birds that have the fewest offspring live the longest) – as has been documented in some species.

Was it worth it?

The first ten years of effort were accompanied by high optimism that “this could work.” But, then, as various constraints and funding issues came into play, fatalism set in: “well, it was a nice idea but impractical.” Then I kind of forgot about it for several years while the data stream stayed alive – until Marc-Olivier came in and cleaned up the database and applied his statistical wizardry to G. fortis. Then hope sprang again that we could do the full adaptive landscape and here we are. Despite the effort, I think the accomplishment by the entire team of field organizers, crews, funders, and analyzers is quite remarkable given that – to my knowledge – this is the first formal adaptive landscape estimated for an natural radiation of local species. It has been 25 years coming for me – but we made it.

I should note in closing that the adaptive landscape is more like a Rosetta Stone than a Holy Grail. First, different evolutionary biologist would likely search for different Holy Grails – but, of course, there can be only one. Second, the phenotypic adaptive landscape links phenotypes, fitness, and evolution – and thus is something of a translation between these traits.

Finally, I suppose the real Holy Grail is to link not just phenotypes and fitness across all species in a radiation (or part of a radiation) but to also integrate individual genotypes. Did I mention that Rowan Barrett’s team (with Marc-Olivier) have more than 400 individual whole genomes for these birds – via collaborators in Switzerland (Daniel Wegmann). Stay tuned to this channel!

Appendix: Some other landscapes I have known (and loved) for vertebrates:

1. 1. Smith (1993) estimated a single-trait individual fitness landscape for African seedcrackers, which showed disruptive selection between two beak size morphs – and thus inspired my initial analysis testing for the same thing in G. fortis at El Garrapatero.

2. 2. Schluter and Grant (1984) used seed size distributions on different Galapagos islands to estimate the beak sizes of Darwin’s finches that would be expected to evolve. This landscape was inspiring for our own efforts on finches – in part because it examined all of the ground finch species – as opposed to only single species.

3. Martin and Wainwright (2013) used hybridization between pupfish species to increase phenotypic variation and then estimate individual fitness landscapes in natural environments – albeit in enclosures. Fortunately, hybridization in our finches inflates the variation naturally – precluding the need to perform artificial crosses (which would be impossible anyway).

4. 4. Arnegaard et al. (2014) performed a similar study the pupfish work by hybridizing threespine stickleback populations and testing their fitness in artificial ponds. The additional innovation in this second study was to link the traits in question to genomic regions.

5. 5. Stroud et al. (2023) estimated fitness landscapes through time for a community (although not a radiation) of Anolis lizards on a very small island.