Eco-Evo Evo-Eco: 2018

Sunday, December 23, 2018

What's the worst that could happen?

By Dan Bolnick and Steve Brady with contributions from Easton White

To be a trip leader for Outward Bound, NOLS, or any other outdoor adventure educational program, you need to have wilderness first aid training. As the responsible adult, you are the first responder to handle an emergency in remote locations, potentially hours or days away from professional medical help. Many of this blog’s readers are organismal biologists who also work in remote locations, similarly far from medical help. Yet, unlike outdoor leadership jobs, most departments don’t require first aid training to do field work, or to lead a field expedition. Accidents happen, sometimes fatal accidents.

Anecdotes

To make this less abstract, here are some personal anecdotes. If you prefer, skip to the section below, “What we are advocating”.

Dan Bolnick: In 2013, I was running a large (~25 person) field crew for two projects in British Columbia. One day I left both sub-groups to scout on my own, looking at new research locations. It was lovely to be in the field with some “me time”. Quiet. I drove to the northern tip of Vancouver Island, stopping here and there to look at lakes and streams for stickleback. I saw more black bears than I saw cars that day. It was great. At the end of a long day, I pulled back into the camp where we were staying in Port Hardy, and was surprised to see none of the crew was there. Odd. Then someone pulled up and told me to go to the hospital. An undergraduate on the crew, Cole Thompson, had been walking through thick underbrush by the side of a lake. While jumping from log to log, he had jumped onto a side-branch of a log. It penetrated the sole of his rubber chest waders, and impaled his foot (gruesome photo at the very end of this blog post, be warned). The postdoc in charge of that team, Yoel Stuart, did the right thing. He got Cole onto dry land, irrigated the wound and cleaned it of wood chips as best he could, sterilized it with iodine solution, bound it, elevated it, treated Cole for shock, and drove him an hour to the nearest hospital. We filed accident reports with UT. Several surgeries later, over months of time, Cole made a recovery. But it could have been worse: could have become infected, or resulted in lost blood pressure (shock). I give a lot of credit to Yoel and the rest of the crew for not panicking, and handling the incident professionally. It helped that everyone that summer had taken a 16-hour wilderness first aid class before going to the field.

That’s probably the worst incident on a field research outing for my lab. But we’ve had:

1. fractures (a broken arm when someone tripped over a root while walking down a dirt road),

2. cuts needing stitches (from a razor used to cut plastic zip-ties),

3. a sprained ankle (while carrying a canoe down a rough dirt foot path),

4. a capsized canoe on the far side of a large lake on a cold and very windy day, that verged on hypothermia

5. mild hypothermia from extended snorkeling in cold water,

6. a subcutaneous bacterial infection from having hands in lake water too often, which turned into a nascent ‘flesh-eating bacteria’ infection that luckily was identified very early and treated aggressively at the ER.

And my lab is not especially accident prone, and these are not exceptionally bad events. While traveling in remote places (for work or personal travel) I’ve seen people unaffiliated with my work who were bitten by a poisonous snake (in Zambia), run over by a pickup truck (Tanzania), and fatally gored by a buffalo (Mana Pools, Zimbabwe). I’ve had friends & acquaintances killed doing field work of their own (hit by a bus in Sulawesi; capsized small boat in the Sea of Cortez in Baja California; disappeared while swimming in a lake in the Amazon).

Steve Brady: In 2008, I was a new PhD student eager to figure out how road salt and other runoff pollutants might be affecting frog and salamander evolution in roadside “vernal pools”. The study system was very familiar to me. Most everyone in my advisor’s lab worked in these types of pools. I had already spent two fields as a masters student dashing through the woods and wading around these habitats. They felt like home. Places full of wonder and excitement, for sure. Exhaustion and mosquitoes were part of the deal, but no serious hazards or dangers were all that apparent. After all, these pools are typically small and shallow, often (but not always) waist deep or less. Apart from ticks and poison ivy lurking in between pools, these charming sites seemed to pose very little risk to anyone’s safety. Indeed (at least at the time), no formal training was required to conduct fieldwork and no special training was offered for working in these pools. (More on this later.) What’s more, these ponds would certainly not be considered wilderness locations by any definition, particularly the roadside ones. I was not cocky about working alone in the woods, but I was never all that concerned about it. After all, no one else seemed to be terribly worried about such things. Besides, it was just fieldwork – what was the worst that could happen? Lots.

It was the middle of winter, late January or early February. I decided to search for suitable pools for my first field season by surveying a whole bunch of them for road salt contamination. Easy enough: an axe to chop a hole in the ice, a conductivity meter to effectively measure saltiness, and a pair of waders to stay dry from snow, moist ground, and a splashing axe. Most of the pools I found were frozen solid and easily supported my weight. It happened that one pool was, unbeknownst to me, not so frozen in the middle despite its solid appearance. While standing on the ice in the middle of this pool, I suddenly broke through. As it turns out this was not a shallow pool. My arms instinctively flew out to my sides as I fell, stopping me just barely from plunging through deeper than armpit height. My feet never touched bottom. Fortunately, my waders never had a chance to fill because I managed to lurch myself out as fast as I fell in. Pure adrenaline no doubt. The whole thing was over before it started. And I was fine. I’m probably more shaken now writing this and realizing what a close call it was than in the moment, when I brushed it off because of the outcome. But the whole thing could have turned out much, much worse. It scares me to think of what would have happened if my waders filled. Because I know what would have happened. I was alone. Out of sight of the road. No one was there to help me. No one even knew where I was except to say somewhere in CT searching for pools. Disaster was very narrowly avoided, purely by luck.

Bigger picture:

These are not isolated incidents. Here are a few observations pointing to a broader pattern that should worry us:

1) There is a blog that keeps a list of field biologists, naturalists, and conservationists killed in their work (https://strangebehaviors.wordpress.com/2011/01/14/the-wall-of-the-dead/).

2) There’s a peer reviewed article on job-related mortality of wildlife workers in the US (https://www.jstor.org/stable/3784446), recording 91 deaths in a 63 year period. Most of those were aviation accidents, which are in general likely to happen far from medical help unless you crash near your origin or destination. These are non-trivial numbers, and they do not cover severe accidents that people survived.

3) An unscientific twitter survey on December 21 2018 by Dan Bolnick (https://twitter.com/DanielBolnick/status/1075846068510822400) received 480 responses. 49% of responding field biologists reported not having experienced a serious medical accident on their field work teams (note: a proper survey would stratify by age & experience). But 27% of responses reported one incident in the field that required medical attention. 24% of responses reported more than one incident. The lesson here is that at the average career stage of Dan’s twitter network, half of field biologists have had to handle a medical emergency in the field. The point is, you should EXPECT to have something happen that requires first aid. Maybe you’ll be lucky and be in the half that don’t. The twitter survey also received a lot of specific anecdotes, which we reprint at the end of this blog post.

A blithe dismissal of the need for wilderness safety?

It is worth reflecting on how we got here. Or at least the aspects that need attention. It seems that in general, we as field biologists do not require much in the way of safety or medical training for fieldwork. There are of course exceptions to this generality. We were told NSF Polar Programs has required a First Responder course and medical exams. The Mountain Lake Biological Station, and OTS, have had various levels of first aid requirements. But we can probably all agree that in general, we as field biologists operate in a culture where wilderness safety and medicine are really not considered a core part of what we do. Perhaps we like the autonomy too much. Or perhaps we don’t think of our work as being particularly dangerous. Or perhaps it’s because we are not ‘guides’ in a traditional sense and therefore the liability and responsibility is less apparent. Of course, this prompts questions concerning the roles of PIs and institutions. And this is an area we should all hope to clarify.

This lack of special training surprised us because in many other contexts of outdoor pursuits, trainings and certifications were required. For a fair part of our lives, we've spent much of our free time in the wilderness. We were (are) climbers (Steve was formerly a guide with some AMGA certification) and backcountry skiers. We’d completed numerous trainings and held various certifications in wilderness medicine, self-rescue, and outdoor leadership. As undergrads, our ‘Wilderness Programs’ required trainings to become student trip leaders, with different levels and specialties of training corresponding to different levels of responsibility (e.g. day trips versus overnights) and specific activities (e.g. sea kayaking vs. winter backpacking vs. mountaineering). All of which to say is that we were well trained on standards of safety and medicine in the wilderness. The comparative absence of such requirements in field biology continues to surprise us.

What we are advocating:

The purpose of this blog post is to advocate for better training for field emergencies and risk management that is all too often lacking in academic field research. In outdoor leadership jobs, you wouldn’t think of leading a group into the deep woods, mountains, or across a big lake, without first aid training. Why is it okay for scientists to go to the same places, also for work, without even modest training?

We realize we are calling for more training, when many of us are tired of a culture of excessive red-tape and training in academia. Having just started new faculty jobs in the past year, both of us have had to sit through many hours of lab safety, chemical safety, sexual harassment, and nondiscrimination training. These are all worthwhile goals to be sure, but they also add up to a significant use of precious time. So why advocate for more?

There’s a simple answer: whether you are a graduate student, postdoc, or faculty member, when you are running a field research project you are the responsible supervisor. You have paid or unpaid assistants, perhaps. Maybe a peer. Maybe you are alone (something we’ve both done, and don’t advocate). There could be a lightning strike (which kills more people outdoors than any other cause). Or a car accident, or plane crash. Bear attack. Or just a bad fall. A burn when someone trips into a firepit at the campsite (I’ve seen 3^rddegree burns on someone’s palms from exactly this). If something happens, what is your ethical and legal liability? When someone gets severely injured, are you prepared to stabilize them until medical professionals arrive? How will you do that? How do you reach out to get help? How long will help take to reach you? Is that enough time?

In emergency medicine, there’s the idea of the golden hour (Fig 1). That’s the period of time, after a traumatic injury or medical event, in which emergency intervention is most effective at keeping someone alive. So, if you are within half an hour of a hospital by ambulance, you have a chance at treating someone within the golden hour: up to 30 minutes for the ambulance to reach you, and 30 minutes to get to the hospital. But, if you are more than 30 minutes from the nearest hospital, and especially if you work outside cell phone range (as we often do), then you ARE the first responder. The golden hour treatment to keep someone alive? That’s your job now. Are you ready?

(https://en.wikipedia.org/wiki/Golden_hour_(medicine))

Are you responsible? Liability considerations

We aren’t saying you need a Wilderness Emergency Medical Technician (EMT) month-long training to go for a casual walk in the woods with your friends. Most people don’t have serious first aid training, and when they do most trainings are not tailored to remote outdoor settings with their unique considerations (unusual accidents like hypothermia, lack of ambulance access, etc). And that lack of training does not stop us from going hiking, canoeing, rock climbing, SCUBA diving, as recreation. What if we bring our binoculars with us on a recreational hike and do some natural history along the way? What if we happen to collect some specimens? At some point though we transition from personal recreation (you are responsible for your own wellbeing), to a professional activity. It seems fair to say that the dividing line should be when the primary purpose of the outing is to achieve a work-related scientific objective. And the real distinction arises when you shift from a mere outing with friends and peers, to one in which you have a supervisory role: students or employees or even peer colleagues who are there to help you achieve your work goals or their own goals under your mentorship. Then, they are relying on you, in part, for their safety. You are obligated to do your best to provide adequate training in advance so that all parties can be safe in the field and manage the scene when accidents happen.

Prepare for the worst: Recommendations

We suggest the following list of actions field researchers should consider doing. Some might become institutional expectations, some might not. But all are recommended.

1) Above all else, we recommend professional training for at two people per team in the field. There is no substitute for such trainings when it comes to preparednessMany excellent options exists. The authors are not endorsing any in particular, but we both have taken courses through SOLO and suggest that their website is a good starting point to see what types of wilderness medicine trainings are available. That should ideally include CPR, stabilizing spine injuries, wound treatment, and anything especially likely in your field location (for the Bolnick lab, that would include water-related and cold-related injuries, luckily neither has actually been an issue). There are many excellent training resources ranging from weekend courses to week long courses, to month-long Wilderness EMT classes (which both authors of this essay have taken). See https://en.wikipedia.org/wiki/Wilderness_medical_emergency#First_aid for some details on training options.

2) Use some form of a ‘time control plan’. Notify someone where you are going to be. Talk to them about when they should expect you back, when to get nervous if you are not back, who to contact to initiate a search, and where/how to find you. In the field in British Columbia, the Bolnick lab uses a white board where we list each subgroup’s outing: what sites they are visiting that day, when they expect to return. Brady encountered similar procedures when working for King County.

3) Everyone should have emergency contact information: local law enforcement and medical responders, University administrative and emergency contacts, Medical insurance contacts, and next of kin for all participants.

4) Participants should discuss safety with their supervisor. What are the risks, how should they be minimized. What safety equipment is needed? Ideally, write this out and have people sign this to indicate they are aware of the risks associated with field work. Note that liability waivers are pretty much useless. Unless drawn up by a lawyer, they will probably not stand up in court. Waivers don’t cover negligence on the part of the person in charge, so if an accident is a result of your negligence, that waiver won’t matter. However, negligence is notoriously difficult to establish in wilderness situations, for what that might be worth. But it is a good idea to be sure that participants are aware of the basics of risk. When the Bolnick lab goes to the field, I check to make sure everyone can swim in case a canoe capsizes, they are aware of the risks of hypothermia. We discuss the risks of going jogging alone in mountain lion country. We especially talk about whether the students’ US insurance covers them out of the country.

5) Get information on medical conditions of group members that you need to know of: allergies to antibiotics, or foods, for instance. These are privileged medical issues, and you cannot require someone to reveal what medicines they are taking. But you can ask and explain why. Keep this information on hand in case you need to deliver an unconscious person to a hospital.

6) Don’t go out alone. Its tempting. We’ve done it. Help can be hard to find, and can be expensive. But consider the following story of someone getting caught in an old beaver trap while wading in a pond. Had they been alone, it would have been much much worse.

7) One of the biggest recommendations we make is that departments work with risk management offices at your universities to arrange first aid course offerings tailored to field researchers, and require these courses of students and postdocs and faculty working in remote locations. Coordinate with other departments with similar concerns (geology, anthropology, geography, etc) for economies of scale. A twitter survey follow-up to the one mentioned above asked whether departments required first aid training for field work. Of 51 responses, 86% said no (https://twitter.com/DanielBolnick/status/1076145667368730625).Most people seem to recognize some training is helpful though, and despite the lack of requirement many people do seek training. Another follow-up poll (https://twitter.com/DanielBolnick/status/1076146256811048960) asked about people’s training: of 69 responses, 28% had no training, 17% had CPR only, 33% had less than a day of training, and 22% had wilderness first responder or more (e.g., wilderness EMT). We recommend at least a 2-day course equivalent to the Wilderness First Aid courses by WMA, Solo, or NOLS (e.g., http://www.wildmed.com/wilderness-medical-courses/first-aid/wilderness-first-aid/).Beyond covering the material, the courses are important as they run participants through realistic scenarios. These scenarios solidify the material, teach participants to remain calm in stressful situations, and have people work in teams to solve problems.

8) With first aid training in hand, you need supplies. Every field outing should have a well-stocked first aid kit. Here again, numerous resources exist. Several good examples of first aid kits for fieldwork can be found here: https://store.nols.edu/collections/first-aid-supplies/first-aid-kits. And some key additional items (e.g. epipens) are listed here: https://ehs.berkeley.edu/first-aid-kit-field-safety. Context will of course dictate additional items that might be key for certain areas (e.g. venom extractor kits, malaria treatment, heat packs and reflective emergency blanket, etc.). Wilderness medicine is largely defined by a lack of access to good medical equipment. In addition to your first aid kit, everything on you or in your environment becomes a resource. A t-shirt becomes a sling and a stick is used to build a leg split.

9) Have a mechanism for reaching emergency personnel from a remote location if needed. Where is the nearest hospital? How do you reach EMTs? How do you get to the hospital? What is your evacuation strategy from the field site, and from the region overall?

9. Have a plan for evacuation.

10. Other trainings may be warranted based on the particulars of your research site and activities. These might include boat capsize recovery, fast-water rescue, SCUBA emergencies, tropical medicine, etc.

11. Health is more than physical well-being, and medicine is more than treating cuts and breaks. Mental health crises happen in the field. The best preparation is knowledge: know what mental illnesses are, how to help people with them, who to contact for professional help. If possible, know who in your team might be at risk, but mental health patients often prefer to not share this information, out of fear of stigma.

12. Sexual harassment happens in the field. This presents unique challenges in the field, where victims may be unable to get away from harassers, and may be unable to contact help. What would you do? If you are the PI and have a crew in the field without you, have you talked to them about how to reach you? One summer I (DIB) had a period of time where there was one male graduate student and one female undergraduate in the field for a week without others around. I trust that graduate student deeply, but it still made me nervous. I ended up having a very blunt talk (separately) with both of them about what is expected, what is not acceptable, the consequences of violating that, and (for the undergraduate) what to do if the unacceptable happens. That was one of the hardest conversations I’ve had, but in retrospect I’m relieved I did it (nothing happened).

13. Have a plan to deal with the worst case scenarios. If you think through it in advance, you will minimize damage and stress.

What next

Ultimately, we are advocating for a cultural shift, in which liability is clear and appropriate training measures are in place and fully supported by institutions and funding agencies. We might all shutter at the prospect of new policies and fear for how they might impact our work. But such concerns should be allayed at least in part by the fact that many professionals have already worked out these details, allowing them to operate more safely and with an understanding of responsibility and liability. Crucially, let’s not forget that the main point here is that we can all do better to keep ourselves and our personnel safe in the field, and that we need a health system in place to support this goal.

For immediate actions, we think departments should require that at least some participants in remote field research (> 30 minutes from a hospital) have some first aid training. At a bare minimum, offer first aid courses, subsidized if at all possible. Field workers should discuss safety, and develop documents with necessary information so you have it on hand when there is a problem (e.g., points 1-4 above). This seems like an imposition, extra work. It is. But it is easier than dealing with the consequences of being underprepared and someone getting hurt, or killed. And injuries and deaths both happen.

More stories from twitter, in no particular order:

And here's the promised gruesome photo of a puncture wound suffered by an undergraduate researcher in British Columbia:

Wednesday, December 19, 2018

(Mis)adventures with P Values

In our multi-lab paper reading group at McGill, we have been reading "classics" - or various related engaging papers. Our most recent one was The Cult of Statistical Significance by Ziliak and McCloskey, in which the authors argue for the stupidity and damage caused by adherence to inference from p values - as opposed to effect size. During the discussion, a couple of profs (myself and Ehab Abouheif) related a couple of stories about our adventures with P values. Ehab's story was very interesting and so I asked if he could write it down. Here it is - followed by mine:

Ehab working on Rensch's Rule at Concordia.

By Ehab Abouheif

Today, I presented a paper by Zilliak and McClossky called the “Cult of statistical significance,” where they argued that the Fisherian 5% statistical cutoff was misleading researchers and leading to fatal outcomes. I was first exposed to Zilliak and McClossky’s work during my last sabbatical by the late Werner Callebaut, the then Director if the KLI Institute, and found it fascinating. It brought me right back to my days as an MSc student, when I struggled with the meaning of statistical significance in biology.

As an MSc student in Daphne Fairbairn’s lab at Concordia University (Montreal) during the Phylogenetic Comparative Method revolution in the 1990’s, I was fast at work collecting comparative data and getting my hands on as many phylogenetic trees as I could from the literature. I spent hours scouring journals in the Blacker-Wood Library at McGill University for phylogenies of birds, spiders, primates, water striders, reptiles, snakes – you name it – and I was looking for it. Back then, phylogenies were hard to come by, but evolutionary biologists were pushing hard to incorporate any kind of phylogenetic information, in part or whole, into their comparative analyses. After two years, I published my first paper ever using the comparative method (Abouheif and Fairbairn, 1997, A comparative analysis of allometry for sexual size dimorphism. American Naturalist 149: 540-562). We coined the term Rensch’s Rule, and today is one of my most highly cited papers.

This was my first taste of statistical inference and evolutionary biology up close and personal. While this experience left me excited about evolutionary biology and I was ready to jump into a PhD, I was left with many questions about the comparative method, one could say even doubts, about the big assumptions we were making to account for evolutionary history while testing adaptive hypotheses. It felt like we were introducing a great deal of statistical uncertainty in the process of trying to achieve statistical precision.

In 1996, Emília P Martins has published a method of accounting for evolutionary history when the phylogeny is not known. In other words, she devised a method to account for phylogeny for the group we are working with even if the phylogeny for that group is unknown. Emília’s method randomly generated phylogenies and analyzed the data on each random tree, and took the average of all random trees as the parameter estimate and gave confidence intervals around this average estimate. I thought this was brilliant, and I had always admired Emília's pioneering work on the Phylogenetic Comparative method. I was really curious to see how this method would perform on my MSc data. Would I come to the same conclusions about the patterns and significance about Rensch’s rule if I assumed no knowledge about the phylogeny? This question consumed me during the early days of my PhD, and so I started reanalyzing all of my MSc data using Emila’s method. It brought flash backs of the Blacker Wood library at McGill and all the dust I had to breath in for the sake of science.

Ehab and Daphne Fairbairn

Several months later, the answer finally came. The average of all random trees was … almost the same as not using a tree at all! Somehow the trees were ‘cancelling each other out’ and their average give a similar estimate as not using a tree at all. The difference was in the confidence intervals, which had been inflated dramatically because they were taking account phylogenetic uncertainty. For example, for water striders, a group where the statistical power for estimating the slope and the statistical difference from a slope of 1 was very high (0.998), we had estimated a slope of 0.859 with 95% confidence intervals of 0.805-0.916. Using random trees, the slope was 0.946 and the 95% confidence intervals were between -12.6 and 14.4! I published this results in Evolution in 1997, and needless to say, Emília was not very happy. I have a an enormous amount of respect for Emília’s work, but this was about something larger. In an attempt to achieve greater precision and reduce statistical errors when not accounting for phylogeny, we introduce another, perhaps larger error: not recognizing real patterns in nature that actually exist!

Ever since, I have kept with me, a healthy skepticism about the statistical significance testing and the Fisharian 5% cut off. Patterns should at least be weighted equally with the statistical significance of that pattern, but trying to convince my own students has been hard, and the journals even harder! Thinks are starting to change though, and so I am hopeful for the future. Thanks Andrew Hendry for getting me to write this, it brought me back a good number of years, and made me go back and read my own paper. I was pleasantly surprised, and realized how far back our thinking goes!

By Andrew Hendry

I first started to work on threespine stickleback during my postdoc at UBC in 1999-1999. As a diligent postdoc working on a new system, I read a bunch of papers about stickleback - especially those by the folks at UBC with whom I would be interacting. One of those was Dolph Schluter 1993 Science paper Experimental Evidence that Competition Promotes Divergence in Adaptive Radiation. This paper included the now-famous and oft-republished in textbooks and reviews figure showing that when a competitor was added, the slope of the relationship between traits and fitness (growth) changed in predicted ways.

By chance I happened to see the commentaries written about this paper in Science. Some were based on interesting conceptual questions about whether the manipulation was the right way to do but others were more narrowly focused on the statistics saying, in essence, your P value is wrong:

To which Schluter replied: "ooops ..."

My point here isn't to criticize Dolph - he is an exceptionally insightful, rigorous (including statistically), and helpful scientist. Instead, it is to point out the particular attention paid to P values and whether they do or do not exceed some alpha level. If Dolph had calculated his P value correctly, he almost certainly would not have submitted his paper to Science - and, if he had, saying "My one-tailed p value is 0.11" - it never would have been published. Thus, a simple mistake in calculating a P value, led the publication of a paper that has proven extremely influential - it has been cited nearly 500 times.

Closing notes:

P values provide little insight into well, anything. Consider these illustrative arguments:

1. Imagine you are comparing two populations to see if their mean values (for anything) are significant. The truth is that no two populations ever have the same mean trait value in reality - the population PARAMETER is never identical. Hence, the test for significance is pointless - you already know they differ. All a statistical test does is reveal whether your sample size is large enough to confirm what you already know to be true - the means differ.

2. Most statistical tests involve a test for whether the assumptions of that test are met. The reality is that they NEVER are. That is, no distribution of residuals is ever normal (or any other modeled distribution) and homoscedastic. If you fail to reject the null hypothesis of normality and homoscedasticity, it simply reflects low power of your test to reveal reality.

3. Ever see a paper that calculates P values for a parameter in a simulation model? As the parameter is always (by the mechanics of a model) having an effect, a P value simply reflects how many simulations you have run. Want your parameter to be significant - simply run more replicates of your simulations until it becomes so.

Coda

What matters is effect size (difference between means, variance explained, etc.), your certainty in that estimate (credibility intervals, AIC weights, sampling distributions, etc.), and the extent to which uncertainty is due to sampling error or true biological variation (closely related to sample size).

Of course, oceans of ink have already been spilled on this, so I will stop now and suggest just a few additional resources and tidbits:

1. My post on How To Do Statistics

2. Steve Heard's Defence of the P value

3. Dan Bolnick's coding error that made a P value significant when it wasn't that then led to a paper retraction. (The picture might show the moment he realized his P value was wrong.)

Sunday, December 16, 2018

Abiding in the midst of ignorance

"Abiding in the midst of ignorance, thinking themselves wise and learned, fools go aimlessly hither and thither, like blind led by the blind." - Katha Upanishad (800 to 200 BCE; ancient Sanscrit writing with core philosophical ideas in Hinduism and Buddhism) *(see footnote)*

Double-blind* review is widely seen as a positive step towards a fairer system of publication. We all intuitively expect this to reduce implicit bias on the part of reviewers, making it easier for previously underrepresented groups to publish. But, is that true? Not surprisingly there's a fair amount of research on the impact of double-blind. Here are a few links:

The Case For and Against Double-blind Reviews is a new BioRxiv paper that I learned about on Twitter, which finds little benefit. But, the paper isn't well replicated at the level of journals, and lacks information on submission sex ratios.

The effects of double-blind versus single-blind reviewing is a classic 1991 experimental study from economics, which found that double-blind led to more critical reviews across the board, equally for men and women, and lower acceptance rates. The strongest drop in acceptance rate was for people at top institutions.

In contrast "Double blind review favors increased representation of female authors" followed the 2001 shift by Behavioral Ecology to double-blind review, finding an increase in female authorship. But again its not clear whether this is an increased acceptance rate, or an increased submission rate.

Then there's a meta-analysis of the issue that found fairly ambigious evidence, though with some evidence of bias (especially in favor of authors from top-ranked universities).

In short, the literature on the literature isn't a slam dunk. Most people tend to agree that double-blind is a good thing. There are some administrative costs (time editorial staff spend policing the system), and time authors spend blinding their own work. But all in all it seems worth it. Indeed, some people won't review for journals that don't do double-blind. (although some people refuse to review when it IS double-blind, a catch-22 for journals who just want to find good qualified reviewers).

To wade into this, I wanted to offer a bit of data from The American Naturalist. The journal went double blind in early 2015, one of the earlier journals in out field to do so, but not the very first to be sure. There's a blog post by Trish Morse in Nov 2015 summarizing the results in the first 10 months. I have the luxury of being able to reach into our historical database and see what double-blind review is doing. The journal is especially valuable in this regard because we have an opt-out double-blind review policy. Our default is to go double-blind, but authors may choose to ignore that, and some do. This gives us an imperfect but useful comparison between those who do and do not opt for double blind. What's their acceptance rate, how does this depend on gender of the first author?

Methods:
I'm lazy, doing this on a Sunday afternoon while my kids are watching a movie, so forgive me some imperfections here. This isn't a peer-review-ready study.

I looked back at the 500 most recent acceptances at AmNat. The actual number is a bit less than this because some things in the database include editorials, notes from the ASN secretary, and so on, that I don't want to count. I also looked at the 500 most recent declines at AmNat (including Decline Without Prejudice that sometimes turn into subsequent acceptances, but I didn't have a simple way to trace this). I did my best to infer the gender of the first author based on their first name. I didn't count papers where I couldn't tell.

Note that because we decline more papers than we accept (20% acceptance rate), the ~500 acceptances covers a couple-year period, whereas the 500 declines were all from 2018. That's not ideal, I know, but see the first Methods paragraph above. That also means that the exact acceptance rates here are not exact values; it is their relative values (double-blind or not; male vs female) that are useful for us. Here's the data with marginal totals

		Male	Female	Total
Double blind	Accepted	205	136	341
	Declined	257	155	412
	Total	462	291	753

Opt out of double blind	Accepted	84	40	124
	Declined	69	24	93
		153	64	217

To digest this for you a bit, here are a few observations:

1) As we might expect, women were less likely to opt out than men

Proportion submitted
	Male	Female
Double blind	0.75	0.82
Opt out	0.25	0.18

This is significant (Chi-square test P = 0.017), though the effect size is modest (7%). This fits with the notion that double-blind is fixing a bias problem and should protect female authors whereas men are privileged to have the option to be identified without harm.

2) Double-blind papers are less likely to be accepted than opt-out papers (P = 0.002). This is partly because things like the ASN Presidential Address articles and other invited papers of award winners are by necessity not anonymous, and have a higher acceptance rate. But note that it seems like women get a stronger benefit from NOT double blind than men do (though this is not significant). So, there's clearly a cost to going double-blind which is it seems to hurt authors prospects overall. And the most widely-cited benefit (reducing bias against women) does not seem to be visible for our journal in the time period covered here. That matches the experimental study from economics from 1991, linked to above, which also found double-blind reduces acceptance rates. Here's the detail:

Proportion accepted
	Male	Female
Double blind	0.444	0.467
Opt out	0.549	0.625

Remember, these acceptance rates aren't the overall acceptance rate of the journal, because I chose to examine 500 accepted and 500 declined papers. But their relative values are telling: opt-out is the way to go, It's tempting to suggest that might be especially true for woman authors, but the gender by review-type interaction is not significant in a binomial GLM. So it doesn't seem to matter. There's no gender difference in acceptance rates in double-blind papers. There's no gender difference in acceptance rates in non-double-blind. But, double-blind reduces everyone's chances by a bit. Not much, but...

3) And now the bad news: we still aren't at gender parity in submissions. Overall in the time period covered by this survey, 36.5% of our SUBMISSIONS were by women first-authors. That's not good. I'm not sure what to do to fix this, because it concerns the pipeline leading up to our front door. So I;ll be continuing my attempt to be proactive at encouraging people, especially students, and women, to submit articles to us. The good news, I can tell them, is we have strong evidence there's no gender bias between submission and decision.

So, why should double blind reduce acceptance rates? That's odd at first glance. But as Editor I've received notes from reviewers. Some say they won't review something because its not-double blind. But quite a few have told me they prefer to review non-blind. They note that if they are aware the author is a student, for example, they are more likely to go an extra mile to provide guidance on how to improve the paper. Now, I would hope that we would do that for everyone. All our reviews should clearly identify weaknesses, and should point towards paths towards improvement. But the truth is we feel protective towards students, including other people's students. There's only so much time each of us can afford to put into reviewing (though a ton of thanks to the many AmNat reviewers who put their heart and soul into it). So we make decisions on how much time to invest in a given paper. Knowing someone is a student can make us be a bit more generous with our time, which might in the long run help them get accepted. When that information is obscured, perhaps reviewers become equal-opportunity grinches.

Interestingly, a year or two after AmNat went double blind, our Managing Editor and lodestar Trish Morse looked at the results. There was a dip in acceptance rates then, as well. It wasn't quite statistically significant, and we wanted to accrue a few more years' worth of data. But it matches what I find in the more recent data.

An alternative hypothesis is that people opting out of double-blind are famous, and influential, and more likely to get a pass. That fits a bit with the observation that men are more likely to opt out. But who are those people? I obviously won't name names. What I will say is that I was surprised. Sure, there were big names who opted out (some by necessity, such as authors of the ASN Presidential Address articles). But there were also many lesser-known authors opting out. And many big-names who went ahead with double-blind. In fact, only a small minority of the opt-out crowd were celebrity authors. Many opt-outs were actually by authors from non-US and non-EU nations who might not be as aware of the double-blind cultural trend.

To summarize: Double-blind seems to slightly reduce everyone's acceptance rate regardless of gender.
That matches results from Economics in the late 1980's. Not a strong endorsement of double-blind, which we tend to think of as fixing gender (and other) bias in publication. For the past 1000 decisions I don't see evidence of such bias. So did we adopt double blind to fix a problem that itself has faded away (# see footnote below)?

Some important caveats:
1) I didn't look at last author or other author names. I didn't categorize by career stage, or university.

2) As noted above, I sampled 500 accept and 500 decline, not over exactly the same time period.

3) AmNat practices reviewer blind. But the Editors (e.g., me, Judie Bronstein before me, Russell Bonduriansky, Alice Winn, Yannis Michalakis) and the Associate Editors can see author names. That's by necessity: someone needs to be able to choose reviewers without asking the authors or their advisors or closest colleagues to do the job. That requires knowledge of names.

4) This might be worth a more careful analysis and publication, but I don't have the time & energy to do that right now. And its not ethical to give someone else access to all our data on accept/decline decisions and author identities.

Footnotes:
* I have been told that the phrase double-blind is abelist, and upsetting to some visually impaired people. This is the phrase we have inherited and we discussed last year switching to doubly-anonymous or something like that, but once a term becomes entrenched it is hard to change.

# There are clearly other barriers still present, generating the strongly significant male-bias in the papers that come in our door. These need to be addressed proactively. So my comment that double-blind might be meant to fix a problem that has faded away refers only to the review and decision-making process as a statistical aggregate effect. I also recognize that if there were one or two sexually biased reviewers, affecting decisions on just a few papers, that would be undetectable in this statistical analysis yet still constitutes a problem worth fixing.

*Footnote: the Dude abides, also