Thursday, January 30, 2020

The Pruitt retraction storm Part 2: An Editor's narrative

This blog post documents my experience concerning retractions of  some of Dr. Jonathan Pruitt's papers. I am writing this from three perspectives. First, as Editor of a journal affected by the series of recent retractions by Dr. Jonathan Pruitt and colleagues. Second, as a one time co-author with Pruitt.  Third, as a friend of Jonathan's from long discussions about science at conferences and pubs over the past decade.

Disclaimer: this post represents my personal experience with minimal opinion. It is not the opinion of The American Naturalist, nor the University of Connecticut. This is not intended to cast aspersions or attack anyone.

A companion post will provide  a summary of the current state of retractions and validations of his papers [please email me updates at daniel.bolnick@uconn.edu],

A third companion post will contain reflections on what this all means for science broadly and behavioral ecology specifically.

Before diving in, I want to emphasize that parts 1 & 2 of this series are meant to be a strictly factual record of the sequence of events and communications that do not imply any judgement about guilt or innocence for Dr. Pruitt. For transparency, I should also reiterate that although I do not know Jonathan well, we have been academic friends for quite a few years.

1. A narrative of events

On November 19 2019, a colleague (Niels Dingemanse) emailed me a specific and credible critique of the data underlying a paper by Dr. Jonathan Pruitt and colleagues that was published in The American Naturalist, for which I am Editor In Chief.  The critique (from an analysis by Erik Postma) identified biologically implausible patterns in the data file used in the paper (Laskowski et al 2016). Specifically, certain numbers appeared far more often than one would expect from any plausible probability distribution. For specifics, see the recent blog post explanation by Kate Laskowski. The complaint did not make specific accusations about how the suspect data might have come to be.

The lead author on this paper, Dr. Kate Laskowski, had received data from the last author, Jonathan Pruitt. Pruitt does not contest this point. She had analyzed and written the paper, trusting that the data were accurate.  On hearing about the odd patterns in the data, she did exactly the right thing: she examined the data files herself very carefully. She found additional odd patterns that have no obvious biological cause. She asked Jonathan about these patterns, as did I.  His initial explanation to me did not satisfy us. He said the duplicated numbers were because up to 40 spiders were measured simultaneously and often responded simultaneously, and were recorded with a single stopwatch. The methods and analyses did not reflect this pseudoreplication, so Jonathan offered to redo the analyses. The new analyses did not recover the same results as the original paper. Moreover, the duplicated numbers were in fact spread across spiders from different time points, webs, etc, so his initial rationale did not explain the data. At this point, Dr. Laskowski decided to retract the American Naturalist paper because she could not determine how the data were generated. She obtained consent from her co-authors including Dr. Pruitt, on wording that included the acknowledgement that the data were not reliable (without specifying how), and that Pruitt had provided the data to the other authors. This retraction was then run past the University of Chicago Press publisher and lawyers, then copyedited and typeset, and published online in mid-January.

Dr. Laskowski also examined two other Pruitt-provided dataset, one for a paper she also lead authored with Pruitt in Proceedings of Royal Society B, and one for a paper she co-authored. The former paper is now officially retracted. Her analysis and request for retraction of this PRSB paper was concurrent with the American Naturalist one, and PRSB Editor Spencer Barrett and I were in close contact through this process. Problems with the third paper were brought to our attention on January 13, by Dr. Laskowski. The retraction of the third paper is being processed by the journal and we were asked to not publicize specifics until the journal posts the retraction statement.

My involvement in this process was to field the initial complaint, email a series of queries with Jonathan seeking explanation, and to accept the retraction of the American Naturalist article without passing judgement on the cause of the problems with the data.  However, once the authors requested the retraction (of both the AmNat paper and the PRSB paper), I consulted the Committee on Publication Ethics guidelines, in depth.  Four points emerged.

First, it is clear that when oddly flawed data lead to a retraction, the Editor is supposed to report this to the author's Academic Integrity Officer (or equivalent).  I contacted the relevant personnel at Pruitt's current and former institutions to notify them of concerns. Pruitt's current employer is best positioned to conduct an inquiry. It is not my job, nor is it even my right, to render judgement about whether data were handled carelessly to accidentally introduce errors, or whether the data were fabricated, or whether there is a real biological explanation for the repeated patterns. So I encourage community members to not engage in summary judgement and await the (likely slow) process of official inquiry.

Second, Editors of multiple affected journals are encouraged to communicate with each other (which I have done with Spencer Barrett of PRSB and other Editors elsewhere) to identify recurrent patterns that might not be clear for our own journals' smaller sample of papers.

Third, it seemed wise to investigate the data underlying other articles that Pruitt published in The American Naturalist, for which I am responsible as the journal's Editor. I asked an Associate Editor with strong analytical skills (which could be any of them) who is not tied up in behavioral ecology debates (e.g., a neutral arbiter) to examine the original concern then to examine the data files for other papers.  The AE put in an impressive effort to do so, and reported to me that at least one paper appeared to have legitimate data and results (on Pisaster sea stars), but other papers had flaws that to varying degrees resembled the problems that drove retraction. The Pisaster paper apparently involved data collected entirely by Pruitt, so make of that what you will, but we found no evidence of unrealistic patterns. Analysis and discussion concerning the other papers is ongoing, we have not yet rendered a judgement.  It is the author's prerogative to request a retraction, and in a desire to approach this fairly we are giving authors time to examine their data closely, exchange concerns with Pruitt (who is in the field with limited connectivity), before reaching a final decision on retraction. The Associate Editor also examined a few files for Pruitt articles at other journals, and found some problems which we conveyed to the relevant co-authors and journal Editors.

Fourth, it seems clear at this point that the data underlying a number of Pruitt papers are not reliable. Whether the problem is data handling error, or intentional manipulation, the outcome will be both a series of retractions (the two public ones are just the beginning I fear), and mistrust of unretracted papers. This is harmful to the field, and harmful especially to the authors and co-authors on those papers. Many of them (myself included) were involved in Pruitt-authored papers on the basis of lively conversations generating ideas that he turned into exciting articles. Or, by giving feedback on ideas and papers he already had in progress (Charles Goodnight, for example, is second of two authors on a Nature paper with Jonathan, having been invited on after giving feedback on the manuscript). Or, often they were first authors who analyzed data provided by Pruitt and wrote up the results.  These people have seen their CVs get shorter, and tarnished by the fact of retraction. They have experienced emotional stress, and concern for how this impacts their careers.  I want to emphasize that regardless of the root cause of the data problems (error or intent), these people are victims who have been harmed by trusting data that they themselves did not generate. Having spent days sifting through these data files I can also attest to the fact that the suspect patterns are often non-obvious, so we should not be blaming these victims for failing to see something that requires significant effort to uncover by examining the data in ways that are not standard for any of this. So to be clear, the co-authors have in every instance I know of reacted admirably and honorably to a difficult and stressful situation. They should in no way be penalized for being the victims of either carelessness or fraud by another whom they had reason to trust.

  As the realization dawned on me that (1) many people were going to be affected, and (2) they are victims, I felt that a proactive approach was necessary to help them.  Dr. Laskowski for example was seeing some of her favorite articles retracted, while she is junior faculty at a top-notch institution. For some of Pruitt's more recent students, the majority of their publication list may be at risk.  With this in mind, I agreed with Dr. Laskowski that public acknowledgement of the retractions was the best strategy (via twitter and her blog).  I was deeply relieved to see the intense outpouring of support, sympathy, and respect that she and her fellow victims deserve.

Fundamentally I believe that if we stigmatize retractions, we will see fewer of them and the scientific record will retain its errors longer than we'd like.  When mistakes are found, transparency helps science progress and move on more quickly. I experienced this myself when I had to retract a paper because of a R code error (it was the first paper I published using R for the analyses), and received very positive support for the actions (blog about that retraction is here). So I encourage you to continue to support the affected co-authors.

Because the first retraction came out in The American Naturalist, and because of Dr. Laskowski's tweets tagging me, I inadvertently became a go-to participant in the process. I have received numerous emails every day this January about data concerns, retraction requests, and related communications. The process has often engulfed half to all of my day several days per week. Most of these I responded to as I could, or forwarded to the relevant people (Editors, Academic Integrity Officers, etc), redacting details when the initial sender requested anonymity.  Analyses and discussion of some of the emerging concerns can be found here.

The Associate Editor I mentioned above went as far back as digging into some of Pruitt's PhD work, when he was a student with Susan Riechert at the University of Tennessee Knoxville. Similar problems were identified in those data, including formulas in excel spreadsheets where logic and biology would suggest no formula belongs.  Seeking an explanation, I had the dubious role of emailing and then calling his PhD mentor, Susan Riechert, to discuss the biology of the spiders, his data collection habits, and his integrity. She was shocked, and disturbed, and surprised.  That someone who knew him so well for many years could be unaware of this problem (and its extent), highlights for me how reasonable it is that the rest of us could be caught unaware.

Meanwhile, I have delved into the one dataset underlying my co-authorship with Pruitt (a PRSB paper on behavioral hypervolumes). The analytical concept remains interesting and relevant, so not all of that paper is problematic. But, the analyical approach presented there was test-run on social spider behavior data (DRYAD data) that does turn out to have two apparent problems: an unexpected distribution of the data (not as overdispersed as we'd think it should be for behavior data);  some runs of increasingly large numbers that do not make sense; the mean of the raw data file of 1800 individuals is basically exactly 100.0; there are many duplicate raw values; and an excess of certain last digits that data forensics suggests can be a red flag of data manipulation BUT IS NOT CONCLUSIVE. Neither problem is a smoking gun, neither is as clear as that of other articles. We have requested a response from Pruitt, who is traveling doing field work in remote locations at the moment, and are holding off on deciding to retract the paper until we see a response.



The last thing I want to say is that I am increasingly intrigued and troubled by the lack of first-hand wittnesses who actually did the raw data collection, and the lack of raw data sheets . If any one was an undergrad with Pruitt who can attest to how these data were collected, their perspective would be very very welcome. ****(I have since been contacted by two undergraduates who did collect data for Pruitt; they confirm that data were recorded on paper, that the experiments they were involved with were actually done).

-Dan Bolnick
January 30, 2020


Some follow up thoughts added later:
1. These investigations take a great deal of time per paper (The AmNat retraction took 2 months of back and forth and data examination and wrangling over retraction wording), there are many papers. Be patient and do not assume every paper is unreliable, please.

2. The co-authors did not catch the flaws in the datasets, it is true, but having been deeply involved in examining these data the red flags that have cropped up all were revealed by the kinds of analyses and digging through original data looking for duplicated runs of numbers, that are not habitual automatic things to do to raw data. Not having had reason to mistrust, what they did (proceeding to analyze the data ) was quite natural.


6 comments:

  1. hi Dan,

    i wouldn't dare to predict the distribution of the -last- digit of data in a dataset, especially after being mangled by excel, or to expect it to conform to Benford's law. That specifically uses the first significant digit. Have you checked the distribution of those as well?
    Secondly, i wouldn't check those distributions against a randomly generated sample in R directly; you'd have to export the generated sample to excel as well and then re-import it (and after that, you probably don't want to use excel ever again)

    thirdly, i'd say classical Bayesian rules apply. The question is not per se how weird the distribution is compared to a randomly sampled dataset, but what the probability is that a dataset with a distribution of this level of weirdness is generated by a ...non-biological process.

    cheers,

    anne (Niels is my husband, in case you're wondering)

    ReplyDelete
  2. additionally, i think "data fabrication 101" should be part of every undergrad statistics course as this is a bit of a disgrace. Still comes in second place after the student who just duplicated a datafile and altered the date.
    Passing the tests of the new R data forensics package will be useful for grading students' efforts, i guess.

    ReplyDelete
  3. Thanks for the post, it is well written and thanks for all the effort you've put into handling this mess. I agree that most - if not all - of Pruitt coauthors are victims here. However, please also spare a thought for the people crowded out of the field because their carefully collected and analyzed data couldn't compare to Pruitt's. Obviously we can't know with certainty who those were, but unlike Pruitt's coauthors who stayed in the field (plausibly at least in part due to their coauthorship), their careers will almost certainly never recover.

    ReplyDelete
  4. You can't compare the empirical distribution of numbers to a single random distribution in R. That's not statistically sound. You'd either define a statistic, calculate it on 100 random samples and compare - or, in this case where the behaviour is welldefined, simply check the p value of the empirical last digit data on a multinomial distribution.

    ReplyDelete
  5. There is also the two finger problem: many data entry devices (such as a key board) have the numbers arranged in a specific way that when some one enters data common errors creep in, as well as common signs of intention attempts to randomize.

    Since 1-5 on a key board are on the left hand, 6-0 are on the right a person moving quickly and suffering from 'wide-finger-errors' would create ab ba combinations where a,b are from 1-5 or 6-0 more frequently than normal. Additionally, when someone is quickly trying to mimic random, dominate pairs emerge. One such is left right, or right left hand pairings occurring too frequently. Then there is the dominate finger cases were numbers such as 53 or 35 are two fingers on the same hand which if the mimic does not think about occur with too much frequency. I have frequently tested my students on their keyboards and at the blackboard to emulate randomness. Slow exercises and fast exercises produce different signatures as well. Slow mimic-ed randomness demonstrates their desire that, they must not repeat as equateing to random, hence slowly produced random shows these tendancies while producing mimic-ed random quickly show the muscle memory issue of left-right , right-left and dominate finger pairs. It would not take much analysis to see these an other issues arise if enough mimic data is available to investigate.

    ReplyDelete
  6. On Twitter ...
    "Jonathan Pruitt
    @Agelenopsis
    Behavioral ecologist, avid gaymer, and fast talker"

    "Fast talker" should have warned this scientific community.

    ReplyDelete

#IntegrityAndTrust 5. With Data Editors, Everyone Wins

Maintaining Trust AND Data Integrity - a forum for discussion.  INFO HERE . #IntegrityAndTrust 5. With Data Editors, Everyone Wins. B...