The following is a guest post by Ken Thompson.
Recently I took the unusual step of publicly posting a technical comment about my own first-authored published work. Why not just publish a correction or a retraction? This warrants an explanation and I hope to provide that with this post.
The study’s premise was simple: identify species in the same plots using either (a) trained botanists identifying plants with morphology or (b) untrained workers who collect tissue to be sent back to the lab for molecular identification using DNA barcodes. My co-author, Dr. Steven Newmaster, provided me the data which consisted of (a) species identifications for both survey methods and (b) environmental data about each plot (e.g., aspect, canopy cover). I never saw any of the raw data or the DNA sequences, and the work was done when I was an undergraduate.
I have had concerns for a while but was too afraid to say anything. Related recent events motivated me to get to the bottom of it. I reached out folks who are familiar with COPE (Committee on Publication Ethics) guidelines (who I won’t name but you can name yourselves if you wish) and they advised me to reach out to the University of Guelph to investigate.
I did just that in February of 2020, and after launching an inquiry --- which took approximately eight months when it should have been less than one --- the University of Guelph declined to investigate further. Even as I transmitted additional information, they continued to see no reason to investigate further. They did not provide any justification, nor did they speak to me at any point.
I then went to the journal (Biodiversity and Conservation) and asked them to investigate. After several months, the Springer Nature Research Integrity Group decided that it was not their responsibility since the University of Guelph had already drawn a conclusion.
Having come up short with both the University of Guelph and the journal, I feel it is now prudent to share details publicly. I submitted a technical Comment on bioRxiv as a pre-print to publicly discuss in detail my issues with the data, and at the same time submitted the Comment to the journal for formal review. BioRxiv declined (as a matter of policy) to post a Comment on another paper, so I have posted it on my own platform.
My technical comment makes four points which I will briefly summarise here. First, although we claim the study was done at a particular site in Timmins, ON, evidence I gathered suggests that the data are from sites that are over 500 km away. Second, we did not archive the molecular data as we claimed in the paper.
Before listing points three and four there is one additional important piece of information. After being alerted that I was looking into the data, my co-author began uploading thousands of DNA sequences to GenBank that are associated with the paper and my name. These data are key to points three and four.
The third issue I outline in my technical comment is that, using the data recently uploaded to GenBank, I cannot reproduce some of the key results of the paper (i.e., distinguishing congeneric species). Finally, I show that the recently uploaded GenBank data is unexpectedly similar to the data uploaded from an independent study at the University of Guelph.
I have submitted the technical comment to Biodiversity & Conservation for consideration too, hopefully to force them to respond substantively to my critiques. I have also formally requested an authorship removal retraction because I can no longer stand by the results of the study.
All the data are now public, linked via the pre-print. I invite and encourage anybody who is interested to have a look for themselves.
So, why go public? Doing this alone behind the scenes has been incredibly isolating. I don’t want to deal with this alone anymore and hope that by sharing an evidence-based critique of our paper some people will choose to support me here. Ultimately, I want to arrive at the truth. I am convinced that publicly is now the best way to do this. Let me also take this opportunity to say that I am not accusing anybody of anything untoward.
I do have another reason for going public. I truly feel that the University of Guelph and Springer Nature have failed to uphold their standards of research integrity. I believe that the evidence presented above is sufficiently serious that any responsible body would find it prudent to investigate.
Ken A. Thompson
One’s first publication is always a major career milestone so Ken Thompson clearly took an unusual step in questioning the legitimacy of his 2014 co-authored paper in Biodiversity and Conservation. His May 10th post on this blog summarizes his concerns and his limited success in provoking scrutiny of the publication by the university that hosted the research or the journal that published it. As key elements of the research occurred at my home institution, I use this post to comment on matters relevant to issues raised by Thompson.
ReplyDeleteWhile the University of Guelph launched a probe, its breadth was clearly limited. If consulted, I would have stated that a key assertion in the 2014 paper (perfect resolution of plant species with two-locus DNA barcodes) cannot be true because no publication on plants, and there are many of them, has ever reported this result. Moreover, because a comprehensive barcode reference library is available for the Canadian flora, it is certain that the study site (Timmins) is no exception.
How could the authors have reached their conclusion? To examine this matter, one must access the underlying data, but they were not submitted to GenBank as reported in the paper. By failing to require accession numbers before publication, Biodiversity and Conservation clearly failed to meet its responsibility to the scientific community. About a third (2569/8252) of the sequence records surfaced in late 2020, but nearly all were ITS2 (just 9 rbcL). Moreover, they only provided coverage for 163 of the 202 species and lack the collateral data required for validation - no collection dates or locality information. Although the paper indicates these sequences were generated at the Canadian Centre for DNA Barcoding, this was not the case so the source of the records needs clarification.
Thompson reported a surprisingly strong length correspondence between ITS2 records in the Canada-wide plant barcode library and those in the 2020 submission. The latter sequences have another unusual feature; the multiple ITS2 records for a species show near identity, a result which deviates from the heterogeneity typically encountered. The uncertain provenance and unusual attributes of these sequences together with the incompleteness of species coverage reinforce the concerns raised by Thompson in relation to legitimacy of the data that underpinned the 2014 publication.
Viewed from a broader context, this case reveals the need for a mechanism to ensure both the independence and rigor of investigations whenever there is suspicion that trust in the scientific process has been broken.
Paul Hebert
Let's take a look at this from another angle. Ken Thompson acknowledges me for providing him the data in his manuscript, but in these public forums states that Dr. Newmaster gave him the data. Interesting, but then Dr. Hebert states in this forum that the study site was my research site at Timmins, but that is also not the truth. Based on this it would appear that neither Ken nor Dr. Hebert know what the truth is. Thus, when viewed in the broader context I am inclined to dismiss the arguments and classify this as a smear campaign rather than a scientific discussion and conclude that Ken couldn't have bungled this one up more if he had tried.
ReplyDelete