Maintaining Trust AND Data Integrity - a forum for discussion. INFO HERE.
#IntegrityAndTrust 5. With Data Editors, Everyone Wins.
Trust is everything. If we don’t trust each other, then science ceases to be a creative and fun collaborative endeavor – which is what makes it so wonderful. For me, it all starts and ends with trust, from which the rest follows. Thus, I will NOT now interrogate the data of each student and collaborator for fraud. I will, of course, interrogate it for outliers and data entry mistakes and so on – but NOT for dishonesty. I want every student in my lab knowing they have my complete trust, and I want every collaborator working with me knowing likewise.
Ok, fine, that sounds good, but then how do we catch fraud from this position of trust? At the outset, I would suggest that outright fraud is exceptionally rare (I will later write a post on the many “flavors of scientific dishonesty”) – to the point that we can start from a default position of not worrying about it on a daily basis. Moreover, if the science matters, the fraud will eventually be caught by someone and corrected. Afterall, that is what is currently happening. The scientific record is being correct and we will move forward from a better place. (I am not diminishing the harm done to those who have been impacted, nor the incredible work being done to correct it.) This self-correction has happened in the past in several well publicized cases, and also in a few you haven’t heard about. Hence, we definitely should always strive for improved recording and presenting and archiving of data, I do not think that we should enter into any collaboration with the expectation that collaborators will be actively checking each other for fraud.
Moreover, any sort of data fraud checking among collaborators will not catch situations where collaborators are cheating together, and it won’t help for single-authored papers, and it won’t help in cases where only one author (e.g., a bioinformatician) has the requisite knowledge to generate and analyze the data. Afterall, if everyone had the ability to generate, analyze, and interpret the data equally well, then we wouldn’t need collaborations would we. Instead, collaborators tend to be brought together for their complementary, rather than redundant, skills. I certainly won’t be able to effectively check the (for example) genomic data generated by specialist collaborators. Nor should all authors be spending their time on this – they should be focusing their efforts on areas where they have unique skills. Otherwise, why collaborate?
Yet we obviously need a better way to detect the cheaters. I can see one clear solution for detecting fraud before it hits the scientific record while not compromising an atmosphere of trust among collaborators. Consider this: the one place where trust does not currently exist is between journals and authors submitting papers. That is, reviewers/editors at journals don’t “trust” that the authors have chosen the right journal, that experimental design is correct, that their analyses appropriate, and that their interpretation is valid. Instead, reviewers/editors interrogate these aspects of each submitted paper to see if they trust the authors’ work and choice of journal.
Why not then have fraud detection enter in from the journal side of the process. For instance, many journals already run the text of submitted papers through a program that checks for plagiarism. Indeed, I was editor for a paper where plagiarism by a junior lead author was caught by just such a program and the senior author was able to fix it. Why not do the same for data? R packages are being developed to check for common approaches to fraud and can be used to interrogate data by officially-sanctioned and recognized Data Editors. These Data Editors could be acknowledged just like regular editors on the back pages of journals and even on the papers themselves. The Data Editors can put this role on their CVs and be recognized for this contribution. I expect that many scientists – especially those with coding skills with a passion for data integrity – would jump at the opportunity to be an official Data Editor at a journal.
Yes, I hear you saying “That would be a ton of work” – and so here is a suggestion to minimize unnecessary effort. Specifically, the Data Editors kick in only when the paper is already accepted. This would avoid duplication of effort when the same paper is rejected from one journal and then serially submitted to other journals. I suppose a few instances would be detected where the paper passed normal peer review and then was rejected (or corrected) after being examined by a Data Editor – but I expect this would be rare. Also, I am not suggesting Data Editors should be checking code – only the raw data itself. Also, I think they should be EDITORS, not reviewers. That is, the journal would engage 10 or so Data Editor who would then “handle” the data for all accepted papers according to their expertise.
I hope that the scientific community will seriously consider this Data Editor suggestion because it seems to me by far the best (perhaps the only) way to maintain trust and complementarity among collaborators while also improving data integrity. I think also that it would be an opportunity for data enthusiasts, programmers, and coders to get more recognition for their skills and efforts. Everyone wins.