Tuesday, February 11, 2020

#IntegrityAndTrust 5. With Data Editors, Everyone Wins

Maintaining Trust AND Data Integrity - a forum for discussion. INFO HERE.

#IntegrityAndTrust 5. With Data Editors, Everyone Wins.
Trust is everything. If we don’t trust each other, then science ceases to be a creative and fun collaborative endeavor – which is what makes it so wonderful. For me, it all starts and ends with trust, from which the rest follows. Thus, I will NOT now interrogate the data of each student and collaborator for fraud. I will, of course, interrogate it for outliers and data entry mistakes and so on – but NOT for dishonesty. I want every student in my lab knowing they have my complete trust, and I want every collaborator working with me knowing likewise.

Ok, fine, that sounds good, but then how do we catch fraud from this position of trust? At the outset, I would suggest that outright fraud is exceptionally rare (I will later write a post on the many “flavors of scientific dishonesty”) – to the point that we can start from a default position of not worrying about it on a daily basis. Moreover, if the science matters, the fraud will eventually be caught by someone and corrected. Afterall, that is what is currently happening. The scientific record is being correct and we will move forward from a better place. (I am not diminishing the harm done to those who have been impacted, nor the incredible work being done to correct it.) This self-correction has happened in the past in several well publicized cases, and also in a few you haven’t heard about. Hence, we definitely should always strive for improved recording and presenting and archiving of data, I do not think that we should enter into any collaboration with the expectation that collaborators will be actively checking each other for fraud.

Moreover, any sort of data fraud checking among collaborators will not catch situations where collaborators are cheating together, and it won’t help for single-authored papers, and it won’t help in cases where only one author (e.g., a bioinformatician) has the requisite knowledge to generate and analyze the data. Afterall, if everyone had the ability to generate, analyze, and interpret the data equally well, then we wouldn’t need collaborations would we. Instead, collaborators tend to be brought together for their complementary, rather than redundant, skills. I certainly won’t be able to effectively check the (for example) genomic data generated by specialist collaborators. Nor should all authors be spending their time on this – they should be focusing their efforts on areas where they have unique skills. Otherwise, why collaborate?

Yet we obviously need a better way to detect the cheaters. I can see one clear solution for detecting fraud before it hits the scientific record while not compromising an atmosphere of trust among collaborators. Consider this: the one place where trust does not currently exist is between journals and authors submitting papers. That is, reviewers/editors at journals don’t “trust” that the authors have chosen the right journal, that experimental design is correct, that their analyses appropriate, and that their interpretation is valid. Instead, reviewers/editors interrogate these aspects of each submitted paper to see if they trust the authors’ work and choice of journal.

Why not then have fraud detection enter in from the journal side of the process. For instance, many journals already run the text of submitted papers through a program that checks for plagiarism. Indeed, I was editor for a paper where plagiarism by a junior lead author was caught by just such a program and the senior author was able to fix it. Why not do the same for data? R packages are being developed to check for common approaches to fraud and can be used to interrogate data by officially-sanctioned and recognized Data Editors. These Data Editors could be acknowledged just like regular editors on the back pages of journals and even on the papers themselves. The Data Editors can put this role on their CVs and be recognized for this contribution. I expect that many scientists – especially those with coding skills with a passion for data integrity – would jump at the opportunity to be an official Data Editor at a journal.

Yes, I hear you saying “That would be a ton of work” – and so here is a suggestion to minimize unnecessary effort. Specifically, the Data Editors kick in only when the paper is already accepted. This would avoid duplication of effort when the same paper is rejected from one journal and then serially submitted to other journals. I suppose a few instances would be detected where the paper passed normal peer review and then was rejected (or corrected) after being examined by a Data Editor – but I expect this would be rare. Also, I am not suggesting Data Editors should be checking code – only the raw data itself. Also, I think they should be EDITORS, not reviewers. That is, the journal would engage 10 or so Data Editor who would then “handle” the data for all accepted papers according to their expertise.

I hope that the scientific community will seriously consider this Data Editor suggestion because it seems to me by far the best (perhaps the only) way to maintain trust and complementarity among collaborators while also improving data integrity. I think also that it would be an opportunity for data enthusiasts, programmers, and coders to get more recognition for their skills and efforts. Everyone wins.


  1. Data editors could be the future, but there are challenges:

    Let’s me start by saying that having data editors would be great! I know members of editorial boards that have come up with the same suggestion; adopting code editors has been also raised in similar occasions. I also want to say again (see post #1) that data integrity goes much beyond tackling data fraud. Data editors could be essential in assuring data accuracy and consistency that are not a result of malicious behaviour.

    There are, however, potential challenges with adopting a system of data editors. In the meanwhile, I still think that authors providing statements on the steps they took to assure data integrity to be an immediate action to improve (not solve) data integrity and public trust. This would also make authors think much harder on how they did their best to alleviate the work of data editors.

    1) The legal challenges: how to build an ironclad system that protects journals and scientists?
    What happens when a paper is rejected on the basis of “data fraud”? If we don’t want a paper that was rejected on the basis of data fraud in one journal to end up published elsewhere, what should we do? Will we be able to share the issue publicly? If not, it is likely that the frauded manuscript will get published elsewhere, particularly if authors change publishing companies. Therefore, we may have to adhere to some sort of legal arrangements in which journals can state these cases publicly without fears of the legal repercussions such as authors suing journals for slander. How to protect data editors from being sued personally if journals end up in court? Perhaps I’m being pessimistic, but I think that the potential implications maybe bigger than one can imagine at first.

    2) The challenges involving data diversity: There are obvious several data standards and formats. Likely, we will need diverse expertise on the variety of data formats, particularly for more general journals (e.g., American Naturalist, Ecology, Ecology Letters, Oikos) or journals that often deal with different formats and data integration (e.g., Ecography, Global Ecology and Biogeography, Methods in Ecology and Evolution).

    3) An army of data editors: Given the ~500000 papers published a year in biological/biomedical sciences, we will need just for these two fields 1000s of data editors. Some journals may not adopt data editing immediately as they won’t be able to find appropriate expertise. While we build the army, three issues may arise along the way: a) are we going to generate different views between papers published in journals that have and don’t have data editors? b) will the number of papers reduce for a while to cope with the short to medium-term potential scarcity of data editors (a potential positive side effect in my mind); c) are we going to slow down the publication process in general in the near future as a result of data editing?

    4) Can data editors deal with data issues outside of their fields of expertise? Do we need expertise within fields to spot issues with data accuracy, consistency or fraud? If we do come to the conclusion that we need field specific data expertise, additional challenges in finding the necessary expertise will amount. As a result, some fields may suffer more than others.

    5) Editorial responsibility: we will need to consider a system in which data editors are respected and protected. They are not to become responsible for not having spotted issues that are discovered by others in the future. While we do have such a system for editorial members in general, I wonder if the pressure could be bigger on data editors if issues involving accuracy, consistency or fraud are to be found in papers that passed their “approval”.

    There are challenges ahead but adopting a system of data editors would be a great step. In parallel, we need to start making sure that we properly structure data in a way that reduces the time of data editors.

    Pedro Peres-Neto
    Concordia University

  2. Pedro

    Thank you for your well considered comments. Many of them seem to me to be no different from those faced by a non-data editor. That is, they are certainly real issues but journals and editors already have to face them in the context of their current roles and functions.

    However, the one that is new is definitely the comment about needing MORE people - an "army of data editors." To reduce this real problem, I suggest:

    1. Only accepted papers have their data interrogated by the data editor.
    2. Only one data editor per paper - unless that data editor needs help.
    3. And - of course - only papers with data need data editors.

    I think that the current growing expertise of early-career scientists in data analysis, and their enthusiasm for open science and data integrity, would yield a large group of people excited to take on this role - and to be recognized formally as such.



    1. Andrew,

      We agree that the idea of having data editors has great merit. I disagree that the challenges involved are the same as current editors (see above). That said, we gave a good account of our opinions and others participating in the publication system (pretty much all of us) can now use these as topics for further discussion.



      PS: looking forward to share a glass of beer, scotch or wine to discuss this (and other topics) further!


Prediction In Ecology And Evolution

I recently published a paper titled Prediction in Ecology and Evolution  in BioScience . I was pretty sure the paper would get a lot of atte...