Scientists now work in an environment that might be called #OA-Shaming, where publishing behind a “paywall” is increasingly considered elitist: at best, unhelpful to science and, at worst, downright nefarious. Set against this backdrop, the DRY-BAR (Hendry-Barrett) lab meeting this week discussed the recent Trends in Ecology and Evolution (TREE) paper ArchivingPrimary Data: Solutions for Long-Term Studies. (Although I am on sabbatical in California, I was back in Montreal this last week.)
The paper has 63 authors, all scientists with individual-based long-term datasets. The paper was written as a response to the newish policy, now increasingly enforced by many journals and funding agencies, requiring that the data used to produce a paper be made freely and unconditionally available online. The main point of the TREE paper was that what might seem like a meritoriously philosophy and policy (free data availability forever for all humanity!) might not be so obviously beneficial in some cases. I won’t detail their arguments in relation to long-term data sets, but rather reflect a bit on the issues from the perspective of someone who has experienced the transition in policy.
A first important point for young advocates of #OA data accessibility to recognize is that their philosophy is a logical extension of other areas of societal change – most obviously in entertainment. When I grew up, we mostly paid for our music and movies. Sure, we could copy tapes (one of my most prized possessions was my huge ghetto-blaster with high-speed tape-to-tape dubbing capacity) or VHS movies; but it was a pain and, really, we wanted to own the real thing. It was just the way it was. Now, of course, many young people rarely buy their music or movies, preferring instead to get them for free, really staring with Napster and hence progressing in various re-incarnations. Without passing judgement on the merits of this philosophy, it is important to realize that free access to any product (music, movies, data) produced by someone else means that the other person (or entity or company) might not be receiving appropriate compensation for what could well have been produced at massive expense and effort. Sure, much of the science is publically-funded but scientists still clearly invest much of their life in procuring and analyzing data and writing papers.
Don’t worry, be happy.
Regardless of these personal opinions and any realities they might or not reflect, my main point in this post might simply be characterized in one statement: Don’t worry, be happy! This sentiment is based on two realizations.
First, scientists who fear that others will scoop their research or use their data poorly just because it is freely accessible on line are likely deluding themselves as to the demand for, and the likely use of, their data. This statement paraphrases something that I was told had been said by the editor of a major ecology/evolution journal at the time it started to require data be published on DRYAD or other online archives. In essence, the basic argument is that most data published online will never be used, or if it is used, it will not be used in a way that harms the data-collector’s future research or career. This got me to wondering – how have my data fared in this new regime?
A quick search for “Andrew Hendry” in DRYAD found data for 16 papers published between 2010 and 2015 (stats for all DRYAD on Dec. 12, 2015 are above). One of my papers was, in fact, based on a long-term (20 years) individual-based study run by my collaborators, who are not authors on the above TREE paper. These data “packages” (the webpage showing the paper information with the list of data files) have been viewed a total of 3788 times, a much larger number than I had expected. Three of the packages have been viewed more than 500 times and one nearly 1000 times! However, only a fraction of these views lead to downloads. Counting only the most-downloaded data file per paper, downloads totaled 564, still a surprisingly large number. One data file has been downloaded more than 148 times! Some interesting (and perhaps obvious) patterns were evident. First, the number of downloads was strongly correlated with the number of views (first figure below), although this correlation is quite imperfect. For instance, one data file has been downloaded 25 times on 26 views (96%), whereas another has been downloaded only 68 times in 975 views (7%). Downloads and views are, not surprisingly, higher for older papers; and the highest frequency of downloads to views (96%) is for one of the most recent papers. Finally, the number of times a paper is cited is correlated with the number of downloads considering only data packages posted before 2014 (second figure below). Only part of this association is due to the effects of publication date.
At this point, my first thought was "Wow, it looks like freely-accessible data is, indeed, freely accessed – frequently." So how often have I been scooped or how often has my data been used inappropriately? Never, to my knowledge. As far as I can tell, an analysis of these data has never been published anywhere. How can this be? Perhaps robots are downloading my data. Perhaps my data sucks and this is only noticed after a download. Perhaps the data are being used but only in meta-analyses. Or, perhaps, I am about to be scooped soon! However, I suggest the more innocent alternative. People are curious and interested but they have no intention of taking the data and publishing it to their own ends. Don’t worry, be happy.
My second realization speaks to the counterpoint. That is, even if data aren’t freely available, it won’t have a major negative impact on science. First, I would bet that nearly all reasonably recent data are accessible in one way or another. In fact, nearly every time I have asked a scientist for their data, they have provided it – though, admittedly, it has often taken some repeated prompting. My favorite instance occurred in 2005 when I was writing a paper about morphological changes in Darwin’s finches. It was 2004 and I was working at a site (Academy Bay, Santa Cruz Island) from which finches had been sampled in 1968 (the year I was born!) by Hugh Ford, who had published the data in 1973. I searched online and found that Hugh was a professor at the University of New England in Arimdale, Australia. I emailed him and he responded that the data were old note books and he would happily dig them out, enter the data in excel, and send it to me. These data became a key part of the paper and I invited Hugh to be a co-author even though he hadn’t asked. On flip side, I have been asked many times (I will speculate the number is over 50) to provide raw data from my previous work and I have – every single time – provided it. A few times I was a collaborator on the resulting paper but most of the time none of us saw the need for me to be an author.
The simple point is that data will generally be there simply for the asking, regardless of whether it is “archived” online. (An exception from my own experience is given below - but I just digitized it from the paper in the end anyway.) One might complain that such data often come with unreasonable demands for co-authorship but, really, if one subscribes to the #OA philosophy about the betterment of society and society, then who cares, really, if you add another author to your paper. If you want to exclude from co-authorship someone who contributes data to your paper, then surely you shouldn’t simultaneously complain when people don’t want to share their data.
So, no matter how this plays out, and I think where it is going is pretty clear, I am confident that science won’t really be that compromised either way. If data are truly valuable, then they can be obtained even without freely-accessible online access. At the same time, if one puts their data online and freely-accessible, then it is extremely unlikely doing so will ever harm their research programs. In fact, I have never heard of a scientist who has had a bad experience with data they have placed online – although I am suspect there must be such an instance.
In conclusion, data archiving and #OA advocates and data archiving and #OA detractors both: don’t worry, be happy!