Scientists now work in an environment that might be called
#OA-Shaming, where publishing behind a “paywall” is increasingly considered
elitist: at best, unhelpful to science and, at worst, downright nefarious. Set
against this backdrop, the DRY-BAR (Hendry-Barrett) lab meeting this week discussed
the recent Trends in Ecology and
Evolution (TREE) paper ArchivingPrimary Data: Solutions for Long-Term Studies. (Although I am on sabbatical
in California, I was back in Montreal this last week.)
The paper has 63 authors, all scientists with individual-based
long-term datasets. The paper was written as a response to the newish policy,
now increasingly enforced by many journals and funding agencies, requiring that
the data used to produce a paper be made freely and unconditionally available
online. The main point of the TREE paper was that what might seem like a
meritoriously philosophy and policy (free data availability forever for all
humanity!) might not be so obviously beneficial in some cases. I won’t detail
their arguments in relation to long-term data sets, but rather reflect a bit on
the issues from the perspective of someone who has experienced the transition
in policy.
A first important point for young advocates of #OA data
accessibility to recognize is that their philosophy is a logical extension of
other areas of societal change – most obviously in entertainment. When I grew
up, we mostly paid for our music and movies. Sure, we could copy tapes (one of
my most prized possessions was my huge ghetto-blaster with high-speed
tape-to-tape dubbing capacity) or VHS movies; but it was a pain and, really, we
wanted to own the real thing. It was just the way it was. Now, of course, many
young people rarely buy their music or movies, preferring instead to get them
for free, really staring with Napster and hence progressing in various
re-incarnations. Without passing judgement on the merits of this philosophy, it
is important to realize that free access to any product (music, movies, data)
produced by someone else means that the other person (or entity or company)
might not be receiving appropriate compensation for what could well have been
produced at massive expense and effort. Sure, much of the science is publically-funded but scientists still clearly invest much of their life in procuring and
analyzing data and writing papers.
Don’t worry, be
happy.
Regardless of these personal opinions and any realities they
might or not reflect, my main point in this post might simply be characterized
in one statement: Don’t worry, be happy! This sentiment is based on two
realizations.
First, scientists who fear that others will scoop their
research or use their data poorly just because it is freely accessible on line
are likely deluding themselves as to the demand for, and the likely use of,
their data. This statement paraphrases something that I was told had been said
by the editor of a major ecology/evolution journal at the time it started to
require data be published on DRYAD or other online archives. In essence, the basic argument is that most
data published online will never be used, or if it is used, it will not be used
in a way that harms the data-collector’s future research or career. This got me
to wondering – how have my data fared in this new regime?
A quick search for “Andrew Hendry” in DRYAD found data for 16 papers published between 2010 and 2015 (stats for all DRYAD on Dec. 12, 2015 are above). One of my papers was, in fact, based on a long-term (20 years) individual-based study run by my collaborators, who are not authors on the above TREE paper. These data “packages” (the webpage showing the paper information with the list of data files) have been viewed a total of 3788 times, a much larger number than I had expected. Three of the packages have been viewed more than 500 times and one nearly 1000 times! However, only a fraction of these views lead to downloads. Counting only the most-downloaded data file per paper, downloads totaled 564, still a surprisingly large number. One data file has been downloaded more than 148 times! Some interesting (and perhaps obvious) patterns were evident. First, the number of downloads was strongly correlated with the number of views (first figure below), although this correlation is quite imperfect. For instance, one data file has been downloaded 25 times on 26 views (96%), whereas another has been downloaded only 68 times in 975 views (7%). Downloads and views are, not surprisingly, higher for older papers; and the highest frequency of downloads to views (96%) is for one of the most recent papers. Finally, the number of times a paper is cited is correlated with the number of downloads considering only data packages posted before 2014 (second figure below). Only part of this association is due to the effects of publication date.
At this point, my first thought was "Wow, it looks like freely-accessible data is, indeed, freely accessed – frequently." So how often have I been scooped or how often has my data been used inappropriately? Never, to my knowledge. As far as I can tell, an analysis of these data has never been published anywhere. How can this be? Perhaps robots are downloading my data. Perhaps my data sucks and this is only noticed after a download. Perhaps the data are being used but only in meta-analyses. Or, perhaps, I am about to be scooped soon! However, I suggest the more innocent alternative. People are curious and interested but they have no intention of taking the data and publishing it to their own ends. Don’t worry, be happy.
My second realization speaks to the counterpoint. That is,
even if data aren’t freely available, it won’t have a major negative impact on
science. First, I would bet that nearly all reasonably recent data are accessible
in one way or another. In fact, nearly every time I have asked a scientist for
their data, they have provided it – though, admittedly, it has often taken some
repeated prompting. My favorite instance occurred in 2005 when I was writing a
paper about morphological changes in Darwin’s finches. It was 2004 and I was
working at a site (Academy Bay, Santa Cruz Island) from which finches had been
sampled in 1968 (the year I was born!) by Hugh Ford, who had published the data
in 1973. I searched online and found that Hugh was a professor at the
University of New England in Arimdale,
Australia. I emailed him and he responded that the data were old note books and
he would happily dig them out, enter the data in excel, and send it to me. These
data became a key part of the paper and I invited Hugh to be a co-author even
though he hadn’t asked. On flip side, I have been asked many times (I will
speculate the number is over 50) to provide raw data from my previous work and
I have – every single time – provided it. A few times I was a collaborator on
the resulting paper but most of the time none of us saw the need for me to be
an author.
The simple point is that data will generally be there simply
for the asking, regardless of whether it is “archived” online. (An exception from my own experience is given below - but I just digitized it from the paper in the end anyway.) One might
complain that such data often come with unreasonable demands for co-authorship
but, really, if one subscribes to the #OA philosophy about the betterment of
society and society, then who cares, really, if you add another author to your
paper. If you want to exclude from co-authorship someone who contributes data
to your paper, then surely you shouldn’t simultaneously complain when people
don’t want to share their data.
So, no matter how this plays out, and I think where it is going
is pretty clear, I am confident that science won’t really be that compromised
either way. If data are truly valuable, then they can be obtained even without
freely-accessible online access. At the same time, if one puts their data
online and freely-accessible, then it is extremely unlikely doing so will ever
harm their research programs. In fact, I have never heard of a scientist who
has had a bad experience with data they have placed online – although I am suspect
there must be such an instance.
In conclusion, data archiving and #OA advocates and data archiving
and #OA detractors both: don’t worry, be happy!