Sunday, August 25, 2019

Self-Citation Revisited

A few years ago, I wrote a post here called A Narcissist Index (n-index) for Academics in which I jokingly argued for an self-citation index based on the proportion of a person's total citations that to their own work. Mine was 10%, which through an on-the-fly search by a group of us in attendance at the American Society of Naturalists meeting revealed to be kind of middle-of-the-road. But that analysis was only a quick and cursory tongue-in-cheek survey.

Now, it seems, someone has taken on the task of analyzing a database of self-citations that includes more than 100,000 scientists. They calculate a number of indices of impact for authors with and without their self citations. Here now was the chance to figure out my true self-citation impact in a large pool of scientists in related fields.

Of course, a number of caveats can be kept in mind. First, the data are from Scopus, which is much less complete than Google Scholar - so the reported citations are far lower for everyone. For me, for instance, my current citations are 20,101 on Google Scholar (h-index = 77), 13,567 on Web of Science (h-index = 63), and 13,250 on Scopus (h-index = 63). Yet, as long as no bias exists, perhaps it is still a reliable indicator of self-citation impact. Second, the authors calculated a number of indices of impact, some of which seem to be completely nonsensical. So I merely used total citations and h-indexes.



To calculate my self-citation impact relative to everyone else, I first sorted to include only the categories "ecology" and "evolutionary biology", yielding 2126 people. Then I plotted h-index in 2017 including self-citations versus h-index in 2017 excluding self-citations. (The first time I posted this, I had the axes reversed - the current version is corrected.) On this, I plotted the two authors of this blog and also Steve Cooke, who has written a spirited defense of self-citation.

A first point is that, as before, I am kind of middle of the road when it comes to the impact of self-citation. To return to our original n-index, I calculated here that the proportion of my total citations due to self-citations is 16%, which is in the 60th percentile so, again - middle of the road. The co-host of this blog, Dan Bolnick, has a lower self-citation rate, with an n-index of 10.7%, which is in the 30th percentile. Give him time. As a side-bar, I was disappointed to see that - for unknown reasons - my colleague Rowan Barrett was not in the index for ecology and evolutionary biology. I wanted to check my previous conclusion that he had an extremely low self-citation rate.

A second point is that Steve Cooke is pretty close to the best at it. In fact, he has 12th highest n-index in the entire database of 2126 ecologists and evolutionary biologists. Again, self-citation is not necessarily a bad thing. Check out Steve's paper on Self-citation by researchers: narcissism or an inevitable outcome of a cohesive and sustained research program?

A final point is that, really, self-citation doesn't matter much. In fact, a regression through the line yields an r-squared of 96.7%. In short, the variation among researchers is vastly higher than the effect of self-citation within researchers. Everyone can chill out.

But, more importantly, are these counts and indices and ranks even useful. Much has been written on this topic, much of which I agree with. I had my own take in the post Should I be Proud of my H Index?



3 comments:

  1. Thanks for doing this. Agree that there were lots of nonsensical things in there... I don't buy the idea that co-authors should be considered self-cites unless specific to a given paper. For example, if I publish one paper with Hendry (we have one in review), does that mean that forevermore, if Hendry cites one of my papers (that he is not co-author on) would count as a self cite for me... I am having a tough time deciphering the methods in the PLoS Biol original paper.

    ReplyDelete
  2. Yes, I agree, that doesn't make sense.

    ReplyDelete
  3. Something may be amiss in the plot of h-index values with and without self-citations. Out of 2126 authors publishing in ecology and evolutionary biology, no author has fewer than 7 publications that have each been cited at least 7 times? Every new author has to start with paper #1, and there will be a lag before it accrues any citations. Considering the steady flow of new scientists emerging from universities around the world, there have to be multiple points on the origin (0,0). I would expect a dense blob of points with early-career scientists with h-scores <10, yet only 3 of 2126 ecologists and evobiologists have h-scores? I am with you on your main point that criticism of self-citation is over-broad.

    ReplyDelete

How to make rational inferences from data

This post is motivated by the paralysis that many students encounter when attempting to fit a model to their data, typically in R. I have l...