Sunday, August 25, 2019

Self-Citation Revisited

A few years ago, I wrote a post here called A Narcissist Index (n-index) for Academics in which I jokingly argued for an self-citation index based on the proportion of a person's total citations that to their own work. Mine was 10%, which through an on-the-fly search by a group of us in attendance at the American Society of Naturalists meeting revealed to be kind of middle-of-the-road. But that analysis was only a quick and cursory tongue-in-cheek survey.

Now, it seems, someone has taken on the task of analyzing a database of self-citations that includes more than 100,000 scientists. They calculate a number of indices of impact for authors with and without their self citations. Here now was the chance to figure out my true self-citation impact in a large pool of scientists in related fields.

Of course, a number of caveats can be kept in mind. First, the data are from Scopus, which is much less complete than Google Scholar - so the reported citations are far lower for everyone. For me, for instance, my current citations are 20,101 on Google Scholar (h-index = 77), 13,567 on Web of Science (h-index = 63), and 13,250 on Scopus (h-index = 63). Yet, as long as no bias exists, perhaps it is still a reliable indicator of self-citation impact. Second, the authors calculated a number of indices of impact, some of which seem to be completely nonsensical. So I merely used total citations and h-indexes.

To calculate my self-citation impact relative to everyone else, I first sorted to include only the categories "ecology" and "evolutionary biology", yielding 2126 people. Then I plotted h-index in 2017 including self-citations versus h-index in 2017 excluding self-citations. (The first time I posted this, I had the axes reversed - the current version is corrected.) On this, I plotted the two authors of this blog and also Steve Cooke, who has written a spirited defense of self-citation.

A first point is that, as before, I am kind of middle of the road when it comes to the impact of self-citation. To return to our original n-index, I calculated here that the proportion of my total citations due to self-citations is 16%, which is in the 60th percentile so, again - middle of the road. The co-host of this blog, Dan Bolnick, has a lower self-citation rate, with an n-index of 10.7%, which is in the 30th percentile. Give him time. As a side-bar, I was disappointed to see that - for unknown reasons - my colleague Rowan Barrett was not in the index for ecology and evolutionary biology. I wanted to check my previous conclusion that he had an extremely low self-citation rate.

A second point is that Steve Cooke is pretty close to the best at it. In fact, he has 12th highest n-index in the entire database of 2126 ecologists and evolutionary biologists. Again, self-citation is not necessarily a bad thing. Check out Steve's paper on Self-citation by researchers: narcissism or an inevitable outcome of a cohesive and sustained research program?

A final point is that, really, self-citation doesn't matter much. In fact, a regression through the line yields an r-squared of 96.7%. In short, the variation among researchers is vastly higher than the effect of self-citation within researchers. Everyone can chill out.

But, more importantly, are these counts and indices and ranks even useful. Much has been written on this topic, much of which I agree with. I had my own take in the post Should I be Proud of my H Index?

How to Write a Thesis

 [ This piece is, supposedly, originally by one A. W. James, who seems to have been a professor in the Dept. of Biology at Canisius College ...