Now, it seems, someone has taken on the task of analyzing a database of self-citations that includes more than 100,000 scientists. They calculate a number of indices of impact for authors with and without their self citations. Here now was the chance to figure out my true self-citation impact in a large pool of scientists in related fields.
Of course, a number of caveats can be kept in mind. First, the data are from Scopus, which is much less complete than Google Scholar - so the reported citations are far lower for everyone. For me, for instance, my current citations are 20,101 on Google Scholar (h-index = 77), 13,567 on Web of Science (h-index = 63), and 13,250 on Scopus (h-index = 63). Yet, as long as no bias exists, perhaps it is still a reliable indicator of self-citation impact. Second, the authors calculated a number of indices of impact, some of which seem to be completely nonsensical. So I merely used total citations and h-indexes.
Has anyone noticed what a train wreck their composite indicator is?— Carl T. Bergstrom (@CT_Bergstrom) August 22, 2019
It sums logs of a) # citations to solo authors papers + b) # cites to solo or 1st auth papers + c) # cites to solo or 1st or last + d) # cites to all papers.
But a ⊆ b ⊆ c ⊆ d. Why would you do this?
To calculate my self-citation impact relative to everyone else, I first sorted to include only the categories "ecology" and "evolutionary biology", yielding 2126 people. Then I plotted h-index in 2017 including self-citations versus h-index in 2017 excluding self-citations. (The first time I posted this, I had the axes reversed - the current version is corrected.) On this, I plotted the two authors of this blog and also Steve Cooke, who has written a spirited defense of self-citation.
A second point is that Steve Cooke is pretty close to the best at it. In fact, he has 12th highest n-index in the entire database of 2126 ecologists and evolutionary biologists. Again, self-citation is not necessarily a bad thing. Check out Steve's paper on Self-citation by researchers: narcissism or an inevitable outcome of a cohesive and sustained research program?
A final point is that, really, self-citation doesn't matter much. In fact, a regression through the line yields an r-squared of 96.7%. In short, the variation among researchers is vastly higher than the effect of self-citation within researchers. Everyone can chill out.
But, more importantly, are these counts and indices and ranks even useful. Much has been written on this topic, much of which I agree with. I had my own take in the post Should I be Proud of my H Index?
 
 
 
 
Thanks for doing this. Agree that there were lots of nonsensical things in there... I don't buy the idea that co-authors should be considered self-cites unless specific to a given paper. For example, if I publish one paper with Hendry (we have one in review), does that mean that forevermore, if Hendry cites one of my papers (that he is not co-author on) would count as a self cite for me... I am having a tough time deciphering the methods in the PLoS Biol original paper.
ReplyDeleteYes, I agree, that doesn't make sense.
ReplyDeleteSomething may be amiss in the plot of h-index values with and without self-citations. Out of 2126 authors publishing in ecology and evolutionary biology, no author has fewer than 7 publications that have each been cited at least 7 times? Every new author has to start with paper #1, and there will be a lag before it accrues any citations. Considering the steady flow of new scientists emerging from universities around the world, there have to be multiple points on the origin (0,0). I would expect a dense blob of points with early-career scientists with h-scores <10, yet only 3 of 2126 ecologists and evobiologists have h-scores? I am with you on your main point that criticism of self-citation is over-broad.
ReplyDelete