Saturday, January 16, 2016

A Narcissist Index (n-index) for academics


We all know the h-index, the number of papers by a given author that have been cited at least that many times. For instance, if you have 52 papers cited at least 52 times but the 53rd most-cited paper is only cited 52 (rather than 53 times), your h-index is 52. This index is widely used to assess the performance of academics and has sometimes been praised but much more widely vilified.

Source
Other indices have been suggested, some seriously and some just for fun. For instance, the Kardashian Index (k-index) is the number of twitter followers divided by the number of citations you have accumulated. If, for example, you have 1594 twitter followers and 8564 total citations, your k-index is 0.19.

Source
While at this week’s American Society of Naturalist’s meeting (Asilomar, CA), I happened to join a conversation that – through a series of unplanned segues – led to the suggestions for a new index by which to assess scientists – the Narcissist Index (n-index).

The conversation started on a totally different topic – how to convince more authors to submit to The American Naturalist. The first question posed to the group was “How many Am Nat papers would you give up to have one Nature or Science paper?” and progressed to questions like “What difference in number of citations for a paper would convince you to publish in Am Nat rather than Science or Nature.” The basic point was to evaluate perceptions about how journal prestige as opposed to actual paper impact (citations) influence how people choose journals.

At this point, the conversation switched to citation rates and what influences them, which led inevitably to the topic of self-citation. For instance, one might inflate their citations simply by citing themselves. Or, even if they don’t do this, it is just generally frowned upon as Narcissistic – or, I suppose, an indication that you are just not widely read.

Although it is typical to criticize self-citation, some have argued that it is merely “an inevitable outcome of a cohesive and sustained research program.” I won’t go into the details here – you can read the paper with this alternative viewpoint – but the whole discussion led someone to say “What we really need to do is figure out what proportion of a person’s citations are self-citations and from this we can calculate a Narcissist Index – the n-index.”


Out came Web of Science (Google Scholar does not track self-citations) and I knew in seconds that I had 829 self-citations out of 8540 total citations for an n-index of 0.1. Hmmm, that sounded high to my friends and colleagues, but we needed some frame of reference. So we started calculating the n-index for everyone in the conversation and for some of our friends (and fathers and mentors), most of whom were at the meeting.

We quickly realized that several factors could influence (or be influenced by) the n-index, including the h-index, total number of papers, “scientific age”, and total number of citations. Back to Web of Science for a bit more data. At this point it became clear that Rowan Barrett was by far the greatest outlier with the lowest proportion of self-citations including in relation to all of these other variables – so he was been deleted from the analyses that follow.

The strongest correlation was with scientific age – longer-established researchers (based on their first paper in Web of Science) have a lower n-index (R2 = 9.2%). Perhaps young people have had less time for their work to become well known – or perhaps self-citation was simply lower in “the good old days” – or maybe the literature is so vast now that it is hard to stand-out and so be cited by others.


A similar correlation was evident with total citations – the n-index is lower for people with more total citations (R2 = 8.9%). At one level, this association might suggest that self-citation is not a very effective way of increasing your total citations – instead it acts against you. However, it seems much more likely that the association occurs simply because a person can only cite themselves a limited number of times, whereas other people can cite them much more frequently.


These two variables (total citations and scientific age) are obviously correlated, and this was certainly the case even in our tiny non-random data set (R2 = 61.2%). So, if one really cared about such things, I guess the n-index could be corrected for its various correlates. Indeed, the h-index is often adjusted for scientific age. Interestingly, both the above correlations (n-index on scientific age and n-index on total citations) appear much lower than the correlation between h-index and scientific age (59%) in the same data set, which I guess proves that you really can succeed in narcissism (or research program cohesion) at any age.

So how did I fare? I am sad to say that I do not have the highest n-index in the pool of 19 people we examined – I am only 3rd. Nor do I have the highest residual of n-index regressed on scientific age – only 4th. Nor do I have the highest residual of n-index regressed on total citations – only 3rd.

Perhaps I am not as good as I thought I was.

Bah - just check out my awesome papers (Hendry et al. 1995, 1996, 1997, 1998, 1999, 2000a, 2000b, 2000c, 2000d, 2000e, 2001, 2002, 2003 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020).



Notes and disclaimers:
  1. Dan Bolnick came up with the name "n-index"
  2. This analysis was done completely in fun and is not meant to single out anyone one way or the other (except for Rowan). Moreover, none of us consider this to be a useful index in any way; merely something that entertained us briefly while drinking beer.
  3. Previous authors have calculated and discussed the n-index or something like it. However, we had more fun ignoring this literature and exploring the ideas ourselves.
  4. The data presented here will not be accurate. First, they are a non-random selection of the people who happened to be standing around supplemented by our friends, people who write papers about the merits of self-citation, and more senior folks who we could use to better populate the range of parameter space. The small number of people included in the analysis are simply an indication of how long my computer battery lasted. Second, because Web of Science does not have a unique identifier for people, at least one person in the analysis is a combination of several people. However, we did confirm that at least all of the highly cited papers were for the people in question. Third, we searched only for last name plus the initials. If someone left an initial off some of their papers, we would miss it. We would also miss papers if someone changed their name. Fourth, no formal stats were conducted – nor do I recommend wasting your time to do so.
Finally:

I recently (April 9, 2016) produced a word cloud for my entire forthcoming book Eco-Evolutionary Dynamics. The most common words were "et" and "al", which I guess is not surprising.
Removing those uninteresting words, I generated the word cloud below.
I put it out on twitter and Menno Schilthuizen immediately noted the "relative modest" appearance of the word "Hendry." I hadn't even noticed my name in the cloud but, looking more closely, I realized that it is the only obvious name in the entire thing, which suggests a new index of narcissism: the appearance of your own name in a word cloud of your work.


21 comments:

  1. Andrew - this is awesome. Wish I'd been there to take part in this discussion.

    One reason the n-index falls with scientific age is probably that it should also fall with paper age. I can self-cite my paper long before it's published, because I know it's coming out. Anyone else has to see it published, and only THEN insert it into a MS which will have a lag before publication. Call this the head-start effect. (Widespread preprint server use should weaken this effect, but not remove it). Early in my career, by definition ALL my papers were young, and had a strong head-start effect; now that I'm (cough) not young, this effect averaged over my total citations is trivial.

    I haven't calculated my own n-index only because UNB has discontinued WoS. I speculate it's high because I do self-cite, although oddly for somewhat the reverse reason you mention. It's not so much that I have a "cohesive and sustained" research programme; more that I have a not-very-cohesive programme, but I want to show that there are connections that might not be obvious at first glance! But I'll test that, whenever I can convince someone who still has WoS to calculate my n-index for me (hint hint hint)...

    ReplyDelete
    Replies
    1. Approximately 5% - right in the middle of the pack, including for residuals from age and a bit low for residuals from total citations. You need to cite yourself more.

      Delete
  2. Re: the correlation with scientific age, my hypothesis would be that science is much more competitive now than it used to be (competition for funding, for tenure-track positions, for, well, everything), and that as a result, self-promotion is both more necessary and more culturally acceptable than it used to be. At the same time, I suspect this has also caused the epidemic of "imposter syndrome" among younger scientists. So according to this hypothesis, your strong negative correlation between scientific age and n-index is a sign of a sickness at the heart of modern science...

    ReplyDelete
  3. Yeah, but it is a pretty weak correlation. In fact, self citation seems to have very little impact on things like total citations and h-index.

    ReplyDelete
    Replies
    1. Doesn't look that weak – your plot above shows that n-index for scientific age 10 is about twice the n-index for scientific age 50, and you report an R2 of 61.2%. That seems like quite a large effect to me!

      Delete
    2. This analysis confirms what we all knew, that Rowan Barrett is an exceptionally modest & excellent scientist.

      What about applying this idea to journals as well. In addition to their impact factor, calculate a metric of how much that IF entails self-citations / external citations. That detects journals that disproportionately self-cite, typically as a strategy to inflate their own IF.

      Delete
    3. Dan- this would actually be interesting if calculated over time for different journals. For instance, I have heard rumors that in the early 2000s certain high profile eco/evo journals sent emails a suggested list of referenes to add to a paper, all of which were previously published in that journal. They were intentionally asking authors to self cite the journal. This type of analysis might be able to confirm or deny the generality of those rumors.

      Colin Averill

      Delete
  4. Ben.

    The huge r2 is for total citations versus age. The correlation of n-index and age is much lower. What I am saying is that n-index is much less effected by age than is h-index and total citations

    ReplyDelete
    Replies
    1. Ah, I see, the 61% R2 is for total citations versus scientific age. But the effect size of scientific age on n-index still looks quite large to me (albeit with a smaller R2 than I thought – i.e. there are other things causing a lot of variance too); as I observed above, those with an age of 10 have a (fit-predicted) n-index about twice as high as those with an age of 50! Of course the effect of scientific age on h-index and total citations is large, since those metrics would both be expected to naturally rise over time if a scientist simply maintained constant behavior; nothing surprising about that. If a scientist maintained constant behavior, it is less obvious that one would expect a large effect of scientific age on n-index (although Stephen Heard's explanation above would account for a small effect, at least). It still seems to me that there is a large effect here in need of an explanation, and that a cultural shift in science, as I propose above, is a very plausible hypothesis. No?

      Delete
  5. This is interesting. I would also be curious to see what measures of in- vs. out-of-network citations look like; in other words what fraction of someone's recent citations are from outside the network of people they've collaborated with (defined by coauthorship) in some relevant time frame. The rise in importance of networking to scientific impact in the last few decades (necessitated by the rise of big data and evidenced by large collaborations overtaking the lone-wolf geniuses as the most common authors of major breakthroughs) seems more pronounced to me (at least anecdotally) than the rise of narcissistic self-citation.

    ReplyDelete
  6. A self-citation corrected journal impact factor would be great but could reflect a "cohesive and sustained publication program"

    ReplyDelete
  7. Regarding journal self-citation, my friend Philip Cohen has done some interesting digging on this in his field (sociology). Check this out: https://familyinequality.wordpress.com/2015/11/30/journal-self-citation-practices-revealed/

    ReplyDelete
  8. Self-citations in web of science seem to include all of your coauthors. If, in a paper you are not on, they cite a paper both you and they are on, that paper will be considered a self citation in your count - I think.

    I do agree that huge collaborations are ephasized now.

    ReplyDelete
  9. One potential (charitable) explanation for the correlation with "scientific age" might be something related to how a researcher builds a program.

    As an edge case, imagine a Ph.D. student writes a cohesive dissertation, such that each chapter builds upon the last. When publishing, each chapter would have to cite (at least) the one before it. Inflating the n-index dramatically given that they're new papers with fewer citations. Continuing on, early in one's career, it might make sense to cite your earlier work because you're directly building upon it.

    Another potential charitable explanation: Science becoming more specialized, so that newer folks are working in a smaller subdiscipline, resulting in more self-citations because there aren't that many other people in their tiny niche.

    ReplyDelete
    Replies
    1. Indeed. Many potential explanations - some narcissistic (or at least self-promoting), some not.

      Delete
  10. So - I am the author of a thought-piece in the journal Ideas in Ecology and Evolution where I (along with co-author Mike Donaldson) whether self-citation is a form of narcissism or an indicator of a cohesive and productive research program. It can go either way depending on a variety of factors - some within the control of the author and some beyond their control.

    Not surprisingly, the blog post caught my attention - indeed, I caught wind of it from a random post on Facebook written by Hendry as he was working on he blog entry. I was certainly interested to see where I stood... However, it took some time for me to reconcile two OCRID accts and then harmonize it with WOS. Today they became "synced" and I was able to see how I fared. I am pleased to report that I crushed Hendry - scoring an impressive N score of 0.36 - that is, 36% of my WOS citations are from me or a paper on which I am a co-author. A few other numbers - I have 420 papers that WOS tracks. I scored by 1st peer reviewed paper in 1999 which gives me an academic age of 17. According to WOS, I have 7458 total citations (over 12,000 in Google Scholar) of which 4746 are from papers I have not authored while 2712 represent self-citations. Some of you reading this may be surprised by such a number and begin to "judge" me... I was not overly surprised. Before I state why, I should first note that as an editor and mentor I call people out on egregious self-citation quite regularly. I am a fan of historic papers and also a total publication nerd, trolling journals for papers that are "in press". I impress upon my students the importance of giving credit where credit is due, reading widely, and recognizing that there is much great work that occurs outside of North America (i.e., concepts transcend geography).

    I consider my N-score to be a product of a few things… 1) I engage in LOTS of synthesis/lit review/perspective articles and not surprisingly they are in topical areas where I have significant research activity (e.g., recreational fisheries, conservation physiology, biotelemetry and biologging). This is partly because I like doing so and (in my apparently narcissistic mind) I am good at it. I also like to build teams and frequently get my grad students and PDFs to work together and loop in whomever in the world I think is the BEST collaborator on a given topic. I am really proud of these syntheses as they serve as rallying points for reflection and progress. 2) Related to 1, we have a number of themes in our lab where we devote much of our research activity. We build upon our own work as we move forward. That does NOT mean we avoid or exclude work by others… it simply means that one thesis (set of pubs) leads to some answers and more questions which we then build upon with next student. This is what we have done with our work on Pacific salmon migration and centrarchid parental care as good examples. Some of these topical research areas are ones where our lab has taken leadership role in defining – e.g., Conservation Physiology – now a “named” discipline with the same-named journal. 3) The volume of papers we produce (which is a function of meaningful and engaged collaboration and having a talented and rather large team of students and post docs with a culture of writing and SciComm) consistent with these themes leads to more opportunities for self-citation.

    I should note that although I am proud of all of my papers, I don't have any of those "home run" 5000 citation papers. My most cited paper is a TREE synthesis on Biotelemetry and it has been cited around 500 times according to Google Scholar. A SINGLE home run paper would drive down ones N index in a big way. Someone with a few, well cited papers would inherently have a very low N-score while someone with lots of pubs and steady citations but no "5000 citation home run" papers would be more likely to have a high N score - or at least that is a testable hypothesis.

    Steven Cooke - www.FECPL.ca

    ReplyDelete
  11. An addendum - A few comments that arose from Twitter. Caleb Hasler suggested that N index could be driven up by having MANY co-authors (which I do... I collaborate with engineers, lawyers, social scientists, etc plus lots of biologist from many institutions). I also continue to think about Google Scholar - where I have 12000 cites. I suspect that if my N index were to be calculated using those data it would be much lower. Most of the 5000+ of my cites that appear in GS that are not in WOS are in theses, govt tech reports, book chapters, env consulting reports, etc - i.e., because I do applied work, my "influence" is not well measured by WOS. It would be great if the bibliometric analytical tools in GS could be expanded.

    PS - I am no advocating FOR self-citation BUT a high N index does not necessarily mean that someone is willy-nilly citing their own work egregiously. I DO see those papers as an Editor and reviewer... and I call people out on it.

    ReplyDelete
    Replies
    1. I agree that having lots of collaborators would be a likely contributor. Of course, I suppose the other way to do it would be to simply calculate citations and h-index and so on excluding self citations (I realize others have said this). I expect rankings (or whatever) would be the same with as without.

      Delete
  12. Missing from this is a discussion of how gender influences the n-index. See http://www.the-scientist.com/?articles.view/articleNo/39450/title/Self-Citation-Gender-Gap/

    ReplyDelete
    Replies
    1. Interesting indeed - and I suppose to be expected (and decried?). I will have to look to see if they corrected for the important covariates - total number of papers and academic "age"

      Delete
  13. While I agree that journal self-citation sometimes reflects a dishonest journal strategy, this number would be very hard to interpret because more specialist journals would naturally have fairly high self-citation rates, while more generalist journals would have lower rates (e.g., J Veg Sci vs TREE). Also, journals with strict citation limits are likely to have a different profile to those with unlimited citations and longer maximum page counts (e.g., Science vs BBS).

    ReplyDelete

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...