Wednesday, August 1, 2018

Do certain subdisciplines lead to a higher H Index?

"Two roads diverged in an academic study. I took the one less-traveled by, and that has made all the difference [to my H-index]".  With apologies to Robert Frost.

Andrew just recently posted a blog here about H indices & how much weight to give them. I like his basic message that H indices, like the number of twitter followers you have, is best viewed as a measure of your H index, or your twitter presence, and not a fundamental metric of your worth as a scientist. I won't disagree. But, I will point out that for better or for worse, there are people who don't share Andrew's casual view. There are people who like metrics, and use them to do important things. Those people are sometimes called "administrators". And they do things like hire, promote, and dole out raises. So, it's not necessarily safe to dismiss H indices and the like, unless you can convince your local friendly bureaucrat to do the same. Good luck with that.

I recently had the dubious pleasure of being on the Merit Review Committee at my previous institution. We had a rubric that we used to judge our fellow faculty, based on publications (number, journals, citations), and grants received, students mentored, course evaluations, service, and such. This was used to rank faculty by three criteria. Perversely, research scores were used to judge who got a lighter teaching load. Total scores were used to dole out merit raises (we had no annual built-in cost of living increase to adjust for inflation, just merit raises). Now, we were given about two weeks to score 40ish faculty, just as the fall semester began. You can be sure we didn't read every paper that everyone produced in the previous year, let alone the past five years (our relevant period). We may look at some papers, but time is tight. Instead, we were given H indices and cumulative citations and numbers of papers published, both career-total and over the previous 5 years (so as to not favor older faculty). There's quite a range of H-values (to pick one of several possible metrics), which makes it very tempting as a tool for ranking our peers.

I want to raise three points now.

First, we really did look at the whole 5-year report, and rank people holistically. We didn't just use Excel to order people by their H index, or dollars earned, or the product of the two. We really were trying to be just, these are our friends and colleagues after all.

Second, we are aware that these indices can be biased by implicit (or explicit) bias within the research community, based on sex or ethnicity or sexual orientation. Andrew's excellent H-index (far greater than my own) probably does benefit from being a white male, and especially an outgoing and admittedly self-promoting white male.

Third, and what I really want to talk about here, is that these indices can vary by field. I noticed early on in my Merit Review involvement that some colleagues I really respect had much lower H-indices (and cumulative citations, total or 5-year) than others. I began to wonder how much that has to do with their sub-discipline of choice. For instance, there's a crazy ton of gut microbiome research out there today, so does studying that give you an edge? Certainly my own few gut microbiome papers (N = 4 so far) accrued citations at a much higher annual rate early on than most of my other papers. I voiced this issue out of concern that some of our colleagues were at a disadvantage in the H-index race, by their choice of topic. I was quickly shot down by a colleague who used the following argument: (I am paraphrasing)  on average the number of citations your paper accrues should be independent of field. Here's the math:

Let's assume that a discipline A publishes N_a papers per year. Discipline B publishes N_b papers per year. A is more popular than B, so N_a >> N_b. More gut microbiome papers than there are fish immunology papers.

Next, let's assume that the typical paper has R entries in its Literature Cited section. If you publish only in Nature or Science, R ~ 30; if in AmNat or Evolution or Ecology R ~ 80. To be flexible, we will acknowledge that each discipline has its own average number of citations per paper, R_a and R_b, but to start let's let R_a ~ R_b.

Okay, now the total number of entries in all Lit Cited sections of all papers in a field is N_a * R_a (for field A) and N_b*R_b for field B. These citations are to a smattering of other papers in the field.  To calculate the average number of citations to the typical paper, we just divide the total number of things cited by the number of things there are to cite. So in field A, the average paper gets cited N_a * R_a/ N_a times, which is just R_a.  And the average in field B = N_b*R_b / N_b = R_b.  So as long as fields A and B have similar caps in their Lit cited (R_a ~ R_b), they will have the same average citation rate. That's unrelated to the total number of papers in each discipline (popularity). Actually, it might even be better to publish in an unpopular field because popular topics get into Science and Nature and PNAS which slightly drive down R because of their more stringent caps on citations. The punch line here is, if you are an average author, your field shouldn't matter; you won't get cited much per year anyway (on average, R times per paper per year, times the number of papers you publish).

So, was my colleague right that H values and total citations are independent of discipline? Of course not, or I wouldn't be writing this. Let's forget the average for a minute and focus on the distribution. At a minimum, you could write a paper that nobody cited at all. Regardless of discipline. So your minimum citations per paper is zero. But your maximum, that's another story. If you publish in popular discipline A (remember, N_a >> N_b), maybe you'll do something brilliant and EVERYONE will cite you. So your maximum number of citations in a year, for that one great paper, is N_a.  In the other field, you could write the best paper ever and if EVERYONE cites you your total citations received is N_b. And we already said N_a >> N_b.  The punch line here is: if you are going to write an above-average paper, do so in a popular field, if you care about citation indices.

To summarize. Your average citations-received per paper should be independent of your discipline. But your highly-cited papers will be really really highly cited in a popular field, and ignored in an empty field. The tails of the distribution matter because we all write some below-average papers and some above-average papers, and it is the latter that drive the citation indices that our administrators use. The upshot is, when administrators use these indices they implicitly favor people contributing above-average papers to busy fields of study, over those contributing the best papers to sparse fields of study.  That's not necessarily wrong, but it certainly can stifle innovation and discourage people from forging their own path.

No comments:

Post a Comment

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...