Wednesday, September 12, 2018

Mistakes were made

We make mistakes. Just look at a search for GIFs using the keyword "mistake". Its worth it.

When you are a graduate student, and you make a mistake, you think its the end of the world, or at least the end of your career.  When you are a PI and you make a mistake, you think maybe its Monday. And you take a deep breath and figure out how to correct it.

To help early stage researchers gain some confidence, it might help if more established researchers point out their own errors, to help put your own trials in perspective. So, in that spirit, here are some of my greatest hits. Or should I say my worst misses? As I go through the following (incomplete) list, keep in mind two things: first, these were excruciating at the time. Second, despite them I managed to get a job, get tenure, get awards, and generally do the various things we consider "success" in academic biology.

So, at the risk of deep self-loathing and embarrassment, here are my top 10:

1) Unexpected design flaw 1:  When I was an undergraduate, I did a senior thesis on how drought-stress differentially affected hybrids between two species of willows (Orians et al 1999 Canadian Journal of Botany). I reared hundreds of willows in a greenhouse on the roof of the Williams College biology building. The greenhouse was necessary so I could keep the rain off the plants, to carefully control their watering schedule. But, there was no greenhouse large enough. So, I built one. On the roof. I made it out of electric conduit pipe and plastic sheets, at the suggestion of my advisor Colin Orians. It was lightweight, and about 30 feet long and 20 feet wide. Basically, it was a parasail. So, one day early in my experiment a big thunderstorm came up. Actually, there was a tornado that day just a bit south in Great Barrington MA, which is highly unusual in that neck of the woods. Where I was, it was really really windy. You see it, right?  The greenhouse becomes a great big sail, and wafts off the building in a gust. It ended up blocks away, almost downtown.  #oops  The plants got more than their fair share of water. Luckily, that happened to be one of the rare days when every plant was scheduled to get watered. So, I just had to clean up and build a new greenhouse, this time out of 2x4 wood beams, bolted to the brick wall of the building. Lesson learned. And when the next thunderstorm came, I got to watch my greenhouse dance and thrash but stay safely in place. Which is lucky, because it was much heavier and would have done real damage had it flow away and landed on something.

2) Youthful exuberance: At the end of the willow experiment, I had to wash the dirt of all the root balls to weigh below-ground investment. The hose was on the ground floor. The willows on the 4th floor roof. Maybe it was the 5th floor. What's the fastest way down? Gravity.  So my friend (freudian typo a moment ago, I wrote fiend) Brian Spitzer and I dropped them, one by one. Well over 100 large potted plants. It was AWESOME. And dropping them loosened the root balls up, making it easier to separate the soil from the roots. But when we got down, we realized (drum roll) that there was now a very large quantity of potting soil covering the biology faculty's parking lot. And there was no question where it would have come from and who was to blame. #oops

3) In the zone: In graduate school I dissected many hundreds of stickleback and made careful measurements of their morphology. To pass the time, I listened to every episode of This American Life. It was kind of meditative and relaxing. So much so, that I once worked for 5 straight hours, forgetting I had to TA a discussion section for Genetics, taught by the very severe Les Gottlieb, who many grad students were afraid of. I got along well with Les, but...  this was not okay. He very calmly told me I messed up and suggested I get up in front of the whole class of 300 and apologize and offer to meet anyone who needed to catch up, whenever they needed to, for the next few weeks, as penance. It was mortifying to stand up and own my mistake in front of the whole crowd, many of whom were just a couple of years younger than me. But I did it. And to this day I carry that lesson from Les Gottlieb: we all mess up sometimes. Its not good. But we can compensate if we own it, apologize, and fix it and then go an extra leg to make up for it. And really, that's the point of this post. But if you want to keep reading, there are more funny and (for me) embarrassing examples, ending with a big one (literally) that went viral on Twitter.

4) Escape!  I did a project in graduate school using experimental evolution with Drosophila. I had a dozen or two large tupperware cages, each about 0.5 * 1 meter long, with a plastic lid that I sealed shut except to change trays of media. One day I realized a lid wasn't sealed well and a few flies had gotten out. Just a few, not enough to end the study. But they found a supply of apples that were stored in a neighboring office. The office belonged to National Academy of Science member Thomas Schoener, who some people fondly called "Conan the Ecologist" for his weight lifter's build. The flies reproduced, as flies are wont to do. He came back from a trip to find HUNDREDS of fruit flies in his office. And, like the potting soil from the willows, there was no question as to the source. He was not happy, to say the least, and I was mortified. I apologized in a long carefully-worded letter, and in person. And now we collaborate. It was terrifying but ultimately okay.

5) "Bolnick's Folly":  I have had a few experiments fail. My favorite is what my students call "Bolnick's Folly". I did a study in the mid-2000s documenting a pair of highly connected lake populations of stickleback, one large and one small lake joined by a short channel that fish easily passed through. It was a great system for studying migration-selection balance, and I thought "maybe I could prevent gene flow", and then track evolutionary divergence, then resume gene flow and watch the populations collapse again. Unreplicated, but uniquely cool test of migration-selection balance in the field. I still kind of want to do it. So, a tech (On Lee Lau) and a middle school science teacher (Tania Tanseem) and I spent a week of back-breaking labor installing a carefully designed fence to prevent gene flow, made from a combination of mosquito screen (prevent fry from moving across) and fencing and fence posts and lots of hardware. We installed it, and it looked clumsy but effective. A year later, however, I learned otherwise. The mosquito screen (multiple layers for safety) built up sediment. Water that would have flowed through the channel built up, and the pressure bent the thick metal fence posts right over in the middle. When we arrived the next spring, the whole thing was shredded and bent and destroyed. I just had to curse, then laugh, then clean up and move on.

Planning to experimentally stop gene flow through this channel between two lakes

Gene flow stopped

Gene flow resumed

6) Retraction: Here's the most painful one, really. I made a mistake in my R code in 2007, when I was first learning R. It made me think a result was significant, when it was definitely not. Basically I calculated (1-P) when I meant to calculate P. I published it. Then in 2016 someone said they couldn't replicate my stats (I had made the data public). I shared my code with him, and we together found the error. I instantly retracted the paper, and wrote a blog about it. That stimulated others to blog about their own retractions or publication errors. The whole experience turned out to be good, because by being honest and open and apologetic people thought it was a great example of how science should self-correct. I got lots of compliments, and even had an article about me in Wired magazine. But this one I don't laugh at. I still wince and shy away from it, but force myself to acknowledge it now and then, like the ancient mariner recounting his albatross (I hope everyone gets the literary reference, if not, go read the poem for cryin' out loud).

7) Denied: Another painful one. A prospective postdoc and I worked hard together to write an NSF grant to fund them (no, I won't reveal who or their gender even). I was busy so I left more of the fine details in their hands than I might have normally done, but the future postdoc was so incredibly organized and effective. I was really excited & confident. But when we finally got the grant submitted just before the deadline, the news broke: we had made a small red-tape error in how we budgeted for a collaborator (we had them file as a co-PI, whereas this RFP required that they be a subaward). The grant was returned, and in the year before we had another opportunity to apply for this RFP the prospective postdoc took a different position. (PS, if anyone wants to go in with me on a grant proposal to fund you as a postdoc, I have a mostly written proposal!)

8) Denying. As Editor In Chief I accidentally sent someone a decision letter meant for another author / paper. I immediately caught it, emailed them (before they even saw the incorrect rejection), and at my suggestion they deleted my mistake without even reading it. Apologies were accepted in good humor, and all was well.

9) Mistakes were made. I just moved to the University of Connecticut. UConn just started a brand new online purchasing system. New PI, new system, what could possibly go wrong. Well, I set out to purchase some supplies for our cell culture work. I put together the shopping list, and filled the cart on FisherSci online, and proceeded to checkout. Everything looked right. About $2,000 total. Okay. Then I moved the FisherSci cart over to HuskyBuy to pass it on to our purchasing folks. I thought it was moving through the system, but it turns out it was waiting for some information from me. A few weeks later I discovered that it was delayed waiting for me. I went to fill in some info (commodity code, delivery location, etc). And somehow, the 1 case of 10 ml, 1 case of 20 ml, 1 case of 50 ml pipette tips I wanted were turned into 285 cases. Of each. Some several times over. As best as I can reconstruct, 285 is the commodity code for research supplies. Somehow that got swapped with the number of cases. Not sure how. But whatever or whoever is to blame (maybe me, maybe the software, maybe gremlins)... I got an email saying there were 4 pallets of supplies waiting in the hall outside my lab. If you don't know, a pallet is one of those wooden squares. 4 feet on a side. These were overhanging with boxes. Hundreds of cases. 12,000 pounds. And apparently something like $14,000 worth of goods (not what I double-checked when I finished filling my cart). That was nearly a half million pipettes. Each is about a foot long. That's about 150 kilometers of pipettes. I could more than lay them end to end across the width of the state of Connecticut.

When I got in and saw what happened, my many years of mistakes kicked in: the panic subsided. There will be a solution. I emailed the FisherSci rep. To their immense credit (earning my long-lasting devotion), they are processing the return and refund, except for the few boxes I actually wanted. Panic turned to pragmatism turned to humor. My postdoc Lauren Fuess and I were laughing about this non-stop. Then I posted this picture on twitter, and the whole thing went viral.

There were some really fun GIFs to go with it.

The best part of going viral with an embarrassing mistake, is that people come out of the woodwork to tell the stories of their own crazy embarrassing mistakes. At the end of this post I provide a sample, just to drive home the point: anyone can make a big mistake. But it takes strength of mind to get pragmatic and fix it, apologize, and move on. So to you younger scientists: don't panic about your mistakes. Just fix them, own them, and move along.

10) I've not yet retired, so I'm sure there is a doozy to put in here sooner or later. They say we learn from our mistakes, and that's (mostly) true. But I, for one, keep finding new and creative ways to screw up. But that's life, and I'll just move on. Or, if it happens close enough to retirement time, I'll just go out of the field in a blaze of embarrassing glory.

Saturday, September 8, 2018

On simulation, and genealogies

I’m writing this post about what I think will be a really useful new technique for forwards-time population genetics simulations, tree-sequence recording. (“I” in this post is Peter Ralph; see the author lists on the papers below for who else has been working on this.) This isn’t just about a under-the-hood technical advance: tree-sequence recording enables a bunch of other neat analyses that let you get more out of your simulations, in addition to making simulations faster (by orders of magnitude if your simulations are big). I’ll explain how and why, below. I’m particularly excited because it finally lets us simulate reasonably large populations with lots of selection on a whole-chromosome scale, and life is so much better when I can simulate the processes I am thinking about.
Everything I’m talking about here is described in two recent preprints:
There’s also some good explanation and examples in the latest SLiM manual and in the msprime documentation.
The basic idea is that by recording the genealogy of the population as you go along, you don’t have to carry around neutral mutations - you can add them in afterwards. This makes things much faster, and also gives us all the genealogies! I’ve been wanting to do this for a long time, and probably quite a few other people have had the idea also, but to make it happen took several new ideas and a lot of hard work by several people.

When should I use forwards-time versus coalescent simulations?

Let me get one thing out of the way. A forwards-time, individual-based simulation is the simple, natural thing to do: the computer keeps track of digital organisms, has them reproduce, mutate, die, etcetera. However, population genetics often uses coalescent simulations, which work by tracing paths of inheritance back through time from your sampled individuals. This is much faster than a forwards simulation because you skip over a lot of irrelevant detail, such as possible ancestors that never had offspring. Producing genome sequences with a coalescent simulation is equivalent to a forwards-time, individual-based simulation, but only if (a) there are no selective differences between genetic variants, and (b) all your populations are large. (There are some additional caveats - it’s possible to do a coalescent simulation around a single selected locus, for instance - but that is pretty much true.) This means that if those two conditions aren’t satisfied – neutrality and large populations – then you can’t use a coalescent simulation. (The technical reason is that these conditions are necessary for the reverse-time process to be Markovian.) Condition (b) also means, for subtle reasons, that you can’t do coalescent simulations in continuous space.
So, coalescent simulations are useful if you’re willing to assume neutral evolution in large, randomly mating populations that exchange migrants, but if you want to get much more complicated than that, you need to use forwards-time simulation.

What is a tree sequence?

Suppose you’ve sampled a collection of genomes. At any location on the genome, your samples are related by a genealogical tree (the “gene tree” there). As you move along the genome, the tree changes because of recombinations in ancestors somewhere back in the tree. Since each recombination only changes a small bit of the tree, nearby trees along the genome are similar. A tree sequence is a particularly compact and useful way of storing that sequence of correlated trees. Once you’ve got the trees, it’s easy to keep track of who has which alleles by just marking on the trees where every mutation occurred. The tree sequence does this, too - it stores both all the gene trees, and all the genotypes. It stores the trees very efficiently because adjacent trees are highly correlated, and it stores genome sequence very efficiently using the trees.
This data structure was introduced by Jerome Kelleher, Alison Etheridge, and Gil McVean in Kelleher et al for the purposes of implementing the super-fast coalescent simulator msprime. Its properties are discussed in that paper, and also in this recent preprint. It has been subsequently extended in various ways, notably to allow storing information about polyploid individuals. See the msprime documentation for the definitive specification. In information content, a tree sequence is equivalent to an Ancestral Recombination Graph (or “ARG”), but Kelleher et al chose the name “tree sequence” to refer to the data structure representing that information.

Why can it make my simulation go faster?

The reason that storing extra information can speed up your simulation is that it lets you ignore a bunch of other stuff. In many simulations, most mutations are neutral, and neutral mutations - by definition - don’t affect the ongoing dynamics of the simulation in any way. That means that if you know all the gene trees, you can add them in after the fact, which saves you the trouble of producing lots of mutations that don’t end up segregating in the final sample, and keeping track of genotypes at all those locations as you go along. It turns out that keeping track of all those neutral mutations really slows down forwards-time simulators, and that adding mutations after the fact turns out to be extremely fast.
Tree sequence recording isn’t free, so it doesn’t always speed things up. If your simulation has no neutral mutations (or if you forget to turn them off), it won’t help at all. Generally speaking, tree sequence recording helps more for larger simulations, and slows down very small ones. In our tests, we find that simulations that take more than a few hours are typically 1-2 orders of magnitude faster. This lets us simulate tens of thousands of individuals with whole-chromosome-scale genomes - simulations that previously would have taken weeks now take hours.

What else can I do with tree sequences?

The other really exciting thing about tree sequence recording is that, well, we end up with the tree sequence! That means that we don’t have to guess from genotypes how things are related; we can actually interrogate the gene trees themselves to find out. The msprime python package provides a lot of nice tools to work with tree sequences, and lets you iterate through all the trees in a genome very quickly. The tree sequence keeps all sorts of information: for instance, you can find out the ages of all the selected mutations, and identify any sites that had more than one mutation. You can see how well your local gene tree reconstruction method is doing, or whether those long IBD blocks are actually inherited from single common ancestors. Here’s some other fun things you can do.
Remembering individuals: By default, the tree sequence you get at the end of a simulation contains the history of the entire final generation. This might be more information than you need, but the tree sequence format is so efficient that it doesn’t matter; you can select the samples you want in post-simulation analysis. You can also ask SLiM to “remember” other individuals along the way. Normally, information extraneous to the genealogies of the final generation is discarded along the way, but remembered individuals won’t be discarded - their histories will be kept in the tree sequence for ever.
This lets us determine true local ancestry: for instance, you could introduce a few Neanderthals as remembered individuals into a simulated population of modern humans, and at the end ask which bits of genome in the final generation were inherited from those Neanderthals, just by seeing which branches of the trees trace through them. Or, you could remember entire populations at some point in the past, and then identify which segments of the final generation inherit from which ancestral population.
Recapitation: Forwards-time simulations have to start somewhere. But, you might ask, where did the first generation come from? Who were their parents? What are their genotypes? To avoid taking a stand on these questions, we generally try to simulate for long enough that their answers don’t matter - by the end of the simulation, the first generation is so far in the remote past that each of the initial individuals are ancestors to either all or none of the final generation. In other words, all the trees in the final tree sequence have coalesced to a single ancestor. However, this can take quite a long time: in a population of size N, you have to simulate for something like 10N generations. One way that people have made this requirement less onerous is by using a (much faster) coalescent simulation to provide the initial generation with history and genotypes. As I discussed above, this isn’t quite right, but using the coalescent for deep-time history is a lot less wrong than using it for modern dynamics. For instance, the effects of geography in the remote past wash out, and the deep-time portions of a genealogy, even with strong population structure, are expected to look like an unstructured coalescent. And besides: if N is very large, probably your species’ population structure looked pretty different N generations ago.
It’s easy to imagine gluing a coalescent simulation to the top of a recorded tree sequence to provide the initial generation a genealogical history (and, genotypes). But, we can go one step further: we can run a forwards-time simulation without providing the history of the first generation, and then, when it’s done, scan through the tree sequence looking for any uncoalesced segments of the genome. We can then perform a coalescent simulation using only those segments, thereby filling in only the portions of history necessary for understanding the final generation. Since those trees are missing their “heads”, which we then fill in, we call the method “recapitation”.

How does tree sequence recording work?

Imagine that as the simulator proceeds, we ask it to write down, for each new genome, who its parents were, which segments of genome it inherited from them, and the locations and states of any new mutations. This clearly tells us everything we need to know: from this information, we can reconstruct the tree sequence for every individual, ever. This is precisely what we do, and this description of how segments of genome are inherited perfectly mirrors the tree sequence data structure developed for msprime. That was the first good idea that made this possible: the data structure, and the excellent set of algorithms and tools that Jerome Kelleher implemented for it. But, we still needed another good idea, because it turns out that the history of everyone, ever, is way too much data. If N=10,000 and you simulate for 10N generations, then “everyone ever” is 109 individuals. So, we developed an algorithm, called simplification, to discard the extra stuff. It works by tracing the ancestry of the individuals we are interested in back up through the tree sequence: everything that doesn’t get traced through is discarded, for instance. (In fact, the information this process discards is exactly the information that gets ignored by a coalescent simulation; the difference is that using this method lets us extract that minimal information from (much more flexible) forwards simulations.)

What is the software?

  • SLiM v3.1: SLiM now has full support for tree sequence recording, including recording mutations, remembering specified individuals, and detecting coalescence, and can output recorded tree sequences as .trees files for analysis in Python.
  • a fwdpy11 implementation : an implementation of tree sequence recording using fwdpp, used for benchmarking in this paper.
  • ftprime: a python implementation interfacing with simuPOP, also in this paper.
  • msprime, the python package: tools for working with tree sequences, in python.
  • pyslim: a python package for reading and manipulating the extra information that SLiM stores in tree sequences, mostly a thin wrapper around msprime.
  • tskit: all the tools above use, under the hood, the same set of C code for working with tree sequences, bundled with msprime. In the future, we plan to split this code from the coalescent simulator as a standalone library.

Prior work

While working on this project we came upon a number of previous attempts in this direction. Padhukasahasram, Marjoram, Wall, Bustamante, and Nordborg kept track of a rolling window of the eight previous generations, only tracking segments having descendants over that period. A program called AnA-FiTS by Aberer and Stamatakis stored genealogies to allow putting neutral mutations on afterwards, but without the important step of simplification.
And, while working on this project, I passed the discard pile from a lab down the hall, and stumbled upon the following 28-year-old unpublished thesis, titled “The Tree Simulator”:

 It was a hefty tome, mostly 300 pages in an obscure-to-me programming language. Was this a prior implementation of the very same project? Were we only treading in the footsteps of a previous Oregon evolutionary biologist? Fortunately, Ben Haller was visiting at the time, and is fluent in Mac Pascal, the language the work was written in. We were able to determine that, in fact, the program’s purpose was to simulate realistic pictures of trees for landscape architecture drawings.

Tuesday, September 4, 2018

Adapting to America

Exactly ten years ago, I moved to the US from Shanghai, China, after finishing my public school and college education, and started graduate school at Georgia Tech.  I am now a postdoc at the University of Pittsburgh, working with Martin Turcotte.  My research is on the eco-evolutionary dynamics in microbial and plant communities.  
In the past week, I exchanged ideas with friends to write this blog on the challenges for Chinese students to pursue graduate studies in America.  Chinese students have recently become the largest international student group in many American research universities.  As I talked with other Chinese students, I recognize three unique challenges we have been facing and fighting over the decade that greatly impacted our research and life.  We hope this blog would be helpful for newcomers and their advisors and mentors.  

Communicating in Chinese way or American way?
            I used to believe that there is only one way to interact with people: “My” way.  However, communication in a Chinese or American way is quite different, where the Chinese way involves subtle hints, and the American way is more straightforward.  I learned this in a hard way.  One of my committee members complained about me ignoring him over the years working together.  I eventually realized that Chinese and American start a conversation in different ways.  Between two Chinese, the talk-initiator looks at the other person; when that person looks back and the eye contact is established, talking starts naturally.  Between two Americans, people begin the conversation by calling other’s name. 
My name was rarely called in conversations with others; the committee member could not memorize the correct pronunciation of my name.  In turn, with no visual sign demanding my attention, I did not realize that he would like to talk to me on many occasions.  Over the years, I noticed many such small-but-less-cheerful points; and I intentionally turned to American communication way cautiously like my scientific findings that creatures adapt to the environment.
In academic and research communications, our previous education, unfortunately, does not prepare us enough to express ideas effectively.  Even though we speak fluent English already, when we land in America, communication problems sneak with us and sprout in the alien cultural environment.  One of my friends was criticized by his advisor for being terrible at critical thinking and a slow writer, while we think the culprit was a mismatch in the logic flow of writing. Connectors (e.g., although, because, thus, until, etc.) are key to create clear sentences in English, while they are not essential in Chinese writing. Struggles in applying effective organization skills in writing are essentially a common challenge for us, Chinese students.
I would like to share a piece of advice from my PhD advisor, who also came to the US as an international student many years ago.  He suggested to me when listening to a seminar and conversation between other people, or reading a paper, paying attention to the organization of the content in addition to the content itself. 

Taking selfies is quite a Chinese thing.  Most of us have a few pictures of ourselves on our phones.  These are me, and I am not narcissistic.

Whom do we socialize with?
            I agree with all of my friends that the majority of friends we have here are Chinese, and we are not exposed enough to American culture.  Even though we grew up together with American movies, TV dramas, music, when we are completely stuck in America, we are more comfortable hanging out with Chinese friends in spare time. 
Common social activities differ drastically between Chinese and American cultures. Typical Chinese students do not enjoy social hours, parties, or department retreats here.  To be honest, I go to these activities as if they were part of my work.  Several reasons make it hard for Chinese students to socialize with Americans.  Personally, I am introverted, and it exhausts me to talk to strangers.  For others, I heard complaints about not knowing the politicians and celebrities other people mentioned or just the bad taste of beer.  
One consequence of not being as social as other groups is that Chinese students get less trained at acquiring open public resources.  I was not surprised to know that quite a lot of Chinese students had no idea about communication and career development courses and services, which are though not common in China, are available in most American universities. 
            Then, how can we improve our social skills?  Most of us agree, with our own experiences, that we should keep going to all the department social events and all those private ones we got invited.  We brand ourselves, establish connections with new friends, and improve our interpersonal and communications skills.  In addition, I personally recommend we invite people from different backgrounds to activities we enjoy and share our joy.  For example, I am good at finding new fancy restaurants.  I have made a good number of international “foodie” friends who can enjoy food and talk about science together. 

No automatic alt text available.
Karaoke is a common social activity that many Chinese enjoy.  This picture from @dailylifeofahvia describes different Karaoke singers you will meet.
No standard answer.  What should we do?
One stereotype for Chinese students is that we show great interest and even respect to “Standards”.  We are eager to find out the standard answer to a test question, the standard routine for an experimental protocol for problems in research, and the standard procedure for manuscript preparation.  Many Chinese students are the champions in standardized exams and get accepted by graduate schools like elite athletes.  The worship of standards has whiffed illusions between our advisors and us.  Some of my friends felt that their advisors were not supportive enough to provide them with standard protocols for our research.  In turn, advisors felt they were not so proactive as in learning and research.  
I learned the “no standard answer” mode in my qualifying exams, one of the biggest headaches for most first- and second-year PhD students.  At Georgia Tech, the first part (written exam) of the exam was writing essays on several topics.  I was distressed before the exam not only because of the heavy reading and writing loads in my second language, but also the anxiety towards the openness of topics. 
I recalled one question in my written exam was to discuss the connections between the metapopulation and metacommunity theories.  I reviewed all the papers I could remember and argued that they basically dealt with the same issue but at different levels—interactions between subtypes in populations versus between species in communities. 
Two of my committee members read and commented on my essay.  Both of them gave “high pass” on my essay, but with opposite comments.  One agreed with my argument and provided me several papers I did not cite for further readings; the other one, however, seemed not happy with my answer and summarized several distinctions between two theories in the comment, making the “high pass” beyond my expectation.  That was when I realized that committee members did not have a standard answer in their mind when they wrote the exam, and we were not judged based on a standard answer, rather the ability to interpret the information we have acquired.  During my time in graduate school, I found that a simple and standard answer is a rare thing, especially in research.  Whenever there is, I ought to find it out myself; share it with the science community. 
            All of my Chinese friends prospering in the US got their own stories of getting out a fixed mindset of standards.  We have the thriving creativity inside us.  Sometimes, we just need a little bit of encouragement. 

In this blog, I did not make generalizations over other international groups due to the above-mentioned fact that I have limited social circles.  I believe students from other countries are struggling with similar problems emerging from language barriers and cultural differences.  We are happy to hear their stories. 
I would like to thank Na Wei (University of Pittsburgh), Qixin He (University of Chicago), Tianze Song (Georgia Tech), Xia Hua (Australian National University, graduated from Stony Brook University), Zhongyun Huang (Roche Pharmaceuticals, graduated from the University of Massachusetts), and Chenxi Yang (homemaker from Boston, graduated from Brown University) for inspiring me on this blog post.

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...