Tuesday, December 21, 2021

Individual Development Plans

Many institutions encourage the use of an Individual Development Plan (IDP) to guide conversations between faculty and their lab members (postdocs, grad students, technicians, undergraduates). There are a variety of examples available on the web. After reviewing a number of these, I've collated what I found most helpful to create this document that my lab will use as a standard form for conversations about career goals and training needs:

Individual Development Plan

Instructions: Please create a duplicate copy of this document and fill out whatever parts are most helpful to you, to a level of detail that suits you. This document is for you first and foremost, so is not meant to be onerous busywork, but rather to give you a chance to reflect, plan, and discuss the insights and plans with your mentor.




Reviewed and discussed with ____<mentor name>________  on  _____<date>________.

Trainee signature:__________________

Mentor signature:__________________


  1. What type of job do you aspire to have in 5-10 years?  Why? What do you find rewarding about this choice?

  1. What do you see as an alternative job path, if you have any in mind? What situations might move you to adopt this alternate path?

  1. As far as research, what long-term scientific outcomes do you wish to achieve in your career (if any)

  1. What skills / knowledge / products has your previous training and work given you that help you towards achieving goal (1) or (2)?

  1. What do you see as your primary weaknesses that limit your ability to achieve your career aspirations?


For each of the following points, describe your current strengths, and what needs to be improved or what skills do you lack, but feel are needed? The following may include coursework needs/plans.

  1. Discipline-specific conceptual knowledge (e.g., literature reading). What fields are you most familiar with, and what fields / topics are you weakest in but feel you need to master? We will use this to plan some reading assignments.

  1. Laboratory skills (if applicable)

  1. Field work skills (if applicable)

  1. Computational biology skills (if applicable)

  1. Statistical analysis 

  1. Graphical presentation of data

  1. Writing manuscripts

  1. Grant writing

  1. Oral presentation & conference attendance. Have you given posters/talks at conferences? What was this experience like? What could be improved? What conferences do you feel you should be attending, and why?

  1. Networking: what do you need to improve your academic network. Who should this network include, and why, and how will you connect with these people?

  1. Leadership, mentoring, and project management

  1. Responsible Conduct of Research:  Ethics, Animal Care, Data Archiving, Open Code, Permitting, etc.

  1. Professionalism: What professional development skills do you feel you lack or need to know more about?

  1. What are you doing to improve the social / cultural setting in which you work, including promoting Diversity Equity and Inclusivity?

  1.  Health and wellness. What do you do to maintain a healthy work-life balance to decrease the risk of burnout? You do not need to list specifics, but you should reflect on how you maintain personal physical and mental health, and healthy relationships with friends, partners, family, and colleagues.


Goals should be SMART:






  1. List and briefly describe your research goals for the coming 12 months. What biological questions are you asking, and what product(s) do you hope to generate? 

  1. List and briefly describe your training goals for the coming 12 months, given your self-assessment above. What are your priorities for improvement, and how do you plan to achieve those improvements? These may include readings, professional development courses, etc.

  1. How will we determine, in 12 months, whether you are successful at these goals?

  1. For Graduate Students, what courses do you plan / need to take?

Optional: Mock Job Application

For postdocs: You may wish to develop a draft job application for a generic job that you anticipate applying for, and share that with your mentor and fellow lab members for feedback. A typical job application includes:

  1. A coverletter, one page, briefly articulating who you are, your expertise, and why you are applying for this position.

  2. A 2-3 page research statement articulating your past achievements, ongoing work, and ending with a 5-year vision for your research directions. Citations and figures may be included.

  3. A ~2 page teaching experience and philosophy statement, covering your approach to teaching, the courses you might be qualified to teach, what you have taught in the past, any pedagogical training experiences, and experience mentoring undergraduates or others.

  4. A 1-2 page Diversity statement articulating what you have done, are doing, or plan to do to advance diversity and equity in the classroom, lab, and science more broadly.

Friday, December 3, 2021

Guidelines for archiving data AND code

The following is a cross-post from the Editor's blog of The American Naturalist, developed with input from various volunteers (credited below).



December 1, 2021

Daniel I. Bolnick (daniel.bolnick@uconn.edu), Roger Schürch (rschurch@vt.edu), Daniel Vedder (daniel.vedder@idiv.de), Max Reuter (m.reuter@ucl.ac.uk), Leron Perez (leron@stanford.edu), Robert Montgomerie (mont@queensu.ca)

Starting January 1, 2022, The American Naturalist will require that any analysis and simulation code (R scripts, Matlab scripts, Mathematica notebooks) used to generate reported results be archived in a public repository (we specifically prefer Dryad, see below). This has been our recommendation for a couple of years, and author compliance has been very common. As part of our commitment to Open and Reproducible Science, we are transitioning to make this a requirement. The following document, developed with input from a variety of volunteers, is intended to be a relatively basic guide to help authors comply with this new requirement.




The fundamental question you should ask yourself is, “If a reader downloads my data and code, will my scripts be comprehensible, and will they run to completion and yield the same results on their computer?” Any computer code used to generate scientific results should be readily usable by reviewers or readers. Sharing this information is vital for several reasons as it promotes: (i) the appropriate interpretation of results, (ii) checking the validity of analyses and conclusions, (iii) future data synthesis, (iv) replication, and (v) their use as a teaching tool for anyone learning to do analyses themselves. Shared code provides greater confidence in results. 

The recommendations below are designed to help authors conduct a final check when finishing a research project, before submitting a manuscript for publication. In our experience, you will find it easier to build reusable code and data if you adhere to these recommendations from the start of your research project. We separately list requirements, and recommendations in each category below.




  Great template available here: https://github.com/gchure/reproducible_research




 Prepare a single README file with important information about your data repository as a whole (code and data files). Text (.txt or .rtf) and Markdown (.md) README files are readable by a wider variety of free and open source software tools, so have greater longevity. The README file should simply be called README.txt (or .rtf or .md). That file should contain, in the following order:

Citation to the publication associated with the datasets and code 

Author names, contact details

A brief summary of what the study is about 

 Identify who is responsible for collecting data and writing code.

List of all folders and files by name, and a brief description of their contents. For each data file, list all variables (e.g., columns) with a clear description of each variable (e.g., units)

Information about versions of packages and software used (including operating system) and dependencies (if these are not installed by the script itself). An easy way to get this information is to use sessionInfo() in R, or 'pip list --format freeze' in Python.


RECOMMENDED (for inclusion in the README file):

  Provide workflow instructions for users to run the software (e.g., explain the project workflow, and any configuration parameters of your software)

 Use informative names for folders and files (e.g., “code”, “data”, “outputs”)

  Provide license information, such as Creative Commons open source license language granting readers the right to reuse code. For more information on how to choose and write a license, see choosealicense.com. This is not necessary for DRYAD repositories, as you choose licensing standards when submitting your files.

 If applicable, list funding sources used to generate the archived data, and include information about permits (collection, animal care, human research). This is not necessary for DRYAD repositories, as it is also recoded when submitting your files.

  Link to Protocols.io or equivalent methods repositories where applicable



  Scripts should start by loading required packages, then importing raw data from files archived in your data repository.

  Use relative paths to files and folders (e.g. avoid setwd() with an absolute path in R), so other users can replicate your data input steps on their own computers. 

 Annotate your code with comments indicating what the purpose of each set of commands is (i.e., “why?”). If the functioning of the code (i.e., “how”) is unclear, strongly consider re-writing it to be clearer/simpler.  In-line comments can provide specific details about a particular command

 Annotate code to indicate how commands correspond to figure numbers, table numbers, or subheadings of results within the manuscript.

  If you are adapting other researcher’s published code for your own purposes, acknowledge and cite the sources you are using. Likewise, cite the authors of packages that you use in your published article.



  Test code before uploading to your repository, ideally on a pristine machine without any packages installed, but at least using a new session.

  Use informative names for input files, variables, and functions (and describe them in the README file).

  Any data manipulations (merging, sorting, transforming, filtering) should be done in your script, for fully transparent documentation of any changes to the data.

  Organise your code by splitting it into logical sections, such as importing and cleaning data, transformations, analysis and graphics and tables. Sections can be separate script files run in order (as explained in your README) or blocks of code within one script that are separated by clear breaks (e.g., comment lines, #--------------), or a series of function calls (which can facilitate reuse of code).

  Label code sections with headers that match the figure number, table number, or text subheading of the paper.

  Omit extraneous code not used for generating the results of your publication, or place any such code in a Coda at the end of your script.

  Where useful, save and deposit intermediate steps as their own files. Particularly, if your scripts include computationally intensive steps, it can be helpful to provide their output as an extra file as an alternative entry point to re-running your code. 

  If your code contains any stochastic process (e.g., random number generation, bootstrap re-sampling), set a random number seed at least once at the start of the script or, better, for each random sampling task. This will allow other users to reproduce your exact results.

  Include clear error messages as annotations in your code that explain what might go wrong (e.g., if the user gave a text input where a numeric input was expected) and what the effect of the error or warning is.



Checklist for preparing data to upload to DRYAD (or other repository)


Repository contents 


  All data used to generate a published result should be included in the repository. For papers with multiple experiments or sets of observations, this may mean more than one data file.

 Save each file with a short, meaningful file name and extension (see DRYAD recommendations here).

 Prepare a README text file to accompany the data. Our recommendation is to put this in the single README file described above. For complex repositories where this readme becomes unmanageably long, you may opt to create a set of separate README files for the overall repository, with one master README and more specific README files for code and for data. But, our preference is one README. The README file(s) should provide a brief overall description of each data file’s contents, and a list of all variable names with explanation (e.g. units). This should allow a new reader to understand what the entries in each column mean and relate this information to the Methods and Results of your paper. Alternatively, this may be a “Codebook” file in a table format with each variable as a row and column providing variable names (in the file), descriptions (e.g. for axis labels), units, etc. 

 Save the README files as a text (.txt) or Markdown (.md) files and all of the data files as comma-separated variable (.csv) files. 

 Save all of the data files as comma-separated variable (.csv) files. If your data are in EXCEL spreadsheets you are welcome to submit those as well (to be able to use colour coding and provide additional information, such as formulae) but each worksheet of data should also be saved as a separate .csv file.


 We recommend also archiving any digital material used to generate data (e.g., photos, sound recordings, videos, etc), but this may require too much storage space for some repository sites. At a minimum, upload a few example files illustrating the nature of the material and a range of outcomes. We recognize that some projects entail too much raw data to archive all the photos / videos / etc in their original state.

Data file contents and formatting  


 Archived files should include the raw data that you used when you first began analyses, not group means or other summary statistics; for convenience, summary statistics can be provided in a separate file, or generated by code archived with the data.

 Identify each variable (column names) with a short name. Variable names should preferably be <10 characters long and not contain any spaces or special characters that could interfere with reading the data and running analysis code. Use an underline (e.g., wing_length) or camel case (e.g., WingLength) to distinguish words if you think that is needed.


 Omit variables not analyzed in your code.

 A common data structure is to ensure that every observation is a row and every variable is a column.

 Each column should contain only one data type (e.g., do not mix numerical values and comments or categorical scores in a single column).

  Use “NA” or equivalent to indicate missing data (and specify what you use in the README file)



 Upload your data and code to a curated, version-controlled repository (e.g., DRYAD, zenodo). Your own GitHub account (or other privately or agency controlled website) does not qualify as a public archive because you control access and might take down the data at a later date.

 Provide all the metadata and information requested by the repository, even if this is optional and redundant with information contained in the README files. Metadata makes your archived material easier to find and understand.

 From the repository, get a private URL and provide this on submission of your manuscript so that editors and reviewers can access your archive before your data are made public.


 Prepare your data, code, and README files, before or during manuscript preparation (analysis and writing).



More detailed guides to reproducible code principles can be found here:

Documenting Python Code: A Complete Guide - https://realpython.com/documenting-python-code/

A guide to reproducible code in Ecology and Evolution, British Ecological Society: https://www.britishecologicalsociety.org/wp-content/uploads/2019/06/BES-Guide-Reproducible-Code-2019.pdf?utm_source=web&utm_medium=web&utm_campaign=better_science 

Dokta tools for building code repositories:  https://github.com/stencila/dockta#readme

Version management for python projects:  https://python-poetry.org/

Principles of Software Development - an Introduction for Computational Scientists (https://doi.org/10.5281/zenodo.5721380), with an associated code inspection checklist (https://doi.org/10.5281/zenodo.5284377).

Style Guide for Data Files

 See the Google R style guide (https://google.github.io/styleguide/Rguide.htmland the Tidyverse style guide(https://style.tidyverse.org/syntax.html#object-names) for more information

Google style guide for Python: https://google.github.io/styleguide/pyguide.html




The American Naturalist requests that authors use DRYAD or zenodo for their archives when possible. 

·                DRYAD/zenodo are curated and this means that there is some initial checking by DRYAD for completeness and consistency in both the data files and the metadata. DRYAD requires some compliance before they will allow a submission.

·                We are finding it much easier and more convenient to download repositories from DRYAD/zenodo rather than searching the ms etc for the files or repository

·                Files on DRYAD/zenodo cannot be arbitrarily deleted or changed by authors or others after publication. DRYAD will allow changes if a good case can be made—all changes are documented and all versions are retained.

·                DRYAD is free for Am Nat authors and we have a good working relationship with them and they take seriously our suggestions for improvement etc.

·                editors, reviewers, and authors will all become familiar with the workings of DRYAD

·                DRYAD and zenodo are now linked. DRYAD is for data files and zenodo for everything else (code, PDFs, etc). You need only upload all files to DRYAD and they will separate your archive into the appropriate parts. As you will see your DRYAD repository provides a link to the files on zenodo and vice versa

A 25-year quest for the Holy Grail of evolutionary biology

When I started my postdoc in 1998, I think it is safe to say that the Holy Grail (or maybe Rosetta Stone) for many evolutionary biologists w...