Over more than a decade of teaching the Bodega Bay Workshop in Applied Phylogenetics, we’ve started developing a list of suggestions that we think will help ensure a successful career in phylogenetics. Drop us a line if you have any suggested additions to our list!
Use a Text Editor
Ironically, the most advanced programs in phylogenetics often require the simplest input: simple text files. In order to obtain such files, you should never used modern Word processors such as Microsoft Office or Mac Pages. Don’t even use the simpler text editors that were included in the base install of your operating system (e.g., TextEdit in Mac OSX or Notepad in Windows). All of these programs have a tendency to introduce hidden formatting instructions that will prevent you from using the resulting files in programs that have exacting syntax and formatting requirements (Figs. 1 & 2). To prepare text files for input into phylogenetics applications, you should obtain either TextWrangler(Mac OSX) or TextPad (Windows). When using these programs, be sure to turn on the “View invisibles” feature so that you can see potentially hidden formatting problems that might gum up your analyses. Using these programs will save you an unbelievable amount of trouble. Moreover, they’re also wonderfully easy to use and full of useful features (try, for example, using the option key to select columns of text in TextWrangler). Another powerful text editor is gedit, which works on any platform (Mac, PC, or Linux).
Don’t Be Intimidated by Command Line Applications
We all love programs with beautiful graphical user interfaces (GUIs). We should all use a few moments of the time these interfaces have saved us to thank the developers who created them. New and highly specialized analyses, however, are often only available in the form of somewhat-more-difficult-to-use command line applications. If you are going to do phylogenetics right, you must learn to use these applications. Early in the process of learning how to use command line software, you should do yourself a favor and learn the basics of UNIX syntax and file architecture. You will need to understand these basics if you hope to do basic things like locate your input files or determine how large your output files are. If you put in a bit of time on the front end, you’ll soon find that command line applications actually have many advantages. For example, command line applications are more likely to permit you to save specific sets of commands that can be used to automate subsequent analyses. The first time you dig through menus in a GUI application to delete a taxon, it may seem convenient, but the hundredth time, you probably want to just type “del 3” instead. See also Phylogenetics on Linux.
Learn a Scripting Language
As the size and complexity of phylogenetic analyses grow, it becomes increasingly difficult (if not impossible) to curate data, produce input files, and summarize results by hand. The ability to write short programs that can automate some of these tasks will make your life much easier. Perl, Python, and R are all good choices of programming languages to start with. Wondering what language to learn? Trying to improve your computer savvy? See Becoming a programmer.
Learn New Programs
If you’re going to do modern phylogenetic analyses you are going to need to constantly learn how to use new programs. Some programs are not easy to use. Some take weeks, months, or even years to master. You cannot allow this to lead you on a detour of convenience that results in the use of inappropriate analyses. It is your responsibility as a scientist to do the best analyses possible. Don’t be lazy: when you stop learning new programs, you stop doing modern phylogenetics. After you’ve learned a new program, come back to this site and post a tutorial to help your fellow phylogeneticists.
Ask Questions of Developers (After You Have Some Clue What You’re Talking About)
One surprisingly common impediment to progress is the reluctance of end-users to seek support, or their inability to get this support when its needed. One problem is that developers tend to get unresponsive (or annoyed) when consumers of their applications are constantly asking them questions that are mundane or naive. To a degree, this is reasonable. These people have already put a ton of time into helping you, and are justified in recoiling if you seem unwilling to put the same time into helping yourself. As a matter of respect, you should read over the instructions and try some basic trouble shooting on your own before asking the developer for help directly. We don’t learn programs with someone holding our hand, we learn them primarily through experimentation, trial, and error. Having said this, of course, some developers may deserve a bit of hassling if they haven’t taken even the most basic measures to make their software accessible to the public. Moreover, its important to remember that most developers are your colleagues and are eager to communicate with informed users of their applications. They’re even hoping for your help spreading their methods, extending their application, and catching bugs. Developers also (generally) appreciate detailed error reports and emails of files that consistently reproduce an error. You’re much more likely to get help this way (and developers aren’t going to “steal” the file you send them and publish it, which is a concern some users seem to have). Consider reading this to help understand how developers like to see questions structured. Providing details (like program and error output) will help you get answers from developers and mailing lists quicker.
Subscribe to Help Mailing Lists for Software you Commonly Use
You’ll learn from questions that others ask and stay abreast of the latest updates. It is usually preferable to post your question to the appropriate list than contact the developer directly.
Common mailing lists for Phylogenetic methods:
r.sig.phylo The phylogenetics mailing list for R language and packages.
BEAST Mailing List
MrBayes Mailing List
Don’t Rush or Half-ass your Analyses
Reconstructing phylogenetic trees is one of the most challenging problems in biology (under many optimization criteria, it’s part of a class of extremely hard problems). You are not going to learn how phylogenetic algorithms work and implement them in a single afternoon. This has led many people to use relatively simple methods that are available in easy-to-use programs. Don’t fall into this trap.