Tuesday, 15 January 2013

My Essential PhD Toolkit

After three years of this PhD under my belt, I have a good grasp of what tools I couldn't live without. I'm thankful that my fellow colleagues at U.Laval and DUC have always been great about sharing new software, tips, tricks, etc. Here's my attempt to pay it forward.

What is it: An open source programming language and software environment for statistical computing and graphics (Wikipedia)
How I use it: I perform all data compilation, manipulation, organization, plus descriptive and statistical analyses using R.
Why I like it: My favourite part of using a programming language rather than a traditional stats program is having a record of everything I do in the form of scripts. I was introduced to R via the Tinn-R script editor (although now I'm using RStudio) and therefore rarely work directly in R. I save all my scripts, so it's trivial to re-run analysis and generate associated graphics to implement some change in data or analysis.
More info:  R-Project, CodeSchool's Try-R: a simple, interactive free course

What is it: A user-friendly, integrated trifecta of editor, language, and R package that combines R scripts and formatted text to generate HTML reports (kind of like a simpler form of LaTeX). The lines blur for me on where Markdown ends and knitr picks up, but I think that Markdown is the formatting language, where the integration between knitr and RStudio runs the embedded R code and passes them to Markdown for formatting (please correct me if I'm wrong).
How I use it: After learning about this combination (and ProjectTemplate, next) in the summer of 2012, I've coded almost exclusively in RStudio using Markdown files, even if it means transferring old code to this new format.
Why I like it: I was finding that my Tinn-R scripts consisted of about half #-commented text either explaining what I was doing or showing the outcome of a particular line of code. This trifecta of R tools provides a more appropriate method to do exactly what I'd been trying to do in my previous scripts. Instead of typing nrow(dat), running it and then manually typing #[1] 3124 in my Tinn-R script, I simply write nrow(dat) and the number gets printed in my eventual output. I then have a permanent record of all of the code, quality-checks, outputs, and results. Figures are produced in whatever format and resolution the user wants, and dataframes can be printed as pre-formatted tables to be copied and pasted into a word processing document -- infinitely easier than trying to reformat a raw data.frame. I have so much more to say on this topic, but it'll need it's own post I think.
More info: RStudio & Markdown

Project Template: 
What is it: An R package that sets up a standardized file structure and automates data loading, pre-processing, and saving of files.
How I use it: For every new project I initiate, I start with a ProjectTemplate-generated template. The standardized folder structure means that I maintain consistency in where and how I save files. I also use it to auto-load all necessary functions for a given project, plus required datasets. I also heavily use the cache() function to save results so I don't need to re-execute them.
Why I like it: It provides standardization and automation to improve my organization and efficiency.
More info:  In praise of Project Template

What is it: A hierarchically-organized notebook that works like a hybrid between a file organizer and a simple word processor.
How I use it: How DON'T I use it. I've got a notebook for each project/chapter of my thesis. Emails, decisions, outlines, drafts, and notes on papers all get stored in the relevant notebook. I have a separate notebook for "Reading", which stores the notes I take within Evernote while reading each morning. I've got a notebook for R tips and tricks, plus one for Meetings, where I take notes after speaking with my advisors. This definitely needs its own post.
Why I like it: It helps me store all of my idle thoughts, serious decisions, and information in one place so I don't need to search my computer for it. I think my useage could probably be optimized, since it's still not terribly organized, but it's better than storing everything in independent Word files.
More info:  Next Scientist: Get a second brain with Evernote

(Free) Citation Software
What is it: Zotero and Mendeley provide automated methods to import references and then insert them into word processing documents to generate correctly-formatted citations and bibliographies.
How I use it: I use Zotero as follows: Any time I download a PDF or just see a paper I want to read, I import it to Zotero. I try to add a note about why I imported/downloaded that particular article. Sometimes I'll add notes back to Zotero, but I'm moving more towards keeping that information in Evernote... Most importantly, citations and bibliographies are ridiculously easy to insert into drafts using the Word plug-in. I still cringe when I think back on the 100s of references I typed by hand for my MSc thesis. I've just started using Mendeley, and it seems just as good. It allows drag-and-drop importing of PDFs, but preliminary attempts show some potential problems with data accuracy. I haven't looked into a possible browser plug-in yet, but I'm sure it exists.
Why I like it: ^ Enough said.
More info:  Zotero. Mendeley.

Cmap Tools
What is it: A simple piece of software that allows you to create concept maps (boxes linked with labelled arrows).
How I use it: I'll build a Cmap any time I get the urge to draw on a big piece of paper with lots of words and arrows. Typically I do this when synthesizing information, such as trying to review everything know about habitat selection in ducks.
Why I like it: Because it's specifically made for conceptual mapping, the tools are more tailored than something like PPT. It's easy to use, and I can colour-code freely.
More info:  Cmap Tools

The basics and every-days: 
Dropbox: Over the course of my PhD, I've used 5-7 different computers, including a remotely-accessed server. I keep my on-going projects on Dropbox to ensure that I can work on them no matter where I am.  I also use it for collaborating and sharing large files.
PDF reader/editor: The free Adobe Reader allows one to open/read PDFs, while Adobe Acrobat has useful functionality for commenting, assembling, and editing PDFs. However, it's price tag may be beyond most grad students. I've used CutePDF for printing documents to PDF, and I now use my Nexus app ezPDFReader for editing and signing PDFs. I welcome additional suggestions for free PDF software.
Word: I feel old-fashioned for using Word, but I'm used to it at this point. That said, I'm trying to switch many of my old Word tasks to Evernote.  I'm also curious about Scrivener, but I'm deterred by the $40 price tag. I'm curious if Evernote could be adapted for Scrivener-like advantages. On the other hand, apparently it's possible to replace Word with Markdown + Mendeley + pandoc, and some are calling for a more scholarly Markdown.
Excel: After switching to R, I barely use Excel. However, every once in a while it is easier to make or tweak small datasets using Excel instead of R. For example, I needed to reclassify 39 land cover classes, so I manually created the classification key in Excel and then merged that system with the raw data in R.
PowerPoint: Most conferences I go to still expect presentations in PowerPoint. I'm interested in Prezi, but think it takes more skill than I have to utilize it effectively.
iTunes:  I listen to music every day, all day. It helps me tune out the surrounding world and provides me with a mental signal that I'm "working" (which is especially important when I work from home). I have a separate playlist specifically dedicated to "deep work", which now provides that mental trigger that I need to focus.
Chrome:  The main reason I prefer Chrome over other browsers is the shortcut search function. Being able to search RSeek, Scholar, or Wikipedia with a single keystroke is awesome for efficiency.
Remote Desktop: I work from my home in Toronto by accessing a server at U.Laval in Quebec via Remote Desktop. I actually wouldn't be able to run some of my more taxing models on my home computer, so remote desktop (in associtation with VPN) has been essential over the past few months.

What's in your essential toolkit?

Do you use Scrivener, Prezi, or other software?
Which free/open source alternatives would you recommend as alternatives to MS and Adobe?
Which mind-mapping software do you use to help organize your thoughts?
Do you prefer Zotero or Mendeley?
How do you use Evernote?


  1. Nice post (and nice blog!) Nicole!

    I discovered Evernote recently, but I see a great potential.
    R and Rstudio are definitively in my essential toolkit. I was looking for an editor for R for months when I heard about Rstudio. I'm working on Mac and other editors for Mac aren't user friendly. I try Aquamacs and TextWrangler before I switched to Rstudio.
    Remote desktop is very useful and VPN are really useful tools as it allow me to have access to ULaval library and electronic books.
    I also used Focus booster every day ( It's a timer based on the pomodoro technique. It helps me stay focus on my work.

    1. I'd heard of pomodoro apps, but I thought that my Google Calendar alarms would be good enough. I wonder if an even more tangible timer (such at this) would help me keep me focus until the end of a pomo. You've prompted me to give it a try! Thanks Juliane!
      And yes, VPN for accessing the ULaval library is essential! I was just thinking about how many e-books we have access to because of ULaval.

  2. oooh... A fellow woman Canuck kind-of-techie scientist! Thanks for this great list of resources (and for making me feel less alone).
    I'm using RStudio more and more for writing in Markdown for all sorts of notes and documents since I have it open for my statistical reports. There is an R package for almost everything (instead of system command prompt for pandoc; publishing an .Rmd straight to WordPress). Amazing!
    I am still looking for an attractive way to organize and display the various project files I use. I experimented with Evernote, but I don't think it's for me. I like the RStudio project system plus Dropbox. I'm hoping there's a way to have RStudio and Scrivener use the same project directory. Have you come across anything like that?

  3. Thanks for the comment Tanya!

    I didn't end up using Scrivener (I don't like that we have to pay), so I don't have any advice on your directory question.

    I use R's ProjectTemplate to set up my folder structure so that it's standardized across projects/papers. I also have an Evernote folder for each project/paper. Within each project Evernote folder, I typically have a "project file" note and "journal of what I accomplished" note. The project note summarizes the overall intent of the project and sometimes I dump references or random ideas in there, just to keep them in one place. I try to add to the journal note each day to keep a record of decisions I made, analyses I ran or re-ran. The idea is to have a record of which files I used for what. I could probably be better about this if I used Git or some other version control, but I'm not quite there yet :)

    Sorry that I'm not more helpful!

  4. Nice post which had me nodding along at several places. Didn't know about Cmap. Will definitely check it out! Thanks :)

  5. Thanks Chandni!

    I recently learned of two more mind-mapping tools, although I haven't tried them myself:


  6. Bonjour,
    Great post -- thanks!
    I have a question, though. How do you use use the knitr functions with ProjectTemplate? If I just create a standard project in RStudio without using PT, I have no difficulty creating (and, more importantly, "knitting") markdown scripts into a html file. However, everytime I try to "knit" a Markdown file relying on projecttemplate (that is, loading the project, with all the libraries defined in the config file and all the data files in the ./data directory), the process fails. Hence my question: are knitr and Markdown compatible with ProjectTemplate? What is your experience?



    1. Salut François! Can you be more specific in why the process fails - is there a specific error you receive? I think the knitr/projecttemplate pairing requires that files are saved in specific places, and directories must be specified in specific ways. I'll try to write up a blog post on it shortly. In the meantime, do let me know what the error message says.

    2. Salut Nicole,

      OK, trouvé. For it to work, the markdown document must be created in the base folder of the project -- not in the ./src folder, and not in a ./md folder. Setting the working directory in a chunk of code in a Rmd document will not do. A bit messy when one has several analyses to run, but what is essential is that it works.

      This said, as you seem not to have run into the same problems, yes, I will certainly be interested in reading your next blog post on this.



    3. Interesting... I did notice some problems with working directory changes, but I was able to find work-around solutions. I've posted them in a new blog post. Please check it out and let me know if it solves your problems!

  7. Hello Nichole,

    Its a great pleasure to find another PhD student in natural resources who blogs!

    I use a lot of similar tools, though I'm going to have to look into R Markdown, that sounds awesome, I am always amazed at what R can accomplish.

    I'm a heavy chrome user as well and I use the StayFocused ( add in to block out websites that I find to be non-productive time sucks (facebook, reddit, etc) I have them set so I only get 5 minutes a day from my school computer, and it definitely helps prevent me from losing time.

    I've also started learning LaTEX, right now just to make my C.V. but I'm hoping to get good enough with it over the next few years (I'm at the end of my first year of my PhD) to be able to write my dissertation in it, since older grad students here have told me it makes formatting and such a lot easier. Mendeley also exports into LaTEX really easily. Once I got my C.V. set up its a breeze to add/change now, and it always looks great, unlike my previous word versions, which were always slightly off.

    Thanks for the great post! Looking forward to digging more into R.