Introduction to Programming in R
Thursdays 1-3PM | Fall 2016
Waller Hall 203 | Rutgers University

Chris Free, Oceanography (DMCS 309F)
Joe Caracappa, Oceanography (Haskin Shellfish Lab)

Requirements: Please come with R and RStudio already installed on your computer. Request to join the Rutgers R Google Group to receive communications.

Summary: This 10-week course is designed to teach programming in R to beginners (no experience necessary). The focus of the course is on using R to powerfully and repeatably manipulate data and to create beautiful, publication-ready graphics. There will be little emphasis on statistical analysis because this material is covered by other courses on campus. We will spend the first hour of class working through a script with interspersed exercises and the last hour on an independent exercise. We will also assign homework each week that asks you to apply what we learned in class to your own data (if you don’t have any data yet, we can provide you with some). In the final class, you will give a short 5-minute presentation on your data visualization.

Datasets used in class: We will use two datasets in this course: (1) Mongolian fish data and (2) Brazilian mahogany tree data. The Mongolian fish data is a product of Dr. Olaf Jensen‘s lab at Rutgers University and includes one-time measurements (e.g., species, length, weight, sex, age, egg count, etc.) of individual fish from lakes and rivers in northern Mongolia. The Brazilian mahogany tree data is a product of Dr. Jimmy Grogan‘s lab at Mt. Holyoke College and includes repeated measurements (i.e., diameter, growth rate, fruit production) of individual mahogany trees in a natural forests in the Brazilian Amazon. These datasets are made available for educational purposes only and are not to be used  without the owner’s permission. More information on these datasets is available here.

A guide to good data management practices is available here.


Week 1 (10/6): Intro to R
In this class, we will learn to use the RStudio environment to code in R. We learn how to use R to do math, perform logical tests, create objects, and create and index vectors. We will also learn how to read CSV files into R, inspect data, subset and index data, add new columns to a data frame, export data, and summarize and plot data.

  • Data: mongolia_fish_data.csv
  • Script 1 (R basics): worksheet | answers
  • Script 2 (data in R): worksheet | answers
  • Homework: Write an R script that will read YOUR data into R. Inspect your data. Reduce it to the important columns. Add a new column based on calculations from another column. Isolate an interesting subset of your data and export it. Make a plot of your data and look up some tricks for making the plot even prettier.

Week 2 (10/13): Basic plotting
In this class, we will learn to set axis labels and limits, change point styles, sizes, and colors, add shading to plots using rect() and polygon(), add lines using abline() and lines(), add text using text() and mtext(), add legends using legend(), draw customized axes using axis(), and export plots using png() and other export functions. We will explore the barplot(), boxplot(), and hist() plotting functions in an exercise at the end of class.

  • Data: mongolia_fish_data.csv (from Week 1)
  • Script 1 (basic plotting): worksheet | answers
  • Homework: Incorporate the extra plotting parameters we used today in your plot from Week 1. Can you add custom axis labels and limits? How about shading, lines, and text? Can you use color, size, or style to improve your data visualization? Can you add a legend to your plot? Export your plot as a PNG and a JPG file.

Week 3 (10/20): Reshaping data with reshape2 and dplyr, for loops, and functions
In this class, we will learn to reshape data using the reshape2 package and to summarize and aggregate data using the dplyr package. We will also learn to program for loops, use the apply() family of functions, and write custom functions.

  • Data (for script 1): mongolia_fish_data.csv (from Week 1)
  • Data (for script 2): brazilian_mahogany_tree_data.csv
  • Script 1 (for loops, dplyr): worksheet |answers
  • Script 2 (apply(), reshape2, custom functions): worksheet |answers
  • Class exercise: worksheet | answers
  • Packages: Please install the dplyr, reshape2, Hmisc, plotrix packages before class.
  • Homework: Use what we learned in class today to expand the analysis you began in Weeks 1 and 2. Can you use reshape2 or dplyr to ask new questions about your data? For example, can you summarize some value (e.g, growth rate, body size, plant height, sea surface temperature, etc.) by some factor (e.g., location, experimental treatment, species, etc.)? Extra credit if you can use a loop to iterate over one of your factors and make a plot for each factor. We will do this in more detail next week.

Week 4 (Wed 10/26 2-4pm): Bring your own data to class!
Bring your data and questions and leave with a plot! Chris and Joe will help you apply everything we’ve covered in class so far on your own data. 

Week 5 (11/3): Intermediate plotting: multi-panel figures, custom layouts, par(), RColorBrewer, and a discussion of effective graphical practices
In this class, we will learn to setup multi-panel figures using mfrow() and mfcol(). We will learn to setup custom multi-panel layouts using layout(). We will learn to customize plot appearance using par() and we will learn to make beautiful color palettes using RColorBrewer and other color packages. We will also assign some reading on graphic design.

  • Data: mongolia_fish_data.csv (from Week 1)
  • Script: worksheet |answers
  • Packages: Please install the RColorBrewer and GISTools packages before class.
  • Homework: In last weeks homework, we challenged you to loop over a factor (e.g., species, location, treatment, etc.) in your data and create a plot for each factor. This week we’ll challenge you to use this loop to make a multi-panel figure. Adjust the par() settings to personalize your figure and try using RColorBrewer to display the data in each panel with a different color. After creating a multi-panel figure with identifically shaped panels, can you use layout() to feature the most important panel (make it bigger) and make the other panels smaller? Good luck!

Week 6 (11/10): Intermediate plotting: plotting regression lines and confidence intervals, heat maps and contours, and date/time data
In this class, we will continue to practice making multi-panel figures with custom layouts and par() settings but will also learn to fit and plot regression lines and confidence intervals. We will also learn to plot “heat map”-like figures and will learn to handle date/time data.

  • Slides: Lab 6 figures to create
  • Data (for script 1): mongolia_fish_data.csv (from Week 1)
  • Data (for script 2): Lake Hovsgol temperature data (new data!)
  • Data (for script 2): Lake Hovsgol temperate collection dates (new data!)
  • Script 1 (fit/plot regression lines): worksheetanswers
  • Script 2 (plot heat maps, date/time data): worksheet | answers
  • Packages: Please install the shape and colorRamps packages before class.
  • Homework: Try to create a figure similar to the one we created today with your data. Can you use a for loop to setup a multipanel plot where each panel is a scatter plot with a linear regression line and confidence intervals plotted with the data? Can you figure out how to use the text() or mtext() function to add panel letters to your plot? Can you figure out how to create vectors of x-axis and y-axis limits to customize the limits of the plot created within your for loop?

Week 7 (11/17): “Heat map”-like figures, mapping, and spatial data
In this class, we will learn to use image() and filled.contour() to make heat maps (e.g., temperature profile of Lake Hovsgol, Mongolia over time). We will also learn how to make maps using the maps, mapdata, and ggmap packages. These packages work well with ggplot2 and this lecture will also serve as an introductions to plotting with ggplot2.

  • Data 1 (for maps): mongolia_fish_data.csv (from Week 1)
  • Data 2 (for heat maps): Lake Hovsgol temperature data (from Week 6)
  • Script: worksheet | answers
  • Packages: Please install the maps, mapdata, ggmap, ggplot2 packages. Make sure shape, colorRamps, and RColorBrewer are also installed.
  • Homework: If you have some of your own spatial data, make a map of your data using the skills we learned today. Can you color points by some grouping in your data? Can you size the points by some continuous variable? If you don’t have any of your own spatial data, can you plot taimen and lenok caught in the Eg and Uur Rivers of Mongolia, where the points indicating taimen and lenok are different colors and are size by their total length? Can you figure out how to add a scale bar?


Week 9 (12/1-LAST CLASS): Nested for loops and plotting with ggplot2
In this class, we will learn to code nested for loops and will discuss if/when to use them as well as preferable alternatives. We will also learn to makes plots in ggplot2, which follows a completely different syntax from making plots in base R.


R Resources

  • A Primer of Ecology in R |  M. Henry H. Stevens – this is a really good resource for learning to implement population models in the R programming language. Stevens provides a few of the chapters for free download here.
  • R-Bloggers – provides a nice list of resources for expanding your R skills
  • Quick-R – this is a really good online resource for learning R
  • OHI’s Resources for R and Data Scientists – an incredible compilation of resources for novice, intermediate, and advanced R programmers