This repository contains the material for the PhD Workshop R for Data Science in the tidyverse by Matteo Sostero at Sant'Anna School of Advanced Studies (Pisa) in May 2018. The workshop is intended mainly for first-year PhDs in Economics, but all members of the Computational Modellers Society are welcome to attend!
Image credit: Kevin Ku
The workshop covers topics on data science throughout a typical research project using the tidyverse. No prior knowledge of R is required! Hopefully even those of you familiar with “base R” (but not the tidyverse) will find something new.
The material is based on R for Data Science by Garrett Grolemund and Hadley Wickham. See also their book R for Data Science: Import, Tidy, Transform, Visualize, and Model Data published by O'Reilly Media, 2017.
Please bring your laptop 💻 and charger 🔌!
Preparation (10 minutes ⌛):
- Install an up-to-date version of R;
- Install an up-to-date version of RStudio;
- (Windows only) install Rtools;
- Install git on your system;
- Get a copy of the material by creating a new project in RStudio:
File > New Project > Version Control > Git
- The repository URL is https://github.com/CoMoS-SA/workshop-R-tidyverse.git
- Create project as a subdirectory of your choice;
- ☑️ “Open in new session”
- In RStudio, install tidyverse and nycflights13:
install.packages("tidyverse")
install.packages("nycflights13")
Date | Time | Room |
---|---|---|
Wednesday 16/05/2018 | 10:30–12:30 | Aula 3 Toscanelli |
Thursday 17/05/2018 | 10:30–12:30 | Aula 3 Toscanelli |
Friday 18/05/2018 | 10:30–12:30 | Aula 3 Toscanelli |
By popular demand, more sessions will be scheduled in the next few weeks.
(In no particular order):
- 📃 Data input: importing data from “messy” files, reading common and exotic formats and directory structures; preserving metadata; web scraping.
- 📐 Data transformation: “tidying” strategies for fixing common issues with data; text processing with regular expressions; reshaping (wide-long) tabular data; merging and appending data.
- ♻️ Automation: using the pipe
%>%
; automate tasks with functional programming with purrr. - ✨ Tidy statistical modeling: automated approaches to estimation and diagnostics.
- 📊 Visualization: plot data with ggplot2; principles and recipes for visualization.
Course slides.
Session notebooks: