Introduction to Data Analysis with R for Social Science

UCSB Library, rm 1312

Mondays, Jan. 23, 30 & Feb. 6

10:00 am - 11:50 am

Instructors: Jon Jablonski, Renata Curty, Seth Erickson

Helpers: Kristi Liu, Amber Budden

Registration for this workshop begins on January 2, 2023 at 8:00 am PST

Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.

General Information

Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at students who want to use R for data analysis and creating charts and graphs, and a basic introduction to scientific programming. You don't need to have any previous programming experience.
If you are comfortable downloading and installing software, you have the skills to complete this workshop.

Where: 525 UCen Road. Get directions with OpenStreetMap or Google Maps.

When: Mondays, Jan. 23, 30 & Feb. 6. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. For workshops at a physical location, the workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email dreamlab@library.ucsb.edu for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

Setup Download files required for the lesson
Day 1 10:00 1. Introduction to R and RStudio How to find your way around RStudio?
How to interact with R?
How to manage your environment?
How to install packages?
10:55 2. Project Management With RStudio How can I manage my projects in R?
11:25 3. Seeking Help How can I get help in R?
11:45 4. Data Structures How can I read data in R?
What are the basic data types in R?
How do I represent categorical information in R?
12:40 Finish
Day 2 10:00 5. Exploring Data Frames How can I manipulate a data frame?
10:30 6. Subsetting Data How can I work with subsets of data in R?
11:20 7. Data Frame Manipulation with dplyr How can I manipulate data frames without repeating myself?
12:15 Finish
Day 3 10:00 8. Creating Publication-Quality Graphics with ggplot2 How can I create publication-quality graphics in R?
11:20 9. Combining ggplo2 and dplyr How can I use ggplot2 and dplyr together?
11:50 Finish
Day 4 10:00 10. Control Flow (extra) How can I make data-dependent choices in R?
How can I repeat operations in R?
10:00 11. Vectorization (extra) How can I operate on all the elements of a vector at once?
10:00 12. Functions Explained (extra) How can I write a new function in R?
10:00 13. Writing Data (extra) How can I save plots and data created in R?
10:00 14. Splitting and Combining Data Frames with plyr (extra) How can I do different calculations on different sets of data?
11:00 15. Data Frame Manipulation with tidyr (extra) How can I change the layout of a data frame?
11:00 16. Producing Reports With knitr (extra) How can I integrate software and reports?
11:00 17. Writing Good Software (extra) How can I write software that other people can use?
11:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.


Setup

To participate in a Software Carpentry workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Pre-Workshop Setup

For this workshop, you have the option of using the Jupyter Hub Instance to use Rstudio. If you would like to install the software on your own device, you will need to access software and data as described below:

Scroll past the Jupyter Hub to view the Data download and R/Rstudio installation instructions.

Logging into the RStudio Jupyter Hub Instance

For this workshop, we will be using a Jupyter Hub Instance that LSIT has graciously setup for us with the software and packages preinstalled.
Please use your UCSB NETID to sign into the Jupyter Hub at: https://carpentryworkshop.lsit.ucsb.edu/ Once you have signed in, click the RStudio Launcher button. You do not need to follow the setup instructions below if you plan on using the Jupyter Hub Interface rather than RStudio on your own computer


jupyter-instance

Data Download

Download the data from https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv

Save this file as: gapminder_data.csv in an easily accessible place, like your Desktop

To download it directly into your Rstudio Environment:

download.file("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv", destfile = "data/gapminder_data.csv")
gapminder <- read.csv("data/gapminder_data.csv", stringsAsFactors = TRUE)

Install R and RStudio

R and RStudio are two separate pieces of software:

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis RStudio is an integrated development environment (IDE) that makes using R easier. In this course we use RStudio to interact with R. If you don’t already have R and RStudio installed, follow the instructions for your operating system below. You have to install R before you install RStudio.

Update R and RStudio

If you already have R and RStudio installed, first check if your R version is up to date:

When you open RStudio your R version will be printed in the console on the bottom left. Alternatively, you can type sessionInfo() into the console. If your R version is 4.2.1 or later, you don’t need to update R for this lesson. If your version of R is older than that, download and install the latest version of R from the R project website for Windows, for MacOS, or for Linux It is not necessary to remove old versions of R from your system, but if you wish to do so you can check How do I uninstall R? Note: The changes introduced by new R versions are usually backwards-compatible. That is, your old code should still work after updating your R version. However, if breaking changes happen, it is useful to know that you can have multiple versions of R installed in parallel and that you can switch between them in RStudio by going to Tools > Global Options > General > Basic. After installing a new version of R, you will have to reinstall all your packages with the new version. For Windows, there is a package called installr that can help you with upgrading your R version and migrate your package library. To update RStudio to the latest version, open RStudio and click on Help > Check for Updates. If a new version is available follow the instruction on screen. By default, RStudio will also automatically notify you of new versions every once in a while.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Video Tutorial

Instructions for R installation on various Linux platforms (debian, fedora, redhat, and ubuntu) can be found at <https://cran.r-project.org/bin/linux/>. These will instruct you to use your package manager (e.g. for Fedora run sudo dnf install R and for Debian/Ubuntu, add a ppa repository and then run sudo apt-get install r-base). Also, please install the RStudio IDE.

Install required R packages

During the course we will need a number of R packages. Packages contain useful R code written by other people. We will use the packages tidyverse, hexbin, patchwork, and RSQLite.

To try to install these packages, open RStudio and copy and paste the following command into the console window (look for a blinking cursor on the bottom left), then press the Enter (Windows and Linux) or Return (MacOS) to execute the command.

install.packages("tidyverse")

Alternatively, you can install the packages using RStudio’s graphical user interface by going to Tools > Install Packages and typing the names of the packages separated by a comma.

R tries to download and install the packages on your machine. When the installation has finished, you can try to load the packages by pasting the following code into the console:

library(tidyverse)

If you do not see an error like there is no package called ‘…’ you are good to go!