Introduction to R and RStudio
- R is a programming language and software used to run commands in that language
- RStudio is software to make it easier to write and run code in R
- Use R Projects to keep your work organized and self-contained
- Write your code in scripts for reproducibility and portability
Data visualization with ggplot2
- the
ggplot()
function initiates a plot, andgeom_
functions add representations of your data - use
aes()
when mapping a variable from the data to a part of the plot - use
scale_
functions to modify the scales used to represent variables - use premade
theme_
functions to broadly change appearance, and thetheme()
function to fine-tune - start simple and build your plots iteratively
Exploring and understanding data
- functions like
head()
,str()
, andsummary()
are useful for exploring data.frames - most things in R are vectors, vectors stitched together, or functions
- make sure to use
class()
to check vector types, especially when using new functions - factors can be useful, but behave differently from character vectors
Working with data
- use
filter()
to subset rows andselect()
to subset columns - build up pipelines one step at a time before assigning the result
- it is often best to keep components of dates separate until needed,
then use
mutate()
to make a date column -
group_by()
can be used withsummarize()
to collapse rows ormutate()
to keep the same number of rows -
pivot_wider()
andpivot_longer()
are powerful for reshaping data, but you should plan out how to use them thoughtfully
Getting started with Quarto
- Quarto is useful for creating reproducible documents combining text, code and figures.
- Specify chunk options to control the formatting of the output document.