Why R Markdown?

Overview

Teaching: 15 min
Exercises: 5 min

Questions

What is reproducible research?

How can R Markdown help research to be more reproducible?

What are the benefits of using R Markdown?

Objectives

Understand what scientific reproducibility entails.

Identify the benefits of using R Markdown to create research reports.

Understand how R Markdown is a useful tool in Open Science approaches.

Learn how R Markdown can help one’s research.

Warm-up

Let’s get into breakout rooms. What is reproducibility for you? Have you ever experienced issues to reproduce someone else’s study or even your own research?

The importance of Reproducibility in Research

PhD Comics cartoon

Discussion: A scary anecdote

A group of researchers obtain great results and submit their work to a high-profile journal.
Reviewers ask for new figures and additional analysis.
The researchers start working on revisions and generate modified figures, but find inconsistencies with old figures.
The researchers can’t find some of the data they used to generate the original results, and can’t figure out which parameters they used when running their analyses.
The manuscript is still languishing in the drawer…

According to the U.S. National Science Foundation (NSF) subcommittee on replicability in science:

Science should routinely evaluate the reproducibility of findings that enjoy a prominent role in the published literature. To make reproduction possible, efficient, and informative, researchers should sufficiently document the details of the procedures used to collect data, to convert observations into analyzable data, and to analyze data.

Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results. Reproducibility is a minimum necessary condition for a finding to be considered rigorous, believable and informative.

Why all the talk about reproducible research?

A 2016 survey in Nature revealed that irreproducible experiments are a problem across all domains of science:

Nature Report - 2016

Factors behind irreproducible research

Science is not a miracle

Not enough documentation on how experiment is conducted and data is generated
Data used to generate original results unavailable
Software used to generate original results unavailable
Difficult to recreate software environment (libraries, versions) used to generate original results
Difficult to rerun the computational steps

Reproducible, replicable, robust, generalizable

While reproducibility is the minimum requirement and can be solved with “good enough” computational practices, replicability/robustness/generalizability of scientific findings are an even greater concern involving research misconduct, questionable research practices (p-hacking, HARKing, cherry-picking), sloppy methods, and other conscious and unconscious biases.

How science should be

If contributing to science and other researchers seems not to be compelling enough, here are 5 selfish reasons to work reproducibly (Markowetz, 2015)

Helps to avoid data loss and disaster
Makes it easier to write papers
Helps reviewers see it your way
Enables continuity of your work
Helps to build your reputation

When do you need to worry about reproducibility?

Let’s assume that I have convinced you that reproducibility and transparency are in your own best interest. Then what is the best time to worry about it?

Throughout the whole research life cycle! Before you start the project because you might have to learn tools like R or Git. While you do the analysis because if you wait too long you might lose a lot of time trying to remember what you did two months ago. When you write the paper because you want your numbers, tables, and figures to be up-to-date. When you co-author a paper, because you want to make sure that the analyses presented in a paper with your name on are sound. When you review a paper, because you can’t judge the results if you don’t know how the authors got there.

Levels of Reproducibility

A published article is like the top of a pyramid, meaning that a reproducible paper/report rests on multiple levels that each contributes to its reproducibility.

What is R Markdown and how it connects to reproducible research?

R Markdown is a variant of Markdown, a system for writing simple, readable text that is easily converted to html which allows you to write using an easy-to-read, easy-to-write plain text format.

R Markdown belongs to the field of literate programming which is about weaving text and source code into a single document to make it easy to create reproducible web-based reports. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents and much, much more. R Markdown provides the flexibility of Markdown with the implementation of R input and output. For more details on using R Markdown check http://rmarkdown.rstudio.com.

The idea of literate programming shines some light on this dark area of science. This is an idea from Donald Knuth where you combine your text with your code output to create a document. This is a blend of your literature (text), and your programming (code), to create something that you can read from top to bottom. Imagine your paper - the introduction, methods, results, discussion, and conclusion, and all the bits of code that make each section. With R Markdown, you can see all the pieces of your data analysis altogether.

You can include both text and code to execute. It is a convenient tool for reproducible and dynamic reports with R! With R Markdown, you are able to:

Keep an eye on text (the paper) AND the source code. These computational steps are essential to ensure computational reproducibility.
Conduct the entire analysis pipeline in an R Markdown document: data (pre-)processing, analysis, outputs, visualization.
Apply a formatting syntax that is part of the R ecosystem and supports LaTeX.
Combine text written in Markdown and source code written in R (and other languages).
Easily share R Markdown documents with colleagues, as supplemental material, or as the paper under review. Thanks to the package knitr, others can execute the document with a single click and receive, for example, HTML or PDF renderings.
Get figures automatically updated if you change the underlying parameters in the code. The error-prone task of exporting figures and uploading the right figure version to another platform is thus not needed anymore.
Since Markdown is a text-based format, you can also use versioning control with Git.
If you do not make any changes to the document after creating the output document, you can be sure that the paper was executable at least at the time of submission.
Refer to the corresponding code lines in the methodology section making it unnecessary to use pseudocode, high-level textual descriptions, or just too many words to describe the computational analysis.
Use packages such as rticles to use templates from publishers and create submission-ready documents.

Some Real-world Applications

Finally, three real-world examples that motivated the authors of this lesson to value and use R Markdown:

In the early days of the COVID-19 pandemic ecologist Chris Lortie quickly put together a simple but compelling COVID trends page. The ease with which he created his plots is a testament to the power of R as a data analysis environment, but the ease with which he was able to publish a page on the web is a testament to R Markdown and Github as a publishing environment. Notice that he did not have to: create plots in a tool and then export the plots as images; write any HTML; embed plot images in HTML; or create a site under Wordpress or other web hosting service. Instead, he directly published his R code as he wrote it, and using Github, made it appear on the web with a button click.
One of us wanted to create a short document that included some math formulas. The LaTeX document preparation can be used for this, but it is difficult to use and is overkill for just a few formulas in otherwise plain text. R Markdown lets you use just the best part of LaTeX—math formatting—while letting you write your text in a user-friendly way.
In this lesson we will be constructing a scientific paper that is based on an actual Nature publication and attendant survey and data. In trying to recreate the plots the original authors created, we found it difficult and time-consuming to figure out exactly how the authors created their plots. Out of the many columns in their data, many with similar-sounding names, which did they use? How did they handle missing data? Exactly what operations did they perform to compute aggregate values? How much easier it would have been if they had published the code they used along with their paper. R Markdown allows you to do this.

Our goal is that by the end of this workshop you will be able to create a reproducible report applying R Markdown and Knitr to publish a paper such as this example. This template is used exclusively for instruction purposes and is based on short and adapted version of the following academic paper:

Knudtson, K. L., Carnahan, R. H., Hegstad-Davies, R. L., Fisher, N. C., Hicks, B., Lopez, P. A., ... & Sol-Church, K. (2019). Survey on scientific shared resource rigor and reproducibility. Journal of biomolecular techniques: JBT, 30(3), 36. doi: doi: 10.7171/jbt.19-3003-001

Key Points

Reproducible research is key for scientific advancement.

R Markdown can help you to organize, have better control over and produce reproducible research.

Getting Started with R Markdown

Overview

Teaching: 15 min
Exercises: 10 min

Questions

How to find your way around RStudio?

How to start an R Markdown document in Rstudio?

How is an R Markdown document configured & what is our workflow?

Objectives

Key Functions in Rstudio

Learn how to start an R markdown document

Understand the workflow of an R Markdown file

Getting Around RStudio

Throughout this lesson, we’re going to teach you some of the fundamentals of using R Markdown as part of your RStudio workflow.

We’ll be using RStudio: a free, open source R Integrated Development Environment (IDE). It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.

This lesson assumes you already have a basic understanding of R and RStudio but we will do a brief tour of the IDE, review R projects and the best practices for organizing your work, and how to install packages you may want to use to work with R Markdown.

Basic layout

When you first open RStudio, you will be greeted by three panels:

The interactive R console/Terminal (entire left)
Environment/History/Connections (tabbed in upper right)
Files/Plots/Packages/Help/Viewer (tabbed in lower right)

RStudio layout

Once you open files, such as .Rmd files or .R files, an editor panel will also open in the top left.

RStudio layout with .R file open

Working in an R Project

The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.

Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.
— Vince Buffalo (@vsbuffalo) April 15, 2013

Most people tend not to think about how to organize their files which may result in something like this:

There are many reasons why we should ALWAYS avoid this:

It is really hard to tell which version of your data is the original and which is the modified;
It gets really messy because it mixes files with various extensions together;
It probably takes you a lot of time to actually find things, and relate the correct figures to the exact code that has been used to generate it;

A good project layout will ultimately make your life easier:

It will help ensure the integrity of your data;
It makes it simpler to share your code with someone else (a lab-mate, collaborator, or supervisor);
It allows you to easily upload your code with your manuscript submission;
It makes it easier to pick the project back up after a break.

A possible solution

Fortunately, there are tools and packages which can help you manage your work effectively.

One of the most powerful and useful aspects of RStudio is its project management functionality. We’ll be using an R project today to complement our R Markdown document and bundle all the files needed for our paper into a self-contained, reproducible project. After opening the project we’ll review good ways to organize your work.

The simplest way to open an RStudio project once it has been created is to click through your file system to get to the directory where it was saved and double click on the .Rproj file. This will open RStudio and start your R session in the same directory as the .Rproj file. All your data, plots and scripts will now be relative to the project directory. RStudio projects have the added benefit of allowing you to open multiple projects at the same time each open to its own project directory. This allows you to keep multiple projects open without them interfering with each other.

CHALLENGE 2.1 - Opening a Project in RStudio

Open an RStudio project through the file system

Exit RStudio.

Navigate to the directory where you downloaded & unzipped the zip folder for this workshop

Double click on the .Rproj file in that directory.

SOLUTION

double click on the RMarkdown_Workshop.Rproj to automatically open in R Studio

Best practices for project organization

Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:

Treat data as read only

This is probably the most important goal of setting up a project. Data is typically time consuming and/or expensive to collect. Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how it has been modified since collection. It is therefore a good idea to treat your data as “read-only”.

Data Cleaning

In many cases your data will be “dirty”: it will need significant preprocessing to get into a format R (or any other programming language) will find useful. This task is sometimes called “data munging”. Storing these scripts in a separate folder, and creating a second “read-only” data folder to hold the “cleaned” data sets can prevent confusion between the two sets.

Treat generated output as disposable

Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts.

There are lots of different ways to manage this output. Having an output folder with different sub-directories for each separate analysis makes it easier later. Since many analyses are exploratory and don’t end up being used in the final project, and some of the analyses get shared between projects.

Use Rmd files to combine code/analysis and narrative

Rmd combines the power of R code and analysis with narratives describing methods and results. Keeping your code and narrative in the same document increases reproducibility by bundling paper components together; decreasing the amount of work you, your collaborators, and your audience has to do to search for different components of your project: raw data, analysis and plots, narrative, and citations.

Tip: Good Enough Practices for Scientific Computing

Good Enough Practices for Scientific Computing gives the following recommendations for project organization:

Put each project in its own directory, which is named after the project.

Put text documents associated with the project in the doc directory.

Put raw data and metadata in the data directory, and files generated during cleanup and analysis in a results directory.

Put source for the project’s scripts and programs in the src directory, and programs brought in from elsewhere or compiled locally in the bin directory.

Name all files to reflect their content or function.

For this project, we used the following setup for folders and files:
bin: contains a a .csl file for changing the bibliography to APA format
code: a different name for the src folder. This will contain our R scripts for plots and R Markdown scripts for writing our paper.
data: this folder contains our raw data files. We have 3 .csvs
docs: This contains our raw text file paper_raw.txt for the paper and our bibliography.bibtex for reference. We could add any other notes we may have about the project here too.
figs: This is for the .png figures we found and want to add into our paper. It can also be used to save .png or .jpeg copies of the figures we output from our code.
results: This is where the rendered version of our .Rmd file (and .R scripts if we ran them) will save to in html form.
RMarkdown_Workshop.Rproj lives in the root directory.

Optional Files to add to root directory:

README.md A detailed project description with all collaborators listed.
CITATION.txt Directions to cite the project.
LICENSE.txt Instructions on how the project or any components can be reused.

Again, there are no hard and fast rules here, but remember, it is important at least to keep your raw data files separate and to make sure they don’t get overidden after you use a script to clean your data. It’s also very helpful to keep the different files generated by your analysis organized in a folder.

Version Control

It is important to use version control with projects. Go here for a good lesson which describes using Git with RStudio.

R Packages

It is possible to add functions to R by writing a package, or by obtaining a package written by someone else. As of this writing, there are over 10,000 packages available on CRAN (the comprehensive R archive network). R and RStudio have functionality for managing packages:

You can see what packages are installed by typing installed.packages()
You can install packages by typing install.packages("packagename"), where packagename is the package name, in quotes.
You can update installed packages by typing update.packages()
You can remove a package with remove.packages("packagename")
You can make a package available for use with library(packagename)

Packages can also be viewed, loaded, and detached in the Packages tab of the lower right panel in RStudio. Clicking on this tab will display all of installed packages with a checkbox next to them. If the box next to a package name is checked, the package is loaded and if it is empty, the package is not loaded. Click an empty box to load that package and click a checked box to detach that package.

Packages can be installed and updated from the Package tab with the Install and Update buttons at the top of the tab.

CHALLENGE 2.2 - Installing Packages

Install the following packages: bookdown, tidyverse, knitr
SOLUTION

We can use the install.packages() command to install the required packages.
install.packages("bookdown")   
install.packages("tidyverse")   
install.packages("knitr")  
An alternate solution, to install multiple packages with a single install.packages() command is:
install.packages(c("bookdown", "tidyverse", "knitr"))  

Starting a R Markdown File

Start a new R markdown document in RStudio by clicking File > New File > R Markdown…

Opening a new R Markdown document

If this is the first time you have ever opened an R markdown file a dialog box will open up to tell you what packages need to be installed.

First time R Markdown install packages dialog box

Click “Yes”. The packages will take a few seconds to install. You should see that each package was installed successfully in the dialog box.

Once the package installs have completed, a dialog box will pop up and ask you to name the file and add an author name (may already know what your name is) The default output is HTML and as the wizard indicates, it is the best way to start and in your final version or later versions you have the option of changing to pdf or word document (among many other output formats! We’ll see this later).

New R Markdown will always pop up with a generic template…

If you see this template you’re good to go. .Rmd new file generic template

Now we’ll get into how our R Markdown file & workflow is organized and then on to editing and styling!

R Markdown Workflow

R Markdown has four distinct steps in the workflow:

create a YAML header (optional)
write R Markdown-formatted text
add R code chunks for embedded analysis
render the document with Knitr

R Markdown Workflow

Let’s dig in to those more:

1. YAML header:

What is YAML anyway?

YAML, pronounced “Yeah-mul” stands for “YAML Ain’t Markup Language”. YAML is a human-readable data-serialization language which, as its name suggests, is not a markup language. YAML has no executable commands though it is compatible with all programming languages and virtually any application that deals with storing or transmiting data. YAML itself is made up of bits of many languages including Perl, MIME, C, & HTML. YAML is also a superset of JSON. When used as a stand-alone file the file ending is .yml or .yaml.

R Markdown’s default YAML header includes the following metadata surrounded by three dashes ---:

title
author
date
output

R Markdown template YAML header

The first three are self-explanatory, but what’s the output? We saw this in the wizard for starting a new document, by default you are able to pick from pdf, html, and word document. Basically, this allows you to export your rmd file as a file type of your choice. There are other options for output and even more can be added by installing certain packages, but these are the three default options.

We’ll see other formatting options for YAML later on including how to add bibliography information, customize our output, and change the default settings of the knit function. Below is an example of how our YAML file will look at the end of this workshop.

---
title: "An Adapted Survey on Scientific Shared Resource Rigor and Reproducibility"
author: UCSB Carpentry
date: "December 16, 2020"
output:
  html_document:
    number_sections: true
bibliography: ../docs/bibliography.bibtex
csl: ../bin/apa-5th-edition.csl
knit: (function(inputFile, encoding) { 
      out_dir <- '../results';
      rmarkdown::render(inputFile,
                        encoding=encoding, 
                        output_file=file.path(dirname(inputFile), out_dir, 'Paper_Template_html.html')) })
---

2. Formatted text:

This one is simple, it’s literally just text narrative formatted by using markdown (more on markdown syntax later). Markdown-formatted text is one of the benefits added above and beyond the capabilities of a regular r script. Any text section will have the default white background in the rmd document. As you might know, in a regular R file, # starts a comment. In R markdown, plain text is just plain narrative text that appears in the document. In R scripts, plain text wants to be code. In R Markdown, you will need to enclose your code in special characters. Any symbols you do see that aren’t regular grammar components are for formatting, such as ##, ** **, and < >.

Rmd text chunks

CHALLENGE 2.3 - Formatting with Symbols (optional)

In Rmd certain symbols are used to denote formatting that should happen to the text (after we “knit” or render). Before we knit, these symbols will show up seemingly “randomly” throughout the text and don’t contribute to the narrative in a logical way. In the generic Rmd document, there are three types of such symbols (##, **, <>) . Each symbol represents a different kind of formatting (think of your text formatting buttons you use in Word). Can you deduce from the surrounding text how these symbols format the surrounding text?
## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the >output of any embedded R code chunks within the document. You can embed an R code chunk like this:
SOLUTION

## is a heading, ** is to bold enclosed text, and <> is for hyperlinks. Don’t worry about this too much right now! This is an example of R Markdown syntax for styling, we’ll dive into this next.

3. Code Chunks:

R code chunks appear highlighted in gray throughout the rmd document. They are surrounded by three tick marks on either side (```) with the starting three tick marks followed by curly brackets {}with some other code inside. The tick marks indicate the start of a code section and the bits found between the curly brackets {}indicate how R should read and display the code (more on this in the Knitr syntax episodes). These are the sections you add R code such as summary statistics, analysis, tables and plots. If you’ve already written an R script you can copy and paste your code between the few lines of required formatting to embed & run whichever piece you want at that particular spot in the document.

rmd template code chunks

4. Rendering your Rmd document:

This is called “knitting”” and the button looks like a spool of yarn with a knitting needle. Clicking the knit button will compile the code, check for errors, and finally, output the type of file indicated in your yaml header. One nice thing about the knit button is that it saves the .Rmd document each time you run it. Your rmd document may not run and render as your indicated output if there are any errors in the document so it also functions somewhat as a code checker.

Try it yourself

We’re going to pause here and see what the R Markdown does when it’s rendered. We’ll just use the generic template, but when we’re working on our own project, knitting periodically while we’re editing allows us to catch errors early. We’ll continue rendering our rmd throughout the lesson to see what happens when we add our markdown and knitr syntax and to make sure we aren’t making any errors.

This is a little preview of what’s to come in the Knitr syntax episodes later on. Click the “knit” button

Before you can render your document, you’ll need to give it a file name and choose what folder you want to save it to. Choose rmd-workshop-paper.rmd as your file name and save the file to your code sub-folder.

First knit choose filename

This is how our hmtl document will render after clicking the knit button and choosing a file name: Knit html output

CHALLENGE 2.4 - echo=TRUE Function (optional)

Can you deduce what the echo=TRUE option stands for?

Solution

The echo=TRUE piece is knitr syntax that sets a global default for the whole paper. This piece of code specifically, echo=TRUE, tells the rmd document to display the R code that generates the plots & analysis when the rmd document is rendered by hitting the “knit” button. Don’t worry too much about this now, we’ll learn more about this syntax in the Knitr Syntax episodes.

Starting our paper

Ok, now let’s start on our own rmd document.

Do this on your own in your new rmd file:

1) Delete EVERYTHING except the yaml header

2) Edit the yaml header to add the title of the paper we’re working on and to add yourself as author.

---
title: "An Adapted Survey on Scientific Shared Resource Rigor and Reproducibility"
author: [Add Your Name Here]
date: "December 16, 2020"
output: html_document
---

3) Navigate to the docs folder and open paper_raw.txt. Copy all the text with either ctrl-a or cmd-a then ctrl-c or cmd-c and paste ctrl-v or cmd-v AFTER the yaml header in our rmd file.

...
output:
  html_document
---
INTRODUCTION
Reproducible research practices include rigorously controlled and documented experiments using validated reagents. These practices are integral to the scientific method, and they enable acquisition of reliable and actionable research results. However, the art and practice of science is affected by challenges 
...

Your file should now look something like this:
Rmd file with raw text

Now we’ll be set for our next episode which is about adding Markdown syntax to style your text sections (white sections)- including headers, bold, italics, citations, footnotes, links, citations etc.

Key Points

Starting a new Rmd File

Anatomy of an Rmd File (YAML header, Text, Code chunks)

How to knit an Rmd File to html

R Markdown Syntax: Headings & Lists

Overview

Teaching: 15 min
Exercises: 15 min

Questions

How does markdown in R compare to markdown in other programs?

How to create headings and sub-headings in R Markdown?

How to create bulleted and numbered lists in R Markdown?

Objectives

Understand how R markdown relates to the markdown universe

Learn how to create headings and sub-headings in R Markdown

Learn how to create bulleted and numbered lists in R Markdown

Intro to R Markdown Syntax

Before we dive into learning R Markdown Syntax, let’s talk a little about the “markdown” part.

R Markdown is a format for writing reproducible, dynamic reports with R which allows you to weave together narrative and code to produce elegantly formatted outputs. In practice, it allows you to use plain text for a document with bits of other things thrown in, but which will ultimately be converted to any number of other languages, for eventual display in the format you desire. It supports dozens of static and dynamic output formats such as HTML, PDF, MS Word.

The text in an R Markdown document is written with the markdown syntax, which is a basic markup language that conveys how text should be displayed. The basic markdown syntax has dozens of flavors, of which R Markdown is one. Most markdown syntax is preserved and works identically no matter what flavor you use. However, the different flavors will have different options or slightly different implementations of certain things.

R Markdown syntax is relatively simple and there are a number of tutorials and cheat sheets available online that you can consult while working on your reproducible report (here is a link explaining Pandoc’s markdown specs). In the next episodes we will be covering a subset of it, focusing on the most common formatting you may need to apply while writing reproducible documents.

First Things First - Line Breaks

It seems strange to have to talk about line breaks when writing text, but this is very important to know for proper rendering of your text. You will need to make sure to add line breaks into your document or text will wrap when it renders (even if you hit enter/return and start typing on a new line in R studio)

You can add line breaks by using :

two spaces at the end of the line
an html break <br>
2 enters/returns (leaving a blank line)

These line breaks will also be important to get your formatting to render correctly. In some cases you MUST have a blank line (just two spaces or a break won’t do the trick). Blank lines will be required before/after all headings, horizontal lines, and lists.

example:

If I'm writing some text
an enter should work for a line break, but doesn't

Here I am writing again,
I need to make certain I have a line break by:  
adding two spaces  
or an html break<br>
or by adding two returns

and I carry on writing

*Notice the spacing difference with 2 returns versus the other two options.

Creating Headings and Subheadings

Most papers or articles need headings and subheadings to distinguish different parts of the paper. We can insert headings and subheadings in R Markdown using the pound sign #. There are six heading/subheading sizes in R Markdown. The number of pound signs before your line of text determines the heading size, 1 being the largest heading and 6 being the smallest.

# Heading 1  
## Heading 2  
### Heading 3 
#### Heading 4  
##### Heading 5  
###### Heading 6 

Displays as:

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Note this is Github’s markdown “flavor” for headers, they look different in R Markdown. However, relative sizing and hierarchy with 1 being the largest and 6 the smallest remains the same.

Tip: Add a Space!

It’s good practice to put a space between the last # and the start of your heading. While R flavored markdown will still render #Title, other flavors of markdown (i.e. github) require a space between the #s and the heading text: # Title.

Numbered Sections

We would like to have numbered section headings in our paper. In order to do that we actually add a bit of code to the yaml section at the top.

Specifically, we will add a return to put html_document: on the next line indented (don’t forget to add the :), enter another line & indent again and then add number_sections: true.
It should look like the following:

```
---
title: "An Adapted Survey on Scientific Shared Resource Rigor and Reproducibility"
author: Add Your Name Here
date: "December 15, 2020"
output:
  html_document:
    number_sections: true
---
```

We want to insert headings and subheadings to divide our paper into more readable parts. Let’s start by adding one at the beginning to start our introduction.

To conform to markup best practices, header 1 # should only be used for the title. From there, for each sub-heading level you use one heading level lower, for example the introduction will be header 2 ##.

Tip: More Heading Convention

For best practices regarding all heading levels, you should never skip a heading level - heading levels inform the hierarchy of your paper compositions, they do not reflect styling choices. For styling you may employ CSS stylesheets, either importing an exisiting “theme”, or creating your own.

In the first line of our paper, make the word “Introduction” into a heading 2 by adding a ## before the line.

## INTRODUCTION

Now let’s knit to see how the heading is formatted.

Oops! for our Introduction we have 0.1 Introduction. The numbering isn’t right here…

What’s going on?

This is an exception in R Markdown. Because an R Markdown document defines the title within the YAML header, you should actually use header 1 # for the next highest header level after the title, (Introduction, Conclusion, etc.). Otherwise, if you use numbered sections, the numbering will be off.

So, let’s try this again, this time with heading 1 #:

# INTRODUCTION

Now we can go to the next section and add a main heading and a subheading. Find the “Materials and Methods” section (right after the introduction) and make the line that says “Materials and Methods” into heading 1 and the lines that say “Survey Overview” and “Data Analysis” into heading 2 for subheadings.

# MATERIALS AND METHODS  
## Survey Overview  
## Data Analysis  

Tip: Finding Content on RStudio

Use Ctrl+F (Windows) or Command+F (Mac) shortcut keys, or from the Edit -> Find to locate content in your paper. We will be using that quite a lot during this workshop.

CHALLENGE 3.1 - Applying Headings and Subheadings

Insert headings and subheadings throughout the rest of the paper.

Make these lines into headings so our paper is split into 5 main sections:

Introduction (already done)

Materials and Methods (already done)

Results and Discussion

Conclusion

References

Make these lines into subheadings:

Survey Overview (already done)

Data Analysis (already done)

Survey Demographics

Current Landscape for Rigor and Transparency in Represented Shared Resources

Core Implementation of Research Best Practices

Strategies for Improving R&R in Core Operation

Think carefully about which heading levels you should use for consistency throughout your paper. *Use the search function in R Markdown ctrl-f or cmd-f to find these lines in the document quickly.
SOLUTION
# INTRODUCTION
# MATERIALS AND METHODS
# RESULTS AND DISCUSSION
# CONCLUSION
# REFERENCES

## Survey Demographics
## Current Landscape for Rigor and Transparency in Represented Shared Resources
## Core Implementation of Research Best Practices
## Strategies for Improving R&R in Core Operation
## Creating Bulleted and Numbered Lists

Time to Knit!

Check how the headings look like in your paper.

Horizontal Lines

If you wish to create divisions between sections, you can insert a horizontal line in using 3 (or more) dashes, asterix, or underlines (---, ***, or ___):

---  

See some paragraph text   
between horizontal lines---  
Now you know markdown  

***    

The above renders as:

See some paragraph text
between horizontal lines—
Now you know markdown

*Note again that displayed here is the github styling for horizontal lines, they look different rendered in R Markdown

Tip: Leave Blank Line Before & After Horizontal Lines

Depending on the platform, the markdown parser may interpret your attempt at a horizontal line as some other styling unless you add a blank line before and after the line. A break <br> may not even work, it should be a completely blank line.

Ok, let’s add a horizontal line in our paper under the title:

--- (yaml end)

---

# INTRODUCTION

Time to Knit!

Check how the horizontal line looks in your paper

CHALLENGE 3.2 - Adding Horizontal Lines (optional)

Add horizontal lines after each section header.
SOLUTION
# INTRODUCTION

***

# MATERIALS AND METHODS

***

# RESULTS AND DISCUSSION

***

# CONCLUSION

***

# REFERENCES

***

Bulleted & Numbered Lists

Academic articles often include lists to make important findings stand out more or to summarize key points for readers. We will learn how to create both unordered lists with bullet points, and ordered numbered lists.

Unordered Bullet Lists

Creating unordered lists is relatively simple. For unordered lists, you can use: asterix, dash or plus characters *, - or +:

* A bullet point  
- Also a bullet point  
+ Still a bullet point   

Outputs as:

A bullet point
Also a bullet point
Still a bullet point

You can also add sub-levels, to create sub-lists by indenting the next list item evenly by two or four spaces:

* A bullet point  
  * Sub-level one  
    * Sub-level two   

Outputs to:

A bullet-point
- Sub-level one
  - Sub-level two

Ordered Numbered Lists

For ordered lists, you use a number with a dot, e.g: 1. Your numbers do not need to be sequential. Markdown will number the item in the order in which they appear rather than their numeric order.

First item in our numbered list   
Second item in our numbered list   
Third item in our numbered list   

The above will appear as:

First item in our numbered list
Second item in our numbered list
Third item in our numbered list

Tip: No ) for Numbered Lists

Markdown parser does not accept parenthesis as a list delimiter, so if you use parenthesis, the output will be the same as above. i.e. 1) outputs as 1..

CHALLENGE 3.3 - Inserting Bullet Points

Now let’s practice creating bullet lists. Search in the paper “it is important to highlight:” and apply bullet points for each of the next 3 sentences.
SOLUTION
* At least 170 (∼80%) respondents use documentation, in the form of quality control and standard operation procedures (SOPs) to support practices. 
* The incorporation of an instrumentation management plan, was not as highly utilized (56%).
* Oversight of data analyses and double-checking results were some of the least widely used ones (26%).
Remember: You can use + or - too.

CHALLENGE 3.4 - Applying Numbered Lists

Use RStudio to locate the paragraph which ends with “in grant applications, as follows:” the next four sentences should be shown as numbered a list.
SOLUTION
1. scientific premise forming the basis of the proposed research 
2. rigorous experimental design for robust and unbiased results
3. consideration of sex and other relevant biologic variables
4. authentication of key biologic and chemical resources

Time to Knit!

Check how the bulleted & numbered lists looks like in your paper.

If needed, you can also combine sub-levels numbers or even combine bullets and numbered items in the same list, by indenting different levels.

Key Points

Heading syntax (#,

Bulleted lists (*, - , or +)

Numbered lists (1., 2., etc.)

R Markdown Syntax: Hyperlinks, Images & Tables

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How do I create hyperlinks in R Markdown?

How do I insert images or tables into R Markdown?

How do I resize images?

Objectives

Learn how to add hyperlinks to an R Markdown document

Find out how to insert images into an R Markdown document

Learn how to add tables into an R Markdown document

Creating Hyperlinks

Hyperlinks are created using the syntax [text](link) with no spaces in between the parentheses and the square brackets.

For example:

[RStudio](https://www.rstudio.com)

RStudio

Now, let’s apply it to the template paper. Find where the “Center of Open Science” is mentioned and link the institution to their official website:

[Center for Open Science](https://www.cos.io/)

Time to Knit!

Check if you hyperlinks are working properly.

Challenge 4.1: Adding links

Let’s add another link to our paper. Now it is your turn! We want to create a hyperlink to the survey platform used in the study Survey Monkey (https://www.surveymonkey.com/).

Solution

[SurveyMonkey](https://surveymonkey.com)

Tip:

You can use html directly in your .rmd document to add a link that will open in a new tab, such as <a href="http://www.ucsb.edu/" >target="_blank"> UCSB</a>. This syntax requires pandoc and link_attributes extension, that is by default included in R Markdown.

Inserting Images

You can add images to an R Markdown report using markdown syntax as follows: ![Image Name](path-to-image-here)

You’ll notice this format is exactly the same as hyperlinks, but with the addition of an ! before the brackets and parentheses.

However, when you knit the report, RStudio will only be able to find your image if you have placed it in the right place - RELATIVE to your .Rmd file. This is where good file management becomes extremely important. We have placed all our images in the figs folder in the R-markdown project folder. In that case, make sure your path starts with ../figs/ along with the correct image name and file extension. Also the closing bracket and the opening parentheses should be close to each other, without any spaces in between.

Tip: Paths to Files

The specification of the list of folders to travel and the file name is called a path. A path that starts at the root folder of the computer is called an absolute path. A relative path starts at a given folder and provides the folders and file starting from that folder. Using relative paths will make a number of things easier. A path is made up of folder names. If the path is to a file, then the path will ends with a file name. The folders and files of a path are separated by a directory separator. There are a few special directory names. A single period . indicates the current working directory. Two periods .. indicates moving up a directory.

In our paper template there are three images (two pie charts) and one bar chart we want to include. Those are named fig1_paper.jpg, fig2_paper.jpg and fig3_paper.jpg.

To start let’s identify where Fig. 1 is mentioned in the paper. We will insert the image right after that. This image will have the caption labeled “FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility.” (that we will be pasting in the chat). We need that caption to render the image.

The markdown should look like:

![FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility.](../figs/fig1_paper.jpg)

Note: A preview of your image should pop up automatically in RStudio if you have the correct relative path. But this will be only true if you type out the code, not if you copy and paste it.

This will output as:

FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility.

Resizing Images

The image you just added looks a little too big, right? We can resize it by adjusting the width and height ratio. Let’s say we want this image to be half of the original size. In order to do that, we will have to add to the syntax:

![FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility.](../figs/fig1_paper.jpg){width=50% height=50%}

This will output as:

CHALLENGE 4.2 - Inserting Images

Locate the places for Fig. 2 and Fig. 3 and add them to the document using the captions below:

FIGURE 2 - Lack of requests for rigor and reproducibility documentation by users of shared resources

FIGURE 3 - Types of tools that cores would like to implement in their operations

*The bar chart should use a ratio of 60% x 80% in order to improve readability.
Solution:
![FIGURE 2 - Lack of requests for rigor and reproducibility documentation by users of shared 
resources](../figs/fig2_paper.jpg){width=50% height=50%}
![FIGURE 3 - Types of tools that cores would like to implement in their 
 operations.](../figs/fig3_paper.jpg){width=60% height=80%}

Time to Knit!

Check how your images look now.

Inserting Tables

We can also use markdown syntax to insert a formatted table into our document. The basic syntax to insert a table looks like this:

Column Header | Column Header
---           | ---              
Cell 1        |  Cell 2       
Cell 3        |  Cell 4          

Output:

Column Header | Column Header — | —
Cell 1 | Cell 2
Cell 3 | Cell 4
Start with the column names/headers. Separate columns with the pipe ( | ) symbol. Right below the column headers use at least three dashes to separate the headers from the cells of the table. Then fill in the contents of the table row by row, separating columns using the pipe ( | ) symbol.

Note: Table Spacing

the spacing between cells in each row can help with readability in the R Markdown file, but is not necessary to get the correct output. As long as the pipe symbol is there, R Markdown will automatically format the table in your output. The following syntax will print the same table as the spaced out table above.
Column Header|Column Header
--- |---
Cell 1|Cell 2
Cell 3|Cell 4

You can use text emphasis in the table using the same syntax as you use when emphasizing other plain text. The following change will bold the column headers in the output.

 **Column Header** | **Column Header**
 ---               | ---        
 Cell 1            |  Cell 2 
 Cell 3            |  Cell 4     

Output:

Column Header	Column Header
Cell 1	Cell 2
Cell 3	Cell 4

Let’s create Table 1 in our paper in section 3.2 Current Landscape for Rigor and Transparency in Represented Shared Resources.

Start with the column headers “Category” and “N” in bold. Then add the separator between the header and the cells. We’ll also type out the first two rows of the table.

**Category** | **N**
--- | ---
Poor sample quality from users/sample variability/limited biological material | 51
Lack of well-trained principle investigators and lab members/Poor oversight | 45

CHALLENGE 4.3 - Complete the Table

Finish Table 1 by adding the rest of the rows.

SOLUTION

**Category** | **N**  
--- | ---  
Poor sample quality from users/sample variability/limited biological material | 51  
Lack of well-trained principle investigators and lab members/Poor oversight | 45  
Poor experimental design: Lack of sufficient replicates/inadequate sample size/lack of adequate controls | 43   
Inadequate standardization of protocols or guidelines, and data analysis | 43   
Cost and time | 39   
Failure to leverage the core’s expertise/following the core’s advice/no consulting beforehand | 23   
Inadequate documentation of experiments/data management | 19   
Instruments: maintenance, upgrades, changes | 15   
Responses that could not be assigned to a category | 11  

Time to Knit!

Check how the table you have just created looks like.

Note: Advanced & Interactive Tables

There are some packages that allow you to make more advanced and interactive tables. Here are some references for these packages: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html and https://www.htmlwidgets.org/showcase_datatables.html

Key Points

R Markdown syntax for hyperlinks

R Markdown syntax for images

You can resize images with R Markdown

You can easily create basic tables with R Markdown

R Markdown Syntax: Emphasis, Formulas & Footnotes

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How do I emphasize text in an R Markdown document?

How do I add LaTex formulas?

How can I make superscript text?

How can we add footnotes?

Objectives

Learn how to apply emphases to words or phrases in R Markdown

Understand the power of LaTeX (Lay-techhh) for mathematics formatting

Learn how to add equations and formulas in R Markdown

Learn how to add footnotes in R Markdown

Adding Emphasis

Another way we can customize our R Markdown file output is by emphasizing words or phrases. In our paper, there are several instances of words and phrases that are italicized or bolded. We use markdown syntax to add these emphasis to words or phrases by surrounding them with matching symbols. Different symbols give different emphases effects.

Put text in italics with single asterisks or single underscores.

*italics* will give output italics
_italics_ will give output italics

Make text bold with double asterisks or double underscores.

**bold** will give output bold
__bold__ will give output bold

Let’s try that italics applying that to two organization names that appear in the text. First let’s put the Association of Biomolecular Resource Facilities in italic. _Association of Biomolecular Resource Facilities_ or *Association of Biomolecular Resource Facilities* will render Association of Biomolecular Resource Facilities. Now try the same with the Committee on Core Rigor and Reproducibility.

CHALLENGE 5.1 - Applying Bold Emphasis

For testing out making the text bold, let’s search for the mention to the Transparency and Openness Promotion in the paper and make them bold.

SOLUTION

Either __Transparency and Openness Promotion__ or **Transparency and Openness Promotion** will render Transparency and Openness Promotion

Which symbol do you prefer to use? Do you prefer to stick with the same for both emphases?

Time to Knit!

Check how the emphases you have just applied looks like in your paper.

Tip: When you realllllly want to emphasize something

you can combine emphasis styles by combining the symbols surrounding the word or phrase. Make text bold and italicized with triple asterisks or triple underscores.
***super emphasized*** will give output super emphasized ___super emphasized___ will give output super emphasized

Adding Blockquotes

R Markdown also allows you to emphasize a pull quote using blockquotes. Let’s say you want to transform the first sentence in the paper into one and also apply italic emphasis. For this you will have to add a carrot “>” (greater-than symbol) and the asterisc or underscore for the italics, as demonstrated below:.

>_"I have seen further it is because I have stood on the shoulders of giants"_ (Isaac Newton)

The output you will get shoul look like that:

“I have seen further it is because I have stood on the shoulders of giants” (Isaac Newton)

Time to Knit!

Check for the block quote in your paper.

Adding Equations & Formulas

LaTeX (pronounced Lay-techhh) is a comprehensive document formatting and preparation system. It is very powerful, but also famously difficult to use. A few journals require that papers be written in LaTeX, and some fields, such as high energy physics, use it exclusively. Why? Because, despite its difficulty, its mathematics formatting (its formatting of equations and formulas) is better than anything else out there.

RStudio has a most wonderful feature that allows you to use just the mathematics formatting portion of LaTeX without having to use LaTeX as a whole, without having a LaTeX installation on your system, and without having to understand LaTeX generally.

An inline formula is delimited with single dollar signs, as in $ 2+2 = 4 $ . A display equation uses double dollar signs, $$ 2+2 = 4 $$. What goes between the single or double dollar signs is LaTeX math formatting. This is its own language, and you just have to learn it. A decent online reference (Overleaf is an online LaTeX editor):

https://www.overleaf.com/learn/latex/Mathematical_expressions

The canonical LaTeX reference, written by the author (hardback, but viewable online):

But, know that the LaTeX math language is very intuitive once you get a feel for its style. Put your math head on, not your programming head. If you want to say that a equals b times c, in a programming language you might write something like a = b*c, but in LaTeX you would say $ a = bc $ . Spaces generally don’t matter in LaTeX; it “understands” your formula and uses rules to determine how to display things.

Challenge 5.2 - Adding Formulas

Let’s add the following text and formula to our data analysis section where the paper talks about confidence level:

Using the sampling error formula

$$e = { Zp(1-p) \over \sqrt{n} }$$

we compute that at a 95% confidence level (i.e., Z=1.96), with base probability p=1/2 and sample size n=243, the margin of error is +/-3%.

Time to Knit!

Check how the formula just rendered in your paper.

Double click on the equation. Notice how RStudio gives you a preview of it. Nice! The formula here uses curly braces for grouping (kind of like invisible parentheses). \over gives a fraction with a big horizontal line. Try replacing that with just / for an alternative rendering.

Challenge 5.3 - Inline Formulas

For inline formulas, in the same section of text replace Z=1.96 with $Z=1.96$ and similarly. Notice the different formatting, and notice again RStudio’s preview when you hover over the formula. In LaTeX, you would say $ \pm 3 \% $ for the +/-3%. Later we’ll see how to have R compute this value inline.

Time to Knit!

Check how the new formulas just rendered in your paper.

To appreciate the beauty of LaTeX’s typesetting, just look at how formulas are typeset by other systems. Here’s an example: https://www.educba.com/confidence-interval-formula/

RStudio’s facility to bring in LaTeX for math formatting makes it a wonderful authoring environment for math-rich papers that are not computational and have nothing to do with R at all.

Adding Footnotes

We can add footnotes to our paper using the ^. Similar to adding emphasis, putting carrot symbols around your text will print the output as a superscript.

^superscript^ will give output: ^superscript

footnote^1^ will give output: footnote¹

In our paper, we will create a footnote in the introduction when we reference a notice from the U.S. National Institutes of Health (NIH). We will add a footnote after the word “notice” by adding ^1^ right after the word.

notice^1^

Challenge 5.4: Creating Footnotes

Let’s add a footnote to our paper. Right before the References section, add a superscript to distinguish the footnote and match it with the inline footnote. The text to the footnote will be: Through these four elements, the NIH intends to “enhance the reproducibility of research findings through increased scientific rigor and transparency” https://ori.hhs.gov/images/ddblock/ORI%20Data%20Graphs%202006-2015.pdf`

Solution

^1^Through these four elements, the NIH intends to...

Time to Knit!

Take a look at the footnote you have just created.

Key Points

You can add *italicized* and *bolded* texts in R Markdown

There is an extensive LaTeX guideline for mathematics formatting

You can add create superscript text & linked footnotes

R Markdown Syntax: Citations & Bibliography

Overview

Teaching: 20 min
Exercises: 10 min

Questions

How to include citations?

How to create a list of references?

How to apply different citation styles?

Objectives

Learn how to include citations

Create a list of references

Learn how to apply different citation styles

Getting the Bibliography

Let’s now move our attention to include citations and list out the references (bibliography) in our paper example. Before adding citations we need to list out all citable items and set a bibliography. In order to add a bibliography we will need to include a bibliography file in the YAML header. Bibliography formats should be specified in one of the formats supported by Pandoc on RStudio:

MOD: .mods
BibLaTeX: .bib
BibTeX: .bibtex
RIS: .ris
EndNote: .enl
EndNote XML: .xml
ISI: .wos
MEDLINE: .medline
Copac: .copac
JSON citeproc: .json

Note that bibliography formats are not the same as citation styles. These are specified by a CSL (Citation Style Language) that we will cover later on. For now, we will stick to the bibtex format supported by Google Scholar, which will be used to retrieve example references for our practice paper. If you use a reference manager such as Zotero, Endnote, Mendeley etc. to manage your library, you can also export the .bibtex file directly, with all citable items you consider to include in the paper.

A *.bibtex file consists of bibliography in plain-text format. Go to your R-markdown project folder, then paperToRmd then docs and open the bibliography.bibtex. We already have a couple of citable items listed in this file. Let’s take a closer look to understand their anatomy:

@misc{nature_nature_2018,
	type = {Repository},
	title = {Nature {Reproducibility} survey 2017},
	url = {10.6084/m9.figshare.6139937.v4},
	journal = {Figshare},
	author = {Nature},
	year = {2018},
}

@article{springer_reality_2016,
	title = {Reality check on reproducibility.},
	volume = {533},
	doi = {10.1038/533437a},
	number = {7604},
	journal = {Nature},
	author = {Springer, Nature},
	month = may,
	year = {2016},
	pages = {437},
}

Note that the first line specifies the type of citation, MISC for miscellaneous, and Article for papers, along with the main entry which will be used to link in-text citations further in the episode. The other lines include the metadata that describes different parts of the bibliography, such as the date, the author, etc.

Let’s now understand the process of how to get a bibtex, using as example the item 1,500 scientists lift the lid on reproducibility authored by Baker (2016), following the steps below:

1) Search for the first paper listed on Google Scholar by copying and pasting the title of the paper. Make sure to use quotations to better filter results and get the right paper. A tricky part is that if you want more complete files that will render to more accurate citations you have to check for all existing versions of the same result (if any). Google Scholar amasses them altogether into one in the link “All 6 versions”, listing out different repositories and websites the paper can be found.

Fig. 7.1 - Bibtex from Google Scholar (1)

2) Click on the link to check for other existing versions. The first result does not include the journal name, so let’s choose the second one instead, which won’t require us to adjust the metadata.

Fig. 7.2 - Bibtex from Google Scholar (2)

3) When you click on the quotation icon right below the version you chose, it will prompt a window that will give you the option to choose BibTeX.

Fig. 7.3 - Bibtex from Google Scholar (3)

4) Choose the BibTeX option. It will prompt a file in your browser, like this:

@article{baker_1500_2016,
	title = {1,500 scientists lift the lid on reproducibility},
	volume = {533},
	url = {http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970},
	doi = {10.1038/533452a},
	abstract = {Survey sheds light on the ‘crisis’ rocking research.},
	language = {en},
	number = {7604},
	urldate = {2020-11-13},
	journal = {Nature News},
	author = {Baker, M.},
	month = may,
	year = {2016},
	pages = {452},
}

5) If you did not have that item already you would simply copy and paste this to your bibliography.bibtex file. The order would not matter, since references will be listed according to the style. It is strongly recommended to have entries separated by blank lines though. That makes them look like paragraphs, and easier to locate.

Tip: How can you get many bibtex at once?

Alternatively, you can also conduct multiple searchers and save results to your personal library on Google Scholar and export multiple items as .bibtex files in a bulk.

We already have .bibtex file with all items we need to proceed. But how will RStudio be able to link this file with the .rmd file on the other tab? Well, remember we mentioned before that we should include a bibliography file in the YAML header? In this case we will add to the existing YAML the following information:

---
bibliography: "../docs/bibliography.bibtex"
---

Note again the importance of the relative path calling the right folder. The connection between the two files is all set to get us started. For now, we won’t need to specify which format we would like to use.

By default, Pandoc will use a Chicago author-date format for citations and references and we will stick with that for now, but later we will learn how to change citation styles.

Adding Citations

Each item in the bibliography.bibtex file starts with a @ entry which specifies the type of document followed by a curly opening bracket which specifies the key that should be included to create in-text citations.

We will call a citation using the @ followed by the key. It is important to use this exact key in the .bibtex to render correct mentions in the text. Let’s see how that should be included in the R Markdown syntax for different types of citations.

Single citation

At the end of the first paragraph on the Introduction, where you find [citation example 1]. Remove this info and let’s practice adding our first citation. Let’s use the item with the key freedman_2020_2017 from the bibliography.bibtex file in the second tab. In order to cite this work you should add this key after in between brackets, as follows:

[@freedman_2020_2017]

The output you will get in Chicago style will be:

(Freedman, Venugopalan and Wisman 2017)

Challenge 7.1: Adding single citation

Locate [citation exercise 1] in the document, and replace it by a citation to Munafo’s (2017) study.

Solution

[@munafo_manifesto_2017] The output you will get in Chicago style will be: (Munafo, 2017)

Multiple Citations

If you want to add multiple citations in a row (same parentheses) you will have to separate keys by semicolon. So let’s add Bustin (2014) and Freedman, Venugopalan and Wisman (2017) to [citation example 2]

[@bustin_reproducibility_2014; @freedman_2020_2017]

The output you will get in Chicago style will be:

(Bustin, 2014; Freedman, Venugopalan and Wisman, 2017)

Tip: You can simplify items key if you want. For instance, you can keep only the first author and year, but for the purpose of the exercises we will keep keys exactly how we got them from Google Scholar.

Challenge 7.2: Adding multiple citations

Now it is your turn! Locate in the document the note [citation exercise 2]. Remove it and include a citation to Baker (2016) and Freedman, Venugopalan and Wisman’s (2017) studies.

Solution

[@baker_1500_2016; @freedman_2020_2017] The output you will get in Chicago style will be: (Barker, 2016; Freedman, Venugopalan and Wisman, 2017)

Keeping Authors in the narrative

There are cases authors are announced in the text, and therefore their names shouldn’t go between parentheses. Let’s say you want to add a citation to support the statement about Springer’s survey. In order to keep the institutional author out of the parentheses, we should add a hifen - before the @ followed by the citation key. Let’s add that to the [citation example 3] remark on the paper.

In that case you’ll have to first type in the last name(s) of the author(s) as only the year will be rendered. For this example the author is an organization, so let’s type Springer, then enter the key for the item, as follows:

“A recent survey conducted by Springer [-@springer_reality_2016] reported that 90%…”

The output you will get will be:

“A recent survey conducted by Springer (2016) reported that 90%…”

Challenge 7.3: Keep author(s) in the narrative

Let’s practice now how to insert citations outside the parantheses! In the same paragraph, where you find [citation exercise 3] add a citation (year only) to your mention to Nature’s survey in order to indicate the dataset you are referring to.

Solution

[-@nature_nature_2018] Did you remember to type in the organization who authored the publication? If so, the output you will get should be: Nature (2018)

Time to Knit!

Check how the citations you have just created renders in your paper.

Setting the Reference List

All cited items will be listed under the section References which you created before while practicing headings and subheadings. Items will be placed automatically in alphabetical order.

Adding an item to a bibliography without citing it

By default, the bibliography will only display items that are directly referenced in the document. If you want to include items in the bibliography without actually citing them in the body text, you can define a dummy nocite metadata field in the YAML and put the citations there.

nocite: |
  @item1, @item2
  

To demonstrate that I will add a new bibtex from my Google Scholar Library and specify the @keyin the YAML. Note that this will force all items added in the YAML to be displayed in the bibliography.

Changing Citation Styles

There are a number of existing citation styles (CSL), but we won’t cover their differences and applications during this workshop. To use another style, we will need to specify a CSL (Citation Style Language) file in the metadata field in the YAML header.

Let’s assume that we want to use the APA 5th edition apa-5th-edition.csl instead. In order to do so, you have to make sure the CSL you want to apply is correctly named in the YAML, matching the .csl file saved in the project folder and opened in RStudio. We have done that for you. The csl for APA is saved in the bin folder, so using the relative path to the file, we will include that to the YAML:


csl: ../bin/apa-5th-edition.csl

Time to Knit!

Knit the document and note that citations and references now conform to the APA style.

Tip: Change the CSL default

You can override this default by copying a CSL style of your choice to default.csl in your user data directory.The CSL project provides further information on finding and editing styles. More information about CSL can be found here https://docs.citationstyles.org/en/1.0.1/primer.html.

Key Points

R Markdown supports different citation styles

Finding & Applying Existing Journal Templates

Overview

Teaching: 10 min
Exercises: 0 min

Questions

What is the advantage of using the rtciles package?

How to find existing journal templates?

Objectives

Learn about the rticles package and its functions

Locate existing templates for creating R Markdown papers and reports

Installing the “rticles” Package

We have learned how to start a new document on RStudio and apply some important R Markdown syntax to format your reports. But, let’s say you are writing a paper and you already know which journal you are submitting it to. Writing it in your own style and then formatting prior to submission would be too time-consuming, right? The good news is that RStudio can make our lives easier! Through a package called “rticles” you can access a number of existing journals’ templates that will let you easily and quickly format and prepare your paper draft for peer review.

Let’s take a look at that! On RStudio, install the package using the command install.packages("rticles") or by clicking install on the right-hand side pane and typing rticles. Once the installation is completed, use the plus icon at the upper-left side of your screen to create a new document or proceed with File>New File>Markdown. This will prompt the window bellow:

Fig. 8.1 - rticles Templates (Step 1)

Clicking on “from template” will prompt a couple of dozen templates listed as {rticles}. Let’s choose the Biometrics Journal template and then, OK.

Fig. 8.2 - rticles Templates (Step 2)

Note that along with the skeleton of the paper you will see a message on top indicating additional packages you may need to install for that particular template. Creating templates and adding other templates is beyond the scope of this workshop, but that is also possible. To learn more you can check the link Using R Markdown Templates on the right-hand side or check the rticles package documentation.

Key Points

The rticles pachage provides some journal templates

Whenever available, if you already know which journal you are submitting to, start your paper using the template

Knitr Syntax: Inline Code & Code Chunks

Overview

Teaching: 40 min
Exercises: 15 min

Questions

What is “Knitr”?

When would I want to add inline code?

How to add inline code?

When would I want to use code chunks?

How do I add code chunks?

Objectives

Understand the basic functions of Knitr

Learn how to add inline code to your document

Learn how to add code chunks to your document

Distinguish when inline code vs. code chunks would be appropriate

Understand how to change the output characteristics of code chunks

What is Knitr?

Knitr is the engine in RStudio which creates the “dynamic” part of R markdown reports. It’s specifically a package that allows the integration of R code into the html, word, pdf, or LaTex document you have specified as your output for r markdown. It utilizes Literate Programming to make research more reproducible. There are two main ways to process code with Knitr in R Markdown documents:

Inline code
Code Chunks

Adding Inline code

Inline code is best for calculating simple expressions integrated into your narrative. For example, use inline code to calculate an error margin or summary statistic, such as # of observations, of your dataframe in your results section. One of the benefits of using this method is if something about your data set changes (like leaving out NAs or null values) the code will automatically update the calcuation specified.

We’re going to go ahead and change the LaTex code we used to input the error margin and calculate it dynamically using r code. So, instead of $ \pm 3 \% $ to display our error margin as +/-3%, let’s add this:

`r round(1.96*0.5*(1-0.5)/sqrt(243)*100)`%

Notice how we put the % sign after the ticks. In this case the percentage sign should be plain text. If we had put it inside the ending backtick (`) r would have attempted to calculate the modulo since that’s what that symbol stands for in R.

Time to Knit!

See that inline code evaluates to calculate the error margin of +/- 3%.

Where else can we add inline code? We can replace observation counts!

i.e. “There are #r nrow(my_data) individuals who completed the survey”

Now, we’re going to find one such example in our data frame and convert a static number or equation to inline code. In our paper text we read “a total of 243 individuals from 21 countries completed this section.” Here we can use inline r code to calculate the total responses instead of typing it in.

However, because we don’t have access to the original dataset (and thus only aggregate counts) we can’t use nrow() to count our number of observations. we will count the column count in our data1 dataframe which sums the responses relating to how familiar respondents are with current NIH guidelines on reproducibility and is used to create Fig 1. We will use r sum(data1$count) in between the tick marks instead to total the count for each level of familiarity (“Very Aware”, “Somewhat Aware”, “Completely Unaware”).

We will add the inline code to the sentence in question:

a total of `r sum(data1$count)` individuals from 21 countries completed this section.

Output:

"a total of 242 individuals from 21 countries completed this section."

Time to Knit!

See that the r inline code evaluates in the sentence.

Oh! Wow we were off on out total count by one anyway, good thing we added this inline code!

Tip: Inline code cannot span lines

You need to be sure that these in-line bits of code aren’t split across lines in your document. Otherwise you’ll just see the raw code and not the result that you want.

CHALLENGE 9.1 - Converting a static number to inline code

There are two more spots in the paper where the count 243 was stated (search ‘243’ or look just around the paragraph we just edited) Find both and replace with code. What part of the paper is that?
SOLUTION
1. the margin of error is ±`r round(1.96*0.5*(1-0.5)/sqrt(sum(data1$count))*100)`%
Note: since this we just added inline code to calculate the error margin, we can just add this snippet to count the total respondents, this means we only need to substitute sum(data1$count) for 243 (and don’t need to add the backticks and r a second time.
2. sample size $n=`r sum(data1$count)`$
Note: Look at that! you can add r inline code in LaTex formatting, it evaluates the r code and then displays in LaTex format!

Time to Knit!

Let’s make sure everything looks right for our inline code

Inserting Code Chunks

Code chunks are better when you need to do something more sophisticated with your code, such as building plots or tables. There is also syntax which allows you to change how that code gets rendered. We’ll learn more about that as we walk through the “anatomy” of a code chunk.

Basic Anatomy of the Code Chunk

You can quickly insert chunks like these into your file with:

the keyboard shortcut Ctrl + Alt + I (OS X: Cmd + Option + I)
the Add Chunk command in the editor toolbar
or by typing the chunk delimiters {r} and ```.

The most basic code chunk looks like so:

Other than our backticks ``` for code chunks that surround the code top and bottom, the only necessary piece is the specified language (r) placed between the curly brackets. This indicates that the language to read the code is R.

Fun fact: Other Programming Languages

Although we will (mostly) be using R in this workshop, it’s possible to use other programming or markup languages. For example, we have seen that we can use LaTeX code for equations. You can also use python too, and we (may) show an example with css. Other languages include: sql, julia, bash, and c, etc. It should be noted however, that some languages (like python) will require installing and loading additional packages.

Add a code chunk

Ok, let’s add some code! Earlier, we added three images to our document. Now, images of our plots are great and all, but since R Markdown allows us to evaluate live code it would be more reproducible to use code chunks to display those plots. Like with our inline code, this assures that if there are any changes to the data, the plots update automatically. This also makes our life easier because when there’s a change we don’t have to re-generate plots, save them as images and then add them back in to our paper. This will potentially help prevent version errors as well! So we’re actually going to go ahead and convert a few of our plots to code chunks.

We’ll start by typing our our starting backticks & r between curly brackets. (in your own workflow you may want to add the ending three backticks as well so you don’t forget after adding your code):

Now, let’s open our plot-figure-1.r file in our code folder. Copy the code and paste it in between the two lines with backticks.

code chunk with plot1 code

Tip:

There’s actually a button you can use in the RStudio menu to generate the code chunks automatically. Automatic code chunk generation is available for several other languages as well. Also, you can use the keyboard shortcut ctrl+alt+I for Windows and command+option+I for Mac.

Run your code

Now, to check to make sure our code renders, we could click the “knit” button as we have been doing. However, with the code chunks we have other opportunities for rendering.

1) Knit button - knitting will automatically run the code in all code chunks

2) Run from Rmd file (green play button on the right top corner)

code chunk with plot1 code

3) Run menu

run code menu

4) Keyboard shortcuts:

Task	Windows & Linux	macOS
Run all chunks above	Ctrl+Alt+P	Command+Option+P
Run current chunk	Ctrl+Alt+C	Command+Option+C
Run current chunk	Ctrl+Shift+Enter	Command+Shift+Enter
Run next chunk	Ctrl+Alt+N	Command+Option+N
Run all chunks	Ctrl+Alt+R	Command+Option+R
Go to next chunk/title	Ctrl+PgDown	Command+PgDown
Go to previous chunk/title	Ctrl+PgUp	Command+PgUp

Time to Knit!

Use one of the above options to run your code.

Name your code chunks

While not necessary for running your code, better practice is to give a name to each code chunk:

{r chunk-name}

Some things to keep in mind

The chunk name is the only value other than r in the code chunk options that doesn’t require a tag (i.e. echo=FALSE)
The chunk label has to be unique (i.e.you can’t use the the same name for multiple chunks)

We’ll see in a bit where this code chunk label comes in handy. But, for now let’s go back and give our first code chunk a name:

{r fig-1}

Tip: Don’t use spaces, periods or underscores in code chunk labels

Try to avoid spaces, periods (.), and underscores (_) in chunk labels and paths. If you need separators, you are recommended to use hyphens (-) instead. For example, setup-options is a good label, whereas setup.options and chunk 1 are bad; fig.path = ‘figures/mcmc-‘ is a good path for figure output, and fig.path = ‘markov chain/monte carlo’ is bad. See more at: https://yihui.org/knitr/options/

Code Chunk Options

There are over 50 different code chunk options!!! Obviously we will not go over all of them, but they fall into several larger categories including: code evaluation, text output, code style, cache options, plot output and animation. We’ll talk about a few options for code evaluation, text output and plot output specifically.

Again, The chunk name is the only value other than r in the code chunk options that doesn’t require a tag (i.e. the “= VALUE” part of option = VALUE). So these chunk options will always require a tag whose syntax looks like:

{r chunk-label, option = VALUE}

the option always follows the code chunk label (don’t forget to add a , after the label either).

Some common options:

eval = (logical or numeric) TRUE/FALSE to evaluate (or not) or a numeric value like c(1,3) (only evaluate expressions 1 and 3).
echo = (logical or numeric - following the same rules as above) whether to display source code or not.
warning = (logical) whether to display the warnings in the output (default:TRUE). FALSE will output warnings to the console only
include = (logical) whether to include the chunk output in the output document (default TRUE)
message = (logical) whether or not to display messages that appear when running the code (default TRUE)

CHALLENGE 9.2 - Rendering Codes

How will some hypothetical code render given the following options? {r global-chunk-challenge, eval = TRUE, include = FALSE}

SOLUTION

The expressions in the code chunk will be evaluated, but the outputed figures/plots will not be included in the knit document.
When might you want to use this?
If you need to calculate some value or do something on your dataset for a further calucation or plot, but the output is not important to be included in your paper narrative.

CHALLENGE 9.3 - add options to your code

Add the following options to your code:
echo = FALSE, message = FALSE, warning = FALSE

What will this do?

SOLUTION

These options mean the source code will not be printed in the knit html document, messages from the code will not be printed in the knit html document, and warnings will not be printed in the knit html document (but will still output to the console). Plots, figures or whatever is printed by the code WILL show up in the final html document.

Time to Knit!

Make sure the options you added to your code chunk seem right.

Global Code Chunk Options:

With our first plot we set the options separately. However, we may end up with quite a few code chunks in our paper and it might be a lot of work to keep track of what options we’re using throughout the paper. We can automate setting options by adding a special code chunk at the beginning of the document. Then, each code chunk we add will refer to the options in this special when it runs.

To set global options that apply to every chunk in your file, call we will call knitr::opts_chunk$set() in a new code chunk right after our yaml header (name the new code chunk setup. Knitr will treat each option that add to this call as a global default. However, we will need to set the options for this code chunk in the first place! so we’ll use echo = FALSE. Then in the () after the knitr::opts_chunk$set() add the three options we used for our first code chunk.

Add to your file (with backticks):

{r setup, echo = FALSE}
knitr::opts_chunk$set(echo = FALSE, mesage = FALSE, warning = FALSE)

Alright! now let’s go back and remove the options we set in the individual code chunks & since we’ve set the global options in the document instead.

Time to Knit!

Again, let’s make sure our global options look right after knitting.

Tip: Yaml chunk options

We can also tweak some settings in our yaml which changes how code chunks are displayed. We’re not going to get into this in the workshop, but many of the same options you set in your global code chunk settings are also configurable in the yaml.

load our libraries and data “globally”

We can actually make our lives easier in one other way too. So far we’ve loaded the library tidyverse and dataframe data1 we need in the first code chunk. Now if we want to replace, say Figure 3 (which we will do next), we would load tidyverse and the data for Figure 3, meaning we would be loading tidyverse for a second time unecessarily. This is because once libraries and data are loaded they are available for the rest of the rmd document.

Instead, we can load libraries and data at the beginning of our document which makes it available for all other figures or calculations and lets us avoid the repitition. This also makes it easier for us to keep track of all the libraries and data we need to use in any given document. If anything needs to be tweaked, we don’t need to search through every code chunk in our rmd document to make a change.

Let’s add our libraries and data to a code chunk at the top of the document (and we can take this code out of Fig-1):

#load libraries
#tidyverse for plots
library(tidyverse)

#load data
#data for figure 1
data1 <- read_csv("../data/figure-1-data.csv", col_types="fi")
#data for table 1
table1 <- read_csv("../data/table.csv")

It’ll look like the following: load libraries & data

Time to Knit!

Make sure our code runs for Figure 1 now that we moved it around.

CHALLENGE 9.4 - Change the Fig 3 image to code

Now, let’s add the code to regenerate Figure 3 from the r script plot-figure-3.r in the code folder:

1) load the data for the figure in the designated code chunk at the top of our file.
2) Make a new code chunk where you want to replace the image for Figure 3.
3) Give it the name: fig-3
4) Don’t worry about options! we already set those in the global options.

SOLUTION

from the plot-figure-3.r r script, copy and paste the code to read figure-3-data.csv to load-data-libraries:

then, create a new code chunk at the spot in the paper we want figure 3 and paste in the rest of the code from plot-figure-3.r:

Make sure the code chunk is named fig-3

Time to Knit!

Make sure the code runs for Figure 3

Tip: Overiding global options

What if you want most of your code chunks to render with the same options (i.e. echo = FALSE), but you just have one or two chunks that you want to tweak the options on (i.e. display code with echo = TRUE)? Good news! The global options can be overwritten on a case by case basis in each individual code chunk.

CHALLENGE 9.5 (optional) global & individual code chunk options

How would appear in our html document if we knit a code chunk with the following options?
{r challenge-5, warning = TRUE, echo = TRUE}

…considering the global chunk setting’s were as listed: knitr::opts_chunk$set(echo = FALSE, include = FALSE)

SOLUTION

In this case, the global settings are set so neither the code nor the output will display. However, the individual chunk reverses the echo setting so the code will display, and it also indicates that any warnings the code renders should output too. The outputs of the code would still not be displayed (include = FALSE) The hypothetical situation for this configuration may be for debugging while writing the rmd document.

Key Points

What is Knitr?

Inline code

Code chunks

Code chunk options

Global code chunk options

Knitr Syntax: Styling Code Outputs

Overview

Teaching: 35 min
Exercises: 5 min

Questions

How do I make better styled tables in R Markdown?

How do I add captions to code generated figures?

How do I consistently size and align images and figures?

How do I cross reference figures & tables?

Objectives

Create better styled tables

Add captions to code-generated figures, tables, images

Globally size and align images and figures

Use cross referencing

Create better tables in R Markdown

Our markdown table is formatted weirdly after knitting. Unfortunately, it isn’t possible to style tables created with basic markdown syntax. In order to create a table that is stylable, we need to save it as a dataframe and use a package function to style it. There are MANY packages that allow you to create tables in Rmd. See list (https://bookdown.org/yihui/rmarkdown-cookbook/table-other.html)

Restyle the table with kable

We will use kable to recreate our table. kable() is a function in the knitr package. kable() creates a basic stylable table.

1) load data & package in load-data-libraries chunk:

```{r load-data-libraries}   
## ADD to existing libraries & data  
 
#load knitr for kable table
library(knitr)

###

#load table 1 data
table1 <- read_csv("../data/table.csv")` 

2) Delete the existing table (leave the caption for now)

3) create a new code chunk in place of the table, name it table-1 4) call kable() function on table1 dataframe

```{r table-1}   
kable(table1)  

Time to Knit!

Check to see if our table generated properly

your code chunk should output as so:

Category	N
Poor sample quality from users/sample variability/limited biological material	51
Lack of well-trained principle investigators and lab members/Poor oversight	45
Poor experimental design: Lack of sufficient replicates/inadequate sample size/lack of adequate controls	43
Inadequate standardization of protocols or guidelines, and data analysis	43
Cost and time	39
Failure to leverage the core’s expertise/following the core’s advice/no consulting beforehand	23
Inadequate documentation of experiments/data management	19
Instruments: maintenance, upgrades, changes	15
Responses that could not be assigned to a category	11

Note: Styling kable tables

You can use the argument format to specify between html, latex, pipe, or rst styles. This can also be set in the global options with knitr.table.format. For additional styling, there is a function called kable_styling() to change the look of your kable tables. https://www.rdocumentation.org/packages/kableExtra/versions/1.3.1/topics/kable_styling

Note: Other table packages

Even though we can do a bit more styling with kable(), it’s still pretty basic. To do more complex tables, kableExtra package was developed. This especially allows more control over formatting. There are other multiple other packages as well that help with table formatting officer for Word and xtable are two examples. officer: https://www.rdocumentation.org/packages/officer/versions/0.3.15 https://bookdown.org/yihui/rmarkdown-cookbook/table-other.html

Note: Styling Kable tables

You can use the argument format to specify between html, latex, pipe, or rst styles. This can also be set in the global options with knitr.table.format. For additional styling, there is a function called kable_styling() to change the look of your kable tables. https://www.rdocumentation.org/packages/kableExtra/versions/1.3.1/topics/kable_styling

Add captions to code-generated tables & plots

Also, we learned how to add captions to images earlier on, but now, how do we add captions to the code-generated plots and the new table we just added? It is a little different between tables and code-generated plots/images.

Add captions to code-generated tables

Ok, let’s start with the table. The kable() function has an argument called caption.

1) cut the orignial table caption from it’s spot in the paper 2) add to the kable function caption = and paste the caption in between "s

caption="Table 1. Major challenges to rigor observed in shared resources"

code will look like so:

```{r table-1}   
kable(table1, caption="Table 1. Major challenges to rigor observed in shared resources")  

Time to Knit!

Did the caption output for Table 1?
Hint: it will appear above the table.

Add captions to code-generated plots, figures & images

To add captions for Figure 1 and 3 we’re going to go back to our favorite: code chunk options. Anytime after defining the code chunk name we can add the chunk option: fig.cap = (so either before or after the other options, order doesn’t matter). Notice that the syntax is slightly different than our other options but it has the same = . Instead of indicating TRUE/FALSE however, we can type our caption right after the = in quotes fig.cap = "". NOTE: we are adding this to the code chunk options, NOT as an argument to the code like for the table.

1) type in fig.cap = "" after the chunk label (and a comma) 2) paste or type in the caption
FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility.

Your code chunk options should now look like this:

{r fig-1, fig.cap="FIGURE 1 - Knowledge and awareness of the current NIH guidelines on rigor and reproducibility."} #code for figure 1 here

Time to Knit!

Check to see if the caption for Figure 1 appeared

CHALLENGE 9.1 - add a caption for Figure 3

Add the caption to the code for Figure 3:
FIGURE 3 - Types of tools that cores would like to implement in their operations.

SOLUTION

{r fig-3, fig.cap="FIGURE 3 - Types of tools that cores would like to implement in their operations." }

Time to Knit!

Check to see if caption for Figure 3 appeared

Globally & Automatically align & size plots/figures

We changed the sizing of our images in a previous episode. However, instead of changing the sizes one-by one, we can do it within our knitr::opts_chunk$set settings in our setup code chunk.

Automatically size figures & images

We can size with a number of inches, percentage of the original image size, or pixels (in some cases). The default for inches is 7in height and width for code-generated figures, plots & images (knitr) versus it’s 5in height by 7in width for markdown generated images (rmarkdown).

There is no default for perentages because the default size is inches. When using percentages it’s important to note that % refers to the percent of the HTML container. So images are sized in relation to their original size, but in relation to the html document size. For example, if the block of text on a webpage is 1000px wide then resizing at 20% width means the image will be 200px in width.

Option options for resizing

Two options fig. and out.. out. is more expansize than fig. both in how size can be defined and what kind of figures/plots/images it can be applied to.

fig.width along with fig.height, ONLY applies to R-generated figures defined within a code chunk. It changes the width of the images based on the specifications which can be defined in inches & percentage.
fig.height along with fig.width, ONLY applies to R-generated figures defined within a code chunk. It changes the height of the images based on the specifications which can be defined in inches & percentage.

out.width along with out.height, can be applied to any type of graphic defined within a code chunk. It changes the width of the images based on the specifications which can be defined in inches, percentage & number of pixels.
out.height along with out.width, can be applied to any type of graphic defined within a code chunk. It changes the height of the images based on the specifications which can be defined in inches, percentage & number of pixels.

We want to be inclusive of the images in our Rmd file so we will use out.width & out.height which offers more flexibility. Also, we previously sized the images to 50%, but that was a bit small, so let’s try 60% instead.

Add to (before or after) the other options, in the global code chunk:

knitr::opts_chunk$set(out.width="60%", out.height="60%", echo = FALSE, message = FALSE, warning = FALSE)

Time to Knit!

See how our images re-sized.

Oh no, Fig 2 isn’t aligning! Unfortunately, the settings ONLY apply to knitr-rendered code chunks, but our Figure 2 is still just a rmarkdown-styled image.

We can add the Fig 2 .jpg to a code chunk instead:

1) create a new code chunk and name the it fig-2 2) copy and paste the caption between quotes with the fig.cap="" option. 3) use include_graphics() function from the knitr package (alternatively can use syntax knitr::include_graphics()) if you haven't loaded the package. 4) Add the file path in quotes between the () ofinclude_graphics().

{r fig-2, eval = FALSE, echo = TRUE, fig.cap="FIGURE 2 - Lack of requests for rigor and reproducibility documentation by users of shared resources."} include_graphics("../figs/fig2_paper.jpg")

Time to Knit!

Check to see if all images re-sized

Automatically align code chunk outputs

To automatically & globally align code chunk outputs like we did with sizing, add to the setup code chunk:

fig.align = "center"

so the whole chunk will look like so:

knitr::opts_chunk$set(fig.align = "center", out.width="60%", out.height="60%", echo = FALSE, message = FALSE, warning = FALSE)

Time to Knit!

Check to see if all images and figures were aligned to the center

Cross-referencing (optional)

from bookdown.org:

Cross-referencing is a useful way of directing your readers through your document, and can be automatically done within R Markdown. While this has been explained in Chapter 2 from the bookdown book, we want to present a brief summary below.

To use cross-references, you will need:

A bookdown output format: Cross-referencing is not provided directly within the base rmarkdown package, but is provided as an extension in bookdown (Xie 2020c). We must therefore use an output format from bookdown (e.g., html_document2, pdf_document2, and word_document2, etc.) in the YAML output field.
A caption to your figure (or table): Figures without a caption will be included directly as images and will therefore not be a numbered figure.
A labeled code chunk: This provides the identifier for referencing the figure generated by the chunk.

After these conditions are met, we can make cross-references within the text using the syntax \@ref(type:label), where label is the chunk label and type is the environment being referenced (e.g. tab, fig, or eqn). An example is provided below:

We’ve already installed bookdown in our setup steps at the beginning of the workshop, but we need to make sure to load the bookdown library.

Add into your load-data-libraries code chunk:

#load bookdown for cross-referencing
library(bookdown)

We also need to change the output in the yaml to a bookdown specification so, change the output yaml in the yaml to the following(the rest stays the same.):

...
ouput:
  bookdown::html_document2:
...

Now to add cross referencing:

This is the syntax for a cross-reference link:

\@ref(type:label)

the \@ref will always remain the same. We will edit type and label

type options:

tab : for tables
fig : for figures
eqn : for equations

label is replaced with the chunk label of the code chunk figure/table/equation you want to link to.

Let’s try this out on our paper:

1) Find or search for the note: (See Table 1) in the Core Implementation section.

2) To add a cross reference link we will replace the “Table 1” text with:

\@ref(tab:table-1)

we will replace type with tab and label with table-1.

Time to Knit!

Check to see if the cross reference link for Table 1 appeared. Notice what happens when you click the link.

CHALLENGE 9.2: Add a Cross Reference to Figure 2

Search for or find in the text: “(See Figure 2)” in the “Strategies for Improving R&R sub section: Make this into a cross reference link.
SOLUTION
\@ref(fig:fig-2)

*Note adding crossing referencing re-names the figures, so should take the figure numbering out.

Time to Knit!

Check to see if the cross reference links appeared

Styling conclusions

There are a plethora of options for styling, from additional bookdown options, to things you can edit in the yaml (including adding themes to style text), and adding your own custom css style sheets. Explore more on your own!

Key Points

Use kable() to create tables from dataframes

add captions to figures and images generated by code chunks

Set global chunk settings to automatically size & align outputs

Use bookdown to enable cross referencing

Publishing a Report

Overview

Teaching: 20 min
Exercises: 10 min

Questions

How to change the output folder for your rendered document?

How to publish your report on RPubs?

How to use Github to publish your document as a webpage?

Objectives

Explore Different methods for publishing your paper

Changing the behavior of the Knit Button

In episode 2 we said we would show you later on how to change the behavior of the knit button so you could conform to best practices for organizing your R project. We would prefer to save the output to the results folder instead of the code folder because it’s best practice to save your outputs (whether knit rmd documents, the outputs of r scripts, or cleaned data) in a folder separate from your ‘raw’ code and data. That way for instance, you can avoid deleting or saving over your raw data and you can save different iterations of your outputs without worry.

R Studio provides options to set the output type and a few other things but does not let the user select a destination folder for knitr output. But users CAN however add code to the yaml which allows finer control of the behavior of the knit button. Once you’ve set directives in the header YAML section of your R-markdown file everytime you hit ‘knit’, R Studio follows your customized directives as it processes the code to render your output.

So, let’s copy and paste the the following to our yaml header (placing the code after the bibliography and/or csl section should work.):

...
...
knit: (function(rmdfile, ...) { rmarkdown::render(rmdfile, output_dir='../results') })
---

What’s going on in this YAML code we’re adding:

knit we are changing the behavior of the knit function by defining our own:

function(rmdfile, …) as stated above we are defining our own function by calling function() so we can re-design what happens when we hit ‘knit’. rmdfile is the main parameter required for our render function. This just means that the input file is ‘this’ file - the .rmd file we are kniting.

rmarkdown::render When you click the knit button R-Studio runs rmarkdown:render.
Typing ?rmarkdown::render in the console will bring up further information about the rmarkdown:render function. We will be using two arguments of rmarkdown::render.

input: The file we’re feeding the function. We already determined this is rmdfile or ‘this’ document.
output_dir: designates which folder we want to save the knit document to (relative path). In our case we know this is the results folder so we use .. to go back to our root directory (from the code folder). The “default” folder is the folder where your rmd file is saved, so any ‘directions’ you give by indicating a relative path will be from the folder where the rmd file is to the folder where you want to save the html rendered document.

Time to Knit!

Check if your html file was saved to the results folder.

Rendering R Markdown script

Once you have your .Rmd document refined and the Knit output looks good it is ready to publish. When using RStudio the simplest option is to publish to Rpubs.com. Rpubs is a bare-bones web hosting service created for the sole purpose of hosting RMarkdown documents. It provides no other services— no support for search, collaboration, persistent identifers, or versioning. But, it is the simplest and most direct way of getting your RMarkdown document published on the web directly from RStudio. For a more full-featured hosting service consider GitHub and GitHub Pages, described later.

Notice the “Publish” button in the upper right corner of your Knit output. Click this to publish to Rpubs. This is where you’ll need an Rpubs account as mentioned in setup for this workshop.

Click the publish button

and you’ll be presented with the following panels:

(The other option in the dialog box, RStudio Connect, is a standalone publishing platform for teams to share content. It requires purchase to host and use.)

The first time you publish, RStudio will likely ask if you want to install some needed packages; say yes. RStudio will then open up a web browser to allow you sign in to rpubs.com.

At the end of the publish process your paper will be live on the internet with a URL similar to: https://rpubs.com/yourname/678624

RStudio also saves an HTML version of your knit document to your local file system. Look for it in the results directory in the same directory as the R-markdown file in your R-Studio project directory.
This html document is self-contained and highly portable. Images are encoded directly into the HTML so you can easily move it to any web hosting you have available.

Publishing as website on GitHub*

Another, better, but slightly more involved option for publishing an RMarkdown document on the web is to use GitHub and GitHub Pages. It is out of the scope of this lesson to use GitHub, but briefly, GitHub is a widely-used version control and collaboration system. RStudio has built-in support for GitHub: in the upper right panel of your RStudio window, look for the Git tab, which allows you to sync your RMarkdown project with a remote repository stored on github.com. To enable publishing to GitHub Pages, go to the Settings page of your repository on GitHub and select a branch (“branch” is a repository term) to publish. Name your main RMarkdown file index.Rmd, and render it to HTML as index.html. With GitHub Pages enabled on your repository, the HTML file in your repository at https://github.com/myusername/myrepo/index.html will appear on the web as https://myusername.github.io/.

Authoring scientific publications with R Markdown

Why R Markdown?

Overview

Warm-up

The importance of Reproducibility in Research

Discussion: A scary anecdote

Why all the talk about reproducible research?

Factors behind irreproducible research

Reproducible, replicable, robust, generalizable

When do you need to worry about reproducibility?

Levels of Reproducibility

What is R Markdown and how it connects to reproducible research?

Some Real-world Applications

Key Points

Getting Started with R Markdown

Overview

Getting Around RStudio

Working in an R Project

A possible solution

CHALLENGE 2.1 - Opening a Project in RStudio

SOLUTION

Best practices for project organization

Treat data as read only

Data Cleaning

Treat generated output as disposable

Use Rmd files to combine code/analysis and narrative

Tip: Good Enough Practices for Scientific Computing

Optional Files to add to root directory:

Version Control

R Packages

CHALLENGE 2.2 - Installing Packages

SOLUTION

Starting a R Markdown File

R Markdown Workflow

1. YAML header:

What is YAML anyway?

2. Formatted text:

CHALLENGE 2.3 - Formatting with Symbols (optional)

SOLUTION

3. Code Chunks:

4. Rendering your Rmd document:

Try it yourself

CHALLENGE 2.4 - echo=TRUE Function (optional)

Solution

Starting our paper

Key Points

R Markdown Syntax: Headings & Lists

Overview

Intro to R Markdown Syntax

First Things First - Line Breaks

Creating Headings and Subheadings

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Tip: Add a Space!

Numbered Sections

Tip: More Heading Convention

Tip: Finding Content on RStudio

CHALLENGE 3.1 - Applying Headings and Subheadings

SOLUTION

Time to Knit!

Horizontal Lines

Tip: Leave Blank Line Before & After Horizontal Lines

Time to Knit!

CHALLENGE 3.2 - Adding Horizontal Lines (optional)

SOLUTION

Bulleted & Numbered Lists

Unordered Bullet Lists

Ordered Numbered Lists

Tip: No ) for Numbered Lists

CHALLENGE 3.3 - Inserting Bullet Points

SOLUTION

CHALLENGE 3.4 - Applying Numbered Lists

SOLUTION

Time to Knit!

Key Points

R Markdown Syntax: Hyperlinks, Images & Tables