Last updated: 2021-09-24

Checks: 7 0

Knit directory: rr_tools/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20201021) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 1c49e10. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  README.html
    Untracked:  figure/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Dynamic Reports

Dynamic reports mix text and code. The key feature of a dynamic report is that when the document is “rendered” all of the code is executed in order from a clean slate. So the rendering of the document is a testament to the reproducibility of its contents.

Dynamic reports can be used for entire analyses or just for summarizing a larger analysis that had to be run on a compute cluster. They are a convenient way to insert plots and summaries into your text, save you the headache of accurately copying numbers over and mean that you always know the version of the analysis or data that goes with your report.

I have used two methods to created dynamic reports, both using R. The first is R Markdown (see R Markdown Cookbook) which using markdown syntax which is simple and intuitive. The second is Sweave (intro here) which integrates R code with LaTex syntax giving you access to a wider array of formatting options. In the past several years I have preferred R Markdown because it is faster to write and easier to render into multiple different formats (it is easy to change from html to pdf or vice versa). Both Sweave and R Markdown are powered by knitr so code display options are the same between the two.

For Python there is Juptyr Notebook/Jupityr Lab which I believe has very similar functionality to R Markdown.

When to Use Dynamic Reports

You will probably use dynamic reports most often when you are summarizing your analysis for yourself or for someone you want to share it with. It is possible to make dynamic reports look more or less “professional” by controlling what output to show. For your adviser, you might let all the code show while, you might hide most of the code for someone that won’t find it as interesting.

I have used R Markdown and Sweave for “final” reports that I sent to collaborators (I think I also used one for my applied exam in graduate school) but I don’t use them for manuscripts that go to journals. This is because journals usually either want the source LaTex code to compile all on its own without running anything extra or they want a Word document.

One tool that is useful for either using in a dynamic report or for transferring a table from R to LaTex is xtable R package which allows you to print R tables out as LaTex formatted code.

R Markdown

This whole website is actually made out of dynamic reports that are written in R Markdown and then stitched into a website using workflowr (more below). So far we haven’t actually used any R code. Here is a bit of code to give a flavor of what that looks like.

Here is a plot of the famous iris data set which is one of R’s built in data sets.

library(ggplot2)

data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
ggplot(iris) + geom_point(aes(x=Sepal.Length, y = Petal.Length, color=Species))

Version Author Date
1e8ece5 jean997 2020-10-23
b0d3bd7 jean997 2020-10-21

By default, R Markdown will display the code followed by the output but if we wanted a very clean look, we could suppress showing the code. There are tons of code chunk options that control how code and results are displayed. Check here for a full list. These will also work with Sweave.

It is also possible to use R code in line. For example the text

The Iris data have `r nrow(iris)` observations.

renders as

The Iris data have 150 observations.

To create an R Markdown document using RStudio, just choose the R Markdown option under File -> New File. In RStudio you can then view your document by clicking the “Knit” button. If you want to use a different development environment or for some reason can’t use RStudio, simply write you R Markdown document like any other text file and then render it using rmarkdown::render().

A good command to know about is knitr::purl which will extract only the code bits from your R Markdown or Sweave file and write them to their own R script.

workflowr

workflowr, developed by John Blischak is one of my favorite reproducible research tools! workflowr helps you put many R Markdown files describing analyses together into a research website.

This is a great way to aggregate all of the analyses you’ve done for a project in one place so it is easy to see and share your work. It is also useful for making a companion site for your paper which may ultimately just be a cleaned up version of the research site you’ve been using throughout the project.

Here is an example of paper companion site built using workflowr.

Find more examples here.

Envirionment

One piece of computing research that can be very challenging to track is your coding environment. You might be able to know exactly what version of your own code you used to run something but how do you know what versions of other software you used? There are a couple options that start to address this issue:

  1. sessionInfo(). The easiest thing you can do is just make a note of what R packages you used at the time of your analysis. If you record your analysis in R Markdown, add a sessionInfo() command to the end to make a note of all the versions of loaded packages. This doesn’t help you with software outside of R but it can go a long way for chasing down discrepancies. If you use workflowr, this is added automatically.

  2. Conda is a package management system (I like Miniconda) that lets you easily switch between or distribute software environments. For example, suppose I want to make sure that users of my code have specific versions of R and python. I can build a conda environment to distribute with my code. The user can then switch to that environment by activating it, preventing issues like version conflicts.

  3. Docker lets you package your entire software environment into a container. I have never used Docker so I will leave it at that!


sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.5   workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       highr_0.9        pillar_1.6.2     compiler_4.1.1  
 [5] later_1.3.0      jquerylib_0.1.4  git2r_0.28.0     tools_4.1.1     
 [9] digest_0.6.27    evaluate_0.14    lifecycle_1.0.0  tibble_3.1.4    
[13] gtable_0.3.0     pkgconfig_2.0.3  rlang_0.4.11     DBI_1.1.1       
[17] yaml_2.2.1       xfun_0.26        fastmap_1.1.0    withr_2.4.2     
[21] stringr_1.4.0    dplyr_1.0.7      knitr_1.34       generics_0.1.0  
[25] fs_1.5.0         vctrs_0.3.8      tidyselect_1.1.1 rprojroot_2.0.2 
[29] grid_4.1.1       glue_1.4.2       R6_2.5.1         fansi_0.5.0     
[33] rmarkdown_2.11   whisker_0.4      farver_2.1.0     purrr_0.3.4     
[37] magrittr_2.0.1   scales_1.1.1     promises_1.2.0.1 ellipsis_0.3.2  
[41] htmltools_0.5.2  assertthat_0.2.1 colorspace_2.0-2 httpuv_1.6.3    
[45] labeling_0.4.2   utf8_1.2.2       stringi_1.7.4    munsell_0.5.0   
[49] crayon_1.4.1