class: center, middle, inverse, title-slide .title[ # Stat 585 - R packages: Adding Data ] .author[ ### Heike Hofmann ] --- ## Outline - data in packages - code examples in the help file - checking a package <br> <br> <br> Resource: - [usethis](https://usethis.r-lib.org/) by Hadley Wickham and Jenny Bryan - [R packages (2ed)](https://r-pkgs.org/) by Hadley Wickham --- ## Package `usethis` and `R packages` book R package development is not generally linear, we generally need a lot of details on specific things: `usethis` is a package to help with organizing the workflow in packages as well as other non-package projects [README](https://usethis.r-lib.org/) gives an overview of the functionality of `usethis`. [R packages (2ed)](https://r-pkgs.org/) by Hadley Wickham is giving a general overview of the workflow with details --- ## The your turns for today Are going to be commits to a joint github repo ... Follow the github classroom invite - there should be one team: 'all-for-one-and-one-for-all' ![](images/musketeeRs.png) Fighting the good fight one merge conflict at a time --- class: inverse ## Your turn Find the github classroom invite Clone the repo to your machine Run the code from the first code chunk to install all packages --- ## Data in a package Assume we have the data frame `mydata` (you might consider using a better name :)). - Data frames must be stored in the `data` folder. Use the command `save` and make sure to set the path to write into the folder `data`. Then you need to start the documentation ... - Data the `usethis` way: `use_data(mydata)`. This will use the information of the RStudio project to save things into the proper path and give you a reminder what to do next. ``` use_data function (..., internal = FALSE, overwrite = FALSE, compress = "bzip2") ``` `use_data` places a data object into the `data` folder. internal data (used by the package not by the users of a package) is placed in the file `sysdata.rda` --- ## Data provenance https://r-pkgs.org/data.html Keep the data set's origin story! Origin story has at least two parts: original data and the script that converts the raw data to the R package data: - script: has extension `.R` and goes into a folder called `data-raw` (not the `R` folder!!) - in folder: inst/extdata include (some) of the raw data (csv, pdf, ...) The folder `inst` is special in an R package: `inst` is a folder that is being unloaded into the R library during the `inst`allation process. --- class: inverse ## Your Turn Aim for one commit/push to the github repo that makes one of the packags a bit better. Do all the packages have an `ames_presslog` data set? Is there a help available for the data? Where would you go to find the origin story of the data? Where are the elements of the `inst` folder stored on your machine? ... read through some of https://r-pkgs.org/data.html#sec-data-extdata --- ## Data documentation - Documenting data in packages: https://r-pkgs.org/data.html ``` #' Prices of 50,000 round cut diamonds. #' #' A dataset containing the prices and other attributes of almost 54,000 #' diamonds. #' #' @format A data frame with 53940 rows and 10 variables: #' \describe{ #' \item{price}{price, in US dollars} #' \item{carat}{weight of the diamond, in carats} #' ... #' } #' @source \url{http://www.diamondse.info/} "diamonds" ``` General convention is to have a `data.r` file in the folder `R` that consists of data documentation. --- # Examples in packages - The Roxygen tag `@examples` allows you to write code as an example in the help file. - Always make use of this feature! It is not only good practice, it is also a first step in checking that your package is running properly - for new packages CRAN expects an example for every function - Your example has to be **stand alone**, i.e. if you are using some data, it needs to be included in the package. ``` #' @examples #' # you can include comments #' x <- 1 #' 5*x #' hello("this is a test") ``` --- # Checking packages - The Comprehensive R Archive Network (CRAN) is the place for publishing packages for general use. - Once a package is on CRAN it can be installed using the command `install.packages("<package name>")` - CRAN has installed certain rules that all packages must comply before being uploaded `devtools::check()` or `Ctrl/Cmd + Shift + E` runs these tests on your local package Similar to `check()` it looks for the package to check in the working directory --- # Three levels of faults - **`ERROR`**: severe problem that has to be fixed beofr esubmitting to CRAN - you should fix it anyway - **`WARNING`**: also has to be fixed before going onto CRAN, but not as severe. - **`NOTE`**: mild problem. You should try to get rid of all notes before submitting to CRAN, but sometimes there is a specific reason that does not allow you to fix all notes. In that case, write a careful description why this note happens in the submission comments. --- # Checking cycle ``` R CMD check results 0 errors | 0 warnings | 0 notes ``` This is what you want to see, but not a likely outcome of your first run in a real package Checking workflow: 1. Run `devtools::check()`, or press `Ctrl/Cmd + Shift + E`. 2. Fix the first problem. 3. Repeat until there are no more problems. --- class: inverse # Your turn `git pull` In what state are the current packages? Run a check to see Fix one of the problems Run the check again - is that problem fixed? No? `git pull`, and try again Yes? `git commit` `git pull` `git push` --- # Likely Problems - missing documentation for functions and data ``` checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'hello': hello Code: function(string) Docs: function() Argument names in code not in docs: string ``` - missing dependencies to other packages ``` checking dependencies in R code ... WARNING '::' or ':::' import not declared from: ‘stringr’ ``` - making assumptions about data structures ``` checking R code for possible problems ... NOTE hello: no visible binding for global variable ‘string’ ``` --- class: inverse # Your turn - Round (2) `git pull` In what state are the current packages? Run a check to see Fix one of the problems Run the check again - is that problem fixed? No? `git pull`, and try again Yes? `git commit` `git pull` `git push` --- # Continuous Integration - one of the problems of ensuring that a package is sound, is to actually *remember* to run the tests - [Github actions](https://github.com/features/actions) allows us to create an automatized workflow (written in yaml): e.g. every time you push a change, the package is being built and checked for CRAN compliance - github actions can be adjusted to specific use cases - for a start use [example actions](https://github.com/r-lib/actions) :) - the command `usethis::use_github_actions()` sets up a folder .github > workflows with the file `R-CMD-check.yaml`. (This will only work out of the box when there is one package in the repo, and the package is not in a subfolder) --- class: inverse # Your turn - Round (3) `git pull` In what state are the current packages? Run a check to see Fix one of the problems Run the check again - is that problem fixed? No? `git pull`, and try again Yes? `git commit` `git pull` `git push` --- # Package website The package `pkgdown` creates web pages based on all the files in a package `pkgdown` supports making websites for your package (see e.g. [x3ptools](https://heike.github.io/x3ptools/), [ggplot2](https://ggplot2.tidyverse.org/), ... and soon yours) Process: ``` pkgdown::build_site() ``` Assuming your package has a github repo, add the `docs` folder to the repo In `Settings` for your repo, switch GitHub Pages to the docs folder (see screenshot) --- ![](images/github-docs.png)