Happy Git with R

Happy Git with R
Author

Kelly Nascimento Thompson

Published

February 2, 2023

Prompt:

git and Github are tools for helping with versioning of files in collaborative efforts as well as archiving entries for your future self. Unfortunately working with git isn’t always completely straightforward. Jenny Bryan’s book “Happy git and github with R” helps with that. The book is available from http://happygitwithr.com/. Have a look over the index and pick one of the chapters for a more in-depth read.

Write a blog post answering the following questions:

  1. Write a short (100-150 words) summary of the chapter you read in-depth.

Chapter 20 (“Repo, commit, diff, tag”) Git is a version control framework aiming to help developers who collaborate on big software projects. Repo or repositories are set of files that Git manages as they evolve. For new or existing projects, RStudio users are recommended to dedicate a local directory for it, create an RStudio project, and create a Git repository. By doing so, these applications will leave notes for themselves in hidden files or directories. When saving files, a commit is made, taking a snapshot of all files in the project instead of saving individual files. Diff is the set of differences between two file commits. One common practice in Git projects is writing a commit message each time a commit is made. These messages help colleagues understand recent changes made in the project. A tag is a name that represents the version of the project or references the last commit made.

  1. Looking back at all of the team projects you have been involved in, describe the biggest mishap you had. Could that have been avoided using git? How?.

My research involves modeling soil erosion at the STRIPS sites (https://www.nrem.iastate.edu/research/STRIPS/). I get rainfall, runoff, and sediment field data for ground truth validation purposes from another research group. These daily measurements from 2016 to 2022 are saved in Excel files created from R scripts. The data includes column names such as “treatment, site, date, rainfall, and runoff.” When I filtered the data, I noticed the rainfall data was included under the treatment column, and they are not supposed to be there. I contacted the collaborator who handled the data and asked the reasoning behind it; he explained it was for graphing purposes, saying that he would add an updated file to the Cybox folder removing the rainfall data from the treatment column. Git could avoid this issue because we would have access to the scripts that generate such files, and I would not have to go to Cybox to download a new version of the field data. Thus, it is important to work in a version control system in a case like this, where files are updated every year.

  1. Give an example of one new git feature that you learned about from Jenny Bryan’s book..

I followed the steps on Chapter 11 - “Connect to GitHub”. I created a new repository on github.com and cloned the repo to my local computer using the Git Bash shell. Following the book instructions I was able to change my working directory, list its files, and display the README file. I also installed GitHub Desktop and cloned repos from github.com. I tested by cloning Blog 1 to my computer.