Stat 585 - Ethically Managing Data

Frontmatter check

Prompt:

In May 2015 Science retracted - without consent of the lead author - a paper on how canvassers can sway people’s opinions about gay marriage, see also: http://www.sciencemag.org/news/2015/05/science-retracts-gay-marriage-paper-without-agreement-lead-author-lacour The Science Editor-in-Chief cited as reasons for the retraction that the original survey data was not made available for independent reproduction of results, that survey incentives were misrepresented and that statements made about sponsorships turned out to be incorrect.
The investigation resulting in the retraction was triggered by two Berkeley grad students who attempted to replicate the study and discovered that the data must have been faked.

FiveThirtyEight has published an article with more details on the two Berkeley students’ work.

Malicious changes to the data such as in the LaCour case are hard to prevent, but more rigorous checks should be built into the scientific publishing system. All too often papers have to be retracted for unintended reasons. Retraction Watch is a data base that keeps track of retracted papers (see the related Science magazine publication).

Read the paper Ten Simple Rules for Reproducible Computational Research by Sandve et al.

Write a blog post addressing the questions:

Pick one of the papers from Retraction Watch that were retracted because of errors in the paper (you might want to pick a paper from the set of featured papers, because there are usually more details available). Describe what went wrong. Would any of the rules by Sandve et al. have helped in this situation?

I chose the paper “Synthetic lethality of combined glutaminase and Hsp90 inhibition in mTORC1-driven tumor cells.” There were errors in several figures resulting from data duplication. It appears as if the authors re-used several images across different figures. The Retraction Watch post also mentioned that this was not the only time these authors had issues with data leading to retractions or corrections. Each time the authors responded that these errors “do not compromise the conclusions in the paper.” Rule 9 from Sandve et al.’s paper would have helped in this situation. Rule 9, connect textual statements to underlying results, would have helped the authors keep track of which images go with which figures.

After reading the paper by Sandve et al. describe which rule you are most likely to follow and why, and which rule you find the hardest to follow and will likely not (be able to) follow in your future projects.

I am most likely to follow rule 5: record all intermediate results, when possible. I frequently need to break down an analysis pipeline in order to understand it. As such, I already tend to save intermediate “progress” steps whether it’s an image or a .rds file. If I don’t need to save a step then I can comment out the line in the code that saves the data. I find the hardest rule to follow is rule 3: archive the exact versions of all external programs used. While this can be easily done with R containers, this will not be helpful for others to be able to replicate later since they may not have access to the R container. In addition, package updates may change the data analysis process. For example, functions will be replaced, or new functions will add new avenues of analysis. As such, the code should be flexible enough to handle small changes in package versions before being updated after an extended amount of time as passed.

Submission

Push your changes to your repository.
You are ready to call it good, once all your github actions pass without an error. You can check on that by selecting ‘Actions’ on the menu and ensure that the last item has a green checkmark. The action for this repository checks the yaml of your contribution for the existence of the author name, a title, date and categories. Don’t forget the space after the colon! Once the action passes, the badge along the top will also change its color accordingly. As of right now, the status for the YAML front matter is:

Frontmatter check

---
author: "Your Name"
title: "Specify your title"
date: "2023-02-23"
categories: "Ethics and Reproducibility..."
---