Prompt:
In May 2015 Science retracted - without consent of the lead author - a paper on how canvassers can sway people’s opinions about gay marriage, see also: http://www.sciencemag.org/news/2015/05/science-retracts-gay-marriage-paper-without-agreement-lead-author-lacour The Science Editor-in-Chief cited as reasons for the retraction that the original survey data was not made available for independent reproduction of results, that survey incentives were misrepresented and that statements made about sponsorships turned out to be incorrect.
The investigation resulting in the retraction was triggered by two Berkeley grad students who attempted to replicate the study and discovered that the data must have been faked.
FiveThirtyEight has published an article with more details on the two Berkeley students’ work.
Malicious changes to the data such as in the LaCour case are hard to prevent, but more rigorous checks should be built into the scientific publishing system. All too often papers have to be retracted for unintended reasons. Retraction Watch is a data base that keeps track of retracted papers (see the related Science magazine publication).
Read the paper Ten Simple Rules for Reproducible Computational Research by Sandve et al.
Write a blog post addressing the questions:
- Pick one of the papers from Retraction Watch that were retracted because of errors in the paper (you might want to pick a paper from the set of featured papers, because there are usually more details available). Describe what went wrong. Would any of the rules by Sandve et al. have helped in this situation?
The article The Antidiabetic Metformin as an Adjunct to Antidepressants in Patients with Major Depressive Disorder: A Proof-of-Concept, Randomized, Double-Blind, Placebo-Controlled Trial has been distracted. Data in this article have oddities, like uncommon exactly differed by one of the numbers of patients experienced adverse event. The data is also suspicious in the way of glaring similarity to the numbers in a clinic trial of another add-on drug published by the same author.
I think the first two basic rules from Sandve et al might help.
Rule 1: For every result, keep track of how it was produced. and Rule 2: Avoid manual data manipulation steps.
For the author of the retracted article, being honest about the source of the data is the key of avoiding such situation. Even though the data looked like highly impossible and coincidental, it still validate the experiment if it’s not fallacy in how it’s produced. If the real data is not idea, stay away from manipulation and reporting what they have.
- After reading the paper by Sandve et al. describe which rule you are most likely to follow and why, and which rule you find the hardest to follow and will likely not (be able to) follow in your future projects.
Rule 1 and 2 are the most likely to follow rules for me, because they are the foundations of data analysis. I would also follow rule 4. I find tracking version by Git is easy and helps reducing time if I need to look back to whatever I did previously. Rule 7 and 10 are also important: including raw data with plots and open source to public, getting ready to be inspected are important for result reproduction.
I do find rule 5 hard to follow. Recording all the intermediate results and possible in standardized formats are time costly and storage costly, especially there are times finding intermediate results are flaw and deficient. Rule 6 is hard to follow in terms of habit. In a consistent analysis, it’s a normal and necessary to set random seed to keep track of the results and to make sure they are done in the correct way. So it is hard to remember all the steps with randomness, but this is a good reminder. Rule 9 is hard to follow because it’s an advance rule of rule 5.If I can follow rule 5 strictly, I would like to take some notes of textual interpretation, but details would be too time consuming.
Submission
Push your changes to your repository.
You are ready to call it good, once all your github actions pass without an error. You can check on that by selecting ‘Actions’ on the menu and ensure that the last item has a green checkmark. The action for this repository checks the yaml of your contribution for the existence of the author name, a title, date and categories. Don’t forget the space after the colon! Once the action passes, the badge along the top will also change its color accordingly. As of right now, the status for the YAML front matter is:
---
author: "Your Name"
title: "Specify your title"
date: "2023-02-23"
categories: "Ethics and Reproducibility..."
---