2024-01-30
My goal for this workshop is to give everyone the tools to:
Ensuring that your workflow is transparent is important for:
Past/Current/Future You
ZULE Lab
Collaborators
Other grad students
Scientific Community
PUBLIC
Good file structure is important because it 1
Best practices include (but are not limited to) 1
There are some tasks that do not need to be “as reproducible” (e.g., fixing typos) - these can be done in OpenRefine.
In general if you are:
Combining data sources
Making decisions about the data itself (e.g., removing or adding data)
Performing calculations
Renaming things
Do this in R (you will be grateful later!)
tidyverse
packages (e.g. ggplot2)Principles:
Let’s set up a new project, using RProjects
Add input, output, script, and figure folders
(I recommend you have a place on your computer dedicated to this)
GitHub is a website-software that documents your progress on a project and allows you to do version control
If you save rough drafts of your writing as you go along - that is version control
Really useful for when you want to go back/change your mind/re-run a test/etc.
Facilitates peace of mind + reproducible science + collaboration/sharing
ZULE’s GitHub has lots of repositories (including examples) if you are looking for inspiration for folder organization, ReadMe documentation, metadata, etc.
GitHub tracks the changes you make to your repository on your computer
After making changes, you have to select, describe, and commit them
After committing, you push your changes to your remote repository
If you are collaborating on a project, where multiple people are contributing, make sure you pull from the remote repository before starting your work
Same button as push (ctrl + shift + P)
Archiving your project in the lab requires 4 things:
These things can be organized however you’d like, as long as they are easily understood by someone after you are gone.
Projects need to be added to the lab computer, under the D: drive, in the Lab_Alumni folder
Does not have a DOI, so does not point to a specific moment in time
Can be changed continuously
Not dedicated to longevity
Can import GitHub repository to a true data archive
Zenodo is a great option for archiving data
Easily links to GitHub repositories
Preserves file structures
Can be updated after reviews/changes with a new DOI
FREE
Other options include Dryad, figshare, and more topic-specific archives (e.g., GenBank)
As always, use what works for you
To connect and archive your code/data with Zenodo from GitHub, there are three main steps
(see an example workthrough here)
NOTE: you do not need to use Git to use Zenodo, you can also upload local files
This workshop - including examples & code can all be found here and formatted slides are here
Software Carpentry: R for Reproducible Scientific Analysis & Version Control with git
Data Carpentry: Data Analysis & Visualization in R for Ecologists & Data Organization in Spreadsheets for Ecologists
biost@ts: Version Control with Git and GitHub
Happy Git: happygitwithr
University of Bergen: Open Access to Research Data
Smart People I Know: Dr. Christie Bahlai’s Reproducible Quantitative Methods Course & Wildlife Ecology & Evolution Lab’s Guide by Alec Robitaille & Val Lucet’s Git Workshop
PLUS: check out our zup “stats” thread - lots of helpful resources! AND ASK YOUR LABMATES!!!!