Using Github

Introduction to Git/Github

When you engage in any kind of data science or programming, there comes a (frustrating) point that you need to understand how Git and GitHub work. Learning how to use Git and GitHub is especially important for keeping versions of your work (think something like Dropbox + MS Word’s Track Changes) and collaborating with others.

Git is essentially a boring time machine. Remember when you worked on a Word file and saved it by adding the date, or calling it “mywork-vesion1”, “mywork-final”, “mywork-final-final”, etc?

Git is organised around repositories; repos are folders where you keep a project with all necessary files (code, data, images, etc). So you first need to tell git which files/folders to keep track of for any changes you will be making.

As you keep adding code to your project/assignment/etc, you commit changes into your repository and you add an explanatory comment, or message to yourself briefly describing the changes/additions/new work you have done.

When you commit changes, it’s as though you take a snapshot of your work and write a short comment to yourself; it would be the same as saving your Word document adding today’s date in the filename, or v1, v2, final, final-final, etc.

After committing your changes, you need to pull first, so you get the latesr copy from git and then push them to git– this is when you actually upload changes, etc.

Git workflow

The following lists the main steps to create a repository and keep it updated

  1. Create a repo on GitHub and initialize with a README.
  2. Clone the repo to your local machine. You can either do it as an RStudio Project, or using a shell command: $ git clone REPOSITORY-URL
  3. Add or Stage any changes you make: $ git add -A
  4. Commit your changes: $ git commit -m "Helpful message to yourself/collaborators"
  5. Pull from GitHub: $ git pull
  6. Push your changes to GitHub: $ git push

Repeat steps 3—7, but especially steps 3-4, often.

Git keeps track of all the changes you have made in your repo, just in case you made a mistake and need to go back to an earlier version where things actually worked. GitHub is a website built on top of Git that allows you to collaborate on code with others, in helping with code fixes, documentation, and more.

Further resources

For R users, Jenny Bryan et al have created Happy Git with R, a brilliant resource that shows you how to use Git and GitHub in RStudio effectively.

One final thing: git can be confusing and frustrating as hell (ask me for details)– add git to the challenges of coding and you sometimes end up with people asking themselves interesting questions.

When things do go wrong (they will), have a look at https://ohshitgit.com/ and http://happygitwithr.com/burn.html