Skip to content

Git Best Practices for Researchers

This guide is meant to help Duke researchers choose a Git hosting platform and avoid the most common security mistakes. It is practical guidance for research teams, not a formal compliance checklist. Your data classification and any IRB, contract, or sponsor requirements still apply.

Quick Recommendation

Use this table as a quick starting point when deciding where your lab should host code, notebooks, and project documentation.

If your lab needs... Start with...
A Duke-managed service, Duke NetID login, Duke-internal collaboration, or a home for unpublished work Duke GitLab
Public or open-source code, broad external collaboration, or visibility in the wider software and research community GitHub

Duke GitLab

Start with Duke GitLab if your work is:

  • Unpublished, preliminary, or proprietary
  • Primarily shared with Duke researchers, staff, or students
  • Better managed with Duke NetID based access
  • Better suited to repositories hosted on premises at Duke
  • Intended to stay within a Duke-managed collaboration environment

Duke GitLab uses Duke authentication and is often the best default for internal research software collaboration.

Large files usually don't belong in repos

Do not upload large files unless they truly belong in source control, they can create storage and performance issues

GitHub

Use GitHub if your work is:

  • Intended to be public or open source
  • Built with collaborators outside Duke
  • Meant to be easy to discover, cite, reuse, or contribute to
  • Part of a software project that benefits from visibility in the broader research and developer community

Private does not mean inside Duke

A private GitHub repository is still hosted by GitHub's cloud service. "Private" does not mean "inside Duke"

What Should Never Go in Any Git Repo

Never commit sensitive data or secrets

Do not place the following in Duke GitLab, GitHub, or any other Git repository:

  • Sensitive or regulated research data
  • Participant identifiers or identifiable data exports
  • Passwords, API keys, access tokens, SSH private keys, or other secrets
  • .env files or configuration files that contain real credentials
  • Files whose data classification is unknown
  • Sensitive content embedded in code comments, commit messages, or file names

If you are not sure whether something belongs in a repository, check the Duke Data Classification Standard or contact security@duke.edu.

Version control platforms are best used for code, documentation, small example datasets that are clearly safe to share, and reproducible workflows. They are not a storage location for sensitive research data or secrets.

Security Practices That Apply Whether You Use Duke GitLab or GitHub

Accounts and Collaboration

  • Repository and group owners are responsible for making sure projects are set to private when the work is not intended to be public
  • Turn on MFA for GitHub (on by default for Duke GitLab)
  • Use Duke's approved password management tools instead of saving passwords in plain text
  • Give collaborators the least privilege they need to do their work
  • Review collaborator access when students, staff, or external partners leave a project

Secrets and Local Copies

  • Initialize the Git repository in a different folder from where research data is stored when that data is not intended to be public
  • Store tokens, SSH keys, and passwords/credentials in a password manager or other approved secret storage, never in the repository; if a secret is committed by accident, remove it from active use immediately, revoke or rotate it, and follow Duke's Secrets Management guidance
  • Use .gitignore to keep local data files, credentials, generated outputs, and other non-public materials out of version control.
  • Keep local clones on devices that are appropriate for your project's data classification
  • Treat laptops and desktops that sync repositories as part of your research data security plan
  • If you use CI/CD, review pipeline, job, artifact, and related visibility settings and make them private or otherwise restricted as appropriate for the project

When to Ask for Help Before Deciding

Pause before using Duke GitLab or GitHub for work involving:

  • Sensitive, high-risk, or regulated research data
  • Contractually restricted or sponsor-restricted information
  • HIPAA, FERPA, export-controlled, or similar protected data
  • Work that may require a formal security review
  • Non-Duke tools for projects that have Duke compliance or security obligations

Need Help Deciding?

If you are choosing between Duke GitLab and GitHub for a lab, center, or study team:

  • Contact security@duke.edu if the project may involve sensitive, restricted, or regulated data
  • Contact rescomputing@duke.edu if you want help choosing a workflow for code, notebooks, and collaboration