Skip to content

Managing Sensitive Research Data at Duke

Definitions

Sensitive Data

Information, that if inadvertently released, could place the research participants and/or the institution at risk of harm. Failure to protect sensitive information could result in financial losses, reputational damage and legal repercussions for the institution.
Harms to participants could include participants’ relationships, status, employability, or insurability. Participants could face criminal or civil prosecution, and in some cases, physical harm. The assessment of risk must take into account the participants’ culture, age, life experience and any other relevant characteristics.

Individually Identifiable Data

This includes: - Direct identifiers – such as participants’ names, email addresses, or other information that explicitly identifies an individual. - Indirect identifiers – also known as demographic data, which are combinations of characteristics that could allow someone to deduce a participant’s identity.

Examples of Indirect Identifiers

  • Position, gender, and length of service in a named company
  • Age, major, ethnicity, and year in school

Data Classification and Protection Plans

  • Consult the Campus Institutional Review Board (IRB) about the classification of your data before/during the development of your research protocol.

  • See the Duke Data Classification Standards.

  • The online SecureIt tool was designed to help researchers identify approved Duke services they can use to collect, store, and analyze research data.

  • The Information Technology Security Office (ITSO) reviews research protocols that involve sensitive data or data governed by contractual agreements. ITSO will notify researchers and the IRB if any changes to the proposed data protection procedures are necessary.

  • ITSO can be contacted directly at security@duke.edu.

NOTE: The University has determined that any research studies that collect or use direct (e.g., names) or indirect (that is, demographic) data about Duke students may meet the "sensitive" data classification. If the data you intend to collect are considered “sensitive” they must be protected to mitigate risk.


Duke's Research Data Policy

Sensitive, individually identifiable data must be protected during all phases of a research project. The following are some best practices from the Campus IRB and ITSO to prevent an inadvertent breach of individually identifiable, sensitive data during the collection, transfer, storage, analyses, and report phases of your project.

Per Duke’s Research Data Policy, the Principal Investigator is responsible for project personnel interacting with sensitive data, including its appropriate management and protection over the course of a project.


Best Practices for Protecting Sensitive Data

Sensitive, individually identifiable data must be protected during all phases of a research project. Below are recommended best practices from the Campus IRB and Duke's IT Security Office (ITSO) to help prevent inadvertent breaches throughout the research lifecycle:

  • Data Collection
  • Data Transfer
  • Data Storage
  • Data Analysis
  • Reporting

Collection

  • Data collection using online services should be conducted using a secure platform, such as Qualtrics or Duke Box
  • If carrying out virtual interviews or focus groups (including recording), researchers must use their Duke-sponsored Zoom accounts.

If limited resources make it necessary to use pen and pencil to collect sensitive data in the field, paper documents should be identified using a unique ID number, not the participants’ names or any other direct identifiers.
The key linking direct identifiers to unique ID numbers may be taken into the field on an encrypted device.


Transfer

  • All sensitive, identifiable data collected in the field should be transferred as soon as possible to a secure environment (e.g., Protected Network for Research, Duke Box) at Duke.
  • Never send sensitive data as email attachments. Instead:
  • Upload to Duke Box
  • Use encrypted transfer protocols like SFTP.

Data Use Agreements (DUAs)

If your research involves storage or analysis of existing data and requires a DUA: - Append it to your IRB protocol
- IRB will route to: - Office of Research Support (ORS) — for contractual review - ITSO — for data security review

The Protected Research Data Support team in the Office of Research & Innovation (OR&I) can assist with DUA agreement questions or concerns.
Email to request a consultation: researchdatasupport@duke.edu

Some data providers require a DUA even for non-identifiable data — such data is considered sensitive.

Compliance is the responsibility of the Principal Investigator and Duke personnel involved.


Storage and Analyses

🔒 Protected Network for Research (PNR)

Duke’s most trusted environment for securely handling sensitive research data.

Using the Protected Network for Research (PNR) expedites the IRB/ITSO data security review process.

Benefits: - Free tier available for most studies - Web browser access on Duke-managed or personal devices - No VPN required - Easy install of analytics software: Stata, RStudio, SAS, NVivo - Integrates with: Microsoft OneDrive, Qualtrics, Zoom, Box - Box integration enables self-service export - Adheres toward NIST 800-171 Standard - Built-in controls for: - De-identification - Key storage

Other Options for Secure Storage and Analysis

NOTE: Removal of data from Duke Box for analysis should be temporary.
Avoid using Box Drive for data analysis; it stores files locally.
See Using Duke Box with Sensitive Research Data

If the research involves only qualitative writing—without the use of analysis tools (e.g., SAS, NVivo)—then Duke Box may suffice, and the PNR may not be necessary.


Transcription and Translation

When working with recorded interviews, focus groups, or multilingual research materials, researchers must use approved and secure AI solutions that comply with Duke’s data protection standards.

🔒 Secure AI-Powered Transcription Using Microsoft Word

Duke researchers can now take advantage of Microsoft’s AI-powered transcription engine—integrated directly into Microsoft Word Online—to convert recordings into text securely within Duke’s Microsoft 365 environment.

This built-in AI service uses advanced speech recognition models to create accurate, time-stamped transcripts while ensuring all processing and storage remain governed by Duke’s enterprise Microsoft agreement.
When accessed through your Duke NetID, the transcription process and resulting files are fully contained within Duke’s secure Microsoft 365 cloud, meeting the university’s data protection and privacy standards.

Key features: - AI-driven transcription built into Word for the Web - Runs within Duke’s Microsoft 365 enterprise environment — not a public AI service
- Accessible via Microsoft Word Online after signing in with your Duke NetID - Upload audio or video files for automated, high-quality transcription - Transcripts and source recordings are stored in your Duke OneDrive - Fully aligned with Duke’s Data Classification Standard and ITSO security guidelines

⚠️ Do not use personal or non-Duke Microsoft accounts (e.g., @outlook.com, @gmail.com) for transcription.
Consumer Microsoft accounts do not meet Duke’s security or data protection requirements for sensitive or identifiable data.

For official instructions on using this AI-powered feature, see:
👉 Microsoft Support: Transcribe Your Recordings

For research involving highly sensitive or contractually restricted data (e.g., HIPAA, FERPA, or controlled datasets), consult the Information Technology Security Office (ITSO) by emailing security@duke.edu or consider using the Protected Network for Research (PNR) for transcription and storage.


🌐 Secure AI-Powered Translation Using Duke ChatGPT or Microsoft Copilot

Duke researchers can securely translate text or transcripts within Duke’s approved AI platforms:

  • Microsoft Copilot (integrated into Word, Excel, and Edge)
  • Duke ChatGPT (Duke-licensed instance of OpenAI’s GPT-4 Turbo)

Both services are provisioned under Duke’s enterprise agreements, ensuring that no data is shared outside Duke’s protected environments. These tools can translate documents, transcripts, survey responses, or qualitative data between languages.

Key features: - Supports secure, AI-powered translation in multiple languages
- All data processed under Duke’s enterprise Microsoft or OpenAI agreements
- No data used for model training or shared with external services
- Accessible via your Duke NetID credentials

⚠️ Do not use public AI translators (e.g., Google Translate, DeepL free tier, ChatGPT via openai.com) for sensitive, identifiable, or restricted research data.

For access:

For translation of Sensitive (High) or Restricted data (e.g., HIPAA, FERPA, or export-controlled materials), contact security@duke.edu before proceeding to ensure compliance with Duke’s approved data handling environments.


Reporting

To reduce risk of re-identification, follow these practices: - Report aggregate data with large enough groups - Use general terms (e.g., income or age ranges) - Use pseudonyms instead of real names - Use broad roles (e.g., “tradesperson” vs. “carpenter”) - Use vague locations (e.g., “a midsize city in Western Africa”)


Use of Software/Applications with Sensitive Data

Email security@duke.edu with questions ASAP when planning for IRB submission!

The IT Security Office (ITSO) and Privacy Office assess software usage on a case-by-case basis:

NOTE: Inform the IRB if sensitive data is low risk (e.g., de-identified) — this can expedite review.

Assessment & Review Options

Vendor Risk Assessment

Used if software is not previously assessed
Takes ~1 month.
Researchers should consult the Duke Services and Data Classification Guide to determine whether a vendor or tool has already been assessed and approved.

Minimal-Risk Software Review

Used if: - The vendor will not undergo a full risk assessment - The data involved is not “contractually restricted” or classified as “high-risk sensitive”

This streamlined review is conducted jointly by Duke’s Privacy Office and the Information Technology Security Office (ITSO).
It emphasizes the importance of transparent data practices, particularly the use of informed consent related to how data will be collected, stored, and managed. This process is intended for low-risk use cases where traditional vendor review is not feasible but privacy considerations still apply.


Use of AI Tools with Sensitive Data

When working with sensitive or protected data, researchers must exercise caution when using AI tools such as chatbots, large language models, or other AI-powered services. AI-generated outputs may retain or expose elements of the input data, and many AI platforms store user prompts for future model training, creating risk for reidentification or data leakage.

To mitigate these risks, Duke recommends using only Duke-managed machines and services when interacting with AI platforms and working with sensitive data for research purposes. The Duke AI Suite and AI Dashboard provide secure, institutionally supported environments for AI use that align with university policies.

Duke’s Data Classification Standard helps researchers identify what constitutes sensitive data. To ensure proper handling, researchers should also consult the Duke Services and Data Classification Guide, which outlines which Duke-approved tools and services are appropriate based on the classification of the data.

Do not input sensitive or identifiable information into public or non-Duke AI tools.

When in doubt, consult with the Information Technology Security Office (security@duke.edu) before using AI tools for research involving protected data


Use of Personal Devices with Sensitive Data

Use of non-Duke-managed devices for analyzing sensitive data is not recommended and may be prohibited for contractually restricted data. Sensitive data should never be stored on personal devices.
Transfer collected/analyzed data to secure storage (e.g., Duke Box) ASAP.

If personal devices are used researchers must attest to use of the fundamental practices listed below:

The Fundamental Practices (Laptops & Tablets)

Follow these steps to ensure good cyber hygiene for personal devices:


The Expert Practices

After addressing the basics, further enhance your personal security:


Practices to Avoid

When working with research data, avoid these practices:

  • Forwarding your Duke email to a personal email account
  • Using a personal collaboration or storage suite (e.g., DropBox) instead of a Duke service (e.g., Duke Box, Duke OneDrive)
  • Using a personal account on a development platform (e.g., GitHub) instead of Duke’s approved development tools (e.g., Gitlab.oit.duke.edu)

Mobile Phones

  • Use a Passcode, FaceID or Fingerprint to restrict access
  • Turn on encryption
  • Regular operating system updates to iOS or Android OS
  • Use "Find my iPhone or Prey remote wipe software

Personal Device Security

For more details on how to properly secure personal devices used in research, please refer to the following guide:

Personal Device Security Guide (PDF)

This guide outlines essential best practices for endpoint security, including password management, encryption, software updates, and more.