Skip to content

Annoucements

DCC Winter Maintenance

Most researchers will not experience a downtime, only a temporary reduction in capacity

Maintenance window: January 16 – March 27, 2026

We will have rolling updates across the Duke Compute Cluster (DCC) throughout this period. Please plan for:

  • Reduced capacity: compute nodes will be drained in scheduled batches so we can patch, validate, and return them to service. Performance and queue availability may fluctuate while work is in progress.
  • Balanced lab impacts: for researcher or lab groups with dedicated nodes outside the common partition, we will distribute nodes across multiple batches so your total capacity is not reduced by more than 50% at any time (with the exception of labs with only a single physical host).
  • Direct communication: affected lab groups will receive advance notice with batch schedules, expected down-time, and any required follow-up actions.

Planned work:

  • Hardware and operating system patches (Current: AlmaLinux 9.6, Planned: AlmaLinux 9.7)
  • SLURM upgrade (Current: 24.11.5, Planned: 25.11.1)
  • CUDA / NVIDIA driver updates (current: 12.9, planned: 13.x)
  • Conversion from virtual hosts to bare metal See VMware retirement details

Estimated Impacted Nodes Per DCC Group

The table below summarizes the current planned impacts. Labs may appear in more than one date range, and node counts or timing may change as we rebalance the schedule.

Estimated Impacted Nodes Per DCC Group (click to expand)
DCC group Feb 20–27, 2026 (Complete) Feb 27–Mar 6, 2026 (Underway) Mar 6–13, 2026 (Planned) Mar 13–20, 2026 (Planned)
bartesaghilab2977
biostat10
caperlab1
carlsonlab1
cdsa1
chsi1
coganlab1
dhvi78
dhvi-md2
dhvi-md-cryo (CryoSPARC)1
dhvi-strucbio (CryoSPARC)286
dhvi-strucbio-relion23
dhvimdchsi2616
dhvimdcommon4058
dunsonlab1
econ4
engelhardlab1
fanglab1
fergusonlab2
goldberg1
gunschlab1
igvf21
katzlab1
lefkowitz (CryoSPARC)22
liulab2
mastatlab1
nsoeclimate1
ochoalab11
pcharbon1
peterchev1
pearsonlab1
physics1
plusds2
qcd1
rescomp77
schmidlab1
schmidler1
schultzlab1
tdunn21
viplab1
ultrasound2
valdivialab1
velmeshevlab1
volfovskylab1
wrapplab (CryoSPARC)1
yaolab1
youlab10
zjhuanglab1

What to Expect During Maintenance

  • If possible, we will work to make only a subset of lab-owned nodes unavailable during each maintenance cycle (week).
  • Running jobs on affected nodes may be automatically re-queued if they have not completed during the drain state.
  • Compute node names will change.
  • DCC compute nodes also have a /scratch volume that is local to the compute node. This can be used when highly performant storage is needed during a job, but data should be deleted at the completion of the job. Please note that any data stored on /scratch volumes will permanently be deleted during this maintenance.

CryoSPARC

  • Maintenance will run Friday, February 20, 2026, through Friday, March 13, 2026, and will occur on a rolling basis, with no individual VM down for the full three-week window.
  • Estimated downtime for each CryoSPARC server (VM) approximately 2 business days.
  • VMs noted for maintenance in the table above will be unavailable during the maintenance window.
  • We will back up the database from the existing CryoSPARC application server and securely copy it off the system.
  • When the maintenance is complete, we will reinstall CryoSPARC on the new VM and restore the backed-up database.

GPU Servers

  • Are currently presented as multiple separate virtual machines (one GPU per DCC node) will be migrated to a single bare-metal node that provides access to all GPUs on the server.
  • Underlying hardware remains the same, users will need to explicitly request GPUs when submitting jobs, rather than implicitly receiving an entire GPU VM.
  • To help users adjust to this change, Duke Research Computing will be offering a short workshop that walks through the updated GPU usage model and best practices: Intro to Efficient GPU Utilization on the DCC (dates are currently being scheduled).

If you or your lab have long-running jobs that cannot be re-queued, or require assistance with alternate capacity during these windows, please contact rescomputing@duke.edu as soon as possible so we can assist or adjust scheduling where feasible.

  • Thank you for your patience: We appreciate your partnership as we work to minimize impacts to ongoing research.

Research Computing VMware Retirement

As announced by Duke OIT (VMware Transition Announcement), OIT will be converting from VMware to Microsoft’s Hyper-V virtualization platform due to significant licensing cost changes with VMware.

Duke Research Computing has historically been a heavy user of VMware to support our research computational services. Planning and conversions are underway to transition our services away from VMware by April 2026.

For optimal performance, we are recommending that all high-performance computing use cases move to a bare-metal build. Hyper-V virtualization is the recommended solution for less computationally intensive workloads that may have higher uptime requirements.

Since Hyper-V is not considered a standard platform for virtualization of high-performance computing systems, we are recommending that all high-performance computing use cases move to a bare-metal build.

Service Transition Plan Estimated Number of Virtual Machines
Duke Compute Cluster and other OIT supported HPC Clusters Bare Metal 1400
Rapid Virtual Machines Hyper-V 200
Protected Network for Research Hyper-V 130
Hosted Researcher-Owned Servers Bare Metal 200

Migrations will begin in October 2025 and continue in scheduled batches through the Spring. Users will receive direct email notifications with details and timelines for their systems and any scheduled down-time.

In addition to the planned down-time, some faculty lab groups who own servers hosted by Duke Research Computing may have impact to their work-flows. This most frequently impacts GPU system owners who rely on VMWare to physically partition their servers into multiple nodes for ease of use by their research teams. Labs with this configuration will be contacted directly to discuss options for transition, including bare-metal and Hyper-V.

We are committed to working closely with faculty and research groups to ensure a smooth transition and to minimize impacts to ongoing research.

For questions, please contact OITResearchSupport@duke.edu.

Maintenance Outage Window: June 23-25, 2025

Maintenance Complete 6/25/25 2pm

All maintenance is now complete, note 2 important DCC changes (details below):

  • /work file system has been updated to a new, more performant platform
  • SLURM now has a max wall time limit of 30 days and a default wall time of 1 day

/work File System Upgrade for DCC

During the outage, we replaced the Isilon-based /work file system with a new, faster file system from VAST Data, which will provide improved performance for I/O-intensive workloads. The previous /work was moved to /work-old, where it will remain accessible until 9/17/2025 (consistent with our existing purge policy), giving users time to retrieve any necessary files. Note: note if you tested with /vwork, that is the new /work.

DCC Slurm Updates

As part of the maintenance window we have updated the SLURM max job wall time to 30 days as well as implemented a default wall time of 1 day. This change has been done to improve the efficiency and predictability of the cluster.

To specify your own wall time (longer or shorter), use:

sbatch --time=1-00:00:00 (job script)

(for 1 day), or

sbatch --time=30-00:00:00 (job script)

(for 30 days), or other time formats like 12:00:00 for 12 hours. Alternatively, you can add these directly to the job script, e.g.

#SBATCH –time=7-00:00:00

To see when your job might start (estimate), use: squeue –start (job id)

To see the performance details of completed jobs job, use: seff (job id) e.g. seff 12345

(replace 12345 with your actual job ID) You can get a list of job ids for recently completed jobs by running the command sacct -u (NetID) or sacct -u (NetID) -S0601 (all jobs from the beginning of June, etc.)

Hosted Systems Retirement Dates

We have projected the next round of researcher owned systems that will be retired in December of 2025 and in May of 2026. As always, we are doing our best to minimize retirements and only those systems that we will not be able to support due to hardware or software constraints are included. Please see your research toolkits group to see which systems will be retired.

Maintenance Outage Window Annoucement: June 23–25

From 12:01 AM on June 23 through end of day June 25, all OIT-provided research computing services will be unavailable. Affected services include:

  • All stand-alone research virtual machines (VMs) (usually research-*)
  • Research Toolkits VMs such as RAPID and SAFER (PN for Research)
  • All research VMs in protected enclaves (PN for Research, Protected Network)
  • PACE GPU VMs
  • Student GPU access through Container Manager
  • Duke Compute Cluster (DCC) and all associated resources
  • New: All research storage services (Globus, Data Attic, Datacommons, and Research Standard Storage)

As in prior maintenance windows, non-DCC services will be returned to service about half-way through the window.

/work File System Upgrade for DCC

During the outage, we will replace the current Isilon-based /work file system with a new, faster file system from VAST Data, which will provide improved performance for I/O-intensive workloads. Here’s what to expect:

  • The new VAST system is currently available for testing at /vwork.
  • During the maintenance window:
    • /vwork will be renamed to /work and become the new active workspace.
    • The existing /work will be moved to /old_work, where it will remain accessible for 75 days (consistent with our existing purge policy), giving users time to retrieve any necessary files.

We encourage you to test your workflows against /vwork in advance of the outage to ensure compatibility and performance expectations.

DCC Slurm Updates

We will be tuning several SLURM parameters to enhance the performance of the DCC. Most notably, we will begin enforcing stricter job wall times. More details will be provided after the maintenance, but users should expect to edit their job submission scripts.

Hosted Systems Retirement Dates

We have projected the next round of researcher owned systems that will be retired in December of 2025 and in May of 2026. As always, we are doing our best to minimize retirements and only those systems that we will not be able to support due to hardware or software constraints are included. Please see your research toolkits group to see which systems will be retired.

Completed: 1/6/25 to 1/8/25 Maintenance activities

The DCC was restored to service on 1/8/25 at 6 pm. DCC specific notes: - The SLURM GRES GPU type names were updated for all GPUs - more info

The following services were restored to service on 1/6/25 at 5 pm - All research virtual machines (VMs), including Research Toolkits RAPID VMs - Research computing resources (cluster and individual virtual machines) in the PN for Research (PNR), PRDN, and Protected Network - PACE GPU VMs (OIT provided CPU and GPU Resources) - FAST Storage Resources (/cwork, CephFS, and Duke Data Attic) - All Research Computing standard and data commons storage – you can now access DCC via Globus

1/6/25 to 1/10/25 Maintenance Outage

There will be a complete outage of OIT provided Research Computing services from 1/6/25 to 1/10/25 for routine maintenance and patching.

Impacted services include: - Duke Compute Cluster, Open OnDemand, and associated services - All research virtual machines (VMs), including Research Toolkits RAPID VMs - Research computing resources (cluster and individual virtual machines) in the PN for Research (PNR), PRDN, and Protected Network - PACE GPU VMs (OIT provided CPU and GPU Resources) - FAST Storage Resources (/cwork, CephFS, and Duke Data Attic)

Additional changes: - End of Support researcher purchased hardware retirements will be completed before the maintenance window. Impacted owners can view the list of impacted hardware in Research Toolkits - Datacommons storage archive volumes will no longer be mounted directly on DCC compute nodes, this is to help optimize our storage platform. Impacted users will receive a more detailed set of information.

A more detailed schedule of impacts and reminder will be sent in December. Email Oitresearchsupport@duke.edu with questions.