Announcements
Research Computing VMware Retirement
As announced by Duke OIT (VMware Transition Announcement), OIT will be converting from VMware to Microsoft’s Hyper-V virtualization platform due to significant licensing cost changes with VMware.
Duke Research Computing has historically been a heavy user of VMware to support our research computational services. Planning and conversions are underway to transition our services away from VMware by April 2026.
For optimal performance, we are recommending that all high-performance computing use cases move to a bare-metal build. Hyper-V virtualization is the recommended solution for less computationally intensive workloads that may have higher uptime requirements.
Since Hyper-V is not considered a standard platform for virtualization of high-performance computing systems, we are recommending that all high-performance computing use cases move to a bare-metal build.
| Service | Transition Plan | Estimated Number of Virtual Machines |
|---|---|---|
| Duke Compute Cluster and other OIT supported HPC Clusters | Bare Metal | 1400 |
| Rapid Virtual Machines | Hyper-V | 200 |
| Protected Network for Research | Hyper-V | 130 |
| Hosted Researcher-Owned Servers | Bare Metal | 200 |
Migrations will begin in October 2025 and continue in scheduled batches through the Spring. Users will receive direct email notifications with details and timelines for their systems and any scheduled down-time.
In addition to the planned down-time, some faculty lab groups who own servers hosted by Duke Research Computing may have impact to their work-flows. This most frequently impacts GPU system owners who rely on VMWare to physically partition their servers into multiple nodes for ease of use by their research teams. Labs with this configuration will be contacted directly to discuss options for transition, including bare-metal and Hyper-V.
We are committed to working closely with faculty and research groups to ensure a smooth transition and to minimize impacts to ongoing research.
For questions, please contact OITResearchSupport@duke.edu.
Maintenance Outage Window: June 23-25, 2025
Maintenance Complete 6/25/25 2pm
All maintenance is now complete, note 2 important DCC changes (details below):
- /work file system has been updated to a new, more performant platform
- SLURM now has a max wall time limit of 30 days and a default wall time of 1 day
/work File System Upgrade for DCC
During the outage, we replaced the Isilon-based /work file system with a new, faster file system from VAST Data, which will provide improved performance for I/O-intensive workloads. The previous /work was moved to /work-old, where it will remain accessible until 9/17/2025 (consistent with our existing purge policy), giving users time to retrieve any necessary files. Note: note if you tested with /vwork, that is the new /work.
DCC Slurm Updates
As part of the maintenance window we have updated the SLURM max job wall time to 30 days as well as implemented a default wall time of 1 day. This change has been done to improve the efficiency and predictability of the cluster.
To specify your own wall time (longer or shorter), use:
sbatch --time=1-00:00:00 (job script)
(for 1 day), or
sbatch --time=30-00:00:00 (job script)
(for 30 days), or other time formats like 12:00:00 for 12 hours. Alternatively, you can add these directly to the job script, e.g.
#SBATCH –time=7-00:00:00
To see when your job might start (estimate), use:
squeue –start (job id)
To see the performance details of completed jobs job, use:
seff (job id) e.g. seff 12345
(replace 12345 with your actual job ID) You can get a list of job ids for recently completed jobs by running the command
sacct -u (NetID)
or
sacct -u (NetID) -S0601
(all jobs from the beginning of June, etc.)
Hosted Systems Retirement Dates
We have projected the next round of researcher owned systems that will be retired in December of 2025 and in May of 2026. As always, we are doing our best to minimize retirements and only those systems that we will not be able to support due to hardware or software constraints are included. Please see your research toolkits group to see which systems will be retired.
Maintenance Outage Window Annoucement: June 23–25
From 12:01 AM on June 23 through end of day June 25, all OIT-provided research computing services will be unavailable. Affected services include:
- All stand-alone research virtual machines (VMs) (usually research-*)
- Research Toolkits VMs such as RAPID and SAFER (PN for Research)
- All research VMs in protected enclaves (PN for Research, Protected Network)
- PACE GPU VMs
- Student GPU access through Container Manager
- Duke Compute Cluster (DCC) and all associated resources
- New: All research storage services (Globus, Data Attic, Datacommons, and Research Standard Storage)
As in prior maintenance windows, non-DCC services will be returned to service about half-way through the window.
/work File System Upgrade for DCC
During the outage, we will replace the current Isilon-based /work file system with a new, faster file system from VAST Data, which will provide improved performance for I/O-intensive workloads. Here’s what to expect:
- The new VAST system is currently available for testing at /vwork.
- During the maintenance window:
/vworkwill be renamed to/workand become the new active workspace.- The existing
/workwill be moved to/old_work, where it will remain accessible for 75 days (consistent with our existing purge policy), giving users time to retrieve any necessary files.
We encourage you to test your workflows against /vwork in advance of the outage to ensure compatibility and performance expectations.
DCC Slurm Updates
We will be tuning several SLURM parameters to enhance the performance of the DCC. Most notably, we will begin enforcing stricter job wall times. More details will be provided after the maintenance, but users should expect to edit their job submission scripts.
Hosted Systems Retirement Dates
We have projected the next round of researcher owned systems that will be retired in December of 2025 and in May of 2026. As always, we are doing our best to minimize retirements and only those systems that we will not be able to support due to hardware or software constraints are included. Please see your research toolkits group to see which systems will be retired.
Completed: 1/6/25 to 1/8/25 Maintenance activities
The DCC was restored to service on 1/8/25 at 6 pm. DCC specific notes: - The SLURM GRES GPU type names were updated for all GPUs - more info
The following services were restored to service on 1/6/25 at 5 pm - All research virtual machines (VMs), including Research Toolkits RAPID VMs - Research computing resources (cluster and individual virtual machines) in the PN for Research (PNR), PRDN, and Protected Network - PACE GPU VMs (OIT provided CPU and GPU Resources) - FAST Storage Resources (/cwork, CephFS, and Duke Data Attic) - All Research Computing standard and data commons storage – you can now access DCC via Globus
1/6/25 to 1/10/25 Maintenance Outage
There will be a complete outage of OIT provided Research Computing services from 1/6/25 to 1/10/25 for routine maintenance and patching.
Impacted services include: - Duke Compute Cluster, Open OnDemand, and associated services - All research virtual machines (VMs), including Research Toolkits RAPID VMs - Research computing resources (cluster and individual virtual machines) in the PN for Research (PNR), PRDN, and Protected Network - PACE GPU VMs (OIT provided CPU and GPU Resources) - FAST Storage Resources (/cwork, CephFS, and Duke Data Attic)
Additional changes: - End of Support researcher purchased hardware retirements will be completed before the maintenance window. Impacted owners can view the list of impacted hardware in Research Toolkits - Datacommons storage archive volumes will no longer be mounted directly on DCC compute nodes, this is to help optimize our storage platform. Impacted users will receive a more detailed set of information.
A more detailed schedule of impacts and reminder will be sent in December. Email Oitresearchsupport@duke.edu with questions.