Skip to content

Data Attic Tutorial

Overview

This Data Attic tutorial guides you through:

It is intended for people new to using Data Attics. It assumes that you have basic knowledge of:

  • Creating Projects in Research Toolkits

Before you start

Duke Data Attics can be created in either DCC Research Groups or in Self-Service Projects in Research Toolkits. Faculty can create Self-Service Projects on their own. If you do not know how to do this, just follow our Quickstart tutorial for creating a project, and Storage Creation and then come back here!

Part 1: Add Data to your Attic

To add data to your project, you should use Globus. Globus is a data transfer service optimized for large data transfers. Globus is the recommended method to transfer data to and from the DCC and can be used with any OIT provided research storage. Connect to Globus via the web to access your OIT Research storage from anywhere.

Step 1: Access Globus

Before trying to transfer data to your data attic, make sure to log in to Globus using your Duke netID. After you do this, you'll be able to access your Duke Data Attic directly from Research Toolkits.

This will take you to the Globus UI and should show your data attic.

Step 2: Transfer Data

From here, you can upload your data in a few ways: 1.) direct upload; 2.) transfer from another collection, or 3.) use Globus flows.

Preparing data

Duke Data Attic is an archival service. We strongly recommend compressing data sets before moving them to your attic. Both space and file utilization are tracked and limited in data attics, and moving a high number of small files into your attic will fill your data utilization limit much faster than your storage space limit. If you're unsure of how best to compress your data, please see the Globus flows section detailing the Package flow.

Direct upload

This is the simplest method for transferring data into your data attic from your local machine. We recommend this for single, small items.

  • Go to your project in Research Toolkits and select the "Globus" button to access your data attic in Globus.
  • Look for the "Upload" icon. Select this, then navigate to wherever your data is stored on your local machine. Select it and start the upload.

Globus transfer

You can use Globus to transfer data from your local machine, Duke Box, or the DCC to your data attic. One of the easiest ways to transfer data is to select the two-panel Globus UI so that you can see a source and destination for your transfer.

  • Go to your project in Research Toolkits and select the "Globus" button to access your data attic in Globus
  • Look to the right of the screen and select the two-panel Globus UI (if not automatically selected)
  • This will allow you to see both the source and destination of your data
  • Since you're transferring data into your data attic, one panel will contain the collection of "Duke Data Attic Collection" with the path specifying your data attic; this will be your destination
  • To select your source, click on the "Search" option in the other panel.
  • You can transfer data from the Duke DCC, your Duke Box, or from a personal collection you setup when you installed Globus Pesrsonal Connext; simply search for either "Duke DCC", "Duke Box", or the name of the collection you created
    • You can also select one of the buttons below the "Globus" link in your Research Toolkits project if you know you want to copy data from a particular source; for instance, if you wanted to copy data from your Duke Box to your data attic, simply click on the "Box to/from Attic" button
  • From here, you can select your data to transfer. Once done, simply select the "Start" button and Globus will begin copying the selected data into your data attic

Globus Flows

Globus Flows are like pre-written scripts for transferring data. If you are not comfortable with compressing data before moving it to your data attic, you can use the "Package to Attic" flow to compress and copy your data. This flow will create an enhanced tarball with metadata for your compressed data.

  • From Research Toolkits, select the "Globus Flows" button on the data attic service
  • Here, you'll see three flows; Move, Copy, and Package.
    • Move will copy your data from your source location, confirm that it has all been copied to the attic, then delete it from the source
    • Copy will copy your data from your source location only, without deleting anything from the source location
    • Package will create a tarball of your selected data and copy that to your data attic
  • As an example, select the "Package to Attic" flow
  • You'll see a template for initiating this flow
  • Source: Here, you can select the data that you want to compress
  • Attic Name: Copy the name of your attic; you can find this on your Research Toolkits project page
  • Suitcase Name: The suitcase is the enhanced tarball; name this whatever you want the tarball to be named
  • Tags: This is an optional label, but this will help identify this particular flow in the Globus "Activity" tab
  • Label: This is a required name for this flow; this will help you identify this action in the Globus "Flows" tab
  • Click "Start Run" and this will begin the flow. Once finished, the suitcase will be placed in your data attic, along with the suitcase's metadata.

Part 2: Share your Attic

Globus makes it easy to work with collaborators from other institutions. When you create a data attic, you will have the option to enable sharing. You must attest that sharing the data adheres to all policies and protocols governing the data set. If you choose not to share the data set, you can always change this decision by going to your Research Toolkits project and selecting the Globus sharing button.

This will prompt for an attestation

Agree to this and upate the data attic. You should now see a "Shareable Collection Link". If you do not, make sure to refresh the webpage. Click that link to go to the shareable collection on Globus.

Here, you can specify the Permissions of the shareable data set. Click on "Permissions" to view the permissions granted to the data attic collection.

Click "Add Permissions - Share With" to share the data collection with new users. From here, you can specify what content to share, with whom, and what level of permissions they should have.

Once you have your options selected, simply click "Add Permission" and the user will now have access to your shared data collection. That user will receive an email specifying their permissions and a link to the data collection.

Now that you've completed this quickstart, try these to learn more about data transfers via Globus.