Archive Engram data to AWS Deep Glacier Archiving

If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.
Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.
Request Administrative access to your lab's AWS account if you need to become an administrator.
Log in to the AWS Web Console
Create an S3 AWS deep archive bucket in the console using the following steps.
1. Type S3 in the top left search box and select S3 from the list of options
2. Select Create bucket to start the bucket configuration
3. The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :
  
  Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.
4. Go to the Bucket from the list of buckets once it is created.
5. Create a folder within the Bucket by selecting “create folder” in the top right section of the bucket. with the following naming convention labname_archive
6. Select the folder and in the top right corner, click on the action dropdown. select Edit storage class
7. Change the Storage class from Standard to Glacier Deep Archive
To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.
1. Once you're logged into your AWS account navigate to IAM using the search option in the top left.
2. In the left pane, select the Users > Create user option and proceed to create a new user.
3. Attach the policy directly to the user being created, in this case, it would be Amazons3fullaccess
4. Click next and then create a new user
5. Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name
6. Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key
7. Select Command line interface(CLI) as the use case
  The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.
Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)
- If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.
- Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.
- Request Administrative access to your lab's AWS account if you need to become an administrator.
- Log in to the AWS Web Console
- Create an S3 AWS deep archive bucket in the console using the following steps.
  1. Type S3 in the top left search box and select S3 from the list of options
  2. Select Create bucket to start the bucket configuration
  3. The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :
    
    Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.
  4. Go to the Bucket from the list of buckets once it is created.
  5. Create a folder within the Bucket by selecting “create folder” in the top right section of the bucket. with the following naming convention labname_archive
  6. Select the folder and in the top right corner, click on the action dropdown. select Edit storage class
  7. Change the Storage class from Standard to Glacier Deep Archive
- To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.
  1. Once you're logged into your AWS account navigate to IAM using the search option in the top left.
  2. In the left pane, select the Users > Create user option and proceed to create a new user.
  3. Attach the policy directly to the user being created, in this case, it would be Amazons3fullaccess
  4. Click next and then create a new user
  5. Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name
  6. Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key
  7. Select Command line interface(CLI) as the use case
    The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.
- Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)
  - Ubuntu
    If cifs-utils not installed, install them using the following command $ sudo apt-get install cifs-utils Create a local mount point. For example: $ mkdir ~/engram Mount the share using the following command sudo /sbin/mount.cifs --verbose -o vers=2.1,user=UNI,domain=adcu.columbia.edu,uid=$(id -u),forceuid,gid=$(id -g),forcegid,file_mode=0755,dir_mode=0755,rw,noacl //TIERNAME-smb.engram.rc.zi.columbia.edu/LABNAME-TIERNAME /home/$(id -un)/engram ■ Replace strings in BOLD with real values, as follows: ● UNI - your UNI ● TIERNAME - locker, labshare or staging ● LABNAME - name of your lab
  - To install the AWS CLI, run the following commands.
    $ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install ###Confirm the installation with the following command. aws --version
  - To complete the sync, you must type aws configure in the terminal where you will provide the AWS Access Key ID and the Secret Access Key. You will also be asked to provide the default region which will be us-east-1 and the default output format which can be json
    aws configure AWS Access Key ID [None]: ALISIOS DNN7 EXAMPLE AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY Default region name [None]: us-east-1 Default output format [None]: json
  - Create a directory called archive_logs
  - Install tmux using the following commands:
  - Navigate to the directory that will be archived and type the command pwd. This will print the path of the working directory to be archived
  - Copy and paste the path into the AWS s3 sync along with the s3 location where you would like to save the backup. Note that you can create the destination folder from the command instead of creating it in the console beforehand. The folder being uploaded in this case is archive data.
Check logs to ensure that the files are uploaded correctly.

Zuckerman Institute

Archive Engram data to AWS Deep Glacier Archiving

Analytics