Archive Engram data to AWS Deep Glacier Archiving
If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.
Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.
Request Administrative access to your lab's AWS account if you need to become an administrator.
Log in to the AWS Web Console
Create an S3 AWS deep archive bucket in the console using the following steps.
Type S3 Ā in the top left search box and select S3 from the list of options
SelectĀ Create bucketĀ to start the bucket configuration
The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :
Ā
Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.
Go to the Bucket from the list of buckets once it is created.
Ā
Create a folder within the Bucket by selecting ācreate folderāĀ in the top right section of the bucket. with the followingĀ naming convention labname_archive
Ā
Select the folder and in the top right corner, click on the action dropdown. selectĀ Edit storage class
Ā
Change the Storage class from Standard to Glacier Deep Archive
Ā
To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.
Once you're logged into your AWS account navigate to IAM using the search option in the top left.
Ā
In the left pane, select the Users > Create user option and proceed to create a new user.
Attach the policy directly to the user being created, in this case, it would beĀ Amazons3fullaccess
Ā
Click next and then create a new user
Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name
Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key
Select Command line interface(CLI) as the use case
The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.
Ā
Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)
If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.
Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.
Request Administrative access to your lab's AWS account if you need to become an administrator.
Log in to the AWS Web Console
Create an S3 AWS deep archive bucket in the console using the following steps.
Type S3 Ā in the top left search box and select S3 from the list of options
SelectĀ Create bucketĀ to start the bucket configuration
The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :
Ā
Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.
Go to the Bucket from the list of buckets once it is created.
Ā
Create a folder within the Bucket by selecting ācreate folderāĀ in the top right section of the bucket. with the followingĀ naming convention labname_archive
Ā
Select the folder and in the top right corner, click on the action dropdown. selectĀ Edit storage class
Ā
Change the Storage class from Standard to Glacier Deep Archive
Ā
To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.
Once you're logged into your AWS account navigate to IAM using the search option in the top left.
Ā
In the left pane, select the Users > Create user option and proceed to create a new user.
Attach the policy directly to the user being created, in this case, it would beĀ Amazons3fullaccess
Ā
Click next and then create a new user
Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name
Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key
Select Command line interface(CLI) as the use case
The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.
Ā
Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)
Ubuntu
If cifs-utils not installed, install them using the following command $ sudo apt-get install cifs-utils Create a local mount point. For example: $ mkdir ~/engram Mount the share using the following command sudo /sbin/mount.cifs --verbose -o vers=2.1,user=UNI,domain=adcu.columbia.edu,uid=$(id -u),forceuid,gid=$(id -g),forcegid,file_mode=0755,dir_mode=0755,rw,noacl //TIERNAME-smb.engram.rc.zi.columbia.edu/LABNAME-TIERNAME /home/$(id -un)/engram ā Replace strings in BOLD with real values, as follows: ā UNI - your UNI ā TIERNAME - locker, labshare or staging ā LABNAME - name of your lab
To install the AWS CLI, run the following commands.
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install ###Confirm the installation with the following command. aws --version
To complete the sync, you must type aws configure in the terminal where you will provide the AWS Access Key IDĀ and the Secret Access Key. You will also be asked to provideĀ the default region which will be us-east-1 and the default output format which can be json
aws configure AWS Access Key ID [None]: ALISIOS DNN7 EXAMPLE AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY Default region name [None]: us-east-1 Default output format [None]: json
Create a directory called archive_logs
Install tmux using the following commands:
Navigate to the directory that will be archived and type the command pwd. This willĀ print the path of theĀ working directory to be archived
Copy and paste the path into the AWS s3 sync along with the s3 location where you would like to save the backup. Note that you can create the destination folder from the command instead of creating it in the console beforehand. The folder being uploaded in this case is archive data.Ā
Ā Check logs to ensure that the files are uploaded correctly.
Ā