Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

View file
nameAWS Archiving Documentation .docx

...

  • If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.

  • Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.

  • Request Administrative access to your lab's AWS account if you need to become an administrator.

  • Log in to the AWS Web Console

  • Create an S3 AWS deep archive bucket in the console using the following steps.

    1. Type S3  in the top left search box and select S3 from the list of options

      image-20250215-005320.pngImage Added
    2. Select  Create bucket  to start the bucket configuration

    3. The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :

      image-20250215-005518.pngImage Added

       

      Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.

      image-20250215-005715.pngImage Added
    4. Go to the Bucket from the list of buckets once it is created.

      image-20250215-005741.pngImage Added

    5. Create a folder within the Bucket by selecting “create folder”  in the top right section of the bucket. with the following  naming convention labname_archive

      image-20250215-005907.pngImage Added

    6. Select the folder and in the top right corner, click on the action dropdown. select  Edit storage class

      image-20250215-005940.pngImage Added

    7. Change the Storage class from Standard to Glacier Deep Archive

      image-20250215-010054.pngImage Added

  • To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.

    1. Once you're logged into your AWS account navigate to IAM using the search option in the top left.

      image-20250215-010203.pngImage Added

    2. In the left pane, select the Users > Create user option and proceed to create a new user.

    3. Attach the policy directly to the user being created, in this case, it would be  Amazons3fullaccess

      image-20250215-010231.pngImage Added

    4. Click next and then create a new user

    5. Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name

      image-20250215-010254.pngImage Added
    6. Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key

    7. Select Command line interface(CLI) as the use case

      image-20250215-010350.pngImage Added

      The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.

      image-20250215-010456.pngImage Added

  • Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)

    • If you wish to archive data to AWS in a self-service manner, we recommend that you request a Cortex virtual machine that you will use to complete the transfer to Glacier.

    • Your lab must already have a Columbia-provisioned AWS account. For assistance in requesting an account, please contact ITS.

    • Request Administrative access to your lab's AWS account if you need to become an administrator.

    • Log in to the AWS Web Console

    • Create an S3 AWS deep archive bucket in the console using the following steps.

      1. Type S3  in the top left search box and select S3 from the list of options

        image-20250215-005320.pngImage Added
      2. Select  Create bucket  to start the bucket configuration

      3. The initial configuration should have a similar setup to the following :The initial configuration should have a similar setup to the following :

        image-20250215-005518.pngImage Added

         

        Some use cases may wish to enable Bucket Versioning. The default is disabled. If enabled, when files are changed, the different versions will take up additional storage space. Eg. If 10% of data is changed once, that will store multiple versions of that data and will incur a 10% additional storage cost.

        image-20250215-005715.pngImage Added
      4. Go to the Bucket from the list of buckets once it is created.

        image-20250215-005741.pngImage Added

      5. Create a folder within the Bucket by selecting “create folder”  in the top right section of the bucket. with the following  naming convention labname_archive

        image-20250215-005907.pngImage Added

      6. Select the folder and in the top right corner, click on the action dropdown. select  Edit storage class

        image-20250215-005940.pngImage Added

      7. Change the Storage class from Standard to Glacier Deep Archive

        image-20250215-010054.pngImage Added

    • To complete the transfer, you will need to use Identity Access Management (IAM) to grant programmatic access. You will be given an AWS access key ID and a Secret Access Key via a CSV file which you will later use to access AWS via the command line interface.

      1. Once you're logged into your AWS account navigate to IAM using the search option in the top left.

        image-20250215-010203.pngImage Added

      2. In the left pane, select the Users > Create user option and proceed to create a new user.

      3. Attach the policy directly to the user being created, in this case, it would be  Amazons3fullaccess

        image-20250215-010231.pngImage Added

      4. Click next and then create a new user

      5. Navigate to the Users tab where you will see the user that you created. Select the user by double-clicking on the user name

        image-20250215-010254.pngImage Added
      6. Navigate to the the Security credentials tab where you will create an access key. Select Create Access Key

      7. Select Command line interface(CLI) as the use case

        image-20250215-010350.pngImage Added

        The access key will be created and will be able to download it in a CSV. Do not share this key with anyone.

        image-20250215-010456.pngImage Added

    • Once the files are identified for Glacier archiving, mount the Engram share from a Linux terminal(Preferably a virtual Ubuntu machine hosted in our data center)

      • Ubuntu

        Code Block
        If cifs-utils not installed, install them using the following command
        $ sudo apt-get install cifs-utils
        Create a local mount point. For example:
        $ mkdir ~/engram
        Mount the share using the following command
        sudo /sbin/mount.cifs --verbose -o vers=2.1,user=UNI,domain=adcu.columbia.edu,uid=$(id -u),forceuid,gid=$(id -g),forcegid,file_mode=0755,dir_mode=0755,rw,noacl //TIERNAME-smb.engram.rc.zi.columbia.edu/LABNAME-TIERNAME /home/$(id -un)/engram
        ■	Replace strings in BOLD with real values, as follows:
        ●	UNI - your UNI
        ●	TIERNAME - locker, labshare or staging
        ●	LABNAME - name of your lab
      • To install the AWS CLI, run the following commands.

        Code Block
        $ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
        unzip awscliv2.zip
        sudo ./aws/install
        ###Confirm the installation with the following command.
        aws --version
      • To complete the sync, you must type aws configure in the terminal where you will provide the AWS Access Key ID  and the Secret Access Key. You will also be asked to provide  the default region which will be us-east-1 and the default output format which can be json

        Code Block
        aws configure
        AWS Access Key ID [None]: ALISIOS DNN7 EXAMPLE
        AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
        Default region name [None]: us-east-1
        Default output format [None]: json
      • Create a directory called archive_logs

      • Install tmux using the following commands:

        Code Block
        sudo apt update
        sudo apt install tmux
        ### Verify that the screen has been installed by running 
        which tmux
        ### To start a new session run the the tmux command
        tmux
      • Navigate to the directory that will be archived and type the command pwd. This will  print the path of the  working directory to be archived

      • Copy and paste the path into the AWS s3 sync along with the s3 location where you would like to save the backup. Note that you can create the destination folder from the command instead of creating it in the console beforehand. The folder being uploaded in this case is archive data

        Code Block
        sudo s3 sync "/ifs/locker-smb/shohamy/" s3://shohamy-engram-smb-locker-backup/Engram-smb-locker-backup/ --storage-class DEEP_ARCHIVE >> >(tee -a archive_logs/20250121_shohamyLabArchive.log) 2>> archive_logs/20250121_shohamyLabArchive.log
  •   Check logs to ensure that the files are uploaded correctly.