Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

TL;DR

The main takeaway from this is that if your research data is made up of many millions of small files inside of millions of directories, then between the expense and complexity of backing it up to a cloud provider, you may be better off keeping your data on Engram, and letting Research Computing handle the backups.

All-in-one providers

Backblaze, Spider Oak, MEGA, and Crashplan all have backup services designed to back up a single machine or small group of machines, rather than multiple terabytes of scientific data. That includes the backup software and the cloud storage costs bundled into a single fee. Backblaze is only for Windows and Mac, While Spider Oak also supports Linux. The all list prices at the above links. Read the fine print, as none are designed to handle the needs of a lab, and while they may be able to handle you, pricing will most likely be higher.

If you don't use an all-in-one provider, then you will need to choose both backup software and a destination for your data. Here are some options.

Some Cloud Storage Provider Options and Pricing as of 11/2020 - Trade-offs will be in terms of storage cost, API cost and restore speed. While "archive" tier storage is listed, we do not suggest using them for backups primarily due to API related costs

ProviderStorage Cost per GBAPI Requests CostRestore Transfer Cost per GB
AWS S3$0.023$0.0004 - $0.005 per 10,000 depending on query type: details$0.01
AWS Glacier$0.004$0.0004 - $0.005 per 10,000 depending on query type: details$0.01
AWS Glacier Deep Archive$0.00099$0.0004 - $0.005 per 10,000 depending on query type: details$0.02
Google Cloud Nearline Storage$0.01$0.01 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Coldline Storage$0.004$0.05 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Archive Storage$0.0012$0.50 per 10,000$0.01
Azure Cool$0.01$0.004-$0.10 per 10,000 depending on query type: details$0.01
Azure Archive$0.00099$0.004-$0.10 per 10,000 depending on query type, Read are $5 per 10,000: details$0.02
Backblaze b2$0.005$0.004 per 1,000 or $0.004 per 10,000 depending on query type: details$0.01

Some software options for cloud storage backups

SoftwareServer PlatformClientsDestinationsCostConcerns/Gotchas
DuplicatiRuns on client
  • Windows Vista and higher (both 32 and 64 bit versions)
  • Windows Server 2008 and higher (both 32 and 64 bit versions)
  • Linux
  • Apple Mac OSX
  • Some NAS devices
  • Amazon Cloud Drive
  • Amazon S3
  • Azure blob
  • B2 Cloud Storage
  • Box.com
  • Dropbox
  • Google Cloud Storage
  • Google Drive
  • HubiC
  • Jottacloud
  • Mega.nz
  • Microsoft Office 365 Groups
  • Microsoft OneDrive for Business
  • Microsoft OneDrive
  • Microsoft SharePoint
  • OpenStack Simple Storage
  • Rackspace CloudFiles
  • Rclone
  • Sia Decentralized Cloud
FreeMillions of files in millions of directories can be very slow to back up, and detecting changes to that many files can be beyond many free backup options. Make sure this can handle your data.
duplicityRuns on clientLinux, or other POSIX compliant system (best on Linux, included in Ubuntu)
  • Amazon S3
  • Backblaze B2
  • DropBox
  • ftp
  • GIO
  • Google Docs
  • Google Drive
  • HSI
  • Hubic
  • IMAP
  • local filesystem
  • Mega.co
  • Microsoft Azure
  • Microsoft Onedrive
  • par2
  • Rackspace Cloudfiles
  • rclone
  • rsync
  • Skylabel
  • ssh/scp
  • SwiftStack
  • Tahoe-LAFS
  • WebDAV
Freersync based, so it may choke on large numbers of small files.
Cloudberry BackupRuns on client
  • Linux
  • Windows
  • MacOS
  • AWS
  • Microsoft Azure 
  • Backblaze B2
  • Wasabi
  • Google Cloud Storage
Commercial, $50 per Windows computer, up to 5 computers. For Linux and MacOS cost is $30 per computer. Free trial available.

The same company has more expensive software for larger scale backups.

BaculaLinux
  • Linux
  • Windows
  • MacOS
  • S3 compatible Cloud
  • Tape
  • Disk
Free with commercial versionServer/Client Model, so you need a server. Based on tape backups so, it's important to understand that you are emulating tapes when using other storage destinations.
RcloneRuns on client
  • Linux
  • Windows
  • MacOS
  • 1Fichier
  • Alibaba Cloud (Aliyun)
  • Object Storage System (OSS)  
  • Amazon S3
  • Backblaze B2
  • Box
  • Ceph
  • Citrix ShareFile
  • C14
  • DigitalOcean Spaces 
  • Dreamhost 
  • Dropbox 
  • FTP
  • Google Cloud Storage
  • Google Drive
  • Google Photos
  • HTTP
  • Hubic
  • Jottacloud
  • IBM COS S3
  • Koofr
  • Mail.ru Cloud
  • Memset Memstore 
  • Mega
  • Microsoft Azure Blob Storage
  • Microsoft OneDrive 
  • Minio 
  • Nextcloud 
  • OVH 
  • OpenDrive
  • OpenStack Swift
  • Oracle Cloud Storage
  • ownCloud 
  • pCloud 
  • premiumize.me
  • put.io 
  • QingStor 
  • Rackspace Cloud Files
  • rsync.net 
  • Scaleway 
  • Seafile 
  • StackPath
  • SugarSync
  • Tardigrade
  • Tencent Cloud Object Storage (COS)
  • Wasabi
  • WebDAV
  • Yandex Disk
  • The local filesystem
Freersync based, so it may choke on large numbers of small files.

With all of the above choices, I cannot stress enough the importance of making sure that they can handle your data. Most backup solutions are not designed to handle the extremely large numbers of small files and directories that are quite often found in Neuroscience research, and this only one reason we suggest using Engram which is backed up by RC, so you don't have to worry about backups.

Additional Information

Google Cloud Backup - making them most of googles storage tiers

  • No labels