Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

STILL A WIP

TL;DR

The main takeaway from this is that if your research data is made up of many millions of small files inside of millions of directories, then between the expense and complexity, you may be better off keeping your data on Engram, and letting Research Computing handle the backups.

Some Cloud Storage Provider Options and Pricing as of 11/2020 - Trade-offs will be in terms of storage cost, API cost and restore speed

ProviderStorage Cost per GBAPI Requests CostRestore Transfer Cost per GB
AWS S3$0.023$0.0004 - $0.005 per 10,000 depending on query type: details$0.01
AWS Glacier$0.004$0.0004 - $0.005 per 10,000 depending on query type: details$0.01
Google Cloud Nearline Storage$0.01$0.01 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Coldline Storage$0.004$0.05 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Archive Storage$0.0012$0.50 per 10,000$0.01
Azure Cool


Azure Archive$0.00099

Backblaze b2$0.005$0.004 per 1,000 or $0.004 per 10,000 depending on query type: details$0.01
Spider OakContact Sales for >5TB

Some software options for cloud storage backups

SoftwareServer PlatformClientsDestinationsCostConcerns/Gotchas
DuplicatiRuns on client
  • Windows Vista and higher (both 32 and 64 bit versions)
  • Windows Server 2008 and higher (both 32 and 64 bit versions)
  • Linux
  • Apple Mac OSX
  • Some NAS devices
  • Amazon Cloud Drive
  • Amazon S3
  • Azure blob
  • B2 Cloud Storage
  • Box.com
  • Dropbox
  • Google Cloud Storage
  • Google Drive
  • HubiC
  • Jottacloud
  • Mega.nz
  • Microsoft Office 365 Groups
  • Microsoft OneDrive for Business
  • Microsoft OneDrive
  • Microsoft SharePoint
  • OpenStack Simple Storage
  • Rackspace CloudFiles
  • Rclone
  • Sia Decentralized Cloud
FreeMillions of files in millions of directories can be very slow to back up, and detecting changes to that many files can be beyond many free backup options. Make sure this can handle your data.
duplicityRuns on clientLinux, or other POSIX compliant system (best on Linux, included in Ubuntu)
  • Amazon S3
  • Backblaze B2
  • DropBox
  • ftp
  • GIO
  • Google Docs
  • Google Drive
  • HSI
  • Hubic
  • IMAP
  • local filesystem
  • Mega.co
  • Microsoft Azure
  • Microsoft Onedrive
  • par2
  • Rackspace Cloudfiles
  • rclone
  • rsync
  • Skylabel
  • ssh/scp
  • SwiftStack
  • Tahoe-LAFS
  • WebDAV
Freersync based, so it may choke on large numbers of small files.
Cloudberry BackupRuns on client
  • Linux
  • Windows
  • MacOS
  • AWS
  • Microsoft AzureĀ 
  • Backblaze B2
  • Wasabi
  • Google Cloud Storage
Commercial, $50 per Windows computer, up to 5 computers. For Linux and MacOS cost is $30 per computer. Free trial available.

The same company has more expensive software for larger scale backups.

BaculaLinux
  • Linux
  • Windows
  • MacOS
  • S3 compatible Cloud
  • Tape
  • Disk
Free with commercial versionBased on tape backups so, it's important to understand that you are emulating tapes when using other storage destinations.

With all of the above choices, I cannot stress enough the importance of making sure that they can handle your data. Most backup solutions are not designed to handle the extremely large numbers of small files and directories that are quite often found in Neuroscience research, and this only one reason we suggest using Engram which is backed up by RC, so you don't have to worry about backups.

Additional Information

Google Cloud Backup - making them most of googles storage tiers

  • No labels