Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

STILL A WIP

TL;DR

The main takeaway from this is that if Please note: If your research data is made up of many millions of small files inside of millions of many directories, then between due to the expense and complexity of backing it up to a cloud provider, you may be better off keeping your data on Engram, and letting Research Computing handle the backupswe recommend contacting our team for guidance and assistance at rc@zi.columbia.edu.

All-in-one providers

Backblaze and , Spider Oak both , MEGA, and Code42 (Crashplan) all have backup services designed to back up a single machine or small group of machines, rather than multiple terabytes of scientific data. That includes the backup software and the cloud storage costs bundled into a single fee. Backblaze is only for Windows and Mac, While Spider Oak also supports Linux. The both all list prices at the above links. Read the fine print, as neither is none are designed to handle the needs of a lab, and while they may be able to handle you, pricing will most likely be higher. CUIT supports and offers bulk and discounted pricing on Code42 CrashPlan backup software.

If you don't use an all-in-one provider, then you will need to choose both backup software and a destination for your data. Here are some options.

Some Cloud Storage Provider Options and Pricing as of

...

March 2023 - Trade-offs will be in terms of storage cost, API cost and restore speed.

...

 

ProviderStorage Cost per GBAPI Requests CostRestore Transfer Cost per GB
AWS S3$0.023$0.0004 - $0.005 per 10,000 depending on query type: detailsn/a
AWS Glacier Instant Retrieval  $0.004$0.0004 - $0.005 per 10,000 depending on query type: details$0.01
AWS Glacier Flexible Retrieval  $0.0040036$0.0004 - $0.005 per 10,000 depending on query type: details$0.01 - $0.03
AWS Glacier Deep Archive$0.00099$0.0004 - $0.005 per 10,000 depending on query type: details$0.02
Google Cloud Standard Storage$0.02$0.004 - $0.05 per 10,000 depending on query type: details$0.01
Google Cloud Nearline Storage$0.01$0.01 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Coldline Storage$0.004$0.05 - $0.10 per 10,000 depending on query type: details$0.01
Google Cloud Archive Storage$0.0012$0.50 per 10,000$0.01
Azure Cool$0.01$0.004-$0.10 per 10,000 depending on query type: details$0.01
Azure Archive$0.00099$0.004-$0.10 per 10,000 depending on query type, *Read are $5 per 10,000: details$0.02
Backblaze b2$0.005$0.004 per 1,000 or $0.004 per 10,000 depending on query type: details$0.01

...

rsync based, so it may choke on large numbers of small files.
SoftwareServer PlatformClientsDestinationsCostConcerns/Gotchas
DuplicatiRuns on client
  • Windows Vista and higher (both 32 and 64 bit versions)
  • Windows Server 2008 and higher (both 32 and 64 bit versions)
  • Linux
  • Apple Mac OSX
  • Some NAS devices
  • Amazon Cloud Drive
  • Amazon S3
  • Azure blob
  • B2 Cloud Storage
  • Box.com
  • Dropbox
  • Google Cloud Storage
  • Google Drive
  • HubiC
  • Jottacloud
  • Mega.nz
  • Microsoft Office 365 Groups
  • Microsoft OneDrive for Business
  • Microsoft OneDrive
  • Microsoft SharePoint
  • OpenStack Simple Storage
  • Rackspace CloudFiles
  • Rclone
  • Sia Decentralized Cloud
FreeMillions of files in millions of directories can be very slow to back up, and detecting changes to that many files can be beyond many free backup options. Make sure this can handle your data.
duplicityRuns on clientLinux, or other POSIX compliant system (best on Linux, included in Ubuntu)
  • Amazon S3
  • Backblaze B2
  • DropBox
  • ftp
  • GIO
  • Google Docs
  • Google Drive
  • HSI
  • Hubic
  • IMAP
  • local filesystem
  • Mega.co
  • Microsoft Azure
  • Microsoft Onedrive
  • par2
  • Rackspace Cloudfiles
  • rclone
  • rsync
  • Skylabel
  • ssh/scp
  • SwiftStack
  • Tahoe-LAFS
  • WebDAV
Free
Cloudberry Backup and MSP360 Managed BackupsRuns on client
  • Linux
  • Windows
  • MacOS
  • AWS
  • Microsoft Azure 
  • Backblaze B2
  • Wasabi
  • Google Cloud Storage
Commercial, $50 per Windows computer, up to 5 computers. For Linux and MacOS cost is $30 per computer. Free trial available.

The same company has more expensive software for larger scale backups.

starting from $19.99 and above depending on operating system and version.


BaculaLinux
  • Linux
  • Windows
  • MacOS
  • S3 compatible Cloud
  • Tape
  • Disk
Free with commercial versionServer/Client Model, so you need a server. Based on tape backups so, it's important to understand that you are emulating tapes when using other storage destinations.
RcloneRuns on client
  • Linux
  • Windows
  • MacOS
  • 1Fichier
  • Alibaba Cloud (Aliyun)
  • Object Storage System (OSS)  
  • Amazon S3
  • Backblaze B2
  • Box
  • Ceph
  • Citrix ShareFile
  • C14
  • DigitalOcean Spaces 
  • Dreamhost 
  • Dropbox 
  • FTP
  • Google Cloud Storage
  • Google Drive
  • Google Photos
  • HTTP
  • Hubic
  • Jottacloud
  • IBM COS S3
  • Koofr
  • Mail.ru Cloud
  • Memset Memstore 
  • Mega
  • Microsoft Azure Blob Storage
  • Microsoft OneDrive 
  • Minio 
  • Nextcloud 
  • OVH 
  • OpenDrive
  • OpenStack Swift
  • Oracle Cloud Storage
  • ownCloud 
  • pCloud 
  • premiumize.me
  • put.io 
  • QingStor 
  • Rackspace Cloud Files
  • rsync.net 
  • Scaleway 
  • Seafile 
  • StackPath
  • SugarSync
  • Tardigrade
  • Tencent Cloud Object Storage (COS)
  • Wasabi
  • WebDAV
  • Yandex Disk
  • The local filesystem
Free

With all of the above choices, I cannot we stress enough the importance of making sure that they can handle your data. Most Many backup solutions are not designed to handle the extremely large numbers of small files and directories that are quite often found in Neuroscience research, and this only one reason we suggest using Engram which is backed up by RC, so you don't have to worry about backups. If you'd like assistance backing up your data, please contact our team for assistance at rc@zi.columbia.edu.

Additional Information

Google Cloud Backup - making them most of googles storage tiersMaking the Most of Google's Storage Tiers