STILL A WIP
TL;DR
The main takeaway from this is that if Please note: If your research data is made up of many millions of small files inside of millions of many directories, then between due to the expense and complexity of backing it up to a cloud provider, you may be better off keeping your data on Engram, and letting Research Computing handle the backupswe recommend contacting our team for guidance and assistance at rc@zi.columbia.edu.
All-in-one providers
Backblaze, Spider Oak, MEGA, and Code42 (Crashplan) all have backup services designed to back up a single machine or small group of machines, rather than multiple terabytes of scientific data. That includes the backup software and the cloud storage costs bundled into a single fee. Backblaze is only for Windows and Mac, While Spider Oak also supports Linux. The all list prices at the above links. Read the fine print, as none are designed to handle the needs of a lab, and while they may be able to handle you, pricing will most likely be higher. CUIT supports and offers bulk and discounted pricing on Code42 CrashPlan backup software.
If you don't use an all-in-one provider, then you will need to choose both backup software and a destination for your data. Here are some options.
Some Cloud Storage Provider Options and Pricing as of
...
March 2023 - Trade-offs will be in terms of storage cost, API cost and restore speed.
Provider | Storage Cost per GB | API Requests Cost | Restore Transfer Cost per GB | ||
---|---|---|---|---|---|
AWS S3 | $0.023 | $0.0004 - $0.005 per 10,000 depending on query type: details | n/a | ||
AWS Glacier Instant Retrieval | $0.004 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.01 | ||
AWS Glacier Flexible Retrieval | $0.0036 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.01 - $0.03 | ||
AWS Glacier Deep Archive | $0.00400099 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.02 | ||
Google Cloud Standard Storage | $0.02 | $0.004 - $0.05 per 10,000 depending on query type: details | $0.01 | ||
Google Cloud Nearline Storage | $0.01 | $0.01 - $0.10 per 10,000 depending on query type: details | $0.01 | ||
Google Cloud Coldline Storage | $0.004 | $0.05 - $0.10 per 10,000 depending on query type: details | $0.01 | ||
Google Cloud Archive Storage | $0.0012 | $0.50 per 10,000 | $0.01 | ||
Azure Cool | $0.01 | $0.004-$0.10 per 10,000 depending on query type: details | $0.01 | ||
Azure Archive | $0.00099 | $0.004-$0.10 per 10,000 depending on query type, *Read are $5 per 10,000: details | $0.02 | ||
Backblaze b2 | $0.005 | $0.004 per 1,000 or $0.004 per 10,000 depending on query type: details | $0.01 | Spider Oak | Contact Sales for >5TB |
Some software options for cloud storage backups
Software | Server Platform | Clients | Destinations | Cost | Concerns/Gotchas | |
---|---|---|---|---|---|---|
Duplicati | Runs on client |
| Free | Millions of files in millions of directories can be very slow to back up, and detecting changes to that many files can be beyond many free backup options. Make sure this can handle your data. | ||
duplicity | Runs on client | Linux, or other POSIX compliant system (best on Linux, included in Ubuntu) |
| Free | rsync based, so it may choke on large numbers of small files.||
Cloudberry Backup and MSP360 Managed Backups | Runs on client |
|
| Commercial, $50 per Windows computer, up to 5 computers. For Linux and MacOS cost is $30 per computer. Free trial available. | The same company has more expensive software for larger scale backupsstarting from $19.99 and above depending on operating system and version. | |
Bacula | Linux |
|
| Free with commercial version | Server/Client Model, so you need a server. Based on tape backups so, it's important to understand that you are emulating tapes when using other storage destinations. | |
Rclone | Runs on client |
|
| Free |
With all of the above choices, I cannot we stress enough the importance of making sure that they can handle your data. Most Many backup solutions are not designed to handle the extremely large numbers of small files and directories that are quite often found in Neuroscience research, and this only one reason we suggest using Engram which is backed up by RC, so you don't have to worry about backups. If you'd like assistance backing up your data, please contact our team for assistance at rc@zi.columbia.edu.
Additional Information
Google Cloud Backup - making them most of googles storage tiersMaking the Most of Google's Storage Tiers