TL;DR
The main takeaway from this is that if your research data is made up of many millions of small files inside of millions of directories, then between the expense and complexity of backing it up to a cloud provider, you may be better off keeping your data on Engram, and letting Research Computing handle the backups.
All-in-one providers
Backblaze, Spider Oak, MEGA, and Crashplan all have backup services designed to back up a single machine or small group of machines, rather than multiple terabytes of scientific data. That includes the backup software and the cloud storage costs bundled into a single fee. Backblaze is only for Windows and Mac, While Spider Oak also supports Linux. The all list prices at the above links. Read the fine print, as none are designed to handle the needs of a lab, and while they may be able to handle you, pricing will most likely be higher.
If you don't use an all-in-one provider, then you will need to choose both backup software and a destination for your data. Here are some options.
Some Cloud Storage Provider Options and Pricing as of 11/2020 - Trade-offs will be in terms of storage cost, API cost and restore speed. While "archive" tier storage is listed, we do not suggest using them for backups primarily due to API related costs
Provider | Storage Cost per GB | API Requests Cost | Restore Transfer Cost per GB |
---|---|---|---|
AWS S3 | $0.023 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.01 |
AWS Glacier | $0.004 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.01 |
AWS Glacier Deep Archive | $0.00099 | $0.0004 - $0.005 per 10,000 depending on query type: details | $0.02 |
Google Cloud Nearline Storage | $0.01 | $0.01 - $0.10 per 10,000 depending on query type: details | $0.01 |
Google Cloud Coldline Storage | $0.004 | $0.05 - $0.10 per 10,000 depending on query type: details | $0.01 |
Google Cloud Archive Storage | $0.0012 | $0.50 per 10,000 | $0.01 |
Azure Cool | $0.01 | $0.004-$0.10 per 10,000 depending on query type: details | $0.01 |
Azure Archive | $0.00099 | $0.004-$0.10 per 10,000 depending on query type, Read are $5 per 10,000: details | $0.02 |
Backblaze b2 | $0.005 | $0.004 per 1,000 or $0.004 per 10,000 depending on query type: details | $0.01 |
Some software options for cloud storage backups
Software | Server Platform | Clients | Destinations | Cost | Concerns/Gotchas |
---|---|---|---|---|---|
Duplicati | Runs on client |
| Free | Millions of files in millions of directories can be very slow to back up, and detecting changes to that many files can be beyond many free backup options. Make sure this can handle your data. | |
duplicity | Runs on client | Linux, or other POSIX compliant system (best on Linux, included in Ubuntu) |
| Free | rsync based, so it may choke on large numbers of small files. |
Cloudberry Backup | Runs on client |
|
| Commercial, $50 per Windows computer, up to 5 computers. For Linux and MacOS cost is $30 per computer. Free trial available. | The same company has more expensive software for larger scale backups. |
Bacula | Linux |
|
| Free with commercial version | Server/Client Model, so you need a server. Based on tape backups so, it's important to understand that you are emulating tapes when using other storage destinations. |
Rclone | Runs on client |
|
| Free |
With all of the above choices, I cannot stress enough the importance of making sure that they can handle your data. Most backup solutions are not designed to handle the extremely large numbers of small files and directories that are quite often found in Neuroscience research, and this only one reason we suggest using Engram which is backed up by RC, so you don't have to worry about backups.
Additional Information
Google Cloud Backup - making them most of googles storage tiers