Research Computing Services Compared with Public Cloud Services

The information below is accurate as of August, 2020.

Zuckerman researchers often have questions about vendor-provided cloud services, including Columbia-provided Amazon AWS accounts, as well as the Google Cloud and Microsoft Azure platforms, and how those compare to using the Research Computing (RC) team’s Engram storage and Cortex on-demand compute services (which run on infrastructure located in the Jerome L. Greene Science Center and are managed by RC staff).

Our team can arrange to meet and discuss the workflows and aims of a specific project and help assess different options, simply contact us at rc@zi.columbia.edu

Today, a rough rule of thumb for those overseeing highly-utilized infrastructure for the specialized needs of scientific disciplines is that the operational costs of running an organization’s local storage and computation services with vendor-provided cloud-only equivalents would be in the range of 2-5x more expensive. However, estimating cloud service costs is notoriously difficult and specific use cases could, of course, be different. Cloud offerings are evolving quickly as well; part of our team’s mission is to keep up-to-date with the latest technology and pricing so we can provide options to labs that make the most sense.

We provide detailed comparisons in the pages below. Basic conclusions we draw include:

  • For general purpose, everyday use, Engram storage is faster and more cost efficient than AWS.

  • AWS S3 has a clear advantage for public dataset sharing over Engram, and this is the application for which we recommend using it.  If this is something you'd like to set up, please reach out to rc@zi.columbia.edu so that we can configure S3 storage through our Amazon contacts.

  • Both AWS S3 and EFS may make sense if you are processing data that is already available in AWS (such as a public dataset) and you prioritize speed over cost.

  • In most scenarios, Cortex on-demand compute servers (virtual machines or VMs) are more cost effective than the equivalent AWS EC2 instance types, and include data protection features at no additional cost.  EC2 instances make the most sense when Cortex VMs are otherwise unable to fulfill your needs or you need to run a program for a short period of time.