As your key transactional systems start moving to the cloud like Salesforce, Workday, Netsuite etc. it makes sense to move your data warehouse to the cloud also. One of the fastest growing database in the cloud is Snowflake and It is rapidly gaining a lot of customers who want to move their analytic workloads to the cloud from on-premise based Oracle, SQL Server or Teradata based data warehouses
Snowflake is a serverless cloud only database which means you can use as little or as much as you need of various resources like CPU, memory and storage. It is not limited by the capacity of the server it is on. We have recently started several projects to move on-premise Oracle and SQL Server data warehouses to Snowflake. I will write about our experience in this journey in future articles.
To begin with I would like to go over why I think Snowflake is a viable cloud database to replace legacy databases which has been in use for years. There has been a lot of articles written on the technical capabilities of Snowflake. You can read them here – Article 1, Article 2 and many more if you search on the web or ready up Snowflake blog.
What I want to emphasize below is how it will help the technical and functional BI and analytics practitioners.
- Freedom from DBAs – You can create and destroy databases as and when you please. This is very important as in the analytics world you end up starting a lot of projects without knowing how large the data volume will be and how many users will end up using your solution.
The provisioning of on-premise databases is totally depended on CPU, memory and storage. So, before you talk to the DBA team you need to make an estimate on exactly how much of each is needed. Then you must wait for a DBA to create a database, give it the right resources, after you justify why you need this much CPU, memory and storage etc. In the meantime, you just want to load your data and start modeling and analyzing it. Snowflake helps you be your own DBA. Imagine that! It also simplifies backup and recovery as well cloning your databases to create new environments. It does need some budget to get started but it’s not much and you have to deal with way less red tape and politics to get started.
- Its “big data” without the complexity – For the last several years there has been so much talk about Hadoop, Hive, HBase, Spark and a whole bunch of Apache projects which promised a lot of performance and cost improvements over traditional databases. The problem was that it moved the discussion of BI and analytics to an extremely technical level and thereby losing sight of the whole purpose of the effort which is to let the end users analyze the data and gain insights. It turned out that for a lot of companies the traditional Hadoop based “big data” was an overkill. They took on something which was very complex to install, develop and run and the business users lost patience and moved on by building their own custom solutions.
A lot of companies used big data to store csv, XML and JSON files in addition to relational data. All this can be easily accomplished with Snowflake without any of the complexity of maintaining a big data platform, writing Map Reduce etc.
- SQL Compatibility – There has been a lot of databases that has come to the market which promised a lot of very high performance at low cost. The problem with most of these are that they don’t provide full relational SQL capabilities like sub-queries, union queries, inline SQL etc. When you are trying to replace a traditional Oracle or SQL Server database there will be a lot of usage of these SQL features. You will have to end up rewriting this logic in some other way leading to tremendous increase in complexity of your migration effort. Snowflake from the beginning was built to support all the capabilities of relational SQL. I have found this to be mostly true with some gaps which Snowflake is actively closing with every release.
- BI Tools support – Snowflake has partnerships with almost all major BI tools vendors like Tableau, Qlik, MicroStrategy, Looker, Spotfire and many others. All these tools have native connectors to Snowflake which gives immediate connectivity to the data that is loaded into Snowflake.
- Cloud Connectors – Snowflake is built for Extract Load and Transform (ELT) data integrations. The best way to load data into Snowflake is to load data in the raw format from the source system and then transform it within Snowflake whether the source is files, cloud systems like Salesforce or On-premise applications. There is a rapid ecosystem developing with partners like FiveTran, Stitch or Matillion where they are starting to build native connectors to popular cloud applications like Salesforce. These tools sync the data from Salesforce and other systems and then allow the developers to transform it the way the business needs require.
In summary, if you are looking to move your data warehouse to the cloud Snowflake is one of the best options out there due to its serverless architecture, full SQL support, ease of maintenance and strong partnerships with BI and ETL tools.