The elastic future of data warehousing

This post was authored by Joseph Sirosh, Corporate Vice President, Data Group.

Announcing the general availability of Azure SQL Data Warehouse, an elastic, parallel, columnar data warehouse as a service.

A defining characteristic of cloud computing is elasticity – the ability to rapidly provision and release resources to match what a workload requires – so that a user pays no more and no less than what they need to for the task at hand. Such just-in-time provisioning can save customers enormous amounts of money when their workloads are intermittent and heavily spiked. And in the modern enterprise, there are few workloads that have a desperate need for such elastic capabilities as data warehousing. Traditionally built on-premises with very expensive hardware and software, most enterprise Data Warehouse (DW) systems have very low utilization except during peak periods of data loading, transformation and report generation.

With the general availability of the Azure SQL Data Warehouse, we are delivering the true promise of cloud elasticity to data warehousing. It is a fully managed DW as a Service that you can provision in minutes and scale up to 60 times larger in seconds. With a few clicks in the Azure Portal, you can launch a data warehouse, and start analyzing or querying data at the scale of hundreds of terabytes. Our architecture separates compute and storage so that you can independently scale them, and use just the right amount of each at any given time. A very unique pause feature allows you to suspend compute in seconds and resume when needed while your data remains intact in Azure storage. And SQL Data Warehouse offers an availability SLA of 99.9% – the only public cloud data warehouse service that offers an availability SLA to customers.

According to Gartner, “For years, many data warehousing vendors have been operating from a playbook of tightly balanced storage and compute configuration units. Cloud architectures are forcing a shift in this approach, with vendors starting to decouple storage and compute, and allowing them to independently scale. We believe this approach to be the correct one, and that other vendors in the space will need to adopt this methodology if they are to stay competitive.1

Azure SQL Data Warehouse uses an elastic massively parallel processing (MPP) architecture built on top of the industry-leading SQL Server 2016 database engine. It allows you to interactively query and analyze data using the broad set of existing SQL-based tools and business intelligence applications that you use today. It uses column stores for high performance analytics and storage compression, a rich collection of aggregation capabilities of SQL Server, and state of the art query optimization capabilities. With built-in capabilities such as Polybase, it allows you to query Hadoop systems directly, enabling a single SQL-based query surface for all your data.

Azure SQL Data Warehouse is also part of the Cortana Intelligence Suite, which is a fully managed big data and advanced analytics suite to transform your data into intelligent action. It easily integrates with components of the suite such as Azure Data Factory for data integration pipelines, with Azure Machine Learning for predictive analytics, Power BI for business intelligence, HDInsight for big data insights, R and Spark for big data analytics. For an example of such integration, see the airline industry sample on PowerBI.com. This shows a Power BI report based on a real world predictive maintenance solution for a major airline. The data for this report comes from a variety of sources including IoT streams from aircraft engines, air traffic control information, route restrictions and fuel usage data. All this is integrated and landed into a Azure SQL DW and processed with Azure Machine Learning to detect operational anomalies and trends.  The report is “live” and you can interact with it and experience Power BI in conjunction with Azure SQL DW and Azure ML.

The distinct capabilities of Azure SQL Data Warehouse include:

Data warehousing as a service

Gone are the pains associated with administering, managing, patching and manual tuning of data warehouses. There are no knobs to turn, no physical or virtual infrastructure to manage and the service is simple, resilient and secure with reliable storage. This enables the focus on driving the analytics and getting the value from your data rather than on managing your data warehousing software and hardware; Azure SQL Data Warehouse handles it all for you.

Unmatched security and access control

With malicious and even insider attacks becoming a key concern for enterprises, an alarm system over your critical enterprise data is a must have to avoid serious damage to your business and reputation. Only Azure SQL Data Warehouse delivers auditing and threat detection built into the service, with advanced machine learning to detect abnormal query patterns and alert you of potential security issues before it is too late. Data at rest is protected by Transparent Data Encryption.

Additionally, SQL Data Warehouse is the only cloud data warehouse service that works seamlessly with Azure Active Directory which currently supports 1.3 billion daily authentications across 600 million user accounts. This enables Single Sign-On (SSO) and role-based access control. You can even have finer-granularity permissions that let you control which operations a user can do on individual columns, tables, views, procedures, and other objects in the database. These features further protect data by ensuring just the right users have access to the right data—a critical capability when centralizing vast amounts of proprietary and sensitive data for analytics in an enterprise.

Multidimensional elasticity

Currently the majority of cloud database and data warehouse services are provisioned with fixed storage and compute resources. Resizing of resources typically compromises availability and/or performance. This means that service users typically end up with over-provisioned and expensive underutilized resources to accommodate possible peak demand or in the worst case, under-provisioned resources unable to handle sudden work overloads.

Unlike existing cloud services which can take anywhere from a couple of hours to a couple of days to do the data warehouse resizing, SQL Data Warehouse’s unique elastic technology decouples storage and compute, enabling each layer to become independently scalable almost instantaneously. This makes it possible to provision one or more data warehouses in minutes, and then independently scale users, data, and workloads in seconds to optimally match the demand. Further, elastic scaling also makes it possible to simultaneously load and query data, because every user and workload can have exactly the resources needed, without contention, and with minimal impact to production queries.

Getting featured in the iOS App Store was a big deal for a small company like ours as our users increased from 3,000 to 300,000 in 48 hours. To keep up with this 100x increase in workload, we simply added data warehouse compute capacity by moving a slider and our services just scaled in minutes—we didn’t miss an insight,” notes Paul Ohanian, CTO, PoundSand.

Save as you go, with fast pause and resume

Starting and shutting data warehouse clusters may take a considerable amount of time. Leaving the data warehouse running continuously incurs potentially high and unnecessary costs, especially if you are running your jobs periodically and the data warehouse is sitting idle in-between for extended periods of time. Now you can pause your data warehouse for the required time, saving compute costs, and quickly resume it later when needed. You can even write a PowerShell script, then automate the schedule with Azure Automation to automatically pause or resume the cluster based on the specific needs of your business.

When we learned about the pause and resume capabilities of SQL Data Warehouse and integrated services like Azure Machine Learning and Data Factory, we switched from AWS Redshift, migrating over 7TB of uncompressed data over a week for the simple reasons of saving money and enabling a more straight-forward implementation for advanced analytics. To meet our business intelligence requirements, we load data once or twice a month and then build reports for our customers. Not having the data warehouse service running all the time is key for our business and our bottom line,” said Bill Sabo, managing director of information technology at Integral Analytics.

Seamless querying of structured and unstructured data

An increasing amount of data in today’s rapidly digitizing world is unstructured data such as clickstreams, sensor data, location data, customer support emails and chat transcripts, much of which is harnessed for analysis in big data systems. The ability to integrate and join such data with your core relational enterprise data is often a highly desired capability. With built-in PolyBase technology, SQL Data Warehouse allows you to access and combine both non-relational and relational data. You can run queries on external data in Hadoop or Azure blob storage using familiar SQL, often without making any changes to the existing queries. Underneath, the queries are optimized for optimal execution without any burden on the user for tuning. Furthermore, you can quickly import and export data back and forth between relational tables in SQL Data Warehouse and non-relational data in Hadoop or Azure Blob Storage using simple T-SQL statements. The rich SQL programmability support (stored procs, functions and PolyBase) empower users to query the data however they want.

Christoph Leinemann, senior director data engineering at Jet.com says, “with Azure SQL Data Warehouse, we use PolyBase to ingest data from HDInsight then run thousands of analytical queries per day over tens of billions of records—about 20TB of data. This enables us to monitor price history and market dynamics to adjust pricing and ensure we’re offering our customers the best price.”

Integration with the SQL Server tool ecosystem you already use and love

Azure SQL Data Warehouse already fits into the tool ecosystem you already use, with native JBDC and ODBC connectors, and with a broad set of independent software vendors and partners who already support SQL Server, such as Alteryx, Attunity, Informatica, Redgate and SnapLogic. For BI capabilities, it integrates with the industry-leading Power BI service in Azure, and even with Microsoft Excel. For a beautifully visualized walkthrough of Microsoft Power BI and SQL Server 2016 Reporting Services including Mobile BI, please watch this demo. Microsoft also works with a set of popular BI partners to ensure the tools your teams use work great with SQL Data Warehouse, including Looker Data Sciences, Tableau Software and Qlik Technologies.

Experience modern data warehousing in the cloud for yourself

Today we have thousands of customers who are already using Azure SQL Data Warehouse. Many of these customers are experiencing significant performance gains over existing multi-million dollar data warehouses on-premises. With SQL Data Warehouse, some multi-hour queries in our customer environments finish now in under an hour, and some queries that took five to ten minutes now complete in seconds. Get started with SQL Data Warehouse today and experience the speed, scale, elasticity, security and ease of use of a true modern data warehouse as a service for yourself.

– Joseph

1Source: Gartner, The Data Warehouse and DMSA Market: Current and Future States, 201, June 16, 2016.