New SQL Server Big Data Clusters solution takes the stage at HPE Virtual Discover Experience

The HPE Discover Virtual Experience starts Tuesday June 23rd, when 10s of thousands of people will join online to learn about new technologies to transform their businesses such as intelligent edge, hybrid cloud, IoT, Exascale computing and much more. Our team will be part of this event, showing off our newest solution for Microsoft SQL Server Big Data Clusters running in the HPE Container Platform. Here’s the direct link for session and speaker information or search us up once you’ve registered and join the event — we’re session “D139“.

The inspiration for our work is that data growth is taking off like a rocket, and in that spirit the HPE Storage team staged our approach to the new enterprise database capability from Microsoft: SQL Server 2019 Big Data Clusters. We lifted off with an initial enterprise-grade solution for SQL Server Big Data Clusters (BDC), and laid-in a course for more features, capabilities and scale. As introduced in previous blogs, SQL Server BDC uses a new architecture that combines the SQL Server database engine, Spark and Hadoop Distributed File System (HDFS) into a unified data platform.

Microsoft SQL Server 2019 features new Big Data Clusters capability

This approach escapes the gravitational constraints of traditional relational databases, now having the ability to read, write, and process big data from traditional SQL or Spark engines, letting organizations combine and analyze high-value relational data along with high-volume big data, all within their familiar SQL Server environment. Our first stage effort includes an initial implementation guide, collateral and a number of related activities including a live demo in this year’s HPE Discover Virtual Experience.

Following soon will be ‘stage 2’ where we’ll publish technical guidance on deploying your own BDC that takes advantage of data virtualization, also known as the Polybase feature. Polybase lets you virtualize and query other data sources from within SQL Server without having to copy and convert that outside data. It eliminates the time and expense of traditional extract, transform, and load (ETL) cycles, and perhaps more importantly, lets organizations leverage existing SQL Server expertise and tools to extract the value of third-party data sources from across the organizational data estate such as NoSQL, Oracle, and HDFS, to name just a few.

The last stage of this mission will add HPE Apollo 4200 storage systems for a cost-effective storage pool, especially for larger BDC deployments in the petabytes.

Info on our overall SQL Server BDC solution is available online in the new solution brief.

Putting BDC boots on the moon

There are a number of key considerations for deploying your own SQL Server BDC. It’s going to be a very different environment than what you may be familiar with for traditional Microsoft SQL Server. Rather than a Windows environment, with or without VMs, BDC requires the use of containers and along with running on Linux, the architecture will contain a number of possibly new technologies for traditional IT teams: Kubernetes, Apache Spark, Hadoop Distributed File System (HDFS), Kibana and Grafana.

Microsoft Azure Studio showing a dashboard for a Big Data Cluster

Many companies have begun to use Kubernetes as an efficient way to deploy and scale applications. It’s often referenced as a key part of a typical Continuous Integration and Continuous Deployment (CI/CD) process. And one survey puts the number at 78% of respondents using Kubernetes in production[1]. So bringing Kubernetes to SQL Server may be a timely way to merge a couple areas of significant investment for companies: traditional RDBMS and the evolving DevOps space.

Another unique feature of this solution is Container management. Our initial technical guidance includes the use of the HPE Container Platform. The HPE Container Platform provides a multi-tenant, multi-cluster management infrastructure for Kubernetes (K8s). Creating a highly available K8s cluster is as easy as importing the hosts into the platform and defining master/worker role. In addition, it simplifies persistent access to data with the integration of Container Storage Interface (CSI) drivers.  This makes connecting with HPE storage easy, not only providing persistent volumes, but enabling access to valuable array-based resources such as encryption and data protection features like snapshots. The latest HPE CSI package supports HPE Primera storage, HPE Nimble storage and HPE 3PAR storage. 

Key components of the initial solution include:

  • Microsoft SQL Server 2019 Big Data Clusters
  • HPE ProLiant DL380 Gen10 servers
  • CentOS Linux—a community-driven, open source Linux distribution
  • HPE Nimble Storage arrays for the master instance to provide integrated persistent storage
  • HPE Container Storage Interface (CSI) driver
  • Kubernetes to automate deployment, scaling, and operations of containers across clusters of hosts
  • HPE Container Platform for the deployment and management for Kubernetes clusters (optional)
  • HPE MapR as an integrated, persistent data store (optional)

Why HPE Storage for Big Data Clusters

HPE Nimble Storage provides high availability persistent container storage for the BDC Master Instance

The partnership of Microsoft and HPE stretches back to the same time that the Hubble space telescope was launched, about 30 years ago. This heritage of testing and co-development has helped ensure optimal performance for Microsoft business software on HPE hardware. Other important reasons to chose HPE for your BDC deployment:

  • HPE developed a standards-compliant CSI driver for Kubernetes to simplify storage integration.
  • HPE developed the HPE Container platform, providing the most advanced and secure Kubernetes-compatible container platform on the market.
  • HPE owns MapR, an established leading technology for big data management — now incorporated within the HPE Data Fabric offering — and another key part of the solution that helps span data management from on-premises to the cloud
  • Finally, HPE has had in the market a complete continuum of SQL Server solutions based on HPE Storage – from departmental databases to consolidated application environments, and from storage class memory accelerated to the most mission-critical scale-up databases. Adding BDC provides yet another option – now for scale-out data lakes – to customers who rely on HPE as a trusted end-to-end solution partner.

Get started

The HPE Storage with Microsoft SQL Server Big Data Clusters solution is available today. An initial reference architecture delivers the benefits of scale-out SQL Server on HPE Nimble enterprise-class data storage with the newest container management capability using the HPE Container Platform.

The HPE Storage with Microsoft SQL Server Big Data Clusters solution is a safe, first step for your IT team, but a giant leap forward for your organization to derive the most business value from its data estate, regardless of whether its relational, unstructured, on-premises or in the cloud.

Learn more about HPE Storage solutions for Microsoft and see us live at the HPE Virtual Discover Experience.

Are you struggling to manage more data, and more types of data from across the enterprise? Start your mission to manage your entire data estate with existing SQL Server expertise.  Read the new implementation guide: How to deploy Microsoft SQL Server 2019 Big Data Clusters on Kubernetes and HPE Nimble Storage.