Insights from Deploying Microsoft Exchange at Scale on Azure Stack HCI

Microsoft Azure Stack HCI has established itself as a solid hyperconverged infrastructure offering, based on the leading operating system, Microsoft Windows Server 2019. IT staff are able to efficiently consolidate traditional workloads on this familiar platform, thanks to multiple technological features including both compute virtualization with Hyper-V as well as data storage virtualization with Storage Spaces Direct. There’s also support for the use of non-volatile memory express (NVMe) SSDs and persistent memory for caching in order to speed system performance.

However, with such dynamic technology in play at the OS layer, things get interesting when you add a sophisticated workload that also has its own intelligent performance enhancing features including storage tiering, a metacache database (MCDB), and dynamic cache. In this case we’re talking about Microsoft Exchange email, which recently introduced the new Microsoft Exchange Server 2019.

One Wall Street firm was a power user of Microsoft Exchange – with over 200,000 users, many having massive mailboxes of dozens up to 100 or more GBs in size. As part of their infrastructure planning, the customer wanted to compare the performance and cost of continuing to run Exchange on physical servers with external attached storage (JBOD), versus evolving to an Azure Stack HCI infrastructure. 

The combination of these products and technologies required complex testing and sizing that pushed the bounds of available knowledge at the time, generating learning useful for other companies who are also early in adopting various combinations of demanding enterprise workloads on top of Azure Stack HCI.

Field experts share their insight

“This customer had an interest in deploying truly enterprise-scale Exchange, and eventually the latest server version, using their HCI infrastructure,” began Gary Ketchum, Sr. System Engineer in the Storage Technology Center at HPE.  “Like vSAN or any other software-defined datacenter product, choosing the hardware is very important in order to consistently achieve your technical objectives.”

This observation especially holds true when implementing Storage Spaces Direct solutions. As stated in the Microsoft Storage Spaces direct Hardware requirements page, “Systems, components, devices, and drivers must be Windows Server Certified per the Windows Server Catalog. In addition, we recommend that servers, drives, host bus adapters, and network adapters have the Software-Defined Data Center (SDDC) Standard and/or Software-Defined Data Center (SDDC) Premium additional qualifications (AQs). There are over 1,000 components with the SDDC AQs.”

A key challenge of the implementation was in how to realize the targeted levels of improved flexibility, performance, and availability, within a much more complex stack of technologies, multiple virtualization layers, including potentially competing caching mechanisms.

Anthony Ciampa, Hybrid IT Solution Architect from HPE explains key functionality of the solution. “Storage Spaces Direct allows organizing physical disks into storage pools. The pool can easily be expanded by adding disks. The Virtual Machine VHDx volumes are created from the pool capacity providing fault tolerance, scalability, and performance. The resiliency enables continuous availability protecting against hardware problems. The types of resiliency are dependent on the number of nodes in the cluster.  The solution testing used a two-node cluster with two-way mirroring. With three or more servers it is recommended to use three-way mirroring for higher fault tolerance and increased performance.” HPE has published a technical whitepaper on Exchange Server 2019 on HPE Apollo Gen10 available today online.

Microsoft Azure Stack HCI on HPE Apollo 4200 Gen10 solution

At Microsoft Ignite 2019, HPE launched its solution for the new Microsoft HCI product, Windows Azure Stack HCI with HPE Apollo 4200 Gen10. This new software-defined hyperconverged offering, built on the high capacity yet dense Apollo storage server, delivered a new way to meet the needs of the emerging ‘Big Data HCI’ customer. A new deployment guide details solution components, installation, management and related best practices.

Exchange on Azure Stack HCI Solution Stack

The new Azure Stack HCI on HPE Apollo 4200 solution combines Microsoft Windows Server 2019 hyper-converged technology with the leading storage capacity/density data platform in its class. It serves a growing class of customers who want the benefits of a simpler on-premises infrastructure while still able to run the most demanding Windows analytics and data-centric workloads.

Findings from the field

Notes from the deployment team captured some of the top findings of this Exchange on Windows HCI testing, that will help others avoid problems as well as confidently speed these complex implementations.

  1. More memory not required – The stated guidance for Azure Stack HCI requires additional memory, specifically an SSD NVMe (cache tier) beyond JBOD physical deployment. However HPE’s Jetstress testing showed that similar performance was also possible from just JBOD. Thus the server hardware requirements are similar between Azure Stack HCI and JBOD, and even if the customer plans to deploy JBOD MCDB tier with Exchange 2019, the hardware requirements are still very similar. Note, there could be other cost factors to consider such as the cost of overhead for additional Compute and RAM within the Azure Stack HCI, as well as any other additional software licensing cost for running Azure Stack HCI.
  • Size cache ahead of data growth – The cache should be sized to accommodate the working set (the data being actively read or written at any given time) of your applications and workloads. If the active working set exceeds the size of the cache, or if the active working set drifts too quickly, read cache misses will increase and writes will need to be de-staged more aggressively, hurting overall performance.
  • More volumes the better – Volumes in Storage Spaces Direct provide resiliency to protect against hardware problems. Microsoft recommends the number of volumes is a multiple of the number of servers in your cluster. For example, if you have 4 servers, you will experience more consistent performance with 4 total volumes than with 3 or 5. However, testing showed that Jetstress provided better performance with 8 volumes per server compared to 1 or 2 volumes per server.

Where to get more info

Microsoft Azure Stack HCI on HPE Apollo 4200 Gen10 server is a new solution that addresses the growing needs of the Big Data HCI customer – those who are looking for an easy-to-deploy and affordable IT infrastructure with the right balance of capacity, density, performance, and security.  Early work with this solution, especially where it’s being combined with demanding and data intensive workloads, can create non-intuitive configuration requirements, so IT teams should seek out experienced vendors and service partners.  

A new deployment guide details solution components, installation, management and related best practices. Information in that document, along with this blog, and future sizing tools expected out from HPE, will continue to provide guidance for enterprise deployments of this new HCI offering.

The deployment guide is available online today at this link: <link to Deployment Guide>

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s