Azure Stack HCI - Part IV - Scale-out Azure Stack HCI Single Node Cluster

This is a series of blogs about my experiences with Azure Stack HCI

Azure Stack HCI - Part IV - Scale-out Azure Stack HCI Single Node Cluster

In this blog series I plan to blog about everything I know about Azure Stack HCI. So, it should be a very short blog series. Just kidding. Again, I tend to blog about subjects that I am currently working on or will be currently working on. So, Azure Stack HCI is fresh on my mind again these days.

This blog will be about scaling out a single-node Azure Stack HCI cluster up to a two-node cluster. My previous blog, Azure Stack HCI - Part III - Advance Deployment of A Single Node Cluster discussed my experiences deploying a single-node cluster. There is a lot of information about my environment, how I setup my lab, and more that would be good to know before continuing with this blog. However, I will try and include as much as I can within this blog to make sense of everything.

So this blog will cover the following:

Remember, I am building this solution on a Single Node cluster and I will try to explain how to do it not only on an AzureVM but try to explain how it would be done on physical hardware if that process is different. Since I do have plans to scale out this cluster, I will keep that in mind as I deploy this first node as well. Knowing that I will add a second and a third and a fourth node in the future.

Review Of Previous Blog

What did I do in my last blog? Good question. First, most of what I did I learned from a few others including Microsoft Learn documentation on Azure Stack HCI. My environment is deployed on an Azure VM using theHybrid Jump Start was created and being worked on by Matt McSpirit. The Hybrid Jump Start is built with Jaromir Kaspar MSLab project. Also, a lot of my scripts I am using comes from Jaromir’s Azure Stack HOLs Github repo. There is also a Slack Channel dedicated to Azure Stack HCI that I pull some information from as well.

I have an Azure VM running with Hyper-V. Within that AzureVM I have 6 VM’s. A domain controller, a Windows Admin Center Gateway VM, and 4 VM’s running the Azure Stack HCI OS. All the systems are joined to a demo domain. The only server, AzSHCI01 is configured and joined to my AzSHCI-Cluster that was deployed in my last blog. I am using Network ATC with only one intent configuration, which currently only uses 2 of the 4 virtual nics on that virtual machine. The intent also is only configured to use Compute and Management traffic. Since this is a single node, I didn’t need to configure the SMB nics, as well as a Witness.

So this blog will continue where we left off. We have a working single-node Azure Stack HCI cluster. However, we now want to scale that cluster from one-node to a two-node cluster.

What it looks like now

The existing environment currently looks like this. From within the Failover Cluster Manager we can see that our cluster is running with our single-node, and a single cluster network for management and compute.

Here we can see that our cluster has been registered to Azure and is being managed by Azure Arc.

Our single node is showing up from within the Windows Admin Center.

Our two volumes are up and healthy.

If I go into Azure Arc, I can see that my cluster is registered and connected.

As well as my single-node, is registered and connected.

So now that we are finished with my environment. It is time to scale our this single-node Azure Stack HCI cluster and add a second node.


The Variables

As I mentioned above, a good chunk of my scripts come from others. Mostly from Jaromir’s Azure Stack Hands On Lab Github repo. I have taken them re-configured them a little to work in my environment both on a virtual machine and moving forward something I can use for automation on bare metal deployments.

Since this is going to be focused on getting my 2nd node configured and then joined to my cluster I have focused these variables knowing this.

#Please fill in variable with node name: example: $ClusterNode = "AzSHCI1"
$ClusterNode = "AzSHCI2"

#Existing Cluster Name
$ClusterName = "AzSHCI-Cluster"

#Cluster-Aware-Updating role name
$CAURoleName = "AzSHCI-Cl-CAU"

#Deploy network using Network ATC? https://learn.microsoft.com/en-us/azure-stack/hci/manage/manage-network-atc?tabs=22H2
$NetATC = $True

#Perform Windows update? (for more info visit WU Scenario https://github.com/microsoft/WSLab/tree/dev/Scenarios/Windows%20Update)
$WindowsUpdate = "Recommended" #Can be "All","Recommended" or "None"

#Dell updates
#If running on physical Dell servers change this to $True
$DellUpdates = $False

#Witness type
$WitnessType = "FileShare" #or Cloud
$WitnessServer = "DC" #name of server where witness will be configured

#For Cloud  witness
    $CloudWitnessStorageAccountName="MyStorageAccountName"
    $CloudWitnessStorageKey="qi8QB/VSHHiA9lSvz1kEIEt0JxIucPL3l99nRHhkp+n1Lpabu4Ydi7Ih192A4VW42vccIgUnrXxxxxxxxxxxxx=="
    $CloudWitnessEndpoint="core.windows.net"
    #>

#Delete Storage Pool (like after reinstall there might be data left from old cluster)
$DeletePool = $True

Configuring the second node

So let the fun begin! I am ready to start configuring my second node that will be added to my single-node Azure Stack HCI cluster. The following steps is what we took to get the node configured and ready to join the cluster.

Run Azure Stack HCI Environment Checker

Here we will run the Azure Stack HCI Environment checker. I don’t think it is necessary in my virtualized environment but I will run it anyway. This tool will run some test on each of our servers that we will be joining to this Azure Stack HCI Cluster.

Install-PackageProvider -Name NuGet -Force
Install-Module -Name AzStackHci.EnvironmentChecker -Force -AllowClobber

$PSSessions = New-PSSession $ClusterNode
Invoke-AzStackHciConnectivityValidation -PsSession $PSSessions

Update Server

Now we will run Windows Update to make sure we have the most recent updates installed.

$RegistryPath = 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\'
$ComputersInfo = Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    Get-ItemProperty -Path $using:RegistryPath
}
$ComputersInfo | Select-Object PSComputerName, CurrentBuildNumber, UBR

#Update servers
if ($WindowsUpdate -eq "Recommended") {
    #Create virtual account to be able to run command without credssp
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        New-PSSessionConfigurationFile -RunAsVirtualAccount -Path $env:TEMP\VirtualAccount.pssc
        Register-PSSessionConfiguration -Name 'VirtualAccount' -Path $env:TEMP\VirtualAccount.pssc -Force
    } -ErrorAction Ignore
    #sleep a bit
    Start-Sleep 2
    # Run Windows Update via ComObject.
    Invoke-Command -ComputerName $ClusterNode -ConfigurationName 'VirtualAccount' {
        $Searcher = New-Object -ComObject Microsoft.Update.Searcher
        $SearchCriteriaAllUpdates = "IsInstalled=0 and DeploymentAction='Installation' or
                                    IsPresent=1 and DeploymentAction='Uninstallation' or
                                    IsInstalled=1 and DeploymentAction='Installation' and RebootRequired=1 or
                                    IsInstalled=0 and DeploymentAction='Uninstallation' and RebootRequired=1"
        $SearchResult = $Searcher.Search($SearchCriteriaAllUpdates).Updates
        if ($SearchResult.Count -gt 0) {
            $Session = New-Object -ComObject Microsoft.Update.Session
            $Downloader = $Session.CreateUpdateDownloader()
            $Downloader.Updates = $SearchResult
            $Downloader.Download()
            $Installer = New-Object -ComObject Microsoft.Update.Installer
            $Installer.Updates = $SearchResult
            $Result = $Installer.Install()
            $Result
        }
    }
    #remove temporary PSsession config
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Unregister-PSSessionConfiguration -Name 'VirtualAccount'
        Remove-Item -Path $env:TEMP\VirtualAccount.pssc
    }
}
elseif ($WindowsUpdate -eq "All") {
    # Update servers with all updates (including preview)
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        New-PSSessionConfigurationFile -RunAsVirtualAccount -Path $env:TEMP\VirtualAccount.pssc
        Register-PSSessionConfiguration -Name 'VirtualAccount' -Path $env:TEMP\VirtualAccount.pssc -Force
    } -ErrorAction Ignore
    #sleep a bit
    Start-Sleep 2
    # Run Windows Update via ComObject.
    Invoke-Command -ComputerName $ClusterNode -ConfigurationName 'VirtualAccount' {
        $Searcher = New-Object -ComObject Microsoft.Update.Searcher
        $SearchCriteriaAllUpdates = "IsInstalled=0 and DeploymentAction='Installation' or
                                    IsInstalled=0 and DeploymentAction='OptionalInstallation' or
                                    IsPresent=1 and DeploymentAction='Uninstallation' or
                                    IsInstalled=1 and DeploymentAction='Installation' and RebootRequired=1 or
                                    IsInstalled=0 and DeploymentAction='Uninstallation' and RebootRequired=1"
        $SearchResult = $Searcher.Search($SearchCriteriaAllUpdates).Updates
        if ($SearchResult.Count -gt 0) {
            $Session = New-Object -ComObject Microsoft.Update.Session
            $Downloader = $Session.CreateUpdateDownloader()
            $Downloader.Updates = $SearchResult
            $Downloader.Download()
            $Installer = New-Object -ComObject Microsoft.Update.Installer
            $Installer.Updates = $SearchResult
            $Result = $Installer.Install()
            $Result
        }
    }
    #remove temporary PSsession config
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Unregister-PSSessionConfiguration -Name 'VirtualAccount'
        Remove-Item -Path $env:TEMP\VirtualAccount.pssc
    }
}

Install Windows Features

Here we will install the Windows Features needed on our second node getting it ready to join the cluster. We will also run a script to make sure our management server has all the tools installed as well.

The first step is to make sure our management server is ready. I will install the needed Windows Features that will come into play during our deployment and management of Azure Stack HCI.

Install-WindowsFeature -Name RSAT-Clustering, RSAT-Clustering-Mgmt, RSAT-Clustering-PowerShell, RSAT-Hyper-V-Tools, RSAT-Feature-Tools-BitLocker-BdeAducExt, RSAT-Storage-Replica

Next we will install the Hyper-V role on our future nodes. Since I am using a virtual machine, I needed to enabled nested virtulization on my VM. The good thing is, the Hybird Jump Start solution does that for me.

#install roles and features on servers
#install Hyper-V using DISM if Install-WindowsFeature fails (if nested virtualization is not enabled install-windowsfeature fails)
Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    $Result = Install-WindowsFeature -Name "Hyper-V" -ErrorAction SilentlyContinue
    if ($result.ExitCode -eq "failed") {
        Enable-WindowsOptionalFeature -FeatureName Microsoft-Hyper-V -Online -NoRestart 
    }
}

Now we need to install all the required Windows features on the node that I will be using.

#define and install other features
$features = "Failover-Clustering", "RSAT-Clustering-PowerShell", "Hyper-V-PowerShell", "NetworkATC", "NetworkHUD", "Data-Center-Bridging", "RSAT-DataCenterBridging-LLDP-Tools", "FS-SMBBW", "System-Insights", "RSAT-System-Insights"
#optional - affects perf even if not enabled on volumes as filter driver is attached (SR,Dedup) and also Bitlocker, that affects a little bit
#$features+="Storage-Replica","RSAT-Storage-Replica","FS-Data-Deduplication","BitLocker","RSAT-Feature-Tools-BitLocker"
Invoke-Command -ComputerName $ClusterNode -ScriptBlock { Install-WindowsFeature -Name $using:features }

Restart Servers

I added a restart here. We will be restarting the servers again shortly but since this is a VM it boots fast. If this was a physical box I would wait to reboot until after the Security configurations are done.

Restart-Computer $ClusterNode -Protocol WSMan -Wait -For PowerShell -Force
Start-Sleep 20 # Allow time for reboots to complete fully
Foreach ($Server in $ClusterNode) {
    do { $Test = Test-NetConnection -ComputerName $Server -CommonTCPPort WINRM }while ($test.TcpTestSucceeded -eq $False)
}

Configure OS Settings

There are a few configurations that I will do along the way for the OS.

Configure Active Memory

Here I will configure Active Memory.

Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    Set-ItemProperty -Path HKLM:\System\CurrentControlSet\Control\CrashControl -Name CrashDumpEnabled -value 1
    Set-ItemProperty -Path HKLM:\System\CurrentControlSet\Control\CrashControl -Name FilterPages -value 1
}

Configure High Performance Plan

If this was a physical machine we would set the high performance plan. This is not needed on a VM.

Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    if ((Get-ComputerInfo).CsSystemFamily -ne "Virtual Machine") {
        powercfg /SetActive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c
    }
}

Let’s check the settings now:

Invoke-Command -ComputerName $ClusterNode -ScriptBlock { powercfg /list }

Delete Storage Pool

So in my previous blog I discussed a little about how dangerous a script like this can be. So be careful.

if ($DeletePool) {
    #Grab pools
    $StoragePools = Get-StoragePool -CimSession $ClusterNode -IsPrimordial $False -ErrorAction Ignore
    #remove pools if any
    if ($StoragePools) {
        $StoragePools | Remove-StoragePool -Confirm:0
    }
    #Reset disks (to clear spaces metadata)
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Get-PhysicalDisk -CanPool $True | Reset-PhysicalDisk
    }
}

Configure Max Envelope Size

Now we will configure our Max Envelope size. This changes the max envelope size to be 8KB in order to copy files using PSSession.

Invoke-Command -ComputerName $ClusterNode -ScriptBlock { Set-Item -Path WSMan:\localhost\MaxEnvelopeSizekb -Value 8192 }

Configure MaxTimeOut

if ((Get-CimInstance -ClassName win32_computersystem -CimSession $ClusterNode).Manufacturer -like "*Dell Inc.") {
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock { Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Services\spaceport\Parameters -Name HwTimeout -Value 0x00002710 }
}
if ((Get-CimInstance -ClassName win32_computersystem -CimSession $ClusterNode).Model -eq "Virtual Machine") {
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock { Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Services\spaceport\Parameters -Name HwTimeout -Value 0x00007530 }
}

Rename Network Adapters

I did this for my systems. It isn’t necessary but I do it.

Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    if ((Get-ComputerInfo).CsSystemFamily -eq "Virtual Machine") {
        Rename-NetAdapter -Name "Ethernet" -NewName "MGT01"
        Rename-NetAdapter -Name "Ethernet 2" -NewName "MGMT02"
        Rename-NetAdapter -Name "Ethernet 3" -NewName "SMB01"
        Rename-NetAdapter -Name "Ethernet 4" -NewName "SMB02"
    }
}

#Verify Network Adapters
Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    Get-NetAdapter
}

Configure OS Security

Time to do some security configurations. In my lab environment running on a VM I wouldn’t need to run these. Some of them I would if I knew I was keeping this environment up for training or other reasons.

 Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
    #Device Guard
    #REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v "Locked" /t REG_DWORD /d 1 /f 
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v "EnableVirtualizationBasedSecurity" /t REG_DWORD /d 1 /f
    #there s different setting for VM and Bare metal
    if ((Get-CimInstance -ClassName win32_computersystem).Model -eq "Virtual Machine") {
        REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v "RequirePlatformSecurityFeatures" /t REG_DWORD /d 1 /f
    }
    else {
        REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v "RequirePlatformSecurityFeatures" /t REG_DWORD /d 3 /f
    }
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v "RequireMicrosoftSignedBootChain" /t REG_DWORD /d 1 /f

    #Cred Guard
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\Lsa" /v "LsaCfgFlags" /t REG_DWORD /d 1 /f

    #System Guard Secure Launch (bare meta only)
    #https://docs.microsoft.com/en-us/windows/security/threat-protection/windows-defender-system-guard/system-guard-secure-launch-and-smm-protection
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\SystemGuard" /v "Enabled" /t REG_DWORD /d 1 /f

    #HVCI
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\HypervisorEnforcedCodeIntegrity" /v "Enabled" /t REG_DWORD /d 1 /f
    #REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\HypervisorEnforcedCodeIntegrity" /v "Locked" /t REG_DWORD /d 1 /f
    REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\HypervisorEnforcedCodeIntegrity" /v "HVCIMATRequired" /t REG_DWORD /d 1 /f
}

Restart Computer

Let us reboot to make sure all these changes will take and get our node ready to configure the cluster.

Restart-Computer $ClusterNode -Protocol WSMan -Wait -For PowerShell -Force
Start-Sleep 20 #Failsafe as Hyper-V needs 2 reboots and sometimes it happens, that during the first reboot the restart-computer evaluates the machine is up
#make sure computers are restarted
Foreach ($Server in $ClusterNode) {
    do { $Test = Test-NetConnection -ComputerName $Server -CommonTCPPort WINRM }while ($test.TcpTestSucceeded -eq $False)
}

Configure Networking with Network ATC

Network ATC is a tool to assist with the configuration of our Azure Stack HCI networking. Dealing with networking in the past could be complex and like me, would always be an error-prone process. Network ATC will help us with:

  • Reduce host networking deployment time, complexity, and errors
  • Deploy the latest Microsoft validated and supported best practices
  • Ensure configuration consistency across the cluster
  • Eliminate configuration drift

Install Features (If Needed)

Since I have used my management machine in the past and most likely have already had the features installed I wouldn’t necessarily need to install them again. Just in case I ran the following PowerShell script.

    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Install-WindowsFeature -Name NetworkATC, Data-Center-Bridging, RSAT-Clustering-PowerShell, RSAT-Hyper-V-Tools, FS-SMBBW
    }

Configure Management Server for ATC Management

Next we will copy over the NetworkATC modules to our management machine in order to be able to manage Network ATC from our management server.

    $session = New-PSSession -ComputerName $ClusterNode
    $items = "C:\Windows\System32\WindowsPowerShell\v1.0\Modules\NetworkATC", "C:\Windows\System32\NetworkAtc.Driver.dll", "C:\Windows\System32\Newtonsoft.Json.dll", "C:\Windows\System32\NetworkAtcFeatureStaging.dll"
    foreach ($item in $items) {
        Copy-Item -FromSession $session -Path $item -Destination $item -Recurse -Force
    }

Deploy new Intent Configurations

This section I will deploy my Storage Intent configuration. This will target the ClusterName so it will actually be running on AzsHCI01, my first node. This will get our cluster ready to have two network configurations. One for Compute/Management and then a separate switch for Storage traffic only.

So here is an example of the configuration that was deployed in my first blog. We have 4 virtual nics. We are using nic01 and nic02 for our Mgmt & Compute Team that was configured with our ConvergedIntent configuration.

What we are about to do is deploy another Intent configuration that will add a Storage Intent. The only thing that is different between my deployment and the image below is my compute and management are on one Intent configuration, which in turn is a separate virtual switch from my storage intent and virtual switch.

Here is the PowerShell that will allow me to deploy our configurations.

#if virtual environment, then skip RDMA config
    if ((Get-CimInstance -ClassName win32_computersystem -CimSession $ClusterNode).Model -eq "Virtual Machine") {
        Import-Module NetworkATC
        #virtual environment (skipping RDMA config)
        $AdapterOverride = New-NetIntentAdapterPropertyOverrides
        $AdapterOverride.NetworkDirect = 0
        Add-NetIntent -ClusterName $ClusterName -Name StorageIntent -Storage -AdapterName "SMB01", "SMB02" -AdapterPropertyOverrides $AdapterOverride -Verbose
    }
    else {
        #on real hardware you can configure RDMA
        #grab fastest adapters names (assuming that we are deploying converged intent with just Mellanox or Intel E810)
        $FastestLinkSpeed = (get-netadapter -CimSession $ClusterNode | Where-Object { $_.Status -eq "up" -and $_.HardwareInterface -eq $True }).Speed | Sort-Object -Descending | Select-Object -First 1
        #grab adapters
        $AdapterNames = (Get-NetAdapter -CimSession $ClusterNode | Where-Object { $_.Status -eq "up" -and $_.HardwareInterface -eq $True } | where-object Speed -eq $FastestLinkSpeed | Sort-Object Name).Name
        #$AdapterNames="SLOT 3 Port 1","SLOT 3 Port 2"
        Import-Module NetworkATC
        Add-NetIntent -ClusterName $ClusterName -Name StorageIntent -Storage -AdapterName "SMB01", "SMB02" -AdapterPropertyOverrides $AdapterOverride -Verbose
    }

We will add the default global intent manually since we are running this from the management server. I will be honest, I ony do this because I have seen it done with others.

    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Import-Module NetworkATC
        $overrides = New-NetIntentGlobalClusterOverrides
        #add empty intent
        Add-NetIntent -GlobalClusterOverrides $overrides
    }

    #check
    Start-Sleep 20 #let intent propagate a bit
    Write-Output "applying intent"
    do {
        $status = Invoke-Command -ComputerName $ClusterName -ScriptBlock { Get-NetIntentStatus }
        Write-Host "." -NoNewline
        Start-Sleep 5
    } while ($status.ConfigurationStatus -contains "Provisioning" -or $status.ConfigurationStatus -contains "Retrying")

Now if we go to our Failover Cluster Manager we should be able to see the three switches now. The first is our converged management network for compute and management traffic The other two are our SMB traffic on two different vlans. This may take some time. Once this is complete then I will move forward with adding this node to the cluster.


Configure Cluster

Nice, we are almost ready to join our second node to the cluster. Networking is done and all the needed OS configurations are done. The next step can be completed after the cluster is joined but I do it right before.

Configure Witness

This is where we will create our Witness for our cluster. Since before it was only a single-node cluster we did not need a witness… (Can I get a Witness!) Anyway, for this lab I will create Fileshare witness on my domain controller. This is just a lab built to write this blog so please don’t do this in any solution other than that. Either have a real file server that can be used or go with a Cloud Witness.

if ($WitnessType -eq "FileShare") {
    ##Configure Witness on WitnessServer
    #Create new directory
    $WitnessName = $Clustername + "Witness"
    Invoke-Command -ComputerName $WitnessServer -ScriptBlock { new-item -Path c:\Shares -Name $using:WitnessName -ItemType Directory -ErrorAction Ignore }
    $accounts = @()
    $accounts += "$env:userdomain\$ClusterName$"
    $accounts += "$env:userdomain\$env:USERNAME"
    #$accounts+="$env:userdomain\Domain Admins"
    New-SmbShare -Name $WitnessName -Path "c:\Shares\$WitnessName" -FullAccess $accounts -CimSession $WitnessServer
    #Set NTFS permissions
    Invoke-Command -ComputerName $WitnessServer -ScriptBlock { (Get-SmbShare $using:WitnessName).PresetPathAcl | Set-Acl }
    #Set Quorum
    Set-ClusterQuorum -Cluster $ClusterName -FileShareWitness "\\$WitnessServer\$WitnessName"
}
elseif ($WitnessType -eq $Cloud) {
    Set-ClusterQuorum -Cluster $ClusterName -CloudWitness -AccountName $CloudWitnessStorageAccountName -AccessKey $CloudWitnessStorageKey -Endpoint $CloudWitnessEndpoint
}

Add Node To Cluster

Now we can add the second node to the cluster. Very simple, the following PowerShell command is below. We could also do this from the Windows Admin Center as well.

Add-ClusterNode -Name $ClusterNode -Cluster $ClusterName

And after about 5 minutes we can start verifying that the cluster was added successfully. I am going to check first in the failover cluster manager. Here I can see my 2nd node is up and running.

Next I will go into the Windows Admin Center to check on my cluster settings and see that the node has joined successfully. As you can see below, I do have an issue but it is expected. It will take a few minutes for my newly added node to get registered with Azure Arc-enabled servers. This will go green within the next 15 minutes or so.

Then from within the Cluster I can click on Servers and see that both servers are up and running.

Here we can check that our Network intent configurations have been provisioned successfully across all our nodes.

Check Network ATC settings

We can also check Failover Cluster manager to make sure that all our cluster networks are correctly configured. This make take some time so be patient.

One of the last configurations for our cluster would be to disabled CSV cache if using SCM storage or if this is running on a virtual machine.

if (Get-PhysicalDisk -cimsession $ClusterNode | Where-Object bustype -eq SCM) {
    #disable CSV cache if SCM storage is used
        (Get-Cluster $ClusterName).BlockCacheSize = 0
}
elseif ((Invoke-Command -ComputerName $ClusterNode -ScriptBlock { (get-wmiobject win32_computersystem).Model }) -eq "Virtual Machine") {
    #disable CSV cache for virtual environments
        (Get-Cluster $ClusterName).BlockCacheSize = 0
}

We are not done yet. There are some more network settings that have to be configured, storage settings, and we also need to enabled cluster aware updating and a few other things.

Install Network HUD (NetATC)

Here we will install Network HUd on the second node. This was already installed on the first node previously.

Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Install-WindowsFeature -Name "NetworkHUD", "Hyper-V", "Hyper-V-PowerShell", "Data-Center-Bridging", "RSAT-DataCenterBridging-LLDP-Tools", "NetworkATC", "Failover-Clustering"

$Modules = "Test-NetStack", "az.stackhci.networkhud"
    Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
    foreach ($Module in $Modules) {
        #download module to management node
        Save-Module -Name $Module -Path $env:Userprofile\downloads\
        #copy it to servers
        foreach ($Server in $ClusterNode) {
            Copy-Item -Path "$env:Userprofile\downloads\$module" -Destination "\\$Server\C$\Program Files\WindowsPowerShell\Modules\" -Recurse -Force
        }
    }

    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Restart-Service NetworkHUD
    }
    #wait a bit
    Start-Sleep 10

    #check event logs
    $events = Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Get-WinEvent -FilterHashtable @{"ProviderName" = "Microsoft-Windows-Networking-NetworkHUD"; Id = 105 }
    }
    $events | Format-Table -AutoSize

Complete NetATC Configurations

The finally steps to finish our network configuration using Network ATC. Here we will install and configure Data Center Bridging if needed. Then go through all the settings to see if they are configured correctly.

This step we will disable network adapters that we will not be using. Since this was a virtual environment and not physical I didn’t have any network adapters that needed to be disabled.

 Get-Netadapter -CimSession $ClusterNode | Where-Object Status -ne "Up" | Disable-NetAdapter -Confirm:0

Here is where I will configure Data center bridging.

    if ((Get-CimInstance -ClassName win32_computersystem -CimSession $ClusterNode).Manufacturer -like "*Dell Inc.") {
        if (Get-NetAdapter -CimSession $ClusterNode -InterfaceDescription Mellanox*) {
            Set-NetAdapterAdvancedProperty -CimSession $ClusterNode -InterfaceDescription Mellanox* -DisplayName 'Dcbxmode' -DisplayValue 'Host in charge'
        }
    }
    #configure larger receive buffers on Mellanox adapters
    if ((Get-CimInstance -ClassName win32_computersystem -CimSession $ClusterNode).Manufacturer -like "*Dell Inc.") {
        if (Get-NetAdapter -CimSession $ClusterNode -InterfaceDescription Mellanox*) {
            Set-NetAdapterAdvancedProperty -CimSession $ClusterNode -InterfaceDescription Mellanox* -DisplayName 'Receive buffers' -DisplayValue '4096'
        }
    }

Check NetATC Configurations

Live Migration settings. This is where we would see what networks were exuded from Live Migration.

    $Networks = (Get-ClusterResourceType -Cluster $clustername -Name "Virtual Machine" | Get-ClusterParameter -Name MigrationExcludeNetworks).Value -split ";"
    foreach ($Network in $Networks) { Get-ClusterNetwork -Cluster $ClusterName | Where-Object ID -Match $Network }

I do want to call out something here that I noticed is happening. For some reason my Cluster Network is being renamed during the process I am following. I am going to find out why or how or what is doing this.

Check Live Migration options. Make sure it is configured to SMB.

Get-VMHost -CimSession $ClusterNode | Select-Object *Migration*

Check SMB Bandwidth settings limit.

 Get-Cluster -Name $ClusterName | Select-Object *SMB*
Get-SmbBandwidthLimit -CimSession $ClusterNode

Check VLAN settings

Get-VMNetworkAdapterIsolation -CimSession $ClusterNode -ManagementOS

CHeck number of live migrations

get-vmhost -CimSession $ClusterNode | Select-Object Name, MaximumVirtualMachineMigrations

Check Cluster Overrides in Global Overrides

    Invoke-Command -ComputerName $ClusterName -ScriptBlock {
        Import-Module NetworkATC
        $GlobalOverrides = Get-Netintent -GlobalOverrides
        $GlobalOverrides.ClusterOverride
    }

Enable Cluster-Aware-Updating

Now we can enable cluster-aware-updating. This can be done again in the Windows Admin Center but I will do it via PowerShell.

if ($CAURoleName) {
    #Install required features on nodes.
    Invoke-Command -ComputerName $ClusterNode -ScriptBlock {
        Install-WindowsFeature -Name RSAT-Clustering-PowerShell
    }
    #add role
    Add-CauClusterRole -ClusterName $ClusterName -MaxFailedNodes 0 -RequireAllNodesOnline -EnableFirewallRules -GroupName $CAURoleName -VirtualComputerObjectName $CAURoleName -Force -CauPluginName Microsoft.WindowsUpdatePlugin -MaxRetriesPerNode 3 -CauPluginArguments @{ 'IncludeRecommendedUpdates' = 'False' } -StartDate "3/2/2017 3:00:00 AM" -DaysOfWeek 4 -WeeksOfMonth @(3) -verbose
    #disable self-updating
    Disable-CauClusterRole -ClusterName $ClusterName -Force
}
if ($KSR) {
    #list cluster parameters - as you can see, CauEnableSoftReboot does not exist
    Get-Cluster -Name $ClusterName | Get-ClusterParameter
    #let's create the value and validate
    Get-Cluster -Name $ClusterName | Set-ClusterParameter -Name CauEnableSoftReboot -Value 1 -Create
    Get-Cluster -Name $ClusterName | Get-ClusterParameter -Name CauEnableSoftReboot
    #to delete it again you can run following command
    #Get-Cluster -Name $ClusterName | Set-ClusterParameter -Name CauEnableSoftReboot -Delete
}


Configure Cluster Storage

Storage Fault Domain Changes

#configure storage pool
Set-StoragePool -CimSession $ClusterName -FriendlyName "S2D on $ClusterName" -FaultDomainAwarenessDefault StorageScaleUnit
Get-VirtualDisk -CimSession $ClusterName | Select-Object FriendlyName, FaultDomainAwareness
Invoke-Command -ComputerName $clusterName -ScriptBlock { Stop-ClusterPerformanceHistory -DeleteHistory }
#recreate performance history
Start-ClusterPerformanceHistory -CimSession $ClusterName
Get-VirtualDisk -CimSession $ClusterName | Select-Object FriendlyName, FaultDomainAwareness

Create Volumes

Get-StoragePool -FriendlyName "S2D on $ClusterName" -CimSession $ClusterName | Set-StoragePool -ProvisioningTypeDefault Thin
#create 1TB volume on each node
foreach ($Server in $ClusterNode) {
    New-Volume -StoragePoolFriendlyName  "S2D on $ClusterName" -FriendlyName $Server -Size 1TB -CimSession $ClusterName
}

Align Volumes

#align volumes ownership to with servers
foreach ($Server in $ClusterNode) {
    Move-ClusterSharedVolume -Name "Cluster Virtual Disk ($Server)" -Node $Server -Cluster $ClusterName
}

Verify and Testify?

So, at this time I have a fully functional two-node Azure Stack HCI cluster ready to have workloads deployed to it. At this point we can go through and verify everything is running. That both nodes are connected, that our networks are up and running, and that all our small configurations have been applied. We can verify that our nodes show up in Azure as Arc-enabled servers, and that our Azure Stack HCI cluster is also being managed and is registered with Azure. We should verify our storage is working and that all our volumes are good for workload placements. I will even try to do an update via the Update manager from within the Windows Admin Center to make sure everything there is working as well.

There are handful of other settings that can be configured. For instance, configuring fault domains for the new node, configuring the default location for VM’s on our volumes, etc. But for now, this is where the blog shall end.


Final Thoughts

So, I do want to stress. This deployment I just walked through is not production ready. It will be good for getting your feet wet with Azure Stack HCI. For most of use we don’t have a lot of hardware that would be supported for HCI sitting around. We may have a few older servers that we can use, even deploy this same solution on that single physical server to have a multi-node solution, or use an AzureVM which can be very expensive. This blog I hope helped someone get an idea of that they would need to do to get started. Moving forward, I will blog about scaling out this single node cluster to a two node cluster and then to a three and four node cluster after that.

I also didn’t cover how you would cable a two node cluster. THere are a few different ways that it can be done. We could go switchless for compute and storage or introduce a top of the rack switch.

Also, this blog just touches on the deployment of an Azure Stack HCI cluster. There is so much more and we can really go into depth of what Azure Stack HCI can do along with Azure Arc. I will be blogging a lot on these subjects moving forward.