Quite a while ago I had a customer encounter issues with their AAD Connect primary node (they had a staging server, don’t worry) however the server wasn’t actually down, but it was located in a datacenter that had lost Internet connectivity.
As a result, the AAD Connect server wasn’t updating Azure.
Worried about not staying current with changes in AD getting replicated to Azure, my customer wanted to turn off staging mode on their other AAD Connect server, but this begged the question…. “How can we prevent the newly promoted sync server from exporting to Azure once the old server comes back online? Especially if that happens in the middle of the night…”
Well, there’s no good answer. At least there wasn’t, here’s some of the things I kicked around:
- Don’t turn off staging on the secondary server, and hope the outage was short-lived. This, however, pretty much runs contrary to the whole purpose of the staging server.
- Disable the service account used for the offline server’s AD connector, and create a whole new (working) one in AD for the staging server, then disable the Sync_ServerName_GUID account in Azure for the offline server. This would ensure that the dead server, when resurrected, couldn’t read or write to / from AD and Azure… not very elegant, confusing, and quite frankly a kluge that advertises the fact that there’s no good answer.
- Disable staging on the secondary server, and in the event that the primary was to come online and both were exporting to Azure, hope that someone caught it quickly. If the servers are identically configured, this is probably a non-issue, even if unsupported. That said, if a datacenter was down for a long period of time, then the DC in that location that AAD Connect was using would not be current, and there’s a likelihood that there could be issues with DC replication etc… If things go sideways, it’s an unsupported configuration and there would be unhappy campers.
Obviously none of these options worked well, nor were they automatic, so this got me to thinking about file-share-witness functionality in AD file cluster servers and I wondered if I could do the same thing with the AAD Connect scheduler, substituting a generic account in Azure for the file-share-witness.
Many hours, and a lot of testing later, I built the AADC Load Balancing PowerShell module…
How it works
In a nutshell, the AADC Load Balancer module contains several cmdlets that connect to Azure and read and update the value of an attribute from a generic cloud account. The attribute contains the name of the “preferred” AAD Connect server (the one that’s usually active), a timestamp set by the active server with the last time exports were started, and the currently “active” server’s name (since it might not be the preferred server if there was a fail-over, but normally they’re the same server).
The module then compares the name of the server that it’s running on against the active and preferred nodes, and evaluates the timestamp to determine if sync has been offline for a predetermined length of time. If sync is within acceptable parameters, then the timestamp is updated and nothing changes. However if the timestamp is indicative of a failure (x hours since last sync – a value you define) then the current server updates the cloud account, naming itself the “active” node (assuming the active node went down) and updating the timestamp.
When a failover occurs, the staging server takes over as active and exports to Azure until the old active server returns to life, at which point it will check the cloud, and seeing that the “staging” server is active, it will update the timestamp, take over exports, and the staging server will return to staging mode.
That description made perfect sense to me, but let’s see that whole process visually…
Normal operation – Node A is the primary active node, Node B is the staging server
- Node A performs Imports and Syncs of AD and Azure
- Node A reads the file share witness account values
- Node A is the preferred node and the active node
- Node A updates the timestamp
- Node A exports to AD and Azure
- Node B performs Imports and Syncs of AD and Azure
- Node B reads the file share witness account values
- Node B is NOT the preferred node
- Node B checks the timestamp, it’s been less than 3 hours since it was updated
- Node B does NOT update the file share witness account values
- Node B does NOT export to AD or Azure
Outage Occurs – Node A was the primary active node, Node B is the staging server, but Node A is offline
- Node A is down
- Node B performs Imports and Syncs of AD and Azure
- Node B reads the file share witness account values
- Node B is NOT the preferred node
- Node B checks the timestamp, it’s been more than 3 hours since it was updated, meaning Node A is offline
- Node B updates the file share witness account values, the preferred node value remains Node A, but Active Node is set to Node B
- Node B exports to AD or Azure
Service Restored – Node A was offline, Node B is the active node, but Node A was restored
- Node A performs Imports and Syncs of AD and Azure
- Node A reads the file share witness account values
- Node A is the preferred node but NOT the active node
- Node A updates the timestamp and Active Node values but does NOT export this cycle, just in case Node B is mid-export
- Node B performs Imports and Syncs of AD and Azure
- Node B reads the file share witness account values
- Node B is NOT the preferred node nor the Active node
- Node B checks the timestamp, it’s been less than 3 hours since it was updated
- Node B does NOT update the file share witness account values
- Node B does NOT export to AD or Azure
- Node A performs Imports and Syncs of AD and Azure
- Node A reads the file share witness account values
- Node A is the preferred node and the active node
- Node A updates the timestamp
- Node A exports to AD and Azure
A few more quick notes about the “file share witness” account in Azure.
- The file-share-witness account is named Sync_AADConnectLoadBalancer_DoNotDelete
- It is created automatically by the scheduled task script designed to use this module
- It is a cloud-only account, created strictly for the load balancer by the scheduled task script designed to use this module
- The password for the account is not exposed, and is updated with a new random password every 21 days by the scheduled task script designed to use this module.
- The account is a user account only, with no special roles or permissions.
- Only the Department field is updated on the account with the preferred node, timestamp and active node values.
Using the AAD Connect Load Balancer Module
It’s important to note that the AAD Connect Load Balancer module is intended for use with a PowerShell script running as a scheduled task designed specifically to use the module.
That script can be found on the TechNet Gallery here, and the blog post which explains it’s use is located here.
The AAD Connect Load Balancer module provides the following cmdlets:
Get-LoadBalancerAccountID – Retrieves the ObjectID of the Load Balancer (file share witness) account from Azure
Get-ActiveAADCNode – Retrieves the name of the current Active AAD Connect server from the Load Balancer account.
Get-PreferredNode – Retrieves the name of the Preferred AAD Connect server from the Load Balancer account
Get-LastSyncTime – Retrieves a UTC formatted version of the last sync time performed by the Active node from the Load Balancer account.
Convert-StringToDate – converts the UTC formatted string value returned by the Get-LastSyncTime cmdlet into a friendly date. A -Format value of “yyyyMMddTHHmmss” should be used
The cmdlets described above are the primary ones provided in the AADC Load Balancing module, there are additional cmdlets in the module that are used by the Load Balancing scheduled task, these cmdlets are fully documented and can be reviewed by using PowerShell help if you require further detail.
Wrapping it all up….
The detail found above outlines the theory and background behind the AAD Connect Load Balancing module, in order to fully implement the process you will need to ensure that you have the following:
- The Credential Vault Module – this module is required so that the account which connects to Azure to read and write load balancer settings is secured using a certificate. The module blog can be found here.
- The Load Balancing Module – this module actually updates the load balancer settings. Without this module, the scheduler script will behave like the normal AAD Connect built-in scheduler. The module download can be found here.
- The AAD Connect Load Balancer-Aware scheduler script – this script can run whether or not the Load Balancer module is installed. If not, it behaves like the AAD Connect built-in scheduler, but with several added features. The script can be found here and it’s associated blog post here.