With many Citrix Virtual Apps and Desktops, there will often be multiple virtual machines running the Citrix Virtual Delivery Agent (VDA) and streaming from the Citrix Provisioning Service. Throughout the course of your typical business affairs, incidents may arise that result in one or more of these machines being in an unhealthy state. It is not unusual to have one machine fail for any number of reasons, but sometimes larger issues can result in production outages. These risks to your daily operations can be mitigated by performing morning health checks but may also be time-consuming and further increase chances of human error when done improperly. The optimal solution is to schedule and automate these checks to save time and preserve server functionality.
Citrix PVS utilizes virtual disks known as vDisks that stream to target devices or VM's. Due to the non-persistent nature of Citrix PVS, machines are regularly rebooted to ensure that write cache is cleared. Resetting to a known-stable state from the gold image. Often times, these target devices are rebooted daily to ensure consistent stability.
Additional factors could also lead to availability issues with the VDAs. For example, updates to the Citrix Hypervisor could lead to VMs that don't register correctly resulting in an “Unknown” power state. Preventing VDAs from registering to Delivery Controllers. Ultimately leading to an outage since Citrix sessions cannot be started on these VMs.
Despite the obvious benefits, checking the machine uptime, power and registration state, maintenance mode status, and more for a production environment is not an efficient process. Even the most attentive among us will miss something eventually. As such, below is a script that automates this rather painful sequence of checks. The results are compiled into a report and emailed for review each morning.
Let's ease the burden of daily manual health checks. Citrix Virtual Apps and Desktops has a Software Development Kit (SDK) that can be leveraged to write a PowerShell script to perform many necessary checks. You can run this script from any machine with the SDK installed, which is default for the Citrix Deliver Controllers in the environment.
Below is an example output of the script. In this example all checks are healthy for the 10 machines being checked.
Setup
1. Add the correct snap-in for Citrix. This will allow the correct commands to be run in the Citrix environment
Add-PSSnapin Citrix*
2. Create a list of all the machines that need to be checked.
$Machines = Get-BrokerMachine -MaxRecordCount 200| ? {($_.DesktopGroupName -notlike "*-DR*")}| Select HostedMachineName, RegistrationState, PowerState, InMaintenanceMode, LastDeregistrationTime, LoadIndex | Sort HostedMachineName
The $Machines variable will now be a collection of machines in the environment. The “-MaxRecordCount 300” will pull up to 300 machines in the environment. MaxRecordCount is not necessary for any environment with up to 250 machines as this is the default maximum. If this needs to be exceeded, one can include the MaxRecordCount flag with number to accommodate the number of VDAs being queried.
In this example, there are machines that are filtered out of the query (those with “DR” in the name). These machines are in a powered off state unless needed. Checking them would result in a false reading that many machines in the environment are down, so they are removed accordingly. The asterisk is a wildcard character, meaning anything named with “-DR” will be removed.
The information to be gathered leverages the Select statement specifically filtering in names, registration state, power state, maintenance mode state, the last time it deregistered, and the load index of the chosen machines in the environment. These values are used to tell if the machine is healthy or not.
3. Instantiate variables and arrays for the logic checks that will be done next.
The count of the machines in certain states starts at 0. The arrays are used for recording the name of all the machines that meet certain criteria. These are the ones that are possibly in an unhealthy state and need may to be addressed.
$OnCount = 0
$RegisteredCount = 0
$MaintCountFalse = 0
$MaintCountTrue = 0
$RestartCount = 0
$offArray = New-Object System.Collections.Generic.List[System.Object]
$unknownArray = New-Object System.Collections.Generic.List[System.Object]
$OtherThanRegisteredArray = New-Object System.Collections.Generic.List[System.Object]
$noRestartArray = New-Object System.Collections.Generic.List[System.Object]
$HighLoadArray = New-Object System.Collections.Generic.List[System.Object]
$DayUptimeArray = New-Object System.Collections.Generic.List[System.Object]
4. Instantiate the expected restart time for the machines.
In the example environment there are machines that supply seamless applications to users and machines that stream Desktops to users. The machines are scheduled to reboot at separate times every night. Here the values are displayed as a range of one hour. This is used to make sure that the machines reboot in that window.
#Seemles Apps vDisks Restart Window
$SeemlesAppsEarly = Get-Date -hour 01 -Minute 00 -Second 00
$SeemlesAppsLate = Get-Date -hour 02 -Minute 00 -Second 00
#Desktop Restart Window
$DesktopRestartEarly = (Get-Date -hour 23 -Minute 00 -Second 00).AddDays(-1)
$DesktopRestartLate = (Get-Date -hour 23 -Minute 59 -Second 59).AddDays(-1)
5. Create a variable with today’s date for reporting purposes and logic checks.
Additionally, we get a set time of 24 hours before this script is run. This allows us to make sure no machine has gone more than 24 hours without rebooting
$Today = get-date
$TodayNoTime = get-date -Format "MM/dd/yyyy"
$TodayMinus1 = (Get-Date).AddHours(-23).AddMinutes(-59)
6. Prepare the color scheme for the reporting
The output is sent in HTML format with a color-coding.
Green -> Healthy
Yellow -> Warning
Red -> Critical
We start with all the checks assuming green and when we find an issue, we will change the color accordingly.
#The three main colors used in this Health check Green -> healthy, Yellow -> Warning, Red -> Critical
$GreenHTML ="#24ff24"
$YellowHTML = "#ffff24"
$RedHTML = "#ff2424"
#These colors are the background for the values in the Table. Default to green unless there is something to alert about
$OnPowerColor = $GreenHTML
$OffPowerColor = $GreenHTML
$UnknownPowerColor = $GreenHTML
$RegisteredColor = $GreenHTML
$UnregisteredColor = $GreenHTML
$OffMaintColor = $GreenHTML
$OnMaintColor = $GreenHTML
$restartColor = $GreenHTML
$NoRestartColor = $GreenHTML
$highLoadColor = $GreenHTML
$DayUptimeColor = $GreenHTML
Logic checks
Now the script gets to the logic checks that are done to see if there are any issues in the environment. The script steps through each machine in the Collection we retrieved earlier ($Machines). Every time an error is found it is added to the corresponding array and the color coding for the HTML output is changed.
7. Create a foreach loop to ensure that we check each machine in the collection.
foreach($Machine in $Machines){
8. Check the Registration State of the machine.
When performing upgrades and updates to the hypervisor, Citrix infrastructure, or vDisks some VMs can fail to communicate to the Delivery Controllers, resulting in an Unregistered state. While in this state, those machines are unable to accommodate user sessions.
If the machine is indeed registered, then a count variable for registered machines is incremented.
The machine that is found to be in an unregistered state is added to the array for machines with registration issues. An unregistered state could cause a large problem if widespread, the HTML color is changed to red for the unregistered check.
if($Machine.RegistrationState -like "Registered"){
$RegisteredCount++
}
else{
$OtherThanRegisteredArray.add($machine.HostedMachineName)
$UnregisteredColor = $RedHTML
$RegisteredColor = $YellowHTML
}
9. Check the power state for the machine.
Availability of the VDAs could also present issues in an environment if machines are not powered on and ready for users by the start of the business day.
If the machine is showing “On,” then the count of “On” machines is incremented
If the machine is not “On” then we check to see if it is in an “Off” state. If “Off,” then we record this machine’s name in the “Off” array and change the power state colors for HTML Output.
If the machine is not in an “On” or “Off” power state, then it is in some other power state. This is also an issue, and the entry is added to the Unknown array and the colors are changed to red and yellow.
The script will differentiate between “off” and “unknown” power states. Machines that are off may be left intentionally off or need to be started. If the power state is unknown, it means the issue is probably with the box, and it needs to be rebooted.
if($Machine.PowerState -like "on"){
$OnCount++
}
elseif($Machine.PowerState -like "off"){
$offArray.add($machine.HostedMachineName)
$OffPowerColor = $RedHTML
$OnPowerColor = $YellowHTML
}
else{
$unknownArray.add($machine.HostedMachineName)
$OnPowerColor = $YellowHTML
$UnknownPowerColor = $RedHTML
}
10. Record the number of machines in maintenance mode.
Being in maintenance mode is not usually a problem but having too many machines in maintenance mode can result in availability issues within an environment.
When upgrades are done to the Citrix environment, machines are sometimes put in maintenance mode to ensure no user connections are established on specific VDAs. The task to remove them from maintenance mode can be easily forgotten and cause a Production capacity issue at peak hours the next day when users sign in.
Since maintenance mode may be intentional and not an immediate issue, a separate counter of these machines is kept. If too many machines are in maintenance mode, a latter part of the script will change the color to reflect this as a warning.
if($Machine.InMaintenanceMode -like "True"){
$MaintCountTrue++
}
else{
$MaintCountFalse++
}
11. Ensure that the machine has rebooted in the last 24 hours
If the machine has been on for more than 24 hours, the color for this check is changed to red and the machine is added to the DayUptimeArray.
Note: LastDeregistrationTime can result in false negatives that the machine deregistered but never actually restarted. The next check (#12) ensures that the deregistration event happened at the right time during the scheduled reboot window of the servers. With those two checks, it is unlikely to have a false negative.
if($Machine.LastDeregistrationTime -lt $TodayMinus1){
$DayUptimeArray.add($machine.HostedMachineName)
$DayUptimeColor = $RedHTML
}
12. Ensure that the Citrix Virtual Apps machines are restarted within the scheduled maintenance window.
As an example, machines may be scheduled to reboot every day but at different times according to their type (i.e., servers hosting seamless applications versus servers delivering an entire server desktop). To make sure that the machines from each silo were rebooted at the correct time, the naming-convention of the server can be used to identify the server type. Finally, the last deregistration time of those servers can be compared to the acceptable restart window to determine appropriate timing of the deregistration event.
A wildcard “*” can be used in the query of the naming-convention to ensure that the machine is in the correct group. Next, a CompareTime function is created that takes 3-time parameters and reports if the first input is between the other two (this will be explained in greater detail later).
Once the last deregistration event time is captured in the LastDeregisterTime variable, it is compared to the start and end times of the maintenance window. If the machine’s deregistration event occurred inside that window, then it is assumed that this is due to the scheduled reboot. This returns true and the RestartCount variable is incremented by one. If the check fails, it means this specific machine likely did not restart in the correct window and there might be an issue. The color for reporting is changed to yellow for further investigation and this machine is added to the noRestartArray variable.
if(($machine.HostedMachineName -like "*-Desktop*") -and (CompareTime($Machine.LastDeregistrationTime, $DesktopRestartEarly, $DesktopRestartLate) )){
$RestartCount += 1
}
elseif(-not($machine.HostedMachineName -like "*-Seamless*") -and (CompareTime($Machine.LastDeregistrationTime, $SeamlesAppsEarly, $SeamlesAppsLate) )){
$RestartCount += 1
}
else{
$noRestartArray.add($machine.HostedMachineName)
$restartColor = $YellowHTML
$NoRestartColor = $YellowHTML
}
13. Check the load on the machines
When this script runs, there are usually few active users in the environment and the machines should not be taxed. If a machines load index, determined by calculating specific metrics such as CPU memory and overall user session count is at 75%, is too high it may cause an issue. The color is changed to yellow for further investigation in the HTML output and the machine is added to the HighLoadArray.
if($Machine.LoadIndex -gt 7500){
#7500 is 75% load
$HighLoadArray.add($machine.HostedMachineName)
$highLoadColor = $YellowHTML
}
14. Get a count of machines in each unhealthy array and ensure not too many machines are in maintenance mode.
In the next step, the script will get the count of the machines in each of the arrays. This gives the count of machines that are in undesired states. Additionally, a predetermined threshold of machines in maintenance mode is set and checked to see if that threshold has been reached (three in this example). If so, the color is changed to reflect the discrepancy.
$offcount = $offArray.Count
$unknowncount = $unknownArray.Count
$UnregisteredCount = $OtherThanRegisteredArray.count
$NonRestartCount = $noRestartArray.Count
$HighLoadCount = $HighLoadArray.Count
$DayUptimeCount = $DayUptimeArray.count
#this value is the max allowed in maint mode before the color turns to yellow
$MaxMaintAllowed= 3
if($MaintCountTrue -gt $MaxMaintAllowed){
$OnMaintColor = $YellowHTML
$OffMaintColor = $YellowHTML
}
HTML Output
Next, the results are output from the script to HTML and emailed to the correct recipients. This part was created with the goal of allowing more checks to be added later with no HTML changes needed. The use of functions standardizes and handle the output of the HTML report, and this is where most of those functions in the script are defined.
15. Create an array that contains all the information needed for each check to be printed to HMTL (one check per entry). An Array of Arrays is created to store the values for each check. Each entry is in the form:
Index 0 (Identifier) is not shown to the user in the output. It is just for the HTML formatting.
Index 1 (Color) is used to color the number output for each cell in the table based on previous outputs.
Index 2 (Table Header) is the text that will be output in the first row of the table for each column.
Index 3 (Count) is the number of machines that were found for each check.
$ArrayOfKeys = @(
("OnPower", $OnPowerColor, "Powerstate On", $OnCount),
("OffPower", $OffPowerColor, "Powerstate Off", $offcount),
("UnknownPower", $UnknownPowerColor, "Powerstate Unkown", $unknowncount),
("Registered", $RegisteredColor, "Registered", $RegisteredCount),
("Unregistered", $UnregisteredColor, "Unregistered", $UnregisteredCount),
("OffMaint", $OffMaintColor, "Maintenance Mode Off", $MaintCountFalse),
("OnMaint", $OnMaintColor, "Maintenance Mode On", $MaintCountTrue),
("restart", $restartColor, "Restarted Successfully", $RestartCount),
("noRestart", $NoRestartColor, "Restart Error", $NonRestartCount),
("DayUptime", $DayUptimeColor, "1day+ Uptime", $DayUptimeCount ),
("highLoad", $highLoadColor, "Machines at 75% Load", $HighLoadCount )
)
16. Start with the HTML headers.
First, a variable that names the Environment to use in the output is created. Then the $EmailString Variable is created to define the parameters of the email. In this variable, the entirety of the output for the script is included, starting with the basic HTML Header information, environment, and date.
$environment = "PROD:"
$EmailString ="
<head>
<h1>Citrix Health Check</h1>
<h3>$environment $Today</h3></head>
17. Add the body of the HTML output using functions
In the table, three functions are used to correctly output the data from the $ArrayOfKeys array.
The first function call is for the styling of the HTML table (we will discuss these functions later in this blog). There are a few more HTML headers before calling the other two functions for the first and second rows of the table respectively.
$(addTableStyle($ArrayOfKeys))
<table>
<tr id=`"ROW0`">
<th style=`"height:45;width:100;text-align:center;font-size:13px;`"></th>
$(addTH($ArrayOfKeys))
</tr>
<tr id=`"ROW1`">
<th style=`"height:25;text-align:center;font-size:13px;`">Count</th>
$(addTD($ArrayOfKeys))
</table>
18. For each list of unhealthy machines, add them to the HTML output using the OutputList function.
This allows whoever is viewing the script to see the name of all the machines in that state. OutputList is another function that’s made for this output part of the script.
The format of the two inputs is: (Title, Array of Machines in that state)
$(OutputList("OFF POWERSTATE", $offArray))
$(OutputList("UNKNOWN POWERSTATE",$unknownArray ))
$(OutputList("NOT REGISTERED", $OtherThanRegisteredArray))
$(OutputList("RESTART ERROR", $noRestartArray))
$(OutputList("1Day+ Uptime", $DayUptimeArray))
HTML Output
Lastly, the script sends the email to the users who need to receive the health check.
19. Use the Send-MailMessage function to send the email. It is required to pass in information such as Sender, To, and Subject so that the email can be correctly formatted.
The SmtpServer is what will be responsible for sending the email, and the BodyAsHTML is the HTML output that was made earlier.
$Sender = "ScriptAccount@Email.com"
$To = "User@Email.com"
$Subject = "$environment Morning Health Check - $TodayNoTime"
$SmtpServer = "smtp.server.lab"
$BodyAsHtml = $EmailString
Send-MailMessage -From $Sender -Subject $Subject -To $To-BodyAsHtml $BodyAsHtml-SmtpServer $SmtpServer -UseSSL
The script is done except for covering the function calls.
Functions
First, the CompareTime function takes an array of three times and sees if the first entry is between the second and third.
function CompareTime($TestTime){
return (($TestTime[0] -gt $TestTime[1]) -and ($TestTime[0] -lt $TestTime[2]))
}
Next, the addTableStyle function creates the table style header for the html output; it is all returned as a string.
The use of <style> HTML tags are used to surround the strings for formatting purposes. Then, the array that was passed into this function is looped through to add one line per entry.
The array that is passed to this function is an array of arrays and each element of $array is processed. These elements are arrays that have 4 elements each. The array that is passed in is used for multiple different functions, which means not all the parameters are used in this function. In this method, only the first and second entry is needed. (Index 0: Identifier, Index 1: Color)
For each entry in $Array, one line will be added to the HTML output.
function addTableStyle($array){
$returnString = "<style type=`"text/css`">`ntable, th, td { border: 1px solid black;}"
foreach($element in $array){
$tempString1 = $element[0]
$tempString2 = $element[1]
$returnString = $returnString + "`ntable td#$tempString1 {border: 1px solid black; background-color:$tempString2; color:black;}"
}
$returnString = $returnString + "`n</style>"
return $returnString
}
The next function is addTH. This is for adding the Column headers for each check.
The array that is passed into the function is looped through and the third element is taken (Index 2: Table Header) which adds the correct HTML formatting information. All the entries are returned in one large string which is one HTML line per entry.
function addTH($array){
$returnString =""
foreach($element in $array){
$tempString = $element[2]
$returnString = $returnString + "`n<th style=`"height:45;width:100;text-align:center;font-size:13px;`">$tempString</th>"
}
return $returnString
}
Next, the addTD function is for printing out the correct value for each array element.
This function takes the first and fourth elements (index 0: Identifier, Index 3: Count). The first element is used to relate to the style section of the HTML output so that it is colored correctly. The fourth element is the number of machines in that state.
The final two functions are for outputting a list of the machines with certain states. (Off power state for example)
The first function, OutputList, requires two parameters: A title for the list and an array of machines to print. Note: the array can be empty if no machines are in that state.
The second function, OutputListBody, is called from the first function. It uses recursion to loop through the array and append each value with the proper HTML formatting to a string that is returned to the calling function. If the array is empty, it returns the string “NONE”. This can be seen in the example output of the script with the four tables that show NONE.
Conclusion
Every environment will be slightly different. When doing health checks across a large environment there will be ways to improve and automate processes. While this is a good way to preserve machine health, it does not remove the responsibility to have good operational practices. This script does not serve as an all-encompassing check. Admins will still need to be responsible and proactive to prevent downtime in a production environment. The purpose of this script is not to remove the need for vigilance, but to save time each morning allowing for the ability to assess the health of your environment at a glance.
Still have questions or want to discuss your Citrix environment Reach out to us at CDA. We’d love to discuss how we can help you with your automation needs!
Comments