Automation is a great way to manage repeatable tasks. Scripts can do repetitive tasks quickly and without error. This can lead to quicker response times and shorter outage windows. Automation also frees up your team to continually innovate.
This is especially valuable if you manage mission critical applications. Applications that receive high traffic levels each day will need some sort of load balancing between servers. Citrix ADC/NetScaler is one option for load balancing the traffic between multiple backend servers. It does this by creating one “Virtual IP Address” that consumers use to access the site. From the virtual IP Address, the NetScaler uses various types of logic to balance all backend servers, so the workload is equal. This allows for you to “build out” backend services, rather than “build up”.
With adding a load balancer of sorts, you increase complexity within the environment. You have another layer to troubleshoot when it comes to outages. With the environment being more complex, the outages can waste hours of time troubleshooting to find the root cause. Valuable time can be spent just identifying which ADC, Load Balanced, or Content Switched vServer is responsible for the specific outage. Many times, the ADC is not the cause, it is often a change or failure on the applications/servers bound to the service groups.
After watching this occur repeatedly with many of our customers, we decided to find a more efficient way to identify possible root causes.
Built into each ADC/NetScaler is a REST API, called NITRO API. This can be found on the Documentation tab after logging in. On the top you will find materials on how to use the NITRO API, as well as a client to test and build requests. The NITRO API allows for us to use automation tools to run checks or set values on the ADC/NetScaler via a script, rather than logging into the CLI or GUI.
Throughout this blog, we will discuss how we leveraged the Citrix ADC NITRO API to enumerate the ADC resources, namely:
LB vServer Names
Service Groups Names
Backend Server States
HTTP Requests for testing with a 200 “OK” response expected
We use this information and test the services end-to-end. This gives us an accurate view of what is happening in the environment for every object. This allows us to identify precisely where the issue or error exists.
Development Process Overview
The first step I took to build out the automated workflow was writing some simple pseudocode. We need to identify the steps required to complete our process before we jump in and start writing our script. The following steps were used to gather all the information needed to check monitor and backend server status:
Understand Business Case
- The purpose of creating a script to check vServer status is to reduce the time to resolve and troubleshoot issues
- The output will provide detailed information about resource to all stakeholders during an outage and can effectively centralize communication
- Get list of all Load Balanced (LB) vServers
- For each LB vServer, enumerate the Service Group
- For each Service Group, enumerate the bound servers
-Get the backend server IP, Port, Name, and Current State
- For each Service Group, enumerate the bound LB Monitors
- Get any custom HTTP Request strings
- Test each backend server to ensure the Monitor(s) bound are functioning as expected
Once the workflow/pseudocode was designed, I needed to build and configure the lab to prototype the solution for testing
This required exploring the Citrix NITRO API, a tool that comes built-in to any NetScaler. It is used by third-party management tools to interface with your Citrix ADC/NetScalers. The NITRO API has many capabilities and is well documented. Checking Server states is only scratching the surface of what is possible.
During the exploration I used the built-in NITRO API Client to complete testing.
Now, I was ready to build the script in Python.
Once built, I needed to test the script
Finally, document and post the script
This script was built and tested in my lab environment. The lab consists of a few simple components as shown in the diagram below:
Workstation: Ubuntu 20.04.1 LTS
- This will be used to load the json responses to parse for the needed data
- Form the GET requests sent to the ADC
- Parsing through arrays of data received from ADC
- Bypass any ssl certificate errors when connecting to ADC or backend servers
- Open, write, and close the csv file
- Hide password input when connecting to the ADC
NetScaler (Hosted on VMWare ESXi)
- VM: NetScaler 12.1 Build 51.19.nc
VM (Hosted on VMWare ESXi)
- JSONPlaceHolder docker image (3x)
The table below gives you a general idea of what the minimal configuration should be on the NetScaler. This can be used to set up your own lab environment. The configuration includes the following:
Enabling the Load Balancing Feature
Setting the hostname and Subnet IP Address (SNIP)
Creating some backend servers
Creating (3) load balancing vServers
Creating (3) Service Groups
Creating a monitor
Binding the monitors, backend servers, and Service Groups to the LB vServers
NetScaler Configuration (Base Config)
set ns config -IPAddress 192.168.99.50 -netmask 255.255.255.0 enable ns feature WL LB CH set ns hostName NS add ns ip 192.168.99.51 255.255.255.0 -vServer DISABLED add server server01 192.168.99.110 add server server02 192.168.99.111 add server server03 192.168.99.112 add serviceGroup SVG-TEST1 HTTP -maxClient 0 -maxReq 0 -cip ENABLED X-Forwarded-For -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP YES add serviceGroup SVG-TEST2 HTTP -maxClient 0 -maxReq 0 -cip ENABLED X-Forwarded-For -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP YES add serviceGroup SVG-TEST3 HTTP -maxClient 0 -maxReq 0 -cip ENABLED X-Forwarded-For -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP YES add lb vserver VS-LB-TEST1 HTTP 192.168.99.100 80 -persistenceType NONE -cltTimeout 180 add lb vserver VS-LB-TEST2 HTTP 192.168.99.101 80 -persistenceType NONE -cltTimeout 180 add lb vserver VS-LB-TEST3 HTTP 192.168.99.102 80 -persistenceType NONE -cltTimeout 180 bind lb vserver VS-LB-TEST1 SVG-TEST1 bind lb vserver VS-LB-TEST2 SVG-TEST2 bind lb vserver VS-LB-TEST3 SVG-TEST3 add dns nameServer 192.168.99.2 add dns nameServer 22.214.171.124 add lb monitor MON-TEST1 HTTP -respCode 200 -httpRequest "GET /posts" bind serviceGroup SVG-TEST1 server02 80 bind serviceGroup SVG-TEST1 server03 80 bind serviceGroup SVG-TEST1 server01 80 bind serviceGroup SVG-TEST1 -monitorName MON-TEST1 bind serviceGroup SVG-TEST2 server02 80 bind serviceGroup SVG-TEST2 server03 80 bind serviceGroup SVG-TEST2 server01 80
Exploring the Citrix NTIRO API with Citrix Developer Docs
Before starting the script, I wanted to get an idea of what is possible with the NITRO API. Citrix’s Developer Docs are a great resource for documentation of each of the commands. The docs layout examples that outline the request verbs (GET, PUT, DELETE, etc…), syntax, and payloads.
NOTE: There were some examples on GitHub as well, but they seemed a bit too contrived for my needs.
The Developer Docs you will use might differ slightly for your environment. The link referenced above is for 12.0, which is the NetScaler build in my lab.
The documentation provides detailed explanations for each endpoint and parameter that can be sent, along with their payloads. I was able to quickly navigate through the reference guide to find my starting point for the script, enumerating all LB vServers. The URL required to complete this, can be found in Figure 2 below.
Continuing with the psudocode, we need to figure out how to find the Service Group bindings for each of the LB vServers listed above. This can be accomplished by using the ”/nitro/v1/config/lbvserver_servicegroup_binding” URL. (Figure 3)
The process continues until we have all the following objects:
Load Balanced vServer
Service Group Binding
Service Group Members Servers
Service Group Monitor bindings
Testing the NITRO API with NITRO Client
As you may have noticed, all the commands we need are GET’s. I find it easiest to use the native NITRO Client for this particular application. I would use Postman for testing POST requests, but for this particular case it is not necessary.
After getting to the NITRO Client, I began by checking the output of each of the commands I identified earlier. I begin with displaying all LB vServers and looking for the fields that are interesting. Some interesting fields include “name” and “curstate”. The output, as shown in Figure 6 can be a little cumbersome. To make viewing the info slightly easier, I find it best to open a new tab on my browser and copying the URL, seen in Figure 7.
The process will continue with the following endpoints to identify all the fields we will need to complete and output to a .csv file. See a list of all endpoints we GET for use in the script below:
“/nitro/v1/config/lbvserver_servicegroup_binding/<LB vServer Name>”
“/nitro/v1/config/servicegroup_lbmonitor_binding/<Service Group Name>”
“/nitro/v1/config/servicegroup_servicegroupmember_binding/<Service Group Name>”
This should be all the data we need from the NetScaler to give us an effective method for enumerating resources. Next in the blog, we will discuss the script, as well as testing the backend servers.
Scripted Workflow Overview
At this point I have dissected and tested the NITRO API for my purposes. Now it is time to build the script, so I can automate the process. The process is simple and documented in the table below:
Get list of all Load Balancers
Get Service Group bindings for each Load Balancer
Get Service Group Members for each Load Balancer
Get Monitors for each Service Group
Check state of backend servers according to LB Monitor “HTTP-ECV” GET requests
Send request to backend server to verify the service is up or down on the actual host
Write the output to .csv file. You can put the output anywhere you like especially if you plan to incorporate this into your CI/CD pipeline tool (i.e., Jenkins)
After I made it through getting all the data needed from the NetScaler, I needed to find a way to actually test the backend server state. With a combination of the HTTP ECV URL/Endpoint, the server IP Address, and Server Port, I was able to complete a simple test URI to send with a “requests.get” function. Depending on the response code received, we can determine if the backend server and endpoint are up and listening on the URI.
Once everything is tested, we output to a .csv file. This will allow the support and deployment teams to get a quick “birds eye view” of what services could be causing any issues and cut the time to resolve down. This approach minimizes the analysis and offending component from the equation and allows the correct support person to identify the root cause and resolve the issue.
If there is any interest, I may add the steps to incorporate this into Jenkins and JIRA. The goal would be to automate the testing during and CI/CD pipeline testing and send the output to JIRA and create an issue for the deployment team…
Once I verified all the positive result use cases were completed, I needed to account for Load Balancers that are not configured the way we expect. Some issues could be no backend servers are bound, a monitor is not bound, or there is not a specific http request required for the monitor. To resolve this issue, I implemented “IF ELSE” logic for each loop in the script. The table below includes the content of the script I created. This is a working prototype and can be easily modified to suit any other environment.
Citrix NetScaler NITRO API Backend Service Checks
import json import requests, sys, collections import urllib3 import csv import getpass #Get a list of LB vServers, Service Group Bindings, Server Group members, #Service Group Member Ports, Service Group Monitors, Service Group Monitor HTTP Requests, LB vServer Status, Backend Server Status #Get list of LB vServers #Loop through each LB vServer to get Service Group Binding #Loop though each Service Group to get Service Group Members and monitors #Loop though each monitor to get HTTP Request parameter #Send request to backend server #Write output to .csv #Disable SSL Warnings if cert is untrusted urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) #User input server IP Address NITRO_SERVER=input("Server IP: ") #User input username NITRO_USER=input("Username: ") #User input password try: NITRO_PWD=getpass.getpass() except Exception as error: print ('ERROR',error) #Open a .csv file and create a new line with open('monitor_status.csv','w', newline='') as f: writer = csv.writer(f) #write headings for each column writer.writerow(['vServer Name','ServiceGroup Name', 'Server Name', 'Port', 'Monitor Name', 'HTTP Request', 'LB VIP Status', 'Server Status', 'Backend Server State']) #send first request to NITRO API to get list of Load Balancing vServser lbvs_response=requests.get("https://%s/nitro/v1/config/lbvserver"%(NITRO_SERVER),auth=(NITRO_USER, NITRO_PWD),verify=False) lbvs_data=json.loads(lbvs_response.text) #for each lbvserver in lbvs_data get the name and current state for j in lbvs_data['lbvserver']: lbvs_name=j['name'] lbvs_stat=j['curstate'] #send request to get Service Group bindings for each Load Balancing vServer svg_response=requests.get("https://%s/nitro/v1/config/lbvserver_servicegroup_binding/%s"%(NITRO_SERVER,lbvs_name),auth=(NITRO_USER, NITRO_PWD),verify=False) svg_data = json.loads(svg_response.text) #if there is a Service Group binding, get the monitor name and Service Group members if 'lbvserver_servicegroup_binding' in svg_data: for k in svg_data['lbvserver_servicegroup_binding']: svg_grpname=k["servicegroupname"] svgmon_response=requests.get("https://%s/nitro/v1/config/servicegroup_lbmonitor_binding/%s"%(NITRO_SERVER,svg_grpname),auth=(NITRO_USER, NITRO_PWD),verify=False) svgmon_data=json.loads(svgmon_response.text) member_response=requests.get("https://%s/nitro/v1/config/servicegroup_servicegroupmember_binding/%s"%(NITRO_SERVER,svg_grpname),auth=(NITRO_USER, NITRO_PWD),verify=False) mr_data=json.loads(member_response.text) #if there is a monitor bound, get the port, backend server name, and backend server state if 'servicegroup_lbmonitor_binding' in svgmon_data: for l in mr_data['servicegroup_servicegroupmember_binding']: port=str(l["port"]) svrname=l["servername"] svrip=l["ip"] svrstate=l["svrstate"] svrport=":"+port #for each monitor, get the monitor name for m in svgmon_data['servicegroup_lbmonitor_binding']: monname=m["monitor_name"] mon_response=requests.get("https://%s/nitro/v1/config/lbmonitor/%s"%(NITRO_SERVER,monname),auth=(NITRO_USER, NITRO_PWD),verify=False) mon_data=json.loads(mon_response.text) for n in mon_data["lbmonitor"]: #if there is a http request field if 'httprequest' in mon_data["lbmonitor"]: httpreq=n['httprequest'] mon_sec=n['secure'] httpreq=str.replace(httpreq,'GET ','') if( mon_sec == 'NO'): #create test uri test_uri='http://'+svrip+svrport+httpreq response = requests.get(test_uri) if(response.status_code == 200): backend = 'UP' writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) else: backend = 'DOWN' writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) else: test_uri='https://'+svrip+svrport+httpreq response = requests.get(test_uri) if(response.status_code == 200): backend = 'UP' writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) else: backend = 'DOWN' writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) #else assign httpreq to none else: httpreq='N/A' backend='N/A' #write to .csv file writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) else: monname='tcp' httpreq='N/A' backend='N/A' for l in mr_data['servicegroup_servicegroupmember_binding']: port=str(l["port"]) svrname=l["servername"] svrstate=l["svrstate"] writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend]) else: port ='N/A' svrname='N/A' httpreq='N/A' svg_grpname='N/A' monname='N/A' svrstate='N/A' backend='N/A' writer.writerow([lbvs_name,svg_grpname,svrname,port,monname,httpreq,lbvs_stat,svrstate,backend])
This prototype script can be added to any job scheduler or automation engine (Jenkins), with some production ready refinements, to quickly check the entire NetScaler for service states on all load balancers.
Maybe later I will introduce some functionality for GSLB vServers or checking for only specific Load Balancers. For now, this is merely a prototype you can use to get up and running with quick “health checks” on your NetScaler. Seconds count when there is a Production outage.
If you plan on using this script you will want to make sure you have the following:
Install and create a Python Virtual Environment
Clone the repository
Test Functions (IN A TESTING ENVIRONMENT, don’t be that guy/girl!)
Disclaimer: While this may go without saying, “Do NOT test this in your production environment”
You can access this script and supporting files at the following location. Simply “git clone” the repository and run it against your test environment.
Script Execution and Testing
cmyers@UBUNTU:~$python3 NetScaler_Checks.py NOTE: You will be prompted for <server IP> <username> <password>