Monitoring the status of Domain Controller Diagnostics Tool (dcdiag.exe) with Nagios XI

I was recently looking at monitoring the service health of Active Directory Domain Services using Nagios XI, the default monitoring wizards only provide the ability to monitor an LDAP server. However, I was looking to monitor a number of services similar to those available from running the Domain Contoller Diagnostics Tool (dcdiag.exe) – http://technet.microsoft.com/en-us/library/cc776854(v=ws.10).aspx.

Therefore, I decided to create a powershell script that would allow for the state of a domain controller to be analysed and for a service status to be returned using the check_nrpe command.

I wanted my powershell script to be flexible enough to run a test agaisnt a single service on a domain controller, so I specified a mandatory parameter for the test name required to be run.

Capture1

I will also be required to generate a string for the timestamp to include in the status information message on when the service check was last performed.

Capture2

 

As previously stated I will be using the dcdiag.exe utility to run a specific test which has been specified in the mandatory parameter to which I will capture the output to a filter.

Capture3

 

Once the test has completed and the output has been captured I will use conditional logic to determine the service status, return a service state and generate a status information message.

Capture4

 

 

 

 

 

 

 

The powershell session will then exit returning the exit code.

Capture5

Now that we have a powershell script to monitor the service health, we need to add this to the external scripts section of the NSClient++ configuration file (C:\Program Files\NSClient++\NSC.ini) and copy the powershell script to the scripts folder so that the client service may invoke the command ((C:\Program Files\NSClient++\scripts).

[External Scripts]
check_directoryservergiagnostics= cmd /c echo scripts\Check-DirectoryServerDiagnostics.ps1 -Test $ARG1$; exit($lastexitcode) | powershell.exe -command -

The final step is to create a service,  we need to use the check_nrpe command and set the $ARG1$ value as below in the example of running the DNS test (dcdiag /test:DNS) , save the service and apply the configuration.

check_directoryservergiagnostics -a "DNS"

Once you have applied the service to a host, or in my case the host group created for all my domain controllers, following the next service check you will receive a service status.

Capture8

Once issue I had was that the DNS test can take a complete of minutes to complete and produce the output, therefore the default check_nrpe command settings are not applicable for this service to complete successfully as the check will timeout, I therefore used a copy of the check_nrpe command to which I had previously changed the timeout value to 600 seconds in order for the test to complete, see for more information on resolving the check_nrpe socket timeout issue:

https://deangrant.wordpress.com/2013/08/27/nagios-xi-check_nrpe-socket-timeout-after-30-seconds/

While the script was created to be executed as an external script within Nagios, this can be run standalone from Windows Powershell.

Capture6

The full Windows Powershell script can be downloaded from the below link:

https://app.box.com/s/iqun1h3b4r8bdp0esdfu

Nagios XI and Ubuntu returns ‘CHECK_NRPE: Received 0 bytes from daemon’.

When monitoring a remote Ubuntu server which Nagios XI I was receiving the below unknown message for the service status:

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

Following investigation , I noticed the following entry in /var/log/syslog:

Nov 11 15:24:58 nrpe[7768]: Error: Request contained command arguments, but argument option is not enabled!
Nov 11 15:24:58 nrpe[7768]: Client request was invalid, bailing out...

In order to resolve this issue I had to modify the Nagios configuration file (/etc/nagios/nrpe.cfg)  on the remote server to allow command arguments:

dont_blame_nrpe = 1