Report Job Status from vRanger to Nagios

I have been recently creating a number of external scripts within Nagios to bring together a number of services to be monitored. I am now looking at reporting the job status from vRanger backup jobs to Nagios based on the following criteria:

  • Report the last completed run of a backup job.
  • Report the job status and information to Nagios, where successful jobs are reported with the OK status and failed jobs with Critical status.

I will be able to do this by calling a number of cmdlets from the vRanger API Powershell snap-in, which by default is installed with vRanger.

Also the script will be required to return the status of multiple backup jobs, I do not want to create multiple scripts as well as multiple services within Nagios. Therefore, the script defined a mandatory parameter for the backup job name, which is called with the JobName argument.

Param ([parameter(Mandatory = $true)][string] $JobName)

As mentioned, previously we will be using the vRanger API Powershell snap-in to query the backup job status, so we will need to import the snap-ins to the current session:

if (-not (Get-PSSnapin vRanger.API.PowerShell -ErrorAction SilentlyContinue)) 
{ 
Add-PSSnapin vRanger.API.PowerShell > $null
}

Now, we need to filter the backup job name from the job template with the parameter specified.

$Template = Get-JobTemplate | Where-Object {$_.JobName -eq $JobName}

Now that we have the job name we will return the status from the Get-Job cmdlet by comparing the ParentJobTemplateId and the job template TemplateVersionId to return matching jobs where the status is completed and also return the most recent job history.

$Job = Get-Job | Where-Object {$_.ParentJobTemplateId -eq $Template.TemplateVersionID -and $_.JobState -eq "Completed"} | Select -Last 1

Now that we return the job status, we will need to generate return codes for the service status. The criteria as mentioned above is for jobs with the status success to be returned with the service status OK (0) and for jobs not reported as successful for this to be critical (2).

If ($Job.JobStatus -eq "Success")
{ 
$returncode = 0
} 

Else 
{ 
$returncode = 2
}

Finally I want to output the service status information for the backup job being monitored and exit the session returning the error code:

"" + $Job.JobState + " with " + $Job.JobStatus + " on " + $Job.CompletedOn
exit $returncode

One issue I found was that the vRanger API Powershell snap-in is only available in a 32-bit version and therefore requires the snap-in to be imported to the 32-bit version of Windows Powershell. Therefore, this requires the external script to call, the 32-bit executable from the check command:

check_vrangerbackupstatus= cmd /c echo scripts\Get-vRangerBackupStatus.ps1 -JobName "$ARG1$"; exit($lastexitcode) | %SystemRoot%\syswow64\WindowsPowerShell\v1.0\powershell.exe -command -

While the script was created to be executed as an external script within Nagios, this can be run standalone from Windows Powershell as below.

%SystemRoot%\syswow64\WindowsPowerShell\v1.0\powershell.exe -command ./Get-vRangerBackupStatus.ps1 -JobName <Job Name>

If your are looking to add external scripts to Nagios such as this one see the below link for more information;

https://deangrant.wordpress.com/2013/09/12/creating-and-running-external-scripts-within-nagios-xi/

The full Windows Powershell script can be downloaded from the below link:

https://app.box.com/s/3sgeu21nxxvv1oi02zte

Advertisements

Receive extended free trial of CloudCheckr Pro

CloudCheckr provides otherwise unavailable visibility and analytics to remove the complexity from AWS usage. Where users quickly and efficiently gain control of their deployment, reduce costs, and optimize infrastructure performance.

ScreenHunter_409 Sep. 13 10.30

By signing up (no credit card required) and using the promotional code “cloud19” on signup you will receive an extended trial of CloudCheckr Pro.

For full details of the analytic features, see  http://cloudcheckr.com/home-2/aws-analytic-features/

In short, CloudCheckr addresses the following areas to optimise cloud performance:

Monitoring status of AWS EC2 Snapshots within Nagios

I recently wrote a script to automate the creation of snapshots for EBS volumes for Amazon EC2 instances (https://deangrant.wordpress.com/2013/08/06/aws-create-ec2-snapshot-based-on-metadata-tag-value/).

Following on from this I wanted to report the status of snapshots completed and return this status to Nagios. This was to be achieved by comparing the number of EBS volumes that contained a specific metadata tag value to the number of snapshots created on a particular day.

As per usual this script was to be written in Windows Powershell and importing the Powershell for Amazon Web Services ((http://aws.amazon.com/powershell/) snap-in to the current powershell session.

If (-not (Get-Module AWSPowershell -ErrorAction SilentlyContinue))
{ 
Import-Module "C:\Program Files(x86)\AWS Tools\Powershell\AWSPowershell\AWSPowershell.psd1" > $null
}

Once the snap-in has been imported we will need to set our AWS Credentials and AWS Region for this session:

Set-AWSCredentials -AccessKey XXXXXXXXXXXXXXXXXXXXXX -SecretKey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Set-DefaultAWSRegion eu-west-1

As part of the script I will be required to output the Date string in two different formats one for the snapshot description and one for the date passed in the string for Nagios status information.

$Date = (Get-Date).toString ('ddMMyyyy')
$StatusDate = (Get-Date).toString ('dd/MM/yyyy')

Now we are required to compare the number of EBS volumes which match the metadata tag value and the number of snapshots created. Firstly, we will create the filter for the EBS volumes which in this instance was to return all volumes with the metadata tag value ‘EBS Snapshot: Yes’ and use the Get-EC2Volume cmdlet to return this volume and store within a variable.

Filter = (New-Object Amazon.EC2.Model.Filter).WithName("tag:EBS Snapshot").WithValue("Yes")
$Volumes = Get-EC2Volume -Filter $Filter

Now we have returned all our EBS volumes, we need to return all the snapshots created on the current date. This information is stored within the description of the snapshot upon creation in the format ‘EBS Snapshot created on ddMMyyyy’. To return all snapshots with this filter the Get-EC2Snapshot cmdlet is used to return all snapshots containing the filter.

$Snapshots = Get-EC2Snapshot | Where-Object ($_.Description -like ("EBS Snapshot Created on " + $Date + "*")}

Now it is time to compare the counts of variables returned, also in this case I am only creating a warning threshold only to generate my return codes.

$Warning = $Volumes.Count -5
If ($Snapshots.Count -eq $Volume.Count) {$returncode = 0} 
ElseIf ($Snapshots.Count -lt $Volume.Count and $Snapshots.Count -gt $Warning) {$returncode = 1}
ElseIf ($Snapshots.Count -lt $Warning) {$returncode = 2}

Now all that is left, is to exit the script and return the exit code to Nagios. However, before we do so I want to return Status Information as well to provide the number of snapshots performed on a certain date and the number of snapshots compared to actual EBS volumes.

"Total number of EBS Snapshots performed on " + $StatusDate + ": " + $SnapshotCount + "/" + $Volumes.Count 
exit $returncode

Below, is an example of a formatted Status Information message generated:

Total Number of EBS Snapshots performed on 12/09/2013: 137/137

There are one or two issues with the script, if a EBS volume is created during the day and no snapshot has been performed this will report that there are more volumes than snapshots, therefore if six EBS volumes were created this would then turn a warning. This can be negated by running the external script within the service command less frequently, in my case I run this once per day.

While the script was created to be executed as an external script within Nagios, this can be run standalone from Windows Powershell. If your are looking to add external scripts to Nagios such as this one see the below link for more information;

https://deangrant.wordpress.com/2013/09/12/creating-and-running-external-scripts-within-nagios-xi/

The full Windows Powershell script can be downloaded from the below link:

https://app.box.com/s/jm88wcrtosfc7xcisbn7

Checking Free Disk Space on NTFS Volume Mount Points in Nagios

As part of creating external scripts within Nagios I was required to create a script which would monitor NTFS volume mount points which is not available in the default monitoring wizard.

As the script was to be run against a number of different volume mount points, I did not want to create multiple scripts as well as multiple services within Nagios. Therefore, the script defined parameters for the NTFS volume mount points label name, which were called with the MountPoint argument.

Param ([string] $MountPoint)

I had a number of requirements when creating the script. Firstly, the Warning and Critical values as a percentage would be configured to be 80 and 90 respectively. These can be adjusted in the script as below.

$Warning = "80"
$Critical = "90"

Secondly, I had to return the Label, Capacity (GB), Free Space (GB) and Used Space (GB) of the NTFS volume mount point. The NTFS volume mount point was returned using the Get-WmiObject cmdlet and filtering by the Label name.

$Volume = Get-WmiObject Win32_Volume | Where-Object ($_.Label -eq $MountPoint)
$Label = $Volume.Name
$Capacity = $Volume.Capacity /1GB
$FreeSpace = $Volume.FreeSpace / 1GB 
$UsedSpace = ($Volume.Capacity - $Volume.FreeSpace) / 1GB

Finally, we then need to convert the Free Space (GB) and Used Space (GB) into a percentage for generating the status and return an exit code.

$PercentFree = [Math]::Round(($Volume.FreeSpace / $Volume.Capacity) * 100)
$PercentUsed = 100 - [Math]::Round(($Volume.FreeSpace / $Volume.Capacity) * 100)

Now we have the percentage of used space we  can generate the return code to pass to Nagios as a service state by using conditional logic within the script.

If ($PercentUsed -lt $Warning) {$returncode = 0}
ElseIf ($PercentUsed -gt $Warning -and $PercentUsed -lt $Critical) {$returncode = 1} 
ElseIf ($PercentUsed -gt $Critica) {$returncode = 2}

Now all that is left, is to exit the script and return the exit code to Nagios. However, before we do so I want to return Status Information as well to provide the Label, Capacity (GB), Used Space (%) and Free Space (%).

"Disk $Label -total: " + [System.Math]::Round($Capacity, 2) + "GB -used: " + [System.Math]::Round($UsedSpace, 2) + "GB ($PercentUsed%) -free " + [System.Math]::Round($FreeSpace, 2) + "GB ($PercentFree%)"
exit $returncode

Below, is an example of a formatted Status Information message generated:

Disk D:\Disk1\ -total: 200GB -used: 51.16GB (26%) -free 148.84GB (74%)

While the script was created to be executed as an external script within Nagios, this can be run standalone from Windows Powershell. If your are looking to add external scripts to Nagios such as this one see the below link for more information;

https://deangrant.wordpress.com/2013/09/12/creating-and-running-external-scripts-within-nagios-xi/

The full Windows Powershell script can be downloaded from the below link:

https://app.box.com/s/71ouenbu6vvcs6ewe3c7