Integrating Nagios XI alerts to a Slack Channel

Recently, I have been exploring the use of Slack for messaging and collaboration and in particular integration with other tools. In this post I will discuss enabling integration with Nagios XI and providing the functionality to send alerts generated to a channel within Slack. Firstly, you need to have created a login and be an owner or a member of a team. Now, we can browse to the Slack App Directory and search for Nagios to enable and configure.

In order to integrate Nagios and configure the plugin there is a requirement to install the necessary perl modules and to download the plugin to the Nagios XI server, place the plugin in th directory ‘/usr/local/bin’ and change the access permissions to the file.

sudo yum install perl-libwww-perl
sudo yum install perl-Net-SSLeay
cd /tmp 
wget https://raw.github.com/tinyspeck/services-examples/master/nagios.pl
cp nagios.pl /usr/local/bin/slack_nagios.pl
chmod 755 /usr/local/bin/slack_nagios.pl

We will need to edit ‘/usr/local/bin/slack_nagios.pl’ and modify the $opt_domain and $opt_token variables as per your Slack configuration. In the below example, I am using the Slack team domain ‘deangrant.slack.com’ and the token for the Nagios integration has been generated as ‘BIRQpEaFMixAi6LsMMj80bcC’.

my $opt_domain = "dean.slack.com";
my $opt_token = "BIRQpEaFMixAi6LsMMj80bcC"; 

Now we will configure Nagios to define a contact and commands to use for the plugin. In this example, I will use the Slack channel ‘nagiosalerts’ for both the host and service notification command to send messages. Firstly, I will modify the file ‘/usr/local/nagios/etc/contacts.cfg’ to define the contact for slack and specify the host/server notification period, options and commands.

define contact {
      contact_name                             slack
      alias                                    Slack
      service_notification_period              24x7
      host_notification_period                 24x7
      service_notification_options             w,u,c,r
      host_notification_options                d,r
      service_notification_commands            notify-service-by-slack
      host_notification_commands               notify-host-by-slack
}

Now we will define commands for the notification settings by modifying the file ‘/usr/local/nagiosxi/tmp/nagiosxi/subcomponents/nagioscore/mods/cfg/objects/commands.cfg’ and including the following in the notification settings section. As per my example, I am configuring the command_line to use the Slack channel ‘nagiosalerts’ for for both the ‘notify-service-by-slack’ and ‘notify-host-by-slack’ commands.

define command {
      command_name     notify-service-by-slack
      command_line     /usr/local/bin/slack_nagios.pl -field slack_channel=#nagiosalerts
}

define command {
      command_name     notify-host-by-slack
      command_line     /usr/local/bin/slack_nagios.pl -field slack_channel=#nagiosalerts
}

We can also define contact group membership for the contact defined for Slack. In this example, I have modified an existing contact group for the ‘Nagios Administrators’ to include the slack contact as a member by modifying the file ‘/usr/local/nagios/etc/contactgroups.cfg’.

define contactgroup {
  contactgroup_name admins
  alias             Nagios Administrators
  members           nagiosadmin, slack
			}

We also need to ensure that the file ‘/usr/local/nagiosxi/tmp/nagiosxi/subcomponents/nagioscore/mods/cfg/nagios.cfg’ contains the following configuration value to enable environment macros.

enable_environment_macros=1

Finally, to apply the configuration to enable the plugin we will be required to restart Nagios. Following a restart alerts which should now be sent to the Slack channel when generated.

sudo service nagios restart
Advertisements

Forcing SSL with permanent redirect for Nagios XI

The scope of this article is to describe the steps required to configure SSL for Nagios XI and to force SSL with a permanent redirection. In this example, I have made the assumption that certificate files have been generated and in this example I will be using the hostname ‘nagios.dean.local’ for my Nagios XI server throughout the configuration steps.

In order to configure SSL version 2011R1.6 or later of Nagios XI is required to ensure all of data is displayed correctly in Nagios XI interface. The installation of Nagios XI should install the required SSL components, to verify this run the following command on the Nagios XI server.

sudo yum install mod_ssl openssl -y 

Also, you will need to ensure that inbound connectivity on TCP service port 443 for the https protocol is permitted. If this is not the case we can create input chain to accept connections.

sudo iptables -I INPUT -p tcp --dport 443 -j ACCEPT 
sudo service iptables save

Prior to configuring SSL for Nagios XI we will create a backup of the configuration files which will modify as part of this process in case we are required to revert the changes. The SSL certificate files will need to be available NagiosXI server, place the certificate file in ‘/etc/pki/tls/certs’ and the key file in ‘/etc/pki/tls/private’.

sudo cp /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.backup
sudo cp /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.backup 
sudo cp /etc/httpd/conf.d/nagiosxi.conf /etc/httpd/conf.d/nagiosxi.conf.backup
sudo cp /usr/local/nagiosxi/html/config.inc.php /usr/local/nagiosxi/html/config.inc.php.backup 

Once the certificate and key files have been copied to the locations we need to configure the httpd service by modifying the file ‘/etc/httpd/conf.d/ssl.conf’

SSLCertificateFile /etc/pki/tls/certs/nagios.dean.local.crt
SSLCertificateKeyFile /etc/pki/tls/private/nagios.dean.local.key 

Also, we will need to configure the virtual host to listen to requests on TCP server port 443 for the https protocol by modifying the file ‘/etc/httpd/conf/httpd.conf’.

NameVirtualHost *:443 

Once the file has been modified we will restart the ‘httpd’ service to apply the configuration. We may verify the connection to the server by browsing to the server, in this example https://nagios.dean.local.

sudo service httpd restart 

Now we will need to edit ‘/usr/local/nagiosxi/html/config.inc.php’ and modify the below line to which the value by default should be configured as  ‘false’.

$cfg['use_https']=true;

Now browse to the Nagios XI web interface and browse to Admin > System Settings and modify the program URL value to contain the https protocol instead of http and select ‘Update Settings‘. For Example https://nagios.dean.local/nagiosxi.

Nagios_System_Settings

Next, browse to ‘Configure > Core Config Manager > Config Manager Admin > Config Manager Settings’ and modify the Server Protocol value to be https and select ‘Save‘.

Nagios_Global_Settings

 

 

 

 

 

 

 

 

 

 

 

 

Finally, we will need to edit the file ‘/etc/httpd.conf.d/nagiosxi.conf’, which by default should be similar to the below:

# NameVirtualHost * :443
<VirtualHost *:80> 
<Directory "/usr/local/nagiosxi/html">
   #SSLRequireSSL
   Options None 
   AllowOverride None 
   Order allow, deny 
   Allow from all 
   # Order deny,allow
   # Deny from all
   # Allow from 127.0.0.1
   # AuthName "Nagios XI"
   # AuthType Basic 
   # AuthUserFile /usr/local/nagiosxi/etc/htpasswd.users
   # Require valid-user 
</Directory
</VirtualHost> 

We now will add the following to configuration file to enable the virtual host to listen on HTTPS and specify the path to the certificate files that been generated.

<VirtualHost *:443>
   SSLEngine on 
   SSLCertificateFile /etc/pki/tls/certs/nagios.dean.local.crt
   SSLCertificateKeyFile /etc/pki/tls/private/nagios.dean.local.key 
   <Directory "/user/local/nagiosxi/html">
   AllowOverride All 
</Directory> 
</VirtualHost>
Alias /nagiosxi "/usr/local/nagiosxi/html"

To force SSL with a permanent redirection the following will be required to be added between ‘<VirtualHost>’ and ‘</VirtualHost>’ tags and add the configuration for the rewrite engine between the start and end tags for the virtual host listing on TCP service port 443.

Redirect permanent / https://nagios.dean.local 
<IfModule mod_rewrite.c> 
   RewriteEngine On 
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteRule nagiosxi/api/v1/(.*)$ /usr/local/nagiosxi/html/api/v1/index.php?request=$1 [QSA,NC,L]
</IfModule>

Once saved, the updated configuration file ‘/etc/httpd.conf.d/nagiosxi.conf’ should look similar to the below:

# NameVirtualHost * :443
<VirtualHost *:80> 
<Directory "/usr/local/nagiosxi/html">
   #SSLRequireSSL
   Options None 
   AllowOverride None 
   Order allow, deny 
   Allow from all 
   Redirect permanent / https://nagios.dean.local 
   # Order deny,allow
   # Deny from all
   # Allow from 127.0.0.1
   # AuthName "Nagios XI"
   # AuthType Basic 
   # AuthUserFile /usr/local/nagiosxi/etc/htpasswd.users
   # Require valid-user 
</Directory
</VirtualHost> 

<VirtualHost *:443>
   SSLEngine on 
   SSLCertificateFile /etc/pki/tls/certs/nagios.dean.local.crt
   SSLCertificateKeyFile /etc/pki/tls/private/nagios.dean.local.key 
<Directory "/user/local/nagiosxi/html">
   AllowOverride All 
</Directory> 
<IfModule mod_rewrite.c> 
   RewriteEngine On 
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteRule nagiosxi/api/v1/(.*)$ /usr/local/nagiosxi/html/api/v1/index.php?request=$1 [QSA,NC,L]
</IfModule>
</VirtualHost>

Alias /nagiosxi "/usr/local/nagiosxi/html"

This completes the configuration steps required to force SSL with a permanent direct for the Nagios XI server. In my example, I would verify the connection to browsing to ‘http:\\nagios.dean.local\nagiosxi’ which should redirect to ‘https:\\nagios.dean.local\nagiosxi’. The configuration files that are required be modified can be found as examples at the following link.

Nagios XI: Automating Host Management

I was recently looking at how to automate adding and removing managed hosts and services in Nagios XI, which can be particularly useful in cloud computing and large environments where configuration management solutions have been implemented for provisioning. In these environments we typically use configuration files based on the attributes of a server role during the provisioning and configuration cycle.

Nagios XI contains a number of scripts in the directory /usr/local/nagiosxi/scripts that allow for automated host management, as below:

Script Description
reconfigure_nagios.sh Imports configuration files from the import directory, verifies configuration and restart Nagios if verification succeeds . If verification fails, configuration will be rolled back to the last working checkpoint. This is the command invoked from the web interface when selecting ‘Apply Configuration’.
nagiosql_delete_host.php Removes a host from the configuration database and removes the configuration file.
nagiosql_delete_service.php Removes services from the configuration database and removes the configuration file.

In order to automate adding managed hosts and services the method used was to create a single configuration file for a each host and each of its services to which service definitions are are only applied to a that host and not to a host list or host group and to name the configuration file according to the hostname. In the below example, I have created a single configuration file which defines the host and a managed service for CPU Usage and saved the configuration file as ‘server1.dean.local.cfg’.

define host {
 host_name server1.dean.local
 use xiwizard_windowsserver_host
 address server1.dean.local
 max_check_attempts 5
 check_interval 5
 retry_interval 1
 check_period xi_timeperiod_24x7
 notification_interval 60
 notification_period xi_timeperiod_24x7
 icon_image win_server.png
 statusmap_image win_server.png
 _xiwizard windowsserver
 register 1
 } 

define service {
 host_name server1.dean.local
 service_description CPU Usage
 use xiwizard_windowsserver_nsclient_service
 check_command check_xi_service_nsclient!!CPULOAD!-l 5,80,90
 max_check_attempts 5
 check_interval 5
 retry_interval 1
 check_period xi_timeperiod_24x7
 notification_interval 60
 notification_period xi_timeperiod_24x7
 _xiwizard windowsserver
 register 1
 }

Once the configuration file has been created we can place the file in the import directory located at ‘/usr/local/nagios/etc/import’ and invoke the script reconfigure_nagios.sh from the directory ‘/usr/local/nagiosxi/scripts’ to import the configuration file, verify the configuration and restart Nagios if successful. If the verification of the configuration fails, Nagios XI will restore the configuration files to the last working checkpoint but the imported configuration file will remain in the configuration database. In order to detect failures the following exit codes are returned where an exit code of ‘0’ to confirm that the configuration file has been successfully verified as a working configuration and Nagios has been restarted.

Exit Code Description
0 no problems detected
1 config verification failed
2 nagiosql login failed
3 nagiosql import failed
4 reset_config_perms failed
5 nagiosql_exportall.php failed (write configs failed)
6 /etc/init.d/nagios restart failed
7 db_connect failed

Now that we have added a managed host and services, how do we remove this the configuration database and delete the configuration file once the host is terminated? Providing the host has no dependent relationships we can firstly remove the services using the configuration name which matches the configuration file of the managed host (this is why it is important to name the configuration file according to hostname) and invoke the ‘nagiosql_delete_service.php’ from the directory ‘/usr/local/nagiosxi/scripts’ as the below example:

./nagiosql_delete_service.php --config=server1.dean.local

After the services have been successfully deleted we can remove the host by invoking the ‘nagiosql_delete_host.php’ script:

./nagiosql_delete_host.php --host=server1.dean.local

Once the host has been successfully removed, we can apply the new configuration as previosuly by invoking the ‘nagios_reconfigure_sh’ script. This method can also be applied to remove an imported configuration from the configuration database if verification of the configuration has failed during an import.

The above describes how to automate adding and removing hosts and services using Nagios XI and can be applied to your configuration management solutions during the provisioning and configuration cycle. In my scenario, I created a number of configuration files based on the attributes of server roles to which can be used as cookbook templates in Chef and using the ‘{node[‘fqdn’]}’ pattern to specify the host name in the definition file and the configuration file name.  I have also compiled PowerShell functions to perform the above which I will discuss in a later post.

Nagios: Starting NSClient++ service fails with the error ‘NSClient++ (x64) is not a valid Win32 application.’

I was recently investigating an issue where the Nagios monitoring agent (NSClient++) service failed to start with the following error message:

The NSClient++ (x64) service failed to start due to the following error:
NSClient++ (x64) is not a valid Win32 application.

In the first instance I attempted to resolve the issue by uninstalling and installing the monitoring agent on the impacted host. To which the same behaviour was experienced. On investigation, I found the following knowledge base article which describes the above symptom, where the cause of the issue is described as below:

  • The path of a service’s executable file contains spaces.
  • There is a file or folder on your computer’s hard disk that has the same name as a file or folder in the path to the service’s executable file

The first cause  above describes the condition that is causing the issue, so in order to attempt to resolve the issue I to wrapped the image path filename in quotations, as follows:

1) Browse to the registry key ‘HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NSClientpp’.

2) Modify the REG_EXPAND_SZ value data for ImagePath to be “C:\Program Files\NSClient++\nsclient++.exe”.

Following the modification I was successfully able to start the NSClient++ service and continue monitoring the host.

Nagios XI: Host and Service details not being displayed

Recently, I was troubleshooting an issue with Nagios XI where host and service details where not being displayed from the web management console.

On investigating the log file at ‘/var/log/messages’ there was a number of errors identifying that a table in the MySQL database was crashed and was required to be repaired.

ndo2db: mysql_error: 'Table './nagios/nagios_timedeventqueue' is marked as crashed and should be repaired'

In order to repair the table marked as crashed, I ran the below on the Nagios XI monitoring server , reconnected to the web management console and both host and service details were displayed as expected.

myisamchk --safe-recover /var/lib/mysql/*/*.MYI

 

Monitoring multiple SSL certificates on a single host using Nagios XI

I was recently looking to monitor multiple SSL certificates on a host for multiple network services, to which Nagios appeared to have a limitation within the ‘check_xi_service_http_cert’ check command  to which it can only monitor a single SSL certificate per monitored host. 

I therefore created a powershell script to allow for multiple certificates to be monitored based on the port number used for the network service and return the expiry date of the SSL certificate to generate a service status.

The script is dependent on two parameters, one to retrieve the host name (where by default this retrieves the local host FQDN) and a mandatory parameter for the port number.

Param ([string] $URL= ([System.Net.Dns]::GetHostByName(($env:computerName))).HostName, [parameter(Mandatory =$true)][string] $Port) 

Once we have specified the parameters required we will initiate a client connection to the TCP network service by invoking the ‘System.Net.Sockets.TcpClient’ class.

$TCPClient = New-Object System.Net.Sockets.TcpClient($URL,$Port) 

Once the client connection has been generated, we will provide a stream  for client-server communication that uses the Secure Socket Layer (SSL) security protocol to authenticate the server and optionally the client.

$SSLStream = New-Object System.Net.Security.SslStream($TCPClient.GetStream())
$SSLStream.AuthenticateAsClient($URL) 

This will now allow for the expiration date of the certificate used to authenticate the remote endpoint to be retrieved, and store this as a variable to compare in the conditional logic to determine the service status.

$Certificate = $SSLStream.Get_RemoteCertificate()
$Expiry = [datetime]::Parse($Certificate.GetExpirationDatestring())

Now we will use conditional logic to compare the expiry date to a date in the future from the current date and use the following criteria.

  • If the the expiry date is less than 30 days in future report the service status as ‘OK’
  • If the expiry date is greater or equal to 7 days in the future report the service status as ‘Critical’
  • If the expiry date is greater or equal to 30 days in the future report the service status as ‘Warning’
If (((Get-Date).AddDays(30)) -lt $Expiry)
    { 
    "OK - Certificate will expire on " + $Expiry.ToString("dd/MM/yyyy HH:mm") 
    $returncode = "0" 
    } 
    
ElseIf (((Get-Date).AddDays(7)) -ge $Expiry)
    { 
    "Critical - Certificate will expire on " + $Expiry.ToString("dd/MM/yyyy HH:mm") 
    $returncode = "2" 
    } 
    
ElseIf (((Get-Date).AddDays(30)) -ge $Expiry)
    { 
    "Warning - Certificate will expire on " + $Expiry.ToString("dd/MM/yyyy HH:mm") 
    $returncode = "1" 
    } 

Once the service status has been determined, the powershell session will exist returning an exit code.

exit $returncode

Once you have configured the external script to run within Nagios (http://wp.me/p15Mdc-eC), you will be able to monitor multiple SSL certificate expiration’s on a single host. Alternatively, you can invoke the script from the powershell console as below:

./Check-SSLCertificates.ps1 -URL server.domain.local -Port 443

Monitoring vCenter privelage reassignment with Nagios XI

During a restart of the ‘VMware VirtualCenter Server’ service, if a user or group assigned to the Administrator Role at the root folder level could not be verified during the restart the user privelages are revoked.

As part of security hardening on the vCenter server, I created a Nagios Remote Plugin Executor (NRPE) to search for the event created in the application log and create a service status. 

Firstly, we will only require to query the application log after the ‘VMware VirtualCenter Server’ service has started, we can retrieve this information as a date format by using the Get-Process cmdlet to return the ‘StartTime’ value of the process ‘vxpd’.

$Start= (Get-Process vpxd).StartTime

Now that we have retrieved a date value to query the application log after, we will need to filter the application log further using the ‘Get-EventLog’ cmdlet to retrieve an event, which is similar to the below:

Log Name: Application
Source: VMware VirtualCenter Server
Date: M/DD/YYYY H:MM:SS PM
Event ID: 1000
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: [vCenter Server]
Description:  Removing permission for entity ""<group name>"", group ""DOMAIN\Account"", role -1.  Reason: User or group not found."

We will now create a filter to pass to the ‘Get-EventLog’ cmdlet to retrieve the any results like the above and store this is a variable so that we may use the results as a count. The below will filter for the Souce ‘VMware VirtualCenter Server’, the EntryType  ‘Warning’ and where the message text is like ‘Removing permission*User or group not found’.

The ‘ErrorAction’ preference is required as if zero counts of the below filter are returned, an error will be passed to the console output.

$Query = Get-EventLog -LogName Application -Source "VMware VirtualCenter Server" -EntryType "Warning" -After $Start-ErrorAction SilentlyContinue | Where-Object {$_.Message -like "Removing permission*User or group not found"}  

Conditional Logic will then be used to create a service status message based on the count of results returned in the above query. If zero results are returned the service status will be set to ‘OK’ with a status information stating that no instances of privelage reassignment since the process start time have been retrieved

If one or more results are returned, the service status will be set to ‘Critical’ with the status information message that a number of instances of privelage assignment since the process start time have been retrieved.

If ($Query.Count -eq "0") 
    { 
    "No instances of privelage reassignment since " + ($Start).ToString("dd/MM/yyyy HH:mm")
    $returncode="0"
    } 
ElseIf ($Query.Count -ge "1") 
    { 
    "" + $Query.Count + " instances of privelage reassignment since " + ($Start).ToString("dd/MM/yyyy HH:mm")
    $returncode = "2"
    }

The powershell session will now exit and return an exit code.

exit $returncode

Once you have configured the external script to run within Nagios (http://wp.me/p15Mdc-eC), for a service status of ‘OK’ you should receive something similar to the below:

CountVMUPR