Wednesday, April 23, 2014

Installing Nagios on FreeBSD 10

Installing Nagios on FreeBSD 10

Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.

the fork of nagios to icinga is a good thing, much in the same way as quagga was a great fork of zebra.

# uname -a
FreeBSD bsd10.local 10.0-RELEASE

Install Apache2.2

Install PHP5.4.27

Install MySQL5.5

On the Nagios server, install Nagios:
# cd /usr/ports/net-mgmt/nagios
# make config-recursive
# make config-recursive
# make install

Note: you only need to install Nagios on the machine that is going to act as a monitoring server. You do not need to install Nagios on the clients.

Add the www user to the nagios group:
# pw groupmod nagios -m www
# grep nagios /etc/group
nagios:*:181:www

Enable nagios to start on boot:
# echo 'nagios_enable="YES"' >> /etc/rc.conf

Now copy the sample files to the config files:
# cd /usr/local/etc/nagios/
# cp cgi.cfg-sample cgi.cfg
# cp nagios.cfg-sample nagios.cfg
# cp resource.cfg-sample resource.cfg

Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/sample
# mv /usr/local/etc/nagios/*-sample /usr/local/etc/nagios/sample

Navigate to /usr/local/etc/nagios/objects and do the same:
# cd /usr/local/etc/nagios/objects
# cp commands.cfg-sample commands.cfg
# cp contacts.cfg-sample contacts.cfg
# cp localhost.cfg-sample localhost.cfg
# cp printer.cfg-sample printer.cfg
# cp switch.cfg-sample switch.cfg
# cp templates.cfg-sample templates.cfg
# cp timeperiods.cfg-sample timeperiods.cfg

Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/objects/sample
# mv /usr/local/etc/nagios/objects/*-sample /usr/local/etc/nagios/objects/sample

Note: A sample configuration file for monitoring windows servers can be found at /usr/ports/net-mgmt/nagios/work/nagios-3.2.3/sample-config/template-object/windows.cfg

Now check you nagios configurations errors:
# nagios -v /usr/local/etc/nagios/nagios.cfg

Create a Nagios Admin called "nagiosadmin":
# htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin

Note: the -c parameter creates the htpasswd file. If htpasswd file already exists, it is rewritten and truncated.

Note: you must call the admin name "nagiosadmin", because it is the default admin name in these configuration file "grep -i 'admin' /usr/local/etc/nagios/*.cfg".

Change permission:
# chown root:www /usr/local/etc/nagios/htpasswd.users
# chmod 440 /usr/local/etc/nagios/htpasswd.users

Create a Nagios user called "nagiosuser":
# htpasswd /usr/local/etc/nagios/htpasswd.users nagiosuser

Note: you do not need the -c parameter this time since the htpasswd file already created.

Now add Nagios setting to your apache configuration:
# vim /usr/local/etc/apache22/Includes/nagios.conf

### [START] nagios
<Directory /usr/local/www/nagios>
  Order deny,allow
  Deny from all
  Allow from 127.0.0.1
  Allow from 192.168.6.112
  php_flag engine on
  php_admin_value open_basedir /usr/local/www/nagios/:/var/spool/nagios/
  AuthName "Nagios Access Ya"
  AuthType Basic
  AuthUSerFile /usr/local/etc/nagios/htpasswd.users
  Require valid-user
</Directory>

<Directory /usr/local/www/nagios/cgi-bin>
  Options ExecCGI
</Directory>

ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/
Alias /nagios/ /usr/local/www/nagios/
### [END] nagios

Restart Apache:
# /usr/local/etc/rc.d/apache22 restart

Start Nagios:
# /usr/local/etc/rc.d/nagios start

On the Nagios Client, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make config-recursive
# make config-recursive
# make install

Make the Nagios configuration file:
# ls -l /usr/local/etc/nrpe.cfg

If nrpe.cfg does not exist:
# cp /usr/local/etc/nrpe.cfg.sample /usr/local/etc/nrpe.cfg

Change Permission:
# chmod 440 /usr/local/etc/nrpe.cfg

On the Nagios Client, add the Nagios Server's IP Address to allowed hosts:
# vi /usr/local/etc/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.13.3

Note: comma separated. No Space in between!

On the Nagios Client, enable nrpe2 to start on boot:
# echo "nrpe2_enable="YES"" >> /etc/rc.conf

On the Nagios Client, start nrpe2:
# /usr/local/etc/rc.d/nrpe2 start

On the Nagios Client, make sure nrpe2 is running:
# ps auxww | grep nrpe
nagios 46166 0.0 0.1 14392 1860 - Is 4:47AM 0:00.00 /usr/local/sbin/nrpe2 -c /usr/local/etc/nrpe.cfg -d

On the Nagios Client, make sure the nrpe2 daemon is running:
# netstat -a | grep 5666
tcp4 0 0 *.5666 *.* LISTEN
tcp6 0 0 *.5666 *.* LISTEN

# sockstat | grep -E 'nagios|nrpe|5666'
nagios nrpe2 99457 3 dgram -> /var/run/logpriv
nagios nrpe2 99457 4 tcp6 *:5666 *:*
nagios nrpe2 99457 5 tcp4 *:5666 *:*

On the Nagios Client, run check_nrpe2 check. You should see the version number on success.
# /usr/local/libexec/nagios/check_nrpe2 -H localhost
NRPE v2.15

On the Nagios Client, you can test some of these by running the following commands:
# /usr/local/libexec/nagios/check_http -H localhost
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_users
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_load
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_hda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_sda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_total_procs
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_zombie_procs

Note: plugins are stored in /usr/local/libexec/nagios.

At this point, you are done installing and configuring NRPE on the remote host (Nagios Client). Now its time to install a component and make some configuration entries on your monitoring server.

On the Nagios Server, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make install

Make sure the check_nrpe2 plugin can talk to the NRPE daemon on the remote host. Replace "192.168.13.156" in the command below with the IP address of the remote host that has NRPE installed. Run following command on the Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156
NRPE v2.15

On the Nagios Server, run following command for testing:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156 -c check_total_procs

Use a Browser to check:

http://192.168.13.2/nagios/

Edit the admin email:

# vim /usr/local/etc/nagios/nagios.cfg
admin_email=me@example.com
admin_pager=me@example.com

Note: Nagios never uses these values itself, but you can access them by using the $ADMINEMAIL$ and $ADMINPAGER$ macros in your notification commands.

Define Generic Contact Template in templates.cfg:

Nagios installation gives a default generic contact template that can be used as a reference to build your contacts. Please note that all the directives mentioned in the generic-contact template below are mandatory. So, if you've decided not to use the generic-contact template definition in your contacts, you should define all these mandatory definitions inside your contacts yourself.

The following generic-contact is already available under /usr/local/etc/nagios/objects/templates.cfg. Also, the templates.cfg is included in the nagios.cfg by default as shown below.

Please note that any of these directives mentioned in the templates.cfg can be overridden when you define a real contact using this generic-template.

# grep templates /usr/local/etc/nagios/nagios.cfg
cfg_file=/usr/local/etc/nagios/objects/templates.cfg

Note: generic-contact is available under /usr/local/etc/nagios/objects/templates.cfg

define contact{
        name                            generic-contact
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r,f,s
        host_notification_options       d,u,r,f,s
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-email
        register                        0
        }

  • Name - This defines the name of the contact template (generic-contact).
  • service_notification_period - This defines when nagios can send notification about services issues (for example, Apache down). By default this is 24×7 timeperiod, which is defined under /usr/local/etc/nagios/objects/timeperiods.cfg
  • host_notification_period - This defines when nagios can send notification about host issues (for example, server crashed). By default, this is 24×7 timeperiod.
  • service_notification_options - This defines the type of service notification that can be sent out. By default this defines all possible service states including flapping events. This also includes the scheduled service downtime activities.
  • host_notification_options - This defines the type of host notifications that can be sent out. By default this defines all possible host states including flapping events. This also includes the scheduled host downtime activities.
  • service_notification_commands - By default this defines that the contact should get notification about service issues (for example, database down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-service-by-sms command.
  • host_notification_commands - By default this defines that the contact should get notification about host issues (for example, host down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-host-by-sms command.

Define Individual Contacts in contacts.cfg:

One you've confirmed that the generic-contact templates is defined properly, you can start defining individual contacts definition for all the people in your organization who would ever receive any notifications from nagios. Please note that just by defining a contact doesn't mean that they'll get notification. Later you have to associate this contact to either a service or host definition as shown in the later sections below. So, feel free to define all possible contacts here. (for example, Developers, DBAs, Sysadmins, IT-Manager, Customer Service Manager, Top Management etc.)

Note: Define these contacts in /usr/local/etc/nagios/objects/contacts.cfg

define contact{
        contact_name                    sgupta
        use                             generic-contact
        alias                           Sanjay Gupta (Developer)
        email                           sgupta@thegeekstuff.com
        pager                           333-333@pager.thegeekstuff.com
        }

define contact{
        contact_name                    jbourne
        use                             generic-contact
        alias                           Jason Bourne (Sysadmin)
        email                           jbourne@thegeekstuff.com
        }

Define Contact Groups with Multiple Contacts in contacts.cfg:

Once you've defined the individual contacts, you can also group them together to send the appropriate notifications. For example, only DBAs needs to be notified about the database down service definition. So, a db-admins group may be required. Also, may be only Unix system administrators needs to be notified when Apache goes down. So, a unix-admins group may be required. Feel free to define as many groups as you think is required. Later you can use these groups in the individual service and host definitions.

Note: Define contact groups in /usr/local/etc/nagios/objects/contacts.cfg

define contactgroup{
        contactgroup_name          db-admins
        alias                      Database Administrators
        members                    jsmith, jdoe, mraj
        }

        define contactgroup{
        contactgroup_name          unix-admins
        alias                      Linux System Administrator
        members                    jbourne, dpatel, mshankar
        }

Attach Contact Groups or Individual Contacts to Service and Host Definitions:

Once you've defined the individual contacts and contact groups, it is time to start attaching them to a specific host or service definition as shown below.

Note: Following host is defined under /usr/local/etc/nagios/objects/servers/email-server.cfg. This can be any host definition file.

define host{
        use                     linux-server
        host_name               email-server
        alias                   Corporate Email Server
        address                 192.168.1.14
        contact_groups          unix-admins
        }

Note: Following is defined under /usr/local/etc/nagios/objects/servers/db-server.cfg. This can be any host definition file.

define service{
        use                             generic-service
        host_name                       prod-db
        service_description             CPU Load
        contact_groups                  unix-admins
        check_command                   check_nrpe!check_load
        }

We will create a new configuration file for all FreeBSD servers on the LAN:
# touch /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
# vi /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg

Note: you can either edit the existing localhost.cfg or create the lan-freebsd-servers.cfg file.

###############################################################################
# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE
#
# Last Modified: 03-03-2011
#
# NOTE: This config file is intended to serve as an *extremely* simple
#       example of how you can create configuration entries to monitor
#       the local (FreeBSD) machine.
#
###############################################################################


###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the local machine
define host{
        use             freebsd-server  ; Inherit default values from a template
        host_name       test-bsd        ; The name we're giving to this host
        alias           My TEST BSD     ; A longer name associated with the host
        address         192.168.13.156 ; IP address of the host
        }

define host{
        use             freebsd-server  ; Inherit default values from a template
        host_name       dev01           ; The name we're giving to this host
        alias           dev01     ; A longer name associated with the host
        address         192.168.13.157 ; IP address of the host
        }

define host{
        use             freebsd-server  ; Inherit default values from a template
        host_name       web1           ; The name we're giving to this host
        alias           Online Web     ; A longer name associated with the host
        address         192.168.13.242 ; IP address of the host
        }

define host{
        use             freebsd-server  ; Inherit default values from a template
        host_name       bsd-sql        ; The name we're giving to this host
        alias           Online SQL     ; A longer name associated with the host
        address         192.168.13.108 ; IP address of the host
        }

define host{
        use             freebsd-server  ; Inherit default values from a template
        host_name       fw1        ; The name we're giving to this host
        alias           Firewall Server  ; A longer name associated with the host
        address         192.168.13.2 ; IP address of the host
        }

###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

# Define a service to "ping" the local machine

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql,fw1,dev01
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        }

# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql
        service_description             SSH
        check_command                   check_ssh
        notifications_enabled           0
        }

# Define a service to check HTTP.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       web1
        service_description             HTTP
        check_command                   check_http
 contact_groups                  admins
        notifications_enabled           1
        }

### A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /index.php URI contains the string "html". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.
### If you are checking a virtual server that uses 'host headers' you must supply the FQDN (fully qualified domain name) as the [host_name] argument.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       web1
        service_description             HTTP
 check_command                 check_http!-u /index.php -t 5 -s "html"
 contact_groups                  admins
        notifications_enabled           1
        }

### Note: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin.
### # /usr/local/libexec/nagios/check_http --help
### # /usr/local/libexec/nagios/check_http -H localhost

# Define a service to check the number of currently logged in users.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql,fw1,dev01
        service_description             Current Users
        check_command                   check_nrpe2!check_users
        }

# Define a service to check the root partition of the disk.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost,test-bsd,web1,bsd-sql,fw1,dev01
        service_description             / partition
        check_command                   check_nrpe2!check_root
        }

# Define a service to check the /usr partition of the disk.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost,test-bsd,web1,bsd-sql,fw1,dev01
        service_description             /usr partition
        check_command                   check_nrpe2!check_usr
        }

# Define a service to check the /var partition of the disk.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost,test-bsd,web1,bsd-sql,fw1,dev01
        service_description             /var partition
        check_command                   check_nrpe2!check_var
        }

# Define a service to check the /tmp partition of the disk.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost,test-bsd,web1,bsd-sql,fw1,dev01
        service_description             /tmp partition
        check_command                   check_nrpe2!check_tmp
        }

# Define a service to check the load.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql,fw1,dev01
        service_description             Current Load
        check_command                   check_nrpe2!check_load
        }

# Define a service to check zombie processes.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql,fw1,dev01
        service_description             Zombie Processes
        check_command                   check_nrpe2!check_zombie_procs
        }

# Define a service to check total processes.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       test-bsd,web1,bsd-sql,fw1,dev01
        service_description             total Processes
        check_command                   check_nrpe2!check_total_procs
        }

# Define a service to check mysql uptime.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       bsd-sql
        service_description             MySQL Uptime
        check_command                   check_nrpe2!check_mysql_health_uptime
        }

# Define a service to check mysql slave io running.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       bsd-sql
        service_description             MySQL Slave IO
        check_command                   check_nrpe2!check_mysql_health_slave-io-running
        }

# Define a service to check mysql slave sql running.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       bsd-sql
        service_description             MySQL Slave SQL
        check_command                   check_nrpe2!check_mysql_health_slave-sql-running
        }

Note: comma separated. No Space in between!

Add other FreeBSD hosts on the LAN to the host group member list.
# vi /usr/local/etc/nagios/objects/localhost.cfg

define hostgroup{
        hostgroup_name  freebsd-servers ; The name of the hostgroup
        alias           FreeBSD Servers ; Long name of the group
        members         localhost,test-bsd,web1,bsd-sql,fw1 ; Comma separated list of hosts that belong to this group
        }

Remember to add host name to /etc/hosts:
# vi /etc/hosts
192.168.13.156 test-bsd
192.168.13.242 web1
192.168.13.108 bsd-sql
192.168.13.2 fw1

Define check_nrpe2 command in order to allow Nagios Server to run the check_nrpe2 command. Add following lines to commands.cfg:
# vi /usr/local/etc/nagios/objects/commands.cfg

# 'check_nrpe2' command definition
define command{
        command_name check_nrpe2
        command_line $USER1$/check_nrpe2 -H $HOSTADDRESS$ -c $ARG1$
        }

Note: $USERn$ macros are defined in /usr/local/etc/nagios/resource.cfg.

Note: Standard macros that are available in Nagios are listed here http://nagios.sourceforge.net/docs/3_0/macrolist.html .

Add following line to nagios.cfg:
# vi /usr/local/etc/nagios/nagios.cfg
# Definitions for monitoring the freebsd servers on the lan.
cfg_file=/usr/local/etc/nagios/objects/lan-freebsd-servers.cfg

Now check you nagios configurations errors:
# /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg

Restart nagios if everything was okay:
# /usr/local/etc/rc.d/nagios restart

On the Nagios Client, install check_mysql_health plugin:
# cd /usr/ports/net-mgmt/check_mysql_health
# make install

Note: there is a plugin called "check_mysql" in nagios-plugins-1.4.15_1,1. However, check_mysql_health seems better.

Go to your MySQL server, and grant "no privileges" for a nagios user:
# mysql -u root -p
mysql> GRANT USAGE ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit

If you want to monitor mysql replication status as well, grant "REPLICATION CLIENT" privileges for a nagios user:
# mysql -u root -p
mysql> GRANT REPLICATION CLIENT ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit

# mysql -u nagios -p
mysql> show grants;

View check_mysql_health options:
# /usr/local/libexec/nagios/check_mysql_health -h

You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime --warning 2 --critical 5

Note: this command above will trigger a WARNING if mysql uptime is greater than 2 minutes; will trigger a CRITICAL if mysql uptime is greater than 5 minutes.

Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.

10 // means "Alarm, if > 10" (without colon).
90: // means "Alarm, if < 90" (with colon).

On Nagios Client, edit nrpe.cfg:
# vi /usr/local/etc/nrpe.cfg
### MySQL - hardcoded command arugments.
command[check_mysql_health_uptime]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime
command[check_mysql_health_slave-io-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-io-running
command[check_mysql_health_slave-sql-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-sql-running

On Nagios Client, restart nrpe2:
# /usr/local/etc/rc.d/nrpe2 restart

You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_mysql_health_uptime

You can test some of these by running the following commands on Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_uptime
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-io-running
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-sql-running

Check system message if it did not work:
# tail /var/log/messages

Reference:
http://www.wonkity.com/~wblock/docs/nagios.pdf

http://www.weithenn.org/cgi-bin/wiki.pl?Nagios-%E7%B6%B2%E8%B7%AF%E7%9B%A3%E6%8E%A7%E5%8F%8A%E5%91%8A%E8%AD%A6%E7%B3%BB%E7%B5%B1

http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

http://nagios.sourceforge.net/docs/3_0/macros.html

http://www.thegeekstuff.com/2009/06/4-steps-to-define-nagios-contacts-with-email-and-pager-notification/

No comments: