Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.
the fork of nagios to icinga is a good thing, much in the same way as quagga was a great fork of zebra.
# uname -a
FreeBSD bsd10.local 10.0-RELEASE
Install Apache2.2
Install PHP5.4.27
Install MySQL5.5
On the Nagios server, install Nagios:
# cd /usr/ports/net-mgmt/nagios
# make config-recursive
# make config-recursive
# make install
Note: you only need to install Nagios on the machine that is going to act as a monitoring server. You do not need to install Nagios on the clients.
Add the www user to the nagios group:
# pw groupmod nagios -m www
# grep nagios /etc/group
nagios:*:181:www
Enable nagios to start on boot:
# echo 'nagios_enable="YES"' >> /etc/rc.conf
Now copy the sample files to the config files:
# cd /usr/local/etc/nagios/
# cp cgi.cfg-sample cgi.cfg
# cp nagios.cfg-sample nagios.cfg
# cp resource.cfg-sample resource.cfg
Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/sample
# mv /usr/local/etc/nagios/*-sample /usr/local/etc/nagios/sample
Navigate to /usr/local/etc/nagios/objects and do the same:
# cd /usr/local/etc/nagios/objects
# cp commands.cfg-sample commands.cfg
# cp contacts.cfg-sample contacts.cfg
# cp localhost.cfg-sample localhost.cfg
# cp printer.cfg-sample printer.cfg
# cp switch.cfg-sample switch.cfg
# cp templates.cfg-sample templates.cfg
# cp timeperiods.cfg-sample timeperiods.cfg
Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/objects/sample
# mv /usr/local/etc/nagios/objects/*-sample /usr/local/etc/nagios/objects/sample
Note: A sample configuration file for monitoring windows servers can be found at /usr/ports/net-mgmt/nagios/work/nagios-3.2.3/sample-config/template-object/windows.cfg
Now check you nagios configurations errors:
# nagios -v /usr/local/etc/nagios/nagios.cfg
Create a Nagios Admin called "nagiosadmin":
# htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin
Note: the -c parameter creates the htpasswd file. If htpasswd file already exists, it is rewritten and truncated.
Note: you must call the admin name "nagiosadmin", because it is the default admin name in these configuration file "grep -i 'admin' /usr/local/etc/nagios/*.cfg".
Change permission:
# chown root:www /usr/local/etc/nagios/htpasswd.users
# chmod 440 /usr/local/etc/nagios/htpasswd.users
Create a Nagios user called "nagiosuser":
# htpasswd /usr/local/etc/nagios/htpasswd.users nagiosuser
Note: you do not need the -c parameter this time since the htpasswd file already created.
Now add Nagios setting to your apache configuration:
# vim /usr/local/etc/apache22/Includes/nagios.conf
### [START] nagios <Directory /usr/local/www/nagios> Order deny,allow Deny from all Allow from 127.0.0.1 Allow from 192.168.6.112 php_flag engine on php_admin_value open_basedir /usr/local/www/nagios/:/var/spool/nagios/ AuthName "Nagios Access Ya" AuthType Basic AuthUSerFile /usr/local/etc/nagios/htpasswd.users Require valid-user </Directory> <Directory /usr/local/www/nagios/cgi-bin> Options ExecCGI </Directory> ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/ Alias /nagios/ /usr/local/www/nagios/ ### [END] nagios
Restart Apache:
# /usr/local/etc/rc.d/apache22 restart
Start Nagios:
# /usr/local/etc/rc.d/nagios start
On the Nagios Client, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make config-recursive
# make config-recursive
# make install
Make the Nagios configuration file:
# ls -l /usr/local/etc/nrpe.cfg
If nrpe.cfg does not exist:
# cp /usr/local/etc/nrpe.cfg.sample /usr/local/etc/nrpe.cfg
Change Permission:
# chmod 440 /usr/local/etc/nrpe.cfg
On the Nagios Client, add the Nagios Server's IP Address to allowed hosts:
# vi /usr/local/etc/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.13.3
Note: comma separated. No Space in between!
On the Nagios Client, enable nrpe2 to start on boot:
# echo "nrpe2_enable="YES"" >> /etc/rc.conf
On the Nagios Client, start nrpe2:
# /usr/local/etc/rc.d/nrpe2 start
On the Nagios Client, make sure nrpe2 is running:
# ps auxww | grep nrpe
nagios 46166 0.0 0.1 14392 1860 - Is 4:47AM 0:00.00 /usr/local/sbin/nrpe2 -c /usr/local/etc/nrpe.cfg -d
On the Nagios Client, make sure the nrpe2 daemon is running:
# netstat -a | grep 5666
tcp4 0 0 *.5666 *.* LISTEN
tcp6 0 0 *.5666 *.* LISTEN
# sockstat | grep -E 'nagios|nrpe|5666'
nagios nrpe2 99457 3 dgram -> /var/run/logpriv
nagios nrpe2 99457 4 tcp6 *:5666 *:*
nagios nrpe2 99457 5 tcp4 *:5666 *:*
On the Nagios Client, run check_nrpe2 check. You should see the version number on success.
# /usr/local/libexec/nagios/check_nrpe2 -H localhost
NRPE v2.15
On the Nagios Client, you can test some of these by running the following commands:
# /usr/local/libexec/nagios/check_http -H localhost
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_users
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_load
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_hda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_sda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_total_procs
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_zombie_procs
Note: plugins are stored in /usr/local/libexec/nagios.
At this point, you are done installing and configuring NRPE on the remote host (Nagios Client). Now its time to install a component and make some configuration entries on your monitoring server.
On the Nagios Server, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make install
Make sure the check_nrpe2 plugin can talk to the NRPE daemon on the remote host. Replace "192.168.13.156" in the command below with the IP address of the remote host that has NRPE installed. Run following command on the Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156
NRPE v2.15
On the Nagios Server, run following command for testing:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156 -c check_total_procs
Use a Browser to check:
http://192.168.13.2/nagios/
Edit the admin email:
# vim /usr/local/etc/nagios/nagios.cfg
admin_email=me@example.com
admin_pager=me@example.com
Note: Nagios never uses these values itself, but you can access them by using the $ADMINEMAIL$ and $ADMINPAGER$ macros in your notification commands.
Define Generic Contact Template in templates.cfg:
Nagios installation gives a default generic contact template that can be used as a reference to build your contacts. Please note that all the directives mentioned in the generic-contact template below are mandatory. So, if you've decided not to use the generic-contact template definition in your contacts, you should define all these mandatory definitions inside your contacts yourself.
The following generic-contact is already available under /usr/local/etc/nagios/objects/templates.cfg. Also, the templates.cfg is included in the nagios.cfg by default as shown below.
Please note that any of these directives mentioned in the templates.cfg can be overridden when you define a real contact using this generic-template.
# grep templates /usr/local/etc/nagios/nagios.cfg
cfg_file=/usr/local/etc/nagios/objects/templates.cfg
Note: generic-contact is available under /usr/local/etc/nagios/objects/templates.cfg
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
register 0
}- Name - This defines the name of the contact template (generic-contact).
- service_notification_period - This defines when nagios can send notification about services issues (for example, Apache down). By default this is 24×7 timeperiod, which is defined under /usr/local/etc/nagios/objects/timeperiods.cfg
- host_notification_period - This defines when nagios can send notification about host issues (for example, server crashed). By default, this is 24×7 timeperiod.
- service_notification_options - This defines the type of service notification that can be sent out. By default this defines all possible service states including flapping events. This also includes the scheduled service downtime activities.
- host_notification_options - This defines the type of host notifications that can be sent out. By default this defines all possible host states including flapping events. This also includes the scheduled host downtime activities.
- service_notification_commands - By default this defines that the contact should get notification about service issues (for example, database down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-service-by-sms command.
- host_notification_commands - By default this defines that the contact should get notification about host issues (for example, host down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-host-by-sms command.
Define Individual Contacts in contacts.cfg:
One you've confirmed that the generic-contact templates is defined properly, you can start defining individual contacts definition for all the people in your organization who would ever receive any notifications from nagios. Please note that just by defining a contact doesn't mean that they'll get notification. Later you have to associate this contact to either a service or host definition as shown in the later sections below. So, feel free to define all possible contacts here. (for example, Developers, DBAs, Sysadmins, IT-Manager, Customer Service Manager, Top Management etc.)
Note: Define these contacts in /usr/local/etc/nagios/objects/contacts.cfg
define contact{
contact_name sgupta
use generic-contact
alias Sanjay Gupta (Developer)
email sgupta@thegeekstuff.com
pager 333-333@pager.thegeekstuff.com
}
define contact{
contact_name jbourne
use generic-contact
alias Jason Bourne (Sysadmin)
email jbourne@thegeekstuff.com
}Define Contact Groups with Multiple Contacts in contacts.cfg:
Once you've defined the individual contacts, you can also group them together to send the appropriate notifications. For example, only DBAs needs to be notified about the database down service definition. So, a db-admins group may be required. Also, may be only Unix system administrators needs to be notified when Apache goes down. So, a unix-admins group may be required. Feel free to define as many groups as you think is required. Later you can use these groups in the individual service and host definitions.
Note: Define contact groups in /usr/local/etc/nagios/objects/contacts.cfg
define contactgroup{
contactgroup_name db-admins
alias Database Administrators
members jsmith, jdoe, mraj
}
define contactgroup{
contactgroup_name unix-admins
alias Linux System Administrator
members jbourne, dpatel, mshankar
}Attach Contact Groups or Individual Contacts to Service and Host Definitions:
Once you've defined the individual contacts and contact groups, it is time to start attaching them to a specific host or service definition as shown below.
Note: Following host is defined under /usr/local/etc/nagios/objects/servers/email-server.cfg. This can be any host definition file.
define host{
use linux-server
host_name email-server
alias Corporate Email Server
address 192.168.1.14
contact_groups unix-admins
}Note: Following is defined under /usr/local/etc/nagios/objects/servers/db-server.cfg. This can be any host definition file.
define service{
use generic-service
host_name prod-db
service_description CPU Load
contact_groups unix-admins
check_command check_nrpe!check_load
}We will create a new configuration file for all FreeBSD servers on the LAN:
# touch /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
# vi /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
Note: you can either edit the existing localhost.cfg or create the lan-freebsd-servers.cfg file.
###############################################################################
# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE
#
# Last Modified: 03-03-2011
#
# NOTE: This config file is intended to serve as an *extremely* simple
# example of how you can create configuration entries to monitor
# the local (FreeBSD) machine.
#
###############################################################################
###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################
# Define a host for the local machine
define host{
use freebsd-server ; Inherit default values from a template
host_name test-bsd ; The name we're giving to this host
alias My TEST BSD ; A longer name associated with the host
address 192.168.13.156 ; IP address of the host
}
define host{
use freebsd-server ; Inherit default values from a template
host_name dev01 ; The name we're giving to this host
alias dev01 ; A longer name associated with the host
address 192.168.13.157 ; IP address of the host
}
define host{
use freebsd-server ; Inherit default values from a template
host_name web1 ; The name we're giving to this host
alias Online Web ; A longer name associated with the host
address 192.168.13.242 ; IP address of the host
}
define host{
use freebsd-server ; Inherit default values from a template
host_name bsd-sql ; The name we're giving to this host
alias Online SQL ; A longer name associated with the host
address 192.168.13.108 ; IP address of the host
}
define host{
use freebsd-server ; Inherit default values from a template
host_name fw1 ; The name we're giving to this host
alias Firewall Server ; A longer name associated with the host
address 192.168.13.2 ; IP address of the host
}
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
# Define a service to "ping" the local machine
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql,fw1,dev01
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql
service_description SSH
check_command check_ssh
notifications_enabled 0
}
# Define a service to check HTTP.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use generic-service ; Name of service template to use
host_name web1
service_description HTTP
check_command check_http
contact_groups admins
notifications_enabled 1
}
### A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /index.php URI contains the string "html". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.
### If you are checking a virtual server that uses 'host headers' you must supply the FQDN (fully qualified domain name) as the [host_name] argument.
define service{
use generic-service ; Name of service template to use
host_name web1
service_description HTTP
check_command check_http!-u /index.php -t 5 -s "html"
contact_groups admins
notifications_enabled 1
}
### Note: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin.
### # /usr/local/libexec/nagios/check_http --help
### # /usr/local/libexec/nagios/check_http -H localhost
# Define a service to check the number of currently logged in users.
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql,fw1,dev01
service_description Current Users
check_command check_nrpe2!check_users
}
# Define a service to check the root partition of the disk.
define service{
use generic-service ; Name of service template to use
host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01
service_description / partition
check_command check_nrpe2!check_root
}
# Define a service to check the /usr partition of the disk.
define service{
use generic-service ; Name of service template to use
host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01
service_description /usr partition
check_command check_nrpe2!check_usr
}
# Define a service to check the /var partition of the disk.
define service{
use generic-service ; Name of service template to use
host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01
service_description /var partition
check_command check_nrpe2!check_var
}
# Define a service to check the /tmp partition of the disk.
define service{
use generic-service ; Name of service template to use
host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01
service_description /tmp partition
check_command check_nrpe2!check_tmp
}
# Define a service to check the load.
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql,fw1,dev01
service_description Current Load
check_command check_nrpe2!check_load
}
# Define a service to check zombie processes.
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql,fw1,dev01
service_description Zombie Processes
check_command check_nrpe2!check_zombie_procs
}
# Define a service to check total processes.
define service{
use generic-service ; Name of service template to use
host_name test-bsd,web1,bsd-sql,fw1,dev01
service_description total Processes
check_command check_nrpe2!check_total_procs
}
# Define a service to check mysql uptime.
define service{
use generic-service ; Name of service template to use
host_name bsd-sql
service_description MySQL Uptime
check_command check_nrpe2!check_mysql_health_uptime
}
# Define a service to check mysql slave io running.
define service{
use generic-service ; Name of service template to use
host_name bsd-sql
service_description MySQL Slave IO
check_command check_nrpe2!check_mysql_health_slave-io-running
}
# Define a service to check mysql slave sql running.
define service{
use generic-service ; Name of service template to use
host_name bsd-sql
service_description MySQL Slave SQL
check_command check_nrpe2!check_mysql_health_slave-sql-running
}
Note: comma separated. No Space in between!
Add other FreeBSD hosts on the LAN to the host group member list.
# vi /usr/local/etc/nagios/objects/localhost.cfg
define hostgroup{
hostgroup_name freebsd-servers ; The name of the hostgroup
alias FreeBSD Servers ; Long name of the group
members localhost,test-bsd,web1,bsd-sql,fw1 ; Comma separated list of hosts that belong to this group
}
Remember to add host name to /etc/hosts:
# vi /etc/hosts
192.168.13.156 test-bsd
192.168.13.242 web1
192.168.13.108 bsd-sql
192.168.13.2 fw1
Define check_nrpe2 command in order to allow Nagios Server to run the check_nrpe2 command. Add following lines to commands.cfg:
# vi /usr/local/etc/nagios/objects/commands.cfg
# 'check_nrpe2' command definition
define command{
command_name check_nrpe2
command_line $USER1$/check_nrpe2 -H $HOSTADDRESS$ -c $ARG1$
}
Note: $USERn$ macros are defined in /usr/local/etc/nagios/resource.cfg.
Note: Standard macros that are available in Nagios are listed here http://nagios.sourceforge.net/docs/3_0/macrolist.html .
Add following line to nagios.cfg:
# vi /usr/local/etc/nagios/nagios.cfg
# Definitions for monitoring the freebsd servers on the lan.
cfg_file=/usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
Now check you nagios configurations errors:
# /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg
Restart nagios if everything was okay:
# /usr/local/etc/rc.d/nagios restart
On the Nagios Client, install check_mysql_health plugin:
# cd /usr/ports/net-mgmt/check_mysql_health
# make install
Note: there is a plugin called "check_mysql" in nagios-plugins-1.4.15_1,1. However, check_mysql_health seems better.
Go to your MySQL server, and grant "no privileges" for a nagios user:
# mysql -u root -p
mysql> GRANT USAGE ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit
If you want to monitor mysql replication status as well, grant "REPLICATION CLIENT" privileges for a nagios user:
# mysql -u root -p
mysql> GRANT REPLICATION CLIENT ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit
# mysql -u nagios -p
mysql> show grants;
View check_mysql_health options:
# /usr/local/libexec/nagios/check_mysql_health -h
You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime --warning 2 --critical 5
Note: this command above will trigger a WARNING if mysql uptime is greater than 2 minutes; will trigger a CRITICAL if mysql uptime is greater than 5 minutes.
Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.
10 // means "Alarm, if > 10" (without colon).
90: // means "Alarm, if < 90" (with colon).
On Nagios Client, edit nrpe.cfg:
# vi /usr/local/etc/nrpe.cfg
### MySQL - hardcoded command arugments.
command[check_mysql_health_uptime]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime
command[check_mysql_health_slave-io-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-io-running
command[check_mysql_health_slave-sql-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-sql-running
On Nagios Client, restart nrpe2:
# /usr/local/etc/rc.d/nrpe2 restart
You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_mysql_health_uptime
You can test some of these by running the following commands on Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_uptime
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-io-running
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-sql-running
Check system message if it did not work:
# tail /var/log/messages
Reference:
http://www.wonkity.com/~wblock/docs/nagios.pdf
http://www.weithenn.org/cgi-bin/wiki.pl?Nagios-%E7%B6%B2%E8%B7%AF%E7%9B%A3%E6%8E%A7%E5%8F%8A%E5%91%8A%E8%AD%A6%E7%B3%BB%E7%B5%B1
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
http://nagios.sourceforge.net/docs/3_0/macros.html
http://www.thegeekstuff.com/2009/06/4-steps-to-define-nagios-contacts-with-email-and-pager-notification/
No comments:
Post a Comment