Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.
the fork of nagios to icinga is a good thing, much in the same way as quagga was a great fork of zebra.
# uname -a
FreeBSD bsd10.local 10.0-RELEASE
Install Apache2.2
Install PHP5.4.27
Install MySQL5.5
On the Nagios server, install Nagios:
# cd /usr/ports/net-mgmt/nagios
# make config-recursive
# make config-recursive
# make install
Note: you only need to install Nagios on the machine that is going to act as a monitoring server. You do not need to install Nagios on the clients.
Add the www user to the nagios group:
# pw groupmod nagios -m www
# grep nagios /etc/group
nagios:*:181:www
Enable nagios to start on boot:
# echo 'nagios_enable="YES"' >> /etc/rc.conf
Now copy the sample files to the config files:
# cd /usr/local/etc/nagios/
# cp cgi.cfg-sample cgi.cfg
# cp nagios.cfg-sample nagios.cfg
# cp resource.cfg-sample resource.cfg
Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/sample
# mv /usr/local/etc/nagios/*-sample /usr/local/etc/nagios/sample
Navigate to /usr/local/etc/nagios/objects and do the same:
# cd /usr/local/etc/nagios/objects
# cp commands.cfg-sample commands.cfg
# cp contacts.cfg-sample contacts.cfg
# cp localhost.cfg-sample localhost.cfg
# cp printer.cfg-sample printer.cfg
# cp switch.cfg-sample switch.cfg
# cp templates.cfg-sample templates.cfg
# cp timeperiods.cfg-sample timeperiods.cfg
Move sample files to a sample folder:
# mkdir -p /usr/local/etc/nagios/objects/sample
# mv /usr/local/etc/nagios/objects/*-sample /usr/local/etc/nagios/objects/sample
Note: A sample configuration file for monitoring windows servers can be found at /usr/ports/net-mgmt/nagios/work/nagios-3.2.3/sample-config/template-object/windows.cfg
Now check you nagios configurations errors:
# nagios -v /usr/local/etc/nagios/nagios.cfg
Create a Nagios Admin called "nagiosadmin":
# htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin
Note: the -c parameter creates the htpasswd file. If htpasswd file already exists, it is rewritten and truncated.
Note: you must call the admin name "nagiosadmin", because it is the default admin name in these configuration file "grep -i 'admin' /usr/local/etc/nagios/*.cfg".
Change permission:
# chown root:www /usr/local/etc/nagios/htpasswd.users
# chmod 440 /usr/local/etc/nagios/htpasswd.users
Create a Nagios user called "nagiosuser":
# htpasswd /usr/local/etc/nagios/htpasswd.users nagiosuser
Note: you do not need the -c parameter this time since the htpasswd file already created.
Now add Nagios setting to your apache configuration:
# vim /usr/local/etc/apache22/Includes/nagios.conf
### [START] nagios <Directory /usr/local/www/nagios> Order deny,allow Deny from all Allow from 127.0.0.1 Allow from 192.168.6.112 php_flag engine on php_admin_value open_basedir /usr/local/www/nagios/:/var/spool/nagios/ AuthName "Nagios Access Ya" AuthType Basic AuthUSerFile /usr/local/etc/nagios/htpasswd.users Require valid-user </Directory> <Directory /usr/local/www/nagios/cgi-bin> Options ExecCGI </Directory> ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/ Alias /nagios/ /usr/local/www/nagios/ ### [END] nagios
Restart Apache:
# /usr/local/etc/rc.d/apache22 restart
Start Nagios:
# /usr/local/etc/rc.d/nagios start
On the Nagios Client, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make config-recursive
# make config-recursive
# make install
Make the Nagios configuration file:
# ls -l /usr/local/etc/nrpe.cfg
If nrpe.cfg does not exist:
# cp /usr/local/etc/nrpe.cfg.sample /usr/local/etc/nrpe.cfg
Change Permission:
# chmod 440 /usr/local/etc/nrpe.cfg
On the Nagios Client, add the Nagios Server's IP Address to allowed hosts:
# vi /usr/local/etc/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.13.3
Note: comma separated. No Space in between!
On the Nagios Client, enable nrpe2 to start on boot:
# echo "nrpe2_enable="YES"" >> /etc/rc.conf
On the Nagios Client, start nrpe2:
# /usr/local/etc/rc.d/nrpe2 start
On the Nagios Client, make sure nrpe2 is running:
# ps auxww | grep nrpe
nagios 46166 0.0 0.1 14392 1860 - Is 4:47AM 0:00.00 /usr/local/sbin/nrpe2 -c /usr/local/etc/nrpe.cfg -d
On the Nagios Client, make sure the nrpe2 daemon is running:
# netstat -a | grep 5666
tcp4 0 0 *.5666 *.* LISTEN
tcp6 0 0 *.5666 *.* LISTEN
# sockstat | grep -E 'nagios|nrpe|5666'
nagios nrpe2 99457 3 dgram -> /var/run/logpriv
nagios nrpe2 99457 4 tcp6 *:5666 *:*
nagios nrpe2 99457 5 tcp4 *:5666 *:*
On the Nagios Client, run check_nrpe2 check. You should see the version number on success.
# /usr/local/libexec/nagios/check_nrpe2 -H localhost
NRPE v2.15
On the Nagios Client, you can test some of these by running the following commands:
# /usr/local/libexec/nagios/check_http -H localhost
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_users
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_load
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_hda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_sda1
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_total_procs
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_zombie_procs
Note: plugins are stored in /usr/local/libexec/nagios.
At this point, you are done installing and configuring NRPE on the remote host (Nagios Client). Now its time to install a component and make some configuration entries on your monitoring server.
On the Nagios Server, install nrpe2:
# cd /usr/ports/net-mgmt/nrpe
# make install
Make sure the check_nrpe2 plugin can talk to the NRPE daemon on the remote host. Replace "192.168.13.156" in the command below with the IP address of the remote host that has NRPE installed. Run following command on the Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156
NRPE v2.15
On the Nagios Server, run following command for testing:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.156 -c check_total_procs
Use a Browser to check:
http://192.168.13.2/nagios/
Edit the admin email:
# vim /usr/local/etc/nagios/nagios.cfg
admin_email=me@example.com
admin_pager=me@example.com
Note: Nagios never uses these values itself, but you can access them by using the $ADMINEMAIL$ and $ADMINPAGER$ macros in your notification commands.
Define Generic Contact Template in templates.cfg:
Nagios installation gives a default generic contact template that can be used as a reference to build your contacts. Please note that all the directives mentioned in the generic-contact template below are mandatory. So, if you've decided not to use the generic-contact template definition in your contacts, you should define all these mandatory definitions inside your contacts yourself.
The following generic-contact is already available under /usr/local/etc/nagios/objects/templates.cfg. Also, the templates.cfg is included in the nagios.cfg by default as shown below.
Please note that any of these directives mentioned in the templates.cfg can be overridden when you define a real contact using this generic-template.
# grep templates /usr/local/etc/nagios/nagios.cfg
cfg_file=/usr/local/etc/nagios/objects/templates.cfg
Note: generic-contact is available under /usr/local/etc/nagios/objects/templates.cfg
define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 }
- Name - This defines the name of the contact template (generic-contact).
- service_notification_period - This defines when nagios can send notification about services issues (for example, Apache down). By default this is 24×7 timeperiod, which is defined under /usr/local/etc/nagios/objects/timeperiods.cfg
- host_notification_period - This defines when nagios can send notification about host issues (for example, server crashed). By default, this is 24×7 timeperiod.
- service_notification_options - This defines the type of service notification that can be sent out. By default this defines all possible service states including flapping events. This also includes the scheduled service downtime activities.
- host_notification_options - This defines the type of host notifications that can be sent out. By default this defines all possible host states including flapping events. This also includes the scheduled host downtime activities.
- service_notification_commands - By default this defines that the contact should get notification about service issues (for example, database down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-service-by-sms command.
- host_notification_commands - By default this defines that the contact should get notification about host issues (for example, host down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-host-by-sms command.
Define Individual Contacts in contacts.cfg:
One you've confirmed that the generic-contact templates is defined properly, you can start defining individual contacts definition for all the people in your organization who would ever receive any notifications from nagios. Please note that just by defining a contact doesn't mean that they'll get notification. Later you have to associate this contact to either a service or host definition as shown in the later sections below. So, feel free to define all possible contacts here. (for example, Developers, DBAs, Sysadmins, IT-Manager, Customer Service Manager, Top Management etc.)
Note: Define these contacts in /usr/local/etc/nagios/objects/contacts.cfg
define contact{ contact_name sgupta use generic-contact alias Sanjay Gupta (Developer) email sgupta@thegeekstuff.com pager 333-333@pager.thegeekstuff.com } define contact{ contact_name jbourne use generic-contact alias Jason Bourne (Sysadmin) email jbourne@thegeekstuff.com }
Define Contact Groups with Multiple Contacts in contacts.cfg:
Once you've defined the individual contacts, you can also group them together to send the appropriate notifications. For example, only DBAs needs to be notified about the database down service definition. So, a db-admins group may be required. Also, may be only Unix system administrators needs to be notified when Apache goes down. So, a unix-admins group may be required. Feel free to define as many groups as you think is required. Later you can use these groups in the individual service and host definitions.
Note: Define contact groups in /usr/local/etc/nagios/objects/contacts.cfg
define contactgroup{ contactgroup_name db-admins alias Database Administrators members jsmith, jdoe, mraj } define contactgroup{ contactgroup_name unix-admins alias Linux System Administrator members jbourne, dpatel, mshankar }
Attach Contact Groups or Individual Contacts to Service and Host Definitions:
Once you've defined the individual contacts and contact groups, it is time to start attaching them to a specific host or service definition as shown below.
Note: Following host is defined under /usr/local/etc/nagios/objects/servers/email-server.cfg. This can be any host definition file.
define host{ use linux-server host_name email-server alias Corporate Email Server address 192.168.1.14 contact_groups unix-admins }
Note: Following is defined under /usr/local/etc/nagios/objects/servers/db-server.cfg. This can be any host definition file.
define service{ use generic-service host_name prod-db service_description CPU Load contact_groups unix-admins check_command check_nrpe!check_load }
We will create a new configuration file for all FreeBSD servers on the LAN:
# touch /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
# vi /usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
Note: you can either edit the existing localhost.cfg or create the lan-freebsd-servers.cfg file.
############################################################################### # LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE # # Last Modified: 03-03-2011 # # NOTE: This config file is intended to serve as an *extremely* simple # example of how you can create configuration entries to monitor # the local (FreeBSD) machine. # ############################################################################### ############################################################################### ############################################################################### # # HOST DEFINITION # ############################################################################### ############################################################################### # Define a host for the local machine define host{ use freebsd-server ; Inherit default values from a template host_name test-bsd ; The name we're giving to this host alias My TEST BSD ; A longer name associated with the host address 192.168.13.156 ; IP address of the host } define host{ use freebsd-server ; Inherit default values from a template host_name dev01 ; The name we're giving to this host alias dev01 ; A longer name associated with the host address 192.168.13.157 ; IP address of the host } define host{ use freebsd-server ; Inherit default values from a template host_name web1 ; The name we're giving to this host alias Online Web ; A longer name associated with the host address 192.168.13.242 ; IP address of the host } define host{ use freebsd-server ; Inherit default values from a template host_name bsd-sql ; The name we're giving to this host alias Online SQL ; A longer name associated with the host address 192.168.13.108 ; IP address of the host } define host{ use freebsd-server ; Inherit default values from a template host_name fw1 ; The name we're giving to this host alias Firewall Server ; A longer name associated with the host address 192.168.13.2 ; IP address of the host } ############################################################################### ############################################################################### # # SERVICE DEFINITIONS # ############################################################################### ############################################################################### # Define a service to "ping" the local machine define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql,fw1,dev01 service_description PING check_command check_ping!100.0,20%!500.0,60% } # Define a service to check SSH on the local machine. # Disable notifications for this service by default, as not all users may have SSH enabled. define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql service_description SSH check_command check_ssh notifications_enabled 0 } # Define a service to check HTTP. # Disable notifications for this service by default, as not all users may have HTTP enabled. define service{ use generic-service ; Name of service template to use host_name web1 service_description HTTP check_command check_http contact_groups admins notifications_enabled 1 } ### A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /index.php URI contains the string "html". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond. ### If you are checking a virtual server that uses 'host headers' you must supply the FQDN (fully qualified domain name) as the [host_name] argument. define service{ use generic-service ; Name of service template to use host_name web1 service_description HTTP check_command check_http!-u /index.php -t 5 -s "html" contact_groups admins notifications_enabled 1 } ### Note: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. ### # /usr/local/libexec/nagios/check_http --help ### # /usr/local/libexec/nagios/check_http -H localhost # Define a service to check the number of currently logged in users. define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql,fw1,dev01 service_description Current Users check_command check_nrpe2!check_users } # Define a service to check the root partition of the disk. define service{ use generic-service ; Name of service template to use host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01 service_description / partition check_command check_nrpe2!check_root } # Define a service to check the /usr partition of the disk. define service{ use generic-service ; Name of service template to use host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01 service_description /usr partition check_command check_nrpe2!check_usr } # Define a service to check the /var partition of the disk. define service{ use generic-service ; Name of service template to use host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01 service_description /var partition check_command check_nrpe2!check_var } # Define a service to check the /tmp partition of the disk. define service{ use generic-service ; Name of service template to use host_name localhost,test-bsd,web1,bsd-sql,fw1,dev01 service_description /tmp partition check_command check_nrpe2!check_tmp } # Define a service to check the load. define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql,fw1,dev01 service_description Current Load check_command check_nrpe2!check_load } # Define a service to check zombie processes. define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql,fw1,dev01 service_description Zombie Processes check_command check_nrpe2!check_zombie_procs } # Define a service to check total processes. define service{ use generic-service ; Name of service template to use host_name test-bsd,web1,bsd-sql,fw1,dev01 service_description total Processes check_command check_nrpe2!check_total_procs } # Define a service to check mysql uptime. define service{ use generic-service ; Name of service template to use host_name bsd-sql service_description MySQL Uptime check_command check_nrpe2!check_mysql_health_uptime } # Define a service to check mysql slave io running. define service{ use generic-service ; Name of service template to use host_name bsd-sql service_description MySQL Slave IO check_command check_nrpe2!check_mysql_health_slave-io-running } # Define a service to check mysql slave sql running. define service{ use generic-service ; Name of service template to use host_name bsd-sql service_description MySQL Slave SQL check_command check_nrpe2!check_mysql_health_slave-sql-running }
Note: comma separated. No Space in between!
Add other FreeBSD hosts on the LAN to the host group member list.
# vi /usr/local/etc/nagios/objects/localhost.cfg
define hostgroup{
hostgroup_name freebsd-servers ; The name of the hostgroup
alias FreeBSD Servers ; Long name of the group
members localhost,test-bsd,web1,bsd-sql,fw1 ; Comma separated list of hosts that belong to this group
}
Remember to add host name to /etc/hosts:
# vi /etc/hosts
192.168.13.156 test-bsd
192.168.13.242 web1
192.168.13.108 bsd-sql
192.168.13.2 fw1
Define check_nrpe2 command in order to allow Nagios Server to run the check_nrpe2 command. Add following lines to commands.cfg:
# vi /usr/local/etc/nagios/objects/commands.cfg
# 'check_nrpe2' command definition define command{ command_name check_nrpe2 command_line $USER1$/check_nrpe2 -H $HOSTADDRESS$ -c $ARG1$ }
Note: $USERn$ macros are defined in /usr/local/etc/nagios/resource.cfg.
Note: Standard macros that are available in Nagios are listed here http://nagios.sourceforge.net/docs/3_0/macrolist.html .
Add following line to nagios.cfg:
# vi /usr/local/etc/nagios/nagios.cfg
# Definitions for monitoring the freebsd servers on the lan.
cfg_file=/usr/local/etc/nagios/objects/lan-freebsd-servers.cfg
Now check you nagios configurations errors:
# /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg
Restart nagios if everything was okay:
# /usr/local/etc/rc.d/nagios restart
On the Nagios Client, install check_mysql_health plugin:
# cd /usr/ports/net-mgmt/check_mysql_health
# make install
Note: there is a plugin called "check_mysql" in nagios-plugins-1.4.15_1,1. However, check_mysql_health seems better.
Go to your MySQL server, and grant "no privileges" for a nagios user:
# mysql -u root -p
mysql> GRANT USAGE ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit
If you want to monitor mysql replication status as well, grant "REPLICATION CLIENT" privileges for a nagios user:
# mysql -u root -p
mysql> GRANT REPLICATION CLIENT ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'nagios';
mysql> FLUSH PRIVILEGES;
mysql> exit
# mysql -u nagios -p
mysql> show grants;
View check_mysql_health options:
# /usr/local/libexec/nagios/check_mysql_health -h
You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime --warning 2 --critical 5
Note: this command above will trigger a WARNING if mysql uptime is greater than 2 minutes; will trigger a CRITICAL if mysql uptime is greater than 5 minutes.
Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.
10 // means "Alarm, if > 10" (without colon).
90: // means "Alarm, if < 90" (with colon).
On Nagios Client, edit nrpe.cfg:
# vi /usr/local/etc/nrpe.cfg
### MySQL - hardcoded command arugments.
command[check_mysql_health_uptime]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode uptime
command[check_mysql_health_slave-io-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-io-running
command[check_mysql_health_slave-sql-running]=/usr/local/libexec/nagios/check_mysql_health --hostname localhost --username nagios --password nagios --mode slave-sql-running
On Nagios Client, restart nrpe2:
# /usr/local/etc/rc.d/nrpe2 restart
You can test some of these by running the following commands on Nagios Client:
# /usr/local/libexec/nagios/check_nrpe2 -H localhost -c check_mysql_health_uptime
You can test some of these by running the following commands on Nagios Server:
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_uptime
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-io-running
# /usr/local/libexec/nagios/check_nrpe2 -H 192.168.13.108 -c check_mysql_health_slave-sql-running
Check system message if it did not work:
# tail /var/log/messages
Reference:
http://www.wonkity.com/~wblock/docs/nagios.pdf
http://www.weithenn.org/cgi-bin/wiki.pl?Nagios-%E7%B6%B2%E8%B7%AF%E7%9B%A3%E6%8E%A7%E5%8F%8A%E5%91%8A%E8%AD%A6%E7%B3%BB%E7%B5%B1
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
http://nagios.sourceforge.net/docs/3_0/macros.html
http://www.thegeekstuff.com/2009/06/4-steps-to-define-nagios-contacts-with-email-and-pager-notification/
No comments:
Post a Comment