Installing Nagios server and NRPE

What is Nagios?

Nagios Core – Free and open source
        nagios.org
Nagios XI – Enterprise level, paid with support features etc.
        nagios.com

Nagios Server

  • Runs the service and host checks that you define. Configuration definitions, sends emails/third party phone calls and SMS, web interface

NRPE – Nagios Remote Plugin Executor

  • Local agent that allows nagios server to get a command return message from the host you’re checking (disk usage, cpu load)

 

Let’s Build It!

Requirements:

CentOS 6.x or 7.x minimal
Ideally 2 Linux VMs, 1 to be the Nagios server and 1 to be a host to check against
Root access
###########################################################################
* Verify SELinux being disabled or in permissive mode
sudo setenforce 0
This will set SELinux to permissive mode without needed a restart
* Install needed packages
sudo yum install epel-release

sudo yum install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd vim

 ** If you have iptables, or firewalld running you’ll want to open up port 80 and 5666. CentOS minimal does not come with these installed **

Change admin password
sudo htpasswd /etc/nagios/passwd nagiosadmin
Enter your password
Enable nagios and httpd on boot
sudo chkconfig httpd on && chkconfig nagios on
Fire it up!
sudo service httpd start
sudo service nagios start
In a web browser:
<ip addr>/nagios
You’ve started Nagios! Login with:
User: nagiosadmin
Password: <the new password you just created>

Now Let’s Look at configuration

sudo su
cd /etc/nagios
ls
cgi.cfg conf.d/ nagios.cfg objects/ passwd private/
objects/ and nagios.cfg are the things you care most about right now
cd objects/
ls
commands.cfg hosts.cfg printer.cfg switch.cfg timeperiods.cfg
contacts.cfg localhost.cfg services.cfg templates.cfg windows.cfg

Host Checks

Is the server up? This can be a ping to an ip address, DNS check, or website

Service Checks

CPU load, swap usage, disk utilization, process running etc.
Default install will configure a template for localhost – Go to your <ip addr>/nagios —> On the left-hand side click Services
  • Current Load, Current Users, HTTP, PING, Root Partition, SSH, Swap Usage, Total Processes
Hierarchy:
Any .cfg file will be processed as a standalone file. So localhost.cfg can be set by itself, You could use a dynamic CI/CD approach and make a .cfg file for every one of your servers, but that would be crazy town. Or would it? << Show some of the dynamic config we have for prod and stage at Craftsy>>
You can create templates for teams to alert on certain servers, have phone calls or just emails, non-alerting stuff etc.

061183nagios03.png

Image via: https://www.rittmanmead.com/blog/2012/09/an-introduction-to-monitoring-obiee-with-nagios/

 

Let’s Define Contacts First

vim contacts.cfg
 Let’s make a new contact, and add a service to that contact
define contactgroup {
       contactgroup_name       ops
       alias                   Ops Team
       members                 nagiosadmin
}
—> Update nagiosadmin email to be your email
email nagios@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
Look back at templates.cfg, and the top one for generic-contact
name                           The name that you call in other .cfg configuration files
service_notification_period    The time that you want *service* alerts to fire. Can configure work-hours, 24x7 etc
host_notification_period       The time you want *host* alerts to fire
service_notification_options   Warning, Unknown, Critical, Resolved, Flapping, Scheduled downtime
host_notification_options      Down, Unknown, Resolved, Flapping, Scheduled downtime
service_notification_commands  Define email alerts, third party integrations (VictorOps, PagerDuty, OpsGenie)
host_notification_commands     Define alerts for host notifications
register                       Partial definition or not
         ** A note about the register portion. Use this if you are making a template that is a partial object definition. This allows inheritance within other definitions **
So we added our contact to the new contact we made, and told it to use the generic-contact, so that configuration will apply to our newly added contact

Let’s Define A Host To Alert On

Make a new file called hosts.cfg
define host {
    host_name       sofree
    alias           Software Freedom School
    address         sofree.us
    use             generic-host
    contacts        ops        ; The contact we just made
}

Now Define What The Parameters Of That Host Check Should Be

vim templates.cfg
define host {
    name sofree-host
    use generic-host ; This grabs the notification period, notifications enabled, flap detection etc
    check_period 24x7 ; What hours this should check
    check_interval 5 ; How often to check, in minutes
    retry_interval 1 ; How often to retry when it fails
    max_check_attempts 10 ; How many times to retry until it alerts. In this config, you will get an alert after 10 minutes of the server being down
    check_command check-host-alive; Another template for how to check for the host, currently a template for a simple ping. You may make a different host check for http host alive, etc.
 notification-options d,u,r ; When should notify happen - Down, Up, Resolved
    contacts ops ; Who to alert to, options are contacts or contact groups
    register 0 ; Make this a template
}
service nagios restart
!! You’ll get an error!!
 << nagios-options should be notification_options — be careful of underscores vs dashes. But you can use the Nagios pre flight check to verify your syntax before you take the service offline and in a bad state >> 
nagios -v /etc/nagios/nagios.cfg
 You’ll see this error:
Error: Invalid host object directive ' '.
Error: Could not add object property in file '/etc/nagios/objects/templates.cfg' on line 199.
 Error processing object config files!

This is because the notification options directive should have an underscore, not a dash

notification_options d,u,r ; When should notify happen - Down, Up, Resolved

Tell the Main Config to Include Your New Config files

vim /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/hosts.cfg
Let’s see the new config!
service nagios restart
Now everyone else go and set up a single host check. Use the sofree.us site or another one of your favorites.

Install NRPE on a separate host

Disable SELinux

setenforce 0
yum install epel-release wget gcc openssl-devel
cd /tmp
wget http://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz
tar -xzf nagios-plugins-2.2.1.tar.gz
cd nagios-plugins-2.2.1
./configure
make
make install
yum install xinetd
cd ..
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz
tar -xzf nrpe-3.2.1.tar.gz
cd nrpe-nrpe-3.2.1
./configure
make all
make install-groups-users
chown -R nagios.nagios /usr/local/nagios
make install
make install-config
make install-init
service xinetd restart
chkconfig nrpe on && service nrpe start
Now you need to allow the Nagios server to access NRPE plugins
vim /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=<ip addr of server>

Verify NRPE is running

/usr/local/nagios/libexec/check_nrpe -H localhost
Disable IPV6 (not necessary if compiling from source and adding different flags)
vim /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
sysctl -p
service nrpe restart

Let’s look through the different plugins

ls /usr/local/nagios/libexec

Install NRPE on Nagios Server

wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz
tar -xzf nrpe-3.21.tar.gz
cd nrpe-nrpe-3.2.1
./configure
make check_nrpe
make install-plugin
Verify that server can run commands against your defined host
/usr/local/nagios/libexec/check_nrpe -H <ip addr of host> -4 -c check_load
Add services.cfg to the main nagios config
vim /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/services.cfg
vim services.cfg
Create services.cfg
define service {
   use sofree-service
   host_name nrpe_test
   service_description check_load
   check_command check_nrpe!check_load
}
define service {
   use sofree-service
   host_name nrpe_test
   service_description check_xvda1
   check_command check_nrpe!check_hda1
}

NRPE  commands need to be defined in 3 places

1) On server –> services.cfg, or other .cfg file

check_nrpe!check_load
check_nrpe!check_hda1

2) On server –> commands.cfg

define command {
 command_name check_nrpe
 command_line $USER1$/check_nrpe -u -H $HOSTADDRESS$ -c $ARG1$
}

3) On host –> /etc/nagios/nrpe.cfg || /usr/local/nagios/etc/nrpe.cfg

On host machine, match up the command with the argument you’re passing

command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/xvda1

Now you go and define a couple services on your host

Logs are located at /var/logs/nagios/nagios.log

Additional material:

• nagios_email_ack

• Nagdash

• Converting epoch time:

cat /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'

• Nagios dynamic

• Custom commands (plugins, external scripts etc). API calls are a great use of external scripts.

Advertisements