What is Nagios?
Nagios Core – Free and open source
nagios.org
Nagios XI – Enterprise level, paid with support features etc.
nagios.com
Nagios Server
- Runs the service and host checks that you define. Configuration definitions, sends emails/third party phone calls and SMS, web interface
NRPE – Nagios Remote Plugin Executor
- Local agent that allows nagios server to get a command return message from the host you’re checking (disk usage, cpu load)
Let’s Build It!
Requirements:
CentOS 6.x or 7.x minimal
Ideally 2 Linux VMs, 1 to be the Nagios server and 1 to be a host to check against
Root access
###########################################################################
* Verify SELinux being disabled or in permissive mode
sudo setenforce 0
This will set SELinux to permissive mode without needed a restart
* Install needed packages
sudo yum install epel-release sudo yum install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd vim
** If you have iptables, or firewalld running you’ll want to open up port 80 and 5666. CentOS minimal does not come with these installed **
Change admin password
sudo htpasswd /etc/nagios/passwd nagiosadmin
Enter your password
Enable nagios and httpd on boot
sudo chkconfig httpd on && chkconfig nagios on
Fire it up!
sudo service httpd start sudo service nagios start
In a web browser:
<ip addr>/nagios
You’ve started Nagios! Login with:
User: nagiosadmin
Password: <the new password you just created>
Now Let’s Look at configuration
sudo su cd /etc/nagios ls cgi.cfg conf.d/ nagios.cfg objects/ passwd private/
objects/ and nagios.cfg are the things you care most about right now
cd objects/ ls commands.cfg hosts.cfg printer.cfg switch.cfg timeperiods.cfg contacts.cfg localhost.cfg services.cfg templates.cfg windows.cfg
Host Checks
Is the server up? This can be a ping to an ip address, DNS check, or website
Service Checks
CPU load, swap usage, disk utilization, process running etc.
Default install will configure a template for localhost – Go to your <ip addr>/nagios —> On the left-hand side click Services
- Current Load, Current Users, HTTP, PING, Root Partition, SSH, Swap Usage, Total Processes
Hierarchy:
Any .cfg file will be processed as a standalone file. So localhost.cfg can be set by itself, You could use a dynamic CI/CD approach and make a .cfg file for every one of your servers, but that would be crazy town. Or would it? << Show some of the dynamic config we have for prod and stage at Craftsy>>
You can create templates for teams to alert on certain servers, have phone calls or just emails, non-alerting stuff etc.
Image via: https://www.rittmanmead.com/blog/2012/09/an-introduction-to-monitoring-obiee-with-nagios/
Let’s Define Contacts First
vim contacts.cfg
Let’s make a new contact, and add a service to that contact
define contactgroup { contactgroup_name ops alias Ops Team members nagiosadmin }
—> Update nagiosadmin email to be your email
email nagios@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
Look back at templates.cfg, and the top one for generic-contact
name The name that you call in other .cfg configuration files service_notification_period The time that you want *service* alerts to fire. Can configure work-hours, 24x7 etc host_notification_period The time you want *host* alerts to fire service_notification_options Warning, Unknown, Critical, Resolved, Flapping, Scheduled downtime host_notification_options Down, Unknown, Resolved, Flapping, Scheduled downtime service_notification_commands Define email alerts, third party integrations (VictorOps, PagerDuty, OpsGenie) host_notification_commands Define alerts for host notifications register Partial definition or not
** A note about the register portion. Use this if you are making a template that is a partial object definition. This allows inheritance within other definitions **
So we added our contact to the new contact we made, and told it to use the generic-contact, so that configuration will apply to our newly added contact
Let’s Define A Host To Alert On
Make a new file called hosts.cfg
define host { host_name sofree alias Software Freedom School address sofree.us use generic-host contacts ops ; The contact we just made }
Now Define What The Parameters Of That Host Check Should Be
vim templates.cfg
define host { name sofree-host use generic-host ; This grabs the notification period, notifications enabled, flap detection etc check_period 24x7 ; What hours this should check check_interval 5 ; How often to check, in minutes retry_interval 1 ; How often to retry when it fails max_check_attempts 10 ; How many times to retry until it alerts. In this config, you will get an alert after 10 minutes of the server being down check_command check-host-alive; Another template for how to check for the host, currently a template for a simple ping. You may make a different host check for http host alive, etc. notification-options d,u,r ; When should notify happen - Down, Up, Resolved contacts ops ; Who to alert to, options are contacts or contact groups register 0 ; Make this a template }
service nagios restart
!! You’ll get an error!!
<< nagios-options should be notification_options — be careful of underscores vs dashes. But you can use the Nagios pre flight check to verify your syntax before you take the service offline and in a bad state >>
nagios -v /etc/nagios/nagios.cfg
You’ll see this error:
Error: Invalid host object directive ' '. Error: Could not add object property in file '/etc/nagios/objects/templates.cfg' on line 199. Error processing object config files!
This is because the notification options directive should have an underscore, not a dash
notification_options d,u,r ; When should notify happen - Down, Up, Resolved
Tell the Main Config to Include Your New Config files
vim /etc/nagios/nagios.cfg cfg_file=/etc/nagios/objects/hosts.cfg
Let’s see the new config!
service nagios restart
Now everyone else go and set up a single host check. Use the sofree.us site or another one of your favorites.
Install NRPE on a separate host
Disable SELinux
setenforce 0
yum install epel-release wget gcc openssl-devel cd /tmp wget http://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz tar -xzf nagios-plugins-2.2.1.tar.gz cd nagios-plugins-2.2.1 ./configure make make install yum install xinetd cd .. wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz tar -xzf nrpe-3.2.1.tar.gz cd nrpe-nrpe-3.2.1 ./configure make all make install-groups-users chown -R nagios.nagios /usr/local/nagios make install make install-config make install-init service xinetd restart chkconfig nrpe on && service nrpe start
Now you need to allow the Nagios server to access NRPE plugins
vim /usr/local/nagios/etc/nrpe.cfg allowed_hosts=<ip addr of server>
Verify NRPE is running
/usr/local/nagios/libexec/check_nrpe -H localhost
Disable IPV6 (not necessary if compiling from source and adding different flags)
vim /etc/sysctl.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 sysctl -p service nrpe restart
Let’s look through the different plugins
ls /usr/local/nagios/libexec
Install NRPE on Nagios Server
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz tar -xzf nrpe-3.21.tar.gz cd nrpe-nrpe-3.2.1 ./configure make check_nrpe make install-plugin
Verify that server can run commands against your defined host
/usr/local/nagios/libexec/check_nrpe -H <ip addr of host> -4 -c check_load
Add services.cfg to the main nagios config
vim /etc/nagios/nagios.cfg cfg_file=/etc/nagios/objects/services.cfg vim services.cfg
Create services.cfg
define service { use sofree-service host_name nrpe_test service_description check_load check_command check_nrpe!check_load } define service { use sofree-service host_name nrpe_test service_description check_xvda1 check_command check_nrpe!check_hda1 }
NRPE commands need to be defined in 3 places
1) On server –> services.cfg, or other .cfg file
check_nrpe!check_load check_nrpe!check_hda1
2) On server –> commands.cfg
define command { command_name check_nrpe command_line $USER1$/check_nrpe -u -H $HOSTADDRESS$ -c $ARG1$ }
3) On host –> /etc/nagios/nrpe.cfg || /usr/local/nagios/etc/nrpe.cfg
On host machine, match up the command with the argument you’re passing
command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20 command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/xvda1
Now you go and define a couple services on your host
Logs are located at /var/logs/nagios/nagios.log
Additional material:
• nagios_email_ack
• Nagdash
• Converting epoch time:
cat /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
• Nagios dynamic
• Custom commands (plugins, external scripts etc). API calls are a great use of external scripts.