Nagios安装
基础环境 [root@m01 yum.repos.d]# cat /etc/redhat-release CentOS release 6.7 (Final) [root@m01 yum.repos.d]# uname -r 2.6.32-573.el6.x86_64 [root@m01 yum.repos.d]# uname -m x86_641、准备3台服务器
管理IP 角色 备注 10.0.0.61 nagios Nagios 服务器端 10.0.0.8 web01 被监控的客户端服务器 10.0.0.7 web02 被监控的客户端服务器2、设置yum安装源
[root@m01 ~]# ping www.baidu.com(确保可以上网) PING www.a.shifen.com (61.135.169.121) 56(84) bytes of data. 64 bytes from 61.135.169.121: icmp_seq=1 ttl=128 time=3.99 mscd /etc/yum.repos.d/
/bin/mv CentOS-Base.repo CentOS-Base.repo.oldboy.ori wget -O /etc/yum.repos.d/CentOS-Base.repo3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo 'export LC_ALL=C'>>/etc/profile [root@m01 yum.repos.d]# tail -1 /etc/profile export LC_ALL=C [root@m01 yum.repos.d]# source /etc/profile [root@m01 yum.repos.d]# echo $LC_ALL C [root@m01 yum.repos.d]# cd ~4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop [root@m01 ~]# /etc/init.d/iptables status iptables: Firewall is not running. [root@m01 ~]# chkconfig iptables off [root@m01 ~]# chkconfig --list iptables iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [root@m01 ~]#sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config 修改配置文件则永久生效,但是必须重启系统 [root@m01 ~]# getenforce Disabled5、解决系统时间同步问题
[root@m01 ~]# crontab -l #time sync by oldboy at 2010-2-1 */5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&16、安装Nagios服务端所需安装包
yum install gcc glibc glibc-common -y yum install gd gd-devel -y yum install mysql-server -y yum install httpd php php-gd -y[root@m01 ~]# rpm -qa mysql httpd php
httpd-2.2.15-47.el6.centos.4.x86_64 php-5.3.3-46.el6_7.1.x86_64 mysql-5.1.73-5.el6_7.1.x86_647、创建Nagios服务器端需要的用户及组
[root@m01 ~]# /usr/sbin/useradd nagios [root@m01 ~]# /usr/sbin/useradd apache -M -s /sbin/nologin useradd: user 'apache' already exists [root@m01 ~]# /usr/sbin/groupadd nagcmd [root@m01 ~]# /usr/sbin/usermod -a -G nagcmd nagios [root@m01 ~]# /usr/sbin/usermod -a -G nagcmd apache [root@m01 ~]# id -n -G nagios nagios nagcmd [root@m01 ~]# id -n -G apache apache nagcmd8、上传软件包到指定目录或通过URL下载
mkdir -p /home/oldboy/tools/nagios cd /home/oldboy/tools/nagios rz====================================================
安装Nagios服务器端 tar xf nagios-3.5.1.tar.gz cd nagios ./configure --with-command-group=nagcmd make all make install make install-init make install-config make install-commandmode1、安装Nagios Web配置文件及创建登录用户
make install-webconf htpasswd -bc /usr/local/nagios/etc/htpasswd.users oldboy 123456 cat /usr/local/nagios/etc/htpasswd.users /etc/init.d/httpd reload2、添加监控报警信息接受的Email地址
cp /usr/local/nagios/etc/objects/contacts.cfg{,.ori} sed -i 's#nagios@localhost#976199267@qq.com#g' /usr/local/nagios/etc/objects/contacts.cfg 使用第三方邮件服务商提供的邮箱,把下列一行添加达到/etc/mail.rc里 [root@m01 tools]# tail -1 /etc/mail.rc set from=18516688992@163.com smtp=smtp.163.com smtp-auth-user=18516688992 smtp-auth-password=tian123 smtp-auth=login3、配置Apache服务并加入系统开机自启动
[root@m01 tools]# /etc/init.d/httpd start Starting httpd: [root@m01 tools]# /etc/init.d/httpd restart Stopping httpd: [ OK ] Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 172.16.1.61 for ServerName [ OK ] [root@m01 tools]# chkconfig httpd on [root@m01 tools]# netstat -lntup|grep httpd tcp 0 0 :::80 :::* LISTEN 53291/httpd在浏览器登录
10.0.0.61/nagios 输入用户名和密码 oldboy 123456 显示nagios core就正常了4、安装Nagios插件软件包
安装基础依赖包 yum install perl-devel openssl-devel -y 安装Nagiospluginx插件包 wget [root@m01 tools]# ls nagios-plugins-1.4.16.tar.gz nagios-plugins-1.4.16.tar.gz [root@m01 tools]# tar xf nagios-plugins-1.4.16.tar.gz [root@m01 tools]# cd nagios-plugins-1.4.16 [root@m01 nagios-plugins-1.4.16]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql [root@m01 nagios-plugins-1.4.16]# make [root@m01 nagios-plugins-1.4.16]# make install5、安装nrpe软件
ls /usr/local/nagios/libexec/check_nrpe [root@m01 nagios-plugins-1.4.16]# cd .. tar xf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure make all make install -plugin make install -daemon make install -daemon-config [root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l 60检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe /usr/local/nagios/libexec/check_nrpe [root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l 60 到此为止Nagios服务器端的软件安装部分就配置完成了6、配置并启动Nagios服务
添加Nagios服务到开机自启动 [root@m01 tools]# chkconfig nagios on [root@m01 tools]# chkconfig --list nagios nagios 0:off 1:off 2:on 3:on 4:on 5:on 6:off 更好的办法 [root@m01 tools]# echo "/etc/init.d/nagios start">>/etc/rc.local [root@m01 tools]# tail -1 /etc/rc.local /etc/init.d/nagios start 检查语法 [root@m01 tools]# /etc/init.d/nagios checkconfig Running configuration check... OK. 启动Nagios服务 [root@m01 tools]# /etc/init.d/nagios start Starting nagios: done. 检查Nagios服务器端进程及端口 [root@m01 tools]# ps -ef |grep nagios|grep -v grep nagios 15895 1 0 16:41 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg [root@m01 tools]# netstat -lntup|grep nagios===============================================
Nagios客户端安装 1、基础环境 [root@m01 yum.repos.d]# cat /etc/redhat-release CentOS release 6.7 (Final) [root@m01 yum.repos.d]# uname -r 2.6.32-573.el6.x86_64 [root@m01 yum.repos.d]# uname -m x86_642、准备2台服务器
管理IP 角色 备注 10.0.0.8 web01 被监控的客户端服务器 10.0.0.7 web02 被监控的客户端服务器3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo 'export LC_ALL=C'>>/etc/profile [root@m01 yum.repos.d]# tail -1 /etc/profile export LC_ALL=C [root@m01 yum.repos.d]# source /etc/profile [root@m01 yum.repos.d]# echo $LC_ALL4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop [root@m01 ~]# /etc/init.d/iptables status iptables: Firewall is not running. [root@m01 ~]# chkconfig iptables off [root@m01 ~]# chkconfig --list iptables iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [root@m01 ~]#sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config 修改配置文件则永久生效,但是必须重启系统 [root@m01 ~]# getenforce Disabled5、解决系统时间同步问题
[root@m01 ~]# crontab -l #time sync by oldboy at 2010-2-1 */5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1=============================================
正式安装 1、安装基础系统软件 yum install gcc glibc glibc-common -y yum install mysql-server -y[root@m01 ~]# rpm -qa mysql
mysql-5.1.73-5.el6_7.1.x86_64 2、上传软件包到指定目录或通过URL下载 mkdir -p /home/oldboy/tools/nagios cd /home/oldboy/tools/nagios rzunzip -q oldboy_training_nagios_soft.zip
3、添加Nagios用户
[root@web01 nagios]# useradd nagios -M -s /sbin/nologin [root@web01 nagios]# id nagios uid=508(nagios) gid=508(nagios) groups=508(nagios)4、安装nagios-plugins插件
[root@web02 nagios]# yum install perl-devel perl-CPAN openssl-devel -y [root@web02 nagios]# tar xf nagios-plugins-1.4.16.tar.gz [root@web02 nagios]# cd nagios-plugins-1.4.16 [root@web02 nagios-plugins-1.4.16]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql 检查插件数 [root@web01 nagios]# ls /usr/local/nagios/libexec/|wc -l 615、安装nrpe软件
[root@m01 nagios-plugins-1.4.16]# cd .. ls /usr/local/nagios/libexec/check_nrpe tar xf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure make all make install -plugin 下面两个会报错 make install -daemon make install -daemon-config [root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l 60检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe /usr/local/nagios/libexec/check_nrpe [root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l 606、安装其他相关的插件
[root@web01 nrpe-2.12]# cd .. [root@web01 nagios]# #----------Dear,我是分隔符--------------------- tar zxf Params-Validate-0.91.tar.gz cd Params-Validate-0.91 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- tar zxf Class-Accessor-0.31.tar.gz cd Class-Accessor-0.31 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- tar zxf Config-Tiny-2.12.tar.gz cd Config-Tiny-2.12 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- tar zxf Math-Calc-Units-1.07.tar.gz cd Math-Calc-Units-1.07 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- tar zxf Regexp-Common-2010010201.tar.gz cd Regexp-Common-2010010201 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- tar zxf Nagios-Plugin-0.34.tar.gz cd Nagios-Plugin-0.34 perl Makefile.PL make make install cd .. #----------Dear,我是分隔符--------------------- #yum install sysstat -y如果报错就是前面的perl环境变量没提前设置好
7、配置监控内存、磁盘I/O脚本插件
yum install dos2UNIX -y /bin/cp /home/oldboy/tools/nagios/check_memory.pl /usr/local/nagios/libexec/ /bin/cp /home/oldboy/tools/nagios/check_iostat /usr/local/nagios/libexec/ chmod 755 /usr/local/nagios/libexec/check_memory.pl chmod 755 /usr/local/nagios/libexec/check_iostat dos2unix /usr/local/nagios/libexec/check_memory.pl dos2unix /usr/local/nagios/libexec/check_iostat8、配置Nagios客户端nrpe服务
cd /usr/local/nagios/etc/ [root@web02 etc]# sed -n '79p' nrpe.cfg allowed_hosts=127.0.0.1 [root@web01 etc]# sed -i 's#allowed_hosts=127.0.0.1#allowed_hosts=127.0.0.1,10.0.0.61#g' nrpe.cfg [root@web01 etc]# sed -n '79p' nrpe.cfg allowed_hosts=127.0.0.1,10.0.0.619、然后在命令模式下执行shift+g命令道结尾。并进行如下操作
第一步,注释掉199-203行 #command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 #command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 #command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 #command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 第二步,在下面新添加要监控的内容: command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10% -c 3% command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p / command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10% command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10 10、启动Nagios client nrpe守护进程 [root@web02 etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 检查启动结果 [root@web02 etc]# netstat -lntup|grep nrpe tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 24505/nrpe [root@web02 etc]# ps -ef |grep nrpe |grep -v grep nagios 24505 1 0 19:56 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d重启技巧(这里不用重启)
#pkill nrpe #/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d11、加入开机自启
[root@web01 etc]# echo "#nagios nrpe process cmd by wangtian 2016-5-22">>/etc/rc.local [root@web01 etc]# echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local 检查 [root@web01 etc]# tail -2 /etc/rc.local #nagios nrpe process cmd by wangtian 2016-5-22 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d =============================================================================== Nagios服务器端监控 修改主配置文件(新手不需要,需要的话自己加上去书上582页) [root@m01 tools]#cp /usr/local/nagios/etc/nagios.cfg{,.ori} [root@m01 tools]# vim /usr/local/nagios/etc/nagios.cfg +34 增加如下主机和服务的配置文件 cfg_file=/usr/local/nagios/etc/objects/hosts.cfg cfg_file=/usr/local/nagios/etc/objects/services.cfg cfg_dir=/usr/local/nagios/etc/objects/services/ 然后注释下列 # Definitions for monitoring the local (Linux) host #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg根据已有数据生成hosts.cfg
[root@m01 tools]# cd /usr/local/nagios/etc/objects/ [root@m01 objects]# head -51 localhost.cfg >hosts.cfg [root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg然后生成新的空services.cfg服务文件
[root@m01 objects]# touch services.cfg [root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/services.cfg 最后,生成服务的配置文件目录 [root@m01 objects]# mkdir services [root@m01 objects]# chown -R nagios.nagios /usr/local/nagios/etc/objects/services 检查 [root@m01 objects]# ls -lrt total 60 -rw-rw-r-- 1 nagios nagios 10812 May 22 15:14 templates.cfg -rw-rw-r-- 1 nagios nagios 7716 May 22 15:14 commands.cfg -rw-rw-r-- 1 nagios nagios 3208 May 22 15:14 timeperiods.cfg -rw-rw-r-- 1 nagios nagios 5403 May 22 15:14 localhost.cfg -rw-rw-r-- 1 nagios nagios 4019 May 22 15:14 windows.cfg -rw-rw-r-- 1 nagios nagios 3124 May 22 15:14 printer.cfg -rw-rw-r-- 1 nagios nagios 3293 May 22 15:14 switch.cfg -rw-r--r-- 1 root root 2166 May 22 15:26 contacts.cfg.ori -rw-rw-r-- 1 nagios nagios 2166 May 22 15:28 contacts.cfg -rw-r--r-- 1 nagios nagios 1870 May 22 20:36 hosts.cfg -rw-r--r-- 1 nagios nagios 0 May 22 20:38 services.cfg drwxr-xr-x 2 nagios nagios 4096 May 22 20:39 services====================================================================
配置Nagios服务器端监控项
1、定义要监控的Nagios客户端主机 [root@m01 objects]# cd /usr/local/nagios/etc/objects/ [root@m01 objects]# cp hosts.cfg.ori{,.1} [root@m01 objects]# egrep -v "#|^$" hosts.cfg.ori >hosts.cfg [root@m01 objects]# vim hosts.cfg 检查 [root@m01 objects]# cat hosts.cfg define host{ use linux-server host_name web01 alias web01 address 10.0.0.8 } define host{ use linux-server host_name web02 alias web02 address 10.0.0.7 } define hostgroup{ hostgroup_name linux-servers alias Linux Servers members web01,web02 }2、配置services.cfg,定义要监控的资源服务
[root@m01 objects]#cp services.cfg{,.ori} [root@m01 objects]#vim services.cfg [root@m01 objects]# cat services.cfg define service { use generic-service host_name web01,web02 service_description Disk Partition check_command check_nrpe!check_disk } define service { use generic-service host_name web01,web02 service_description Swap Useage check_command check_nrpe!check_swap } define service { use generic-service host_name web01,web02 service_description MEM Useage check_command check_nrpe!check_mem } define service { use generic-service host_name web01,web02 service_description Current Load check_command check_nrpe!check_load } define service { use generic-service host_name web01,web02 service_description Disk Iostat check_command check_nrpe!check_iostat!5!11 } define service { use generic-service host_name web01,web02 service_description PING check_command check_ping!100.0,20%!500.0,60% } 3、调试hosts.cfg和service.cfg的所有配置 [root@m01 objects]# cp commands.cfg{,.ori} [root@m01 objects]# vim commands.cfg [root@m01 objects]# tail -5 commands.cfg # 'check_nrpe' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }4、检查语法
/etc/init.d/nagios checkconfig 出现OK就可以启动了 /etc/init.d/nagios start 如果已经启动了,就执行/etc/init.d/nagios reload在网页输入服务器端IP/Nagios就可以看到结果啦
=====================================================================================
配置报警(前面已经修改过邮箱报警,需要其他报警的自行扩展) 配置报警就是配置contacts.cfg文件。可以将公司所有的运维人员都加入到这个文件中,如果有需要还可以分组。配置报警的步骤:
(1) 添加联系人及联系组contacts.cfg; define contact{ contact_name oldboy-pager use generic-contact alias Nagios users email 18901398229 } (2) 添加报警的命令commands.cfg define command { command_name notify-host-by-pager command_line $USER1$/sms_send "$HOSTSTATE$ alert for $HOSTNAME$" $CONTACTOAGER$ } define command { command_name notify-service-by-pager command_line $USER1$/sms_send "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTOAGER$ } (3) 调整联系人模板,添加报警的命令(来自于commands.cfg): define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email,notify-service-by-pager host_notification_commands notify-host-by-email,notify-host-by-pager register 0 } (4) 在hosts.cfg和service.cfg配置文件中添加报警联系人及组,或者在模板中添加 contact_groups admins,group1,group2,user1
一些排错的思路
(1) 客户端获取值失败:
[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk
CHECK_NRPE: Error - Could not complete SSL handshake. # 握手失败
# 这种问题的解决办法很简单,只需要执行下面这条命令即可:
[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk
# 如果能够获得值,那就是没有添加网卡地址,在nrpe.cfg中修改allowed_hosts=127.0.0.1这一行
(2) 状态为CRITICAL
# 这种问题就是连接失败,要么是服务没起,要么就是防火墙没关。我们可以现在本地执行:
/usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk
# 当然ip和参数都可以改,通过该命令就能得到答案,因为改命令就是Nagios获取监控值的过程
(3) 命令行执行能够获取数值,但是web界面去获取不到。
define service {
use generic-service
host_name 02-client1,01-nagios
service_description Disk Partition
check_command check_nrpe!check_disk # 肯定是这个参数定义错了
}
(4) Unable to read output
# 出现这种问题的原因就是获取值的插件没有执行权限,或者是这插件就是有问题的,总之就是插件的错。
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 6% -c 3% # check_memory.pl就是插件
[root@nagios libexec]# chmod +x check_memory.pl # 执行该命令,如果还是不行,那就是插件本身的问题了
总结,当web界面显示出现问题时:
(1) Nagios自身和配置文件;
(2) 在服务器端执行:
/usr/local/nagios/libexec/check_nrpe -H 被监控主机地址 -c 获取值的命令
(3) 在客户端本地执行:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c 获取值的命令
(4) 执行nrpe.cfg配置文件中的获取值的命令:
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 8% -p / # 执行该命令