How to make Nagios configuration easier

Environment: Redhat Enterprise Linux 5 or CentOS 5
Objective: Some tips to make Nagios configuration easier


Steps:

1. setting up monitor host (localhost) first

2. Follow Nagios website PDF document for NRPE to configure monitored host
note:
Install Nagios plugin and NRPE, add monitor host ip into /etc/xinetd.d/nrpe as follows:
       only_from       = 127.0.0.1 10.0.0.20
use 'chkconfig xinetd on' command to make xinetd auto restart then 'service xinetd restart'

3. Public services monitoring
From locahost which is installed Nagios core monitor host, you can add some public services for monitoring:

it's better you can use the different configuration file name for each monitored host and public service, because you can later use vi to global replace the servername and ip address easily.

in /usr/local/nagios/etc/nagios.cfg ,you can add the following to cfg_file section:
cfg_file=/usr/local/nagios/etc/objects/server1.cfg
cfg_file=/usr/local/nagios/etc/objects/server2.cfg
cfg_file=/usr/local/nagios/etc/objects/publicservices.cfg

For publicservices.cfg ,here is an example:
define service{
       use generic-service
        max_check_attempts 2
     host_name   localhost 
      service description   tomcat on app server 1
     check_command  check_tomcat_app1
}


define command{
     command_name   check_tomcat_app1
    commane_line  $USER1/check_http  -I ip address -u /url/jsp/index.jsp  -p 8443 -S -s "Having trouble"
}


define service{
       use generic-service
        max_check_attempts1
     host_name   localhost 
      service description   real user login test

     check_command  check_www
}


define command{
     command_name   check_www
    commane_line  $USER1$nagios.sh ipaddress
}

note:  nagios.sh script must have return code, 0 means OK, 1 means warning, 2 means critical.


4. individual server configuration
for server1.cfg:

define host{
   use  generic-host
   host_name   server1.domain.com
  max_check_attempts 2

   address 1.2.3.4
}



define service {

   use general-service
   host_name server1.domain.com
   service_description  Current Load
   check_command    check_nrpe!check_load


define service {
use general-service
   host_name server1.domain.com
   service_description  Current Load
   check_command    check_nrpe!check_usrlocal
}

on server1 nrpe.cfg, you need to define check_usrlocal and check_load(default,built-in).

For server1, you can copy from server1 then use vi to global replace server1 to server2 and change ip address.

5.  Windows monitored host
install NSClient++, during installation ,in 'allowed host', type in '127.0.0.1,1.2.3.4' (1.2.3.4 is your monitor host), tick 'enable common check plugins' and 'enable NRPE server'.
After finishing installation, in c:\program files\nsclient++\nsc.ini, to uncomment out checkexternalscript.dll line so that you can use alias part.

# windows1.cfg
define host{
   use generic-host
  host_name windows1.domain.com
  address 2.3.4.5
}


define service{
   use general-service
  host_name  windows1.domain.com
  check_command  check_nrpe! -H windows1 -p 5666 -c alias_cpu
}

note: some other alias like alias_disk, alias_up, alias_service, alias_mem


6. FAQ
a.

check_http.c:807: undefined reference to `np_net_ssl_write`. 
solution: make clean first, then make again. 

b. how to check nagios main configuration syntax error?
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

c. how to check remote server ? (disable check-host-alive)
use check_ssh command instead:

     check_command           check_ssh

d. ./configure hangs for nagios-plugin on RHEL4
If you find that the configure script appears to hang on this line:

checking for redhat spopen problem...


use './configure --enable-redhat-pthread-workaround' instead 

Linux bash scripting comman usage guide

Jephe Wu - http://linuxtechres.blogspot.com

Objective: When you need to write bash script, you need to follow certain steps, I've summarized some tips which I've been using for the past.



Parameter and print out script usage:
The first thing after #!/bin/sh line should be usage for your script, especially when your script needs the parameters:

if  [ $# -eq 1 ];then echo "$0 username";exit 1;fi

note: your script requires username as command parameter, so above line will force you to give one parameter after the command.

Path definition:
You should export PATH variables first like this:
export PATH=/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin

Common tips in the script:

  • Let history command show date and time
# export HISTTIMEFORMAT="%h/%d - %H:%M:%S "
# history | less
  • check command return code 
command
if [ $? -ne 0 ];then
commands
exit
fi

  • recursive variable
 A="DATABASE/script.sh";B=${A/DATABASE/orcl};echo $B
it will return value: orcl/script.sh
  • while loop to read each line from /path/to/file and process it
while read line
do
echo $line
done < /path/to/file

or while IFS=\| read left right
do
echo $left $right
done < /path/to/filename

# another method
i=1
LINES=`wc -l /path/to/file | awk '{print $1}'`
while [ $i -le $LINES ]
do
iLINE=`sed -n $i'p' /path/to/file`
echo $iLINE
i=$[$i+1]
done

  • specify return value
if [ $? -eq 0 ];then
return 0
exit
else
return 2
exit
fi
  •  seq command usage in for loop
 for i in `seq -w 1 10`;do echo $i;done
01 01 03 04 05 06 07 08 09 10

  • Always use "bash -n scriptname" to verify if your script got any syntax error.
  •  check if the file size is 0
if [ ! -s /path/to/filename ];then commands ;fi
note: if the /path/to/file filesize is not zero, then run commands

  • use mktemp and mkdtemp 
You can use mktemp and mkdtemp to create temporary file or directorie variables
# TMPFILE1=`mktemp`
# TMPDIR1=`mktempdir`
  • how to let ls function as cd
alias ls=cd

or
ls () { builtin cd ; }

note: how to let ccd function as cd;pwd;foo

alias foo=`echo testing builtin`
ccd () { builtin cd "$1";pwd;foo; }

so ccd /tmp command will cd to /tmp, then print current working directory and print string 'testing builtin'


common tips:
  1. batch rename file
rename 1.txt and 2.txt to 1.bat and 2.bat

ls *.txt | sed 's#\(.*\).txt#mv \1.txt \1.bat#'  | sh 
     
    2.  use bc to calculate from CLI

# echo "scale=2;34359730176/1024/1024/1024" | bc
31.99
# echo "ibase=10;obase=16;10" | bc
A
# echo "ibase=16;obase=A;A" | bc
10

  3.  force root user to run script
if [ "$(id -u)" != "0" ]; then
echo "Only root can run this script"
exit 1
fi


4. vmstat with timestamp


#!/bin/sh
# more vmstat.sh
function eg {
  while read line
  do
    printf "$line"
    date '+ %m-%d-%Y %H:%M:%S'
  done
}

find /var/adm/logs/vmstat.log* -type f -mtime +15 -exec rm -f {} \;
vmstat 10 | eg >> /var/adm/logs/vmstat.log.`date +%Y%m%d`

in crontab:

0 0 * * * sleep 1;kill `ps -ef | grep -v  grep  |grep vmstat.sh | awk '{print $2}'`;sleep 3;nohup /home/oracle/bin/vmstat.sh &


5. variables in for loop


servers=("root@server1" "root@server2" )
for server in ${servers[*]}
do
ssh -n $server .....
    balabala
done

or


Files=(
"primary.xml.gz"
)
for file in ${Files[*]}

do
   balah
done

6. create sparse file and detect it


[root@oratest ~]# dd if=/dev/zero of=filea bs=1M count=0 seek=5
0+0 records in
0+0 records out
0 bytes (0 B) copied, 3.8969e-05 seconds, 0.0 kB/s
You have new mail in /var/spool/mail/root
[root@oratest ~]# ls -l filea
-rw-r--r-- 1 root root 5242880 Jul 19 12:25 filea
[root@oratest ~]# du -sm filea
0 filea
[root@oratest ~]# ls -lh filea -s
0 -rw-r--r-- 1 root root 5.0M Jul 19 12:25 filea


How to remotely get Cisco router serial number?

Environment : Cisco router 3845
Objective: get serial number of router remotely

Steps:

You can use one of the following commands to get Cisco Processer id/Chassis serial numer:

show snmp (Chassis)
show ver  (Processor board ID)
show diag | include Chassis
show inventory (Chassis SN which should be the first one)

note:  related commands:

snmpget -v1 -c communitystring IP mib-2.47.1.1.1.1.11.1
snmp-server chassis-id string (conf t first) 
 

How to create a global readonly user in Oracle

Jephe Wu - http://linuxtechres.blogspot.com


Objective: create a Oracle user to have readonly access for all schemas in the database.


Steps:

  • create user and assign tablespace first.
sqlplus / as sysdba
create user jephe identified by password default tablespace users temporary tablespace temp;
  • create readonly role  as 'readonly' and assign privileges and users to it
sqlplus / as sysdba
drop role readonly;
create role readonly;
grant create session to readonly;
grant select any table to readonly;
grant select any sequence to readonly;
grant select any dictionary to readonly;
grant execute any type to readonly;
grant readonly to jephe;
exit;
  •  You can switch current schema after login as jephe
select sys_context('USERENV','SESSION_USER') from dual;
select sys_context('USERENV','SESSION_SCHEMA') from dual;
alter session set current_schema=schemaname  
note: in DB2, use 'set schema = ABC' command, but the owner of the table created will be still under the connecting user.

note: you can check your privs after login:

select * from session_privs;
select * from session_roles;

How to clean up disk space for Oracle 11g database server

Jephe Wu - http://linuxtechres.blogspot.com

Environment: OEL5 64bit OS and Oracle 11g 64bit database server
Objective:  temp and undo tablespace occupied too much disk space, need to clean up to save space.


Steps:
 1. For shrinking undo tablespace:
$ sqlplus / as sysdba
select name from v$datafile

create undo tablespace UNDOTBS2 datafile '/path/to/undotbs2.dbf' size 100m autoextend on next 20m flashback off;
alter system set undo_tablespace=undotbs2



then go to Enterprise Manger to find out the old undo tablespace name, e.g. undotbs1
drop tablespace undotbs1 including contents and datafiles;

Most likely, you have to bounce database to get the disk space back to OS. So, do this:


sqlplus / as sysdba
alter system checkpoint;
shutdown immediate;




note: how to find out tablespace_name,username and datafile name


select distinct s.owner,s.tablespace_name,d.file_name from dba_segments s,dba_data_files d where s.tablespace_name = d.tablespace_name;


Reference:
metalink How to Shrink the datafile of Undo Tablespace [ID 268870.1] 
 
SQL> create temporary tablespace TEMP1 tempfile 'c:\path\to\temp02.dbf' size 100M extent management 
local uniform size 128K;
SQL> alter database default temporary tablespace TEMP1;
SQL> alter user <username> temporary tablespace TEMP1; if required
DROP TABLESPACE temp INCLUDING CONTENTS AND DATAFILES;
  


2. For shrinking temp tablespace;
Oracle 11g supports to shrink temporary tablespace online, just run this:


alter tablespace temp shrink space; 
 
to shrink the temp tablespace to minimum possible size.
before that, you can use command below to find out the minimum size and which are the 
temp files:
 
select file_name,bytes,blocks from dba_temp_files;
SELECT * FROM dba_temp_free_space;
 

Now you can the minimum possible size, you can use commands below also:
alter tablespace temp shrink space keep 10m;
alter tablespace temp shrink tempfile '/path/to/file.dbf' [keep 20m];
 
select username,temporary_tablespace from dba_users where username = 'SCHEMA_NAME' 
 
For shrinking temp tablespace for database 10g, refer to 
 How to Shrink the datafile of Temporary Tablespace [ID 273276.1] 
SQL> create temporary tablespace TEMP1 tempfile 'c:\temp01.dbf' size 100M extent management
local uniform size 128K;
 SQL> alter database default temporary tablespace TEMP1;
SQL> alter user <username> temporary tablespace TEMP1; - optional
sql> DROP TABLESPACE temp INCLUDING CONTENTS AND DATAFILES;
 
and
 
Resizing (or Recreating) the Temporary Tablespace [ID 409183.1] 
 
SQL> ALTER DATABASE TEMPFILE '/u02/oradata/TESTDB/temp01.dbf' DROP INCLUDING 
DATAFILES; 

Database altered. 

SQL> ALTER TABLESPACE temp ADD 
TEMPFILE '/u02/oradata/TESTDB/temp01.dbf' SIZE 512m 
2 AUTOEXTEND ON NEXT 
250m MAXSIZE UNLIMITED; 

Tablespace altered. 

On some platforms 
(i.e. Windows 2000), it is possible for the tempfile to be deleted from 
DBA_TEMP_FILES but not from the hard drive of the server. 
If this occurs, 
simply delete the file using regular O/S commands. 

SQL> SELECT 
tablespace_name, file_name, bytes 
2 FROM dba_temp_files WHERE 
tablespace_name = 'TEMP'; 

TABLESPACE_NAME FILE_NAME BYTES 

----------------- -------------------------------- -------------- 
TEMP 
/u02/oradata/TESTDB/temp01.dbf 536,870,912 

If users are currently 
accessing the tempfile that you are attempting to drop, you may receive the 
following error: 

SQL> ALTER DATABASE TEMPFILE 
'/u02/oradata/TESTDB/temp01.dbf' DROP INCLUDING DATAFILES; 
ALTER DATABASE 
TEMPFILE '/u02/oradata/TESTDB/temp01.dbf' DROP INCLUDING DATAFILES 
* 

ERROR at line 1: 
ORA-25152: TEMPFILE cannot be dropped at this time 
 
How to use command line to generate AWR and ADDM report?
@$ORACLE_HOME/rdbms/admin/awrrpt.sql 
@$ORACLE_HOME/rdbms/admin/addmrpt.sql  
@?/rdbms/admin/ashrpt.sql
 
How to create temporary tablespace
create temporary tablespace temp2 tempfile '/data/tb_temp/temp2.dbf' size 100m autoextend on next 10m flashback off; 
ALTER USER "user1"  TEMPORARY TABLESPACE "TEMP2" 

How to send out email through Linux command line or Windows

Environment: RHEL servers
Objective: sending email directly from Linux CLI


Methods:

1. use Sendmail command:
(echo "From: jephe.wu@domain.com";echo "To: recipient@domain.com";echo "Subject: testing";echo "";cat filename) | /usr/sbin/sendmail -v jephe.wu@domain.com


2. use Mutt command:
[jephe@app1 ~]$ more .muttrc
set from="app1@domain.com"
set envelope_from=yes

note: or put above .muttrc to /etc/Muttrc

edit /home/jephe/.muttrc as above , then send like this:
$ mutt -a /etc/hosts jephe.wu@domain.com < /dev/null

note: you can man muttrc:
$ export LANG=en-US
$ man muttrc

3. use email (email.cleancode.org)
or ssmtp (install EPEL < http://fedoraproject.org/wiki/EPEL > then yum install ssmtp)
or msmtp (http://msmtp.sourceforge.net/) (msmtp is recommended),
or mailsend (http://www.muquit.com/muquit/software/mailsend/mailsend.html )
or http://untroubled.org/nullmailer/

4. Windows command line email client
http://www.blat.net/


How to online resize LVM2 partitions size

Jephe Wu  - http://linuxtechres.blogspot.com

Environment: Fedora 3 or RHEL servers with LVM partitions
Objective: online increase partitions size including /, /usr, /var partitions


Steps:

1.  login as root, run 'vgdisplay' to find out the free PE extend and size

  Total PE              2039
  Alloc PE / Size       1745 / 54.53 GB
  Free  PE / Size       294 / 9.19 GB

2. run 'vgdisplay -v' to find out the logical volume LV name which you want to increase, let's say it's /dev/VolGroup00/LogVol06

3. increase the 5 Physical Extent(PE)
lvm lvresize -l +5 /dev/VolGroup00/LogVol06

4. use resize2fs(RHEL 4, 5) or ext2online(Fedora, starting from Fedora 6, this command is removed) to online increase partition file system size

ext2online /dev/VolGroup00/LogVol06 [newsize]
resize2fs  /dev/VolGroup00/LogVol06

Online reduce a /data/ partition and increase / partition
Objective: LVM partitions allocation is done, there's no free PE available already, need to reduce /data partition and increase / partition
Environment: CentOS 5.4, reduce /data from 100g to 50g, and increase / from 15g to 30g, still left 35g free for future
Steps:
umount /data
e2fsck -f /dev/VolGroup01/LogVol02
resize2fs /dev/VolGroup01/LogVol02 50G
lvreduce -L 50G /dev/VolGroup01/LogVol02
e2fsck -f /dev/VolGroup01/LogVol02

# online increase / partition from 15G to 30G and increase file system to make use of the full 30G space
lvresize -L 30G /dev/VolGroup01/LogVol00
resize2fs /dev/VolGroup01/LogVol00


vgdisplay -v

Use RIP Linux to shrink LVM root partition for default CentOS 5.3

Environment: default installation of CentOS 5.3 with a very bit /root LVM partition
Objective: reduce the root LVM partition to 20G file system size

Steps:

1. boot up with a RIP Linux CD
2. run command 'lvm -a y VolGroup00' to activate the volume group and logical volumes
3. resize2fs /dev/VolGroup00/LogVol00 20G
4. lvm lvresize --size 20G /dev/VolGroup00/LogVol00
note: to be very safe, you might want to resize logical volume to 21G instead of 20G, but according to the article from redhat access below, it's not required.
5. reboot



How do I reduce the size of the root file system after installation Red Hat Enterprise Linux 5?

I have attached above article from redhat below:

Release Found: Red Hat Enterprise Linux 5


The default file system layout from the Red Hat Enterprise Linux 5 installation process includes a special space for /boot and swap space then gives all left space to one logical volume and used the logical volume as root / volume.

Integrating all data files and system files in one file system is not always an ideal choice for production systems. If the system cannot be reinstalled, it is possible to reduce the size of the root file system and the logical volume on which it resides.

Reducing the logical volume on the root / volume must be done in rescue mode.

First, boot the system from Red Hat Enterprise Linux 5 Disc 1, and at the prompt, type linux rescue and press enter. When prompted for language, and keyboard, provide the pertinent information for the system. When prompted to enable the network devices on the system, select "No." Finally, select "Skip" when prompted to allow the rescue environment to mount Red Hat Enterprise Linux installation under the /mnt/sysimage directory. The filesystems MUST NOT be mounted to carry out the following steps.

Next run following commands to scan all disks LVM2 volume groups:

# lvm.static vgscan

Next, activate the logical volume to reduce. In this example, /dev/VolGroup00/LogVol00 was made available with the following command:


# lvm.static lvchange -ay /dev/VolGroup00/LogVol00

Next, reduce the size of file system and logical volume on /dev/VolGroup00/LogVol00. Please make sure there is enough space left on the root / file system and that the logical volume is large enough to contain all the data that was previously present. If the file system is at close to being full, for example, this may not work. Before resizing file system, run e2fsck to check file system first.


# e2fsck -f /dev/VolGroup00/LogVol00
# resize2fs /dev/VolGroup00/LogVol00 3000M
# lvm.static lvreduce -L 3000M /dev/VolGroup00/LogVol00

Please note that this is done on /dev/VolGroup00/LogVol00. The number at the end is the final size of the file system, not the amount it is reduced by.

Finally, verify the modification then reboot the system.


# lvm.static vgdisplay VolGroup00
# exit

WARNING: Resizing an active logical volume can cause catastrophic data loss if carried out incorrectly. Plan and act accordingly. ALWAYS create backups!

 References

http://www.redhat.com/magazine/009jul05/features/lvm2/ 

 How to remove a LVM partition forcely?

# kpartx -d xxxx 

General Linux server network performance guide

Environment: Linux web server serves browser client and Database server on the local LAN serves the application server which is also on the same LAN
Objective: maximum the network performance which is one of the 4 performance bottlenecks(CPU,Memory,Storage and Network I/O)



Parameters:

1. net.core.wmem_default(/proc/sys/net/core/wmem_default) and net.core.rmem_default(/proc/sys/net/core/rmem_default) (the following settings are also recommended by Oracle 11gR1 installation)

net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 4194304
 
For Oracle database, it's not recommended to configure net.ipv4.tcp_rmem and net.ipv4.tcp_wmem.
as stated on metalink.

2. net.core.netdev_max_backlog (/proc/sys/net/core/netdev_max_backlog), default is 1000 in Linux RHEL 5 kernel 2.6
net.core.netdev_max_backlog=30000
 
set maximum number of incoming packets that will be queued for delivery to the device queue.
 
3. net.core.somaxconn(/proc/sys/net/core/somaxconn), default is 128
maximum accept queue backlog that can be specified via the listen() system call. or the number of pending connection requests.
 
net.core.somaxconn=1536
 
4. optmem_max (/proc/sys/net/core/optmem_max)
maximum initialization size of socket buffers, expressed in bytes.
 
increase this  
 
TCP options:
5. net.ipv4.tcp_window_scaling (/proc/sys/net/ipv4/tcp_window_scaling), enable
6. disable net.ipv4.tcp_sack(/proc/sys/net/ipv4/tcp_sack)
on LAN, disabling this tcp_sack can actually improve performance.
when tcp_sack is disabled, you should also disable 7 and 8
7. net.ipv4.tcp_dsack(/proc/sys/net/ipv4/tcp_dsack)
8. net.ipv4.tcp_fack(/proc/sys/net/ipv4/tcp_fack)
9. net.ipv4.tcp_max_syn_backlog(/proc/sys/net/ipv4/tcp_max_syn_backlog)
controls the length of the tcp syn queue for each port. If client experience failures
connecting to busy servers, this value should be increased.
10. net.ipv4.tcp_synack_retries(/proc/sys/net/ipv4/tcp_synack_retries) set to 3
controls the number of times kernel tries to resend a response to an incoming syn/ack segments
11. net.ipv4.tcp_retries2 (/proc/sys/net/ipv4/tcp_retries2) set to 5
controls the number of times kernel tries to resend data to a remote host with which it has an established connection.
12. net.ipv4.tcp_max_tw_buckets (/proc/sys/net/ipv4/tcp_max_tw_buckets)
increase this to double value. 
13. net.ipv4.tcp_orphan_retries (/proc/sys/net/ipv4/tcp_orphan_retries)  set to 0
14. net.ipv4.tcp_fin_timeout set to 30
15. net.ipv4.tcp_keepalive
tcp_keepalive_time=120
tcp_keepalive_probes=3
tcp_keepalive_intvl=30
16. ip_local_port_range (net.ipv4.ip_local_port_range)
1024 65000
17. net.ipv4.tcp_window_scaling = 1
18. net.ipv4.tcp_timestamps = 1 
 
Partitions and File system performance:
1. database raid:
For oracle database server hard disk raid. Use raid1 for redo log, archivelog,
including flash recovery area archivelog, temporary tablespace.
use raid1+0 for database files

2. for partition on individual hard disk, the first partition for /boot, the second is 
for swap, the third is for /var, the fourth is for /usr, the last is for /home and /
for other partitions, the first partition is at the outer side of the hard disk which 
is much faster then the inside. outsider partition can be accessed faster than insider ones.
3. add 'noatime' for those often accessed partitions in /etc/fstab.
4. switch to another I/O scheduler 
root (hd0,0) kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/sda2 elevator=deadline
5. swap partition size:
  RAM               Swap Space
  --------------------------------------------
  1 GB - 2 GB       1.5 times the size of RAM
  2 GB - 8 GB       Equal to the size of RAM
  more than 8GB     0.75 times the size of RAM
6. use hugepage and ramfs to improve performance.