Preparing Oracle RAC system

Jephe Wu - http://linuxtechres.blogspot.com

Environment: openfiler as iscsi server, node1 and node2 are OL5.7, all of them are running under VirtualBox VM, Oracle 10gR2 10.2.0.1 clusterware

IP address assignment:
node1: eth0(public): 192.168.1.100, eth1(priv):172.16.1.100, vip:(going to be eth0:0) 192.168.1.102
node2: eth0(public): 192.168.1.101, eth1(priv):172.16.1.101, vip:(going to be eth0:0) 192.168.1.103
openfiler: 172.16.1.200 web login: https://172.16.1.200 (login as : openfiler/password)


Part I: OS installation
a. oracle user and groups
# groupadd oinstall
# groupadd dba
# groupadd oper
# useradd -u 200 -g oinstall -G dba[,oper] oracle

note: make sure all cluseter nodes has the same user and group id, otherwise ,it will fail to install clusterware.

b. verify nobody user exists
# id nobody

c. configure ssh on all nodes
make sure you can ssh into all nodes, public names and priv interconnect names, not vip ip address

d. check hardware requirements
memory size
swap size

e. NFS (http://download.oracle.com/docs/cd/B19306_01/install.102/b14203/prelinux.htm)
If you are using NFS for your shared storage, then you must set the values for the NFS buffer size parameters

rsize and wsize to at least 16384. Oracle recommends that you use the value 32768.

For example, if you decide to use rsize and wsize buffer settings with the value 16384, then update the /etc/fstab

file on each node with an entry similar to the following:

clusternode:/vol/DATA/oradata  /home/oracle/netapp     nfs    

rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,actimeo=0  1 2


f. NTP for both nodes

g. /etc/hosts and dns
For each node, register one virtual host name and IP address in DNS.
For each private interface on every node, add a line similar to the following to the /etc/hosts file on all nodes,

specifying the private IP address and associated private host name:

h. /etc/sysctl.conf
kernel.shmall = 2097152

kernel.shmmax = 2147483648

kernel.shmmni = 4096

kernel.sem = 250 32000 100 128

fs.file-max = 65536

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144

net.core.rmem_max = 1048576

net.core.wmem_default = 262144

net.core.wmem_max = 1048576

run sysctl -p

i. Add the following lines to the /etc/security/limits.conf file:


oracle              soft    nproc   2047

oracle               hard    nproc   16384

oracle               soft    nofile  1024

oracle               hard    nofile  65536

j. Add or edit the following line in the /etc/pam.d/login file, if it does not already exist:


session    required     /lib/security/pam_limits.so
note: should not add above, according to redhat KB, otherwise you cannot login from console
Why the system text console cannot be login? - Article ID: 52661

the following should be added to /etc/profile:

if [ $USER = "oracle" ]; then

        if [ $SHELL = "/bin/ksh" ]; then

              ulimit -p 16384

              ulimit -n 65536

        else

              ulimit -u 16384 -n 65536

        fi

fi


k. identify oracle base  and home directories
more /etc/oraInst.loc  - Inventory directory
more /etc/oratab - Oracle home directory

l. create oracle base and clusterware home directory
mkdir -p /u01/app/oracle/
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle 
(/u01 is the mount point)

mkdir -p /u01/crs/oracle/product/10/crs
chown root:oinstall /u01/crs/
chmod -R 775 /u01/crs/oracle


if mount point is /u01/, then the recommended Oracle clusterware home directory is
/u01/crs/oracle/product/10.2.1/crs


m. Verifying Hangcheck-timer Module on Kernel 2.6

For Red Hat Linux 4.0 and SUSE 9 systems, to verify that the hangcheck-timer module is running on every node:

    Enter the following command on each node to determine which kernel modules are loaded:

    # /sbin/lsmod

    If the hangcheck-timer module is not listed for any node, then enter a command similar to the following to

start the module located in the directories of the current kernel version:

    # insmod /lib/modules/kernel_version/kernel/drivers/char/hangcheck-timer.ko  hangcheck_tick=1

hangcheck_margin=10

    In the preceding command example, the variable kernel_version is the kernel version running on your system.

    To confirm that the hangcheck module is loaded, enter the following command:

    # lsmod | grep hang

    The output should be similar to the following:

    hangcheck_timer         3289  0

    To ensure that the module is loaded every time the system restarts, verify that the local system startup file

contains the command shown in the previous step, or add it if necessary:

        Red Hat:

        On Red Hat Enterprise Linux systems, add the command to the /etc/rc.d/rc.local file.

        SUSE:

        On SUSE systems, add the command to the /etc/init.d/boot.local file.


configure .bash_profile
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_HOME/product/10.2.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.2.0/crs
export PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin

2. clusterware installation

Please refer to http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html for complete

screenshot example
and also
http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2

2.1 os requirements

run as oracle for ./runInstaller to install 

Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5)
Checking operating system version: must be redhat-3, SuSE-9, redhat-4, UnitedLinux-1.0, asianux-1 or asianux-2
                                      Failed <<<<

-> Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5) [ID 456634.1]
If you are installing 10.2 from DVD, copy the <path>/database/install/oraparam.ini to a temporary directory (for

example, /tmp).

If you are installing 10.2 from an OTN download or have copied the 10.2 media to disk, take a backup of

<path>/database/install/oraparam.ini

Now edit oraparam.ini and change the appropriate line:

Original

[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2


New

[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2,redhat-5


note:
a. do not use ifconfig to configure vip before installing clusterware, just configure them in /etc/hosts
for example:

#cat /etc/hosts

127.0.0.1       localhost.localdomain localhost
192.168.1.100   jephe1.jephe.com jephe1
192.168.1.101   jephe2.jephe.com jephe2

172.16.1.100    jephe1-priv.jephe.com jephe1-priv
192.168.1.102   jephe1-vip.jephe.com jephe1-vip

172.16.1.101    jephe2-priv.jephe.com jephe2-priv
192.168.1.103   jephe2-vip.jephe.com jephe2-vip


OCR configration:
normal redundancy

specify OCR Location: /dev/raw/raw1
specify OCR Mirror Location: /dev/raw/raw2

When installing database software, you can choose 'configure ASM', then choose external for redundancy, then choose part of the raw disk for 'DATA' group, later we will run asmca to configure another disk group RECOVERY.

2.2 How To Setup UDEV Rules For RAC OCR And Voting Devices On SLES10, RHEL5, OEL5, OL5
- refer to http://www.held.org.il/blog/2007/11/setting-a-raw-device-in-redhatcentos-5/

configure /etc/sysconfig/rawdevices first, then service rawdevices restart to see those raw devices appear under /dev/raw/*

The following part are optional:
================
Add the required raw device ownership and permissions, for example:
a. Add to /etc/udev/rules.d/60-raw.rules:
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION=="add", KERNEL=="sdc1", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdd1", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="sde1", RUN+="/bin/raw /dev/raw/raw4 %N"
ACTION=="add", KERNEL=="sdf1", RUN+="/bin/raw /dev/raw/raw5 %N"

==================
b. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules

containing lines such as:

    KERNEL==”raw[1-5]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

    The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
    The following permissions have to apply:

OCR Device(s): root:oinstall , mode 0640
Voting device(s): oracle:oinstall, mode 0666
need to test the following commands:
# /sbin/udevcontrol reload_rules
# /sbin/start_udev


Or to add the following to 50-udev.rules
KERNEL=="vcsa[0-9]*",        NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="vcc/*",        NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="raw[1-9]",        OWNER="oracle", GROUP="oinstall", MODE="0640"
KERNEL=="raw10",        OWNER="oracle", GROUP="oinstall", MODE="0640"


# memory devices
KERNEL=="random",        MODE="0666", OPTIONS="last_rule"
KERNEL=="urandom",        MODE="0444", OPTIONS="last_rule"
KERNEL=="mem",            GROUP="kmem", MODE="0640", OPTIONS="last_rule"



permission is very important, otherwise, you might get error like this:
 when run 'crs_stat -t', node2 is offline and when run 'ps -efH' on node2, you will find 'startcheck'  like this:

root      99  416  0 1:23 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     16  67  0 1:46 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     1055  418  0 2:47 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck


check /var/log/messages, you will find this:
Feb  5 12:30:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb  5 12:32:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb  5 12:32:18 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.


[root@node2 ~]# more /tmp/crsctl.3178
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]

check crs to confirm error:
/u01/oracle/product/crs/bin/crsctl check  crs
References:
http://surachartopun.com/2009/04/why-my-oracle-cluster-could-not-start.html

If one node is down, when running 'crs_stat -t', it will show this:

[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....en1.gsd application    ONLINE    OFFLINE              
ora....en1.ons application    ONLINE    OFFLINE              
ora....en1.vip application    ONLINE    ONLINE    node2   
ora....en2.gsd application    ONLINE    ONLINE    node2   
ora....en2.ons application    ONLINE    ONLINE    node2   
ora....en2.vip application    ONLINE    ONLINE    node2 

Finally, if you encounter errors like this:

an error occurred during the interview for this component. oracle clusterware unable to retrieve voting disk information

then you need to 'rm -fr /u01/app/oracle/*' on node1 , then start ./runInstaller again.

2.3. at the end of clusterware installation, you will be prompted to run 2 programs which need root access permission.

permission,follow by the sequences below:
first program on first node
first program on second node

/home/oracle/oraInventory/orainstRoot.sh
ssh jephe2 -l root
/home/oracle/oraInventory/orainstRoot.sh
exit
/home/oracle/oracle/product/10.2.0/crs/root.sh
ssh jephe2 -l root
/home/oracle/oracle/product/10.2.0/crs/root.sh

second program on first node
second program on second node
basically, you need to finish the number 1 script on all nodes first before going to the next script.
also, run it under GUI interface, aka, X windows in vnc on node2 as root user, not oracle user, otherwise vipca will fail, vipca will use GUI

When you run the second /u01/app/oracle/product/10.2.0/crs/root.sh, there might be error like this:
 [root@oratest1 oratest1]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Failed to upgrade Oracle Cluster Registry configuration
=>check log for detail: /u01/app/oracle/product/10.2.0/crs/log/oratest1/alertoratest1.log
see metalink Executing root.sh errors with "Failed To Upgrade Oracle Cluster Registry Configuration" [ID 466673.1]
 => solution is 
a) replace the biary according to above metalink
b) dd if=/dev/zero of=/dev/raw/raw1 bs=1024, (no need to finish, ctrl -c to cancel it)
run above root.sh again on node1


c. FAQ
c.1 some error at the end of running second root program on the second node:
Running vipca(silent) for configuring nodeapps
/home/oracle/oracle/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries: libpthread.so.0:

cannot open shared object file: No such file or directory

-> this is a bug, due to newer version of glibc(RHEL5) is incompatible with Java, need to modify vipca script as follows:

=> you can also modify it first before running root.sh on second node to avoid this error:

vi /home/oracle/oracle/product/10.2.0/crs/bin/vipca
change
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL  
     
to:              

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL                                          
unset LD_ASSUME_KERNEL


 #add this line to uncomment variable LD_ASSUME_KERNEL

if 'srvctl start XXX' command  reports same error, you can vi srvctl script to uncomment LD_ASSUME_KERNEL too, (which srvctl to find the path of the file)

Now, login xwindows on second node, run /home/oracle/oracle/product/10.2.0/crs/bin/vipca again to get errors below: (this error is unavoidable if you are using private ip range for public interface)

Error 0(Native: listNetInterfaces:[3])

[Error 0(Native: listNetInterfaces:[3])]


solution:

$ oifcfg iflist

eth0 192.168.1.0
eth1  172.16.1.0

$ oifcfg setif -global eth0/192.168.1.0:public
$ oifcfg setif -global eth1/172.16.1.0:cluster_interconnect
$ oifcfg getif


eth0 192.168.1.0  global  public
eth1 172.16.1.0  global  cluster_interconnect

# vipca  (run on the second node with root user login at X windows, don't run it with oracle user then su - as root)
just choose eth0 as vip interface, do not choose eth0:0, otherwise, after finishing installation, the vip will become eth0:0:1 for 192.168.1.102 for node1

You will be asked for entering ip alias name and ip address for both codegen1 and codegen2. Firstly, enter vip address for codegen1: 192.168.1.102, then the rest will come out automatically.
node1-vip.jephe.com and node2-vip.jephe.com are ip alias name.

after finishing configure vipca, back to CRS installation GUI, click ok to finish installation of clusterware

If environment variable
$ORA_CRS_HOME/cfgtoollogs/configToolFailedCommands.sh

If you got error: PRKR-1062 : Failed to find configuration for node codegen1,
then your /etc/hosts might be missing domain name such as
192.168.1.100 jephe1
not
192.168.1.100 jephe1.jephe.com jephe1

==============
cronjob for checking RAC configuration:

---------

#!/bin/sh
. /home/oracle/.bash_profile

DATE=`date +%Y%m%d`
TMPFILE=`mktemp`

echo "running ocrconfig -showbackup" > $TMPFILE
ocrconfig -showbackup >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running ocrcheck" >> $TMPFILE
ocrcheck >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "rnning olsnodes -n -p -i" >> $TMPFILE
olsnodes -n -p -i >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running crsctl query css votedisk" >> $TMPFILE
crsctl query css votedisk >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running crsctl check crs" >> $TMPFILE
crsctl check crs >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running oifcfg iflist -p -n" >> $TMPFILE
oifcfg iflist -p -n >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running oifcfg getif" >> $TMPFILE
oifcfg getif >> $TMPFILE 2>&1


echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-01 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-01 -a -g -s -l >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-02 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-02 -a -g -s -l >> $TMPFILE 2>&1



echo "" >> $TMPFILE
echo "cat /etc/oracle/ocr.loc" >> $TMPFILE
cat /etc/oracle/ocr.loc >> $TMPFILE

echo "" >> $TMPFILE
echo "crs_stat -t -v" >> $TMPFILE
crs_stat -t -v >> $TMPFILE


tar cpzf /tmp/crs.tar.gz /u01/crs/oracle/product/10.2.0/crs/cdata/crs

mutt -a /tmp/crs.tar.gz -s "crs/ocr status and backup on $DATE"  jwu@domain.com < $TMPFILE

rm -f $TMPFILE
---------------------


Part II: openfiler iscsi:

How to Dynamically Add and Remove SCSI Devices on Linux [ID 603868.1]


iscsiadm -m discovery -t sendtargets -p 192.168.1.5
 chkconfig iscsid on
chkconfig iscsi on

service iscsi resart
cd /var/lib/iscsi;ls


Reference: http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html



yum install lsscsi

lsscsi
cat /proc/scsi/scsi
grep host /etc/modprobe.conf
ls -ld /sys/class/scsi_host/host*
dmsetup ls | sort
multipath -d -ll
raw -qa
ls -l /dev/mapper/
ls -l /dev/raw/
ocrcheck
crsctl query css votedisk
crsctl check crs
ocrconfig -showbackup
cat /etc/oracle/ocr.loc

where is the setting for RAC service preference nodes?

nodes applications contain listener, gsnd etc?


Firstly, Oracle checks /etc/oracle/ocr.loc to find out where is the ocr (raw1 and raw2), then from ocr, to find out where are the voting disk devices.

[root@codegen1 bin]# lsscsi
[0:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda
[2:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0
[3:0:0:0]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdb
[3:0:0:1]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdc
[3:0:0:2]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdd
[3:0:0:3]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sde
[3:0:0:4]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdf
[3:0:0:5]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdg
[3:0:0:6]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdh
[3:0:0:7]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdi
[3:0:0:8]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdj
[3:0:0:9]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdk

5.1. Resizing an Online Multipath Device
If you need to resize an online multipath device, use the following procedure.

    Resize your physical device.
    Use the following command to find the paths to the LUN:

    # multipath -l
[ -ll] [-d -ll]

    Resize your paths. For SCSI devices, writing a 1 to the rescan file for the device causes the SCSI driver to rescan, as in the following command:

    # echo 1 > /sys/block/device_name/device/rescan

    Resize your multipath device by running the multipathd resize command:

    # multipathd -k'resize map mpath0'

    Resize the filesystem (assuming no LVM or DOS partitions are used):

    # resize2fs /dev/mapper/mpath0

For further information on resizing an online LUN, see the Online Storage Reconfiguration Guide.

echo "scsi add-single-device 3 0 0 0" > /proc/scsi/scsi


Reference:
1. http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2
2. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/10g/r2/rac_r2_work/02_01_crs_install/02_01_crs_inst

all_viewlet_swf.html - Install 10g clusterware livedemo from Oracle
3. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/11g/r1/clusterware/installation_of_oracle_clusterware/installation_of_oracle_clusterware_viewlet_swf.html - Install clusterware on Oracle 11gR1
4. install oracle 11gR2 database - http://st-curriculum.oracle.com/obe/db/11g/r2/2day_dba/install/install.htm
5. http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html

How to check or disable SELinux

Jephe Wu  - http://linuxtechres.blogspot.com

Environment: CentOS 5 or 6
Objective: check selinux status

Methods:
1. check selinux status:
getenforce 
or
selinuxenabled;echo $?  (return 1 means disabled, 0 means enabled, man selinuxenabled)
note: use setenforce  to switch between permissive or enforce mode runtime.

Accoding to Redhat knowledge base:  Only if current selinux status is not disabled. you can use setenforce to set Current Enforcing Mode which can be switched between "Enforcing" and "Permissive". This setting is effective immediately but is not persistent - the system will revert to the "System Default Enforcing Mode" setting when it is restarted.

2.  how to set selinux permanently
a. GUI tool: system-config-securitylevel
b. vi /etc/selinux/config or /etc/sysconfig/selinux , change SELINUX=Disabled
c. vi grub.conf,

  kernel /vmlinuz-xxxx ro root=/dev/VolGroup00/LogVol00 rhgb quiet selinux=0
d. at install time, type 'linux selinux=0' to boot prompt. 
 

Set up mysql proxy with mmm2 and mysql master-slave replication environment for zabbix read/write splitting

Jephe Wu - http://linuxtechres.blogspot.com

Environment
: 2 mysql database configured with master-slave replication. mysql-proxy 0.8.1 from EPEL running on CentOS 6 x86_64, as well as mmm_monitor. Zabbix server connecting to mysql-proxy ip address as database. separated web server for Zabbix running on another machine connecting to mysql-proxy ip address


Diagram:

 Zabbix Web -> mysql-proxy 0.8.1/mmm2  (CentOS 6 64bit)
 Zabbix Server -> mysql-proxy 0.8.1/mmm2 -> writer mysql db
                                                                        -> reader mysql db


Objective
: Make Zabbix to send non-transactional queries to reader mysql db only, and DML to writer mysql db, in short, read/write splitting.

Steps
: We are only talking about mysql-proxy configuration part in this article, you can refer other blogs on this website for Zabbix and master-slave database replication.

1. use mysql-proxy from EPEL on CentOS 6 64bit
configuring EPEL first, then install mysql-proxy

2. use read-write splitting script from 0.8.2 alpha source code instead of tutorial-keep-alive.lua from 0.8.1 yum repository which is not working.

3. configuration file:

[root@mysql-proxy ~]# more /etc/sysconfig/mysql-proxy
# Options for mysql-proxy
#LUA_PATH=/usr/share/doc/mysql-proxy-0.8.1/examples/?.lua
ADMIN_USER="admin"
ADMIN_PASSWORD="password"
ADMIN_LUA_SCRIPT="/usr/lib64/mysql-proxy/lua/admin.lua"
PROXY_USER="mysql-proxy"
PROXY_OPTIONS="--daemon --keepalive  --proxy-backend-addresses=192.168.12.233:3306 --proxy-read-only-backend-addresses=192.168.12.234:3306 --proxy-address=:3306 --proxy-lua-script=/usr/share/doc/mysql-proxy-0.8.1/examples/rw-splitting-got-from-0.8.2alpha-source-tar-file.lua --log-level=message --log-file=/var/log/mysql-proxy.log"



netstat -ntpl  , you will find mysql proxy admin port listening at 4041, you can connect to it from another host to admin port:
mysql -uadmin -ppassword -P4041 -h mysql-proxy-host
===============
or with the following configuration:
root@mon01:/etc/mysql-proxy/ # more /etc/sysconfig/mysql-proxy
# Options to mysql-proxy
# do not remove --daemon
LUA_PATH="/srv/mysql-proxy/lib/?.lua"
PROXY_OPTIONS="--daemon --defaults-file=/etc/mysql-proxy/my.cnf --keepalive"

---------
root@mon01:/etc/mysql-proxy/ # more my.cnf
[mysql-proxy]
admin-username=root
admin-password=password
proxy-address=0.0.0.0:3306

proxy-read-only-backend-addresses=172.17.1.6:3306
proxy-read-only-backend-addresses=172.17.1.7:3306
proxy-read-only-backend-addresses=172.17.1.8:3306
proxy-backend-addresses=172.17.1.5:3306

proxy-pool-no-change-user=true
proxy-skip-profiling=true

log-file=/srv/mysql-proxy/log/mysql-proxy.log
#log-level=warning
#log-level=message
#log-level=debug
user=mysql
admin-lua-script=/srv/mysql-proxy/lib/mysql-proxy/lua/admin.lua
pid-file=/srv/mysql-proxy/run/mysql-proxy.pid
proxy-lua-script=/srv/mysql-proxy/share/doc/mysql-proxy/rw-splitting.lua  (this rw-splitting.lua is from mysql-proxy 0.8.2 tar file)


4. /etc/init.d/mysql-proxy restart

5. monitor slave mysql db logfile -> tail -f /var/lib/mysql/mysql_query.log (see /etc/my.cnf)

6. you can also see some DML statements on slave database, that's because master-slave replication, slave gets syncing with master operations.

7. once you finished configuring above, when clicking zabbix web monitoring - dashboard, most of select statements will go to slave database.


FAQ:

1.  Zabbix web interface gives error "Lock wait timeout exceeded; try restarting transaction"
=> Uncommen innodb_lock_wait_timeout = 50, change it to 500 seconds. restart mysql

2. Error in query: Deadlock found when trying to get lock; try restarting transaction
=> Just try again and refer to http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html  - how to cope with deadlock

3. For read/write splitting, tutorial-keep-alive.lua which comes with EPEL mysql-proxy rpm doesn't work.

4. how to check if rw-spliting is working?
while true ;do mysql -uroot -ppassword -e 'show processlist' ;sleep 1;done
to check if you  can find any select statement.




References:

1. Mysql master-slave and mysql-proxy read/write splitting - http://www.linuxso.com/linuxrumen/10236.html
2. http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html - how to cope with Deadlocks

Commands:
1.  show slave status\G
2.  show create table history;  to display DDL statement for table creation
3. show table status
4. show processlist;
5. show full processlist; to find out slow queries.


==============
Use mysql proxy to balance load between  2 readonly mysql database.


root@mon02:/etc/mysql-proxy/ # more my.cnf
[mysql-proxy]
admin-username=root
admin-password=password
proxy-address=0.0.0.0:3306
#proxy-read-only-backend-addresses=172.16.2.1:3306
#proxy-read-only-backend-addresses=172.16.2.2:3306
proxy-backend-addresses=172.16.2.2:3306,172.16.2.2:3306
log-file=/srv/mysql-proxy/log/mysql-proxy.log
log-level=message
user=mysql
admin-lua-script=/srv/mysql-proxy/lib/mysql-proxy/lua/admin.lua
pid-file=/srv/mysql-proxy/run/mysql-proxy.pid


root@mon02:/etc/mysql-proxy/ # more /etc/sysconfig/mysql-proxy
# Options to mysql-proxy
# do not remove --daemon
PROXY_OPTIONS="--daemon --defaults-file=/etc/mysql-proxy/my.cnf --keepalive"

Note: if you separate proxy-backend-addresse to two lines, it might not work, it will only add one server as backend, check log file /srv/mysql-proxy/log/mysql-proxy.log to confirm.

Install PSP on CentOS 6 x86_64

Jephe Wu - http://linuxtechres.blogspot.com

Environment: HP ProLiant BL460c G6, installed CentOS 6 x86_64
Objective: install proliant support package (PSP)


Steps:

Note: do not use psp for CentOS 6 one, it's older version and it might not work. Use Redhat Enterprise Linux 6 instead.

1. change /etc/centos-release

cp /etc/centos-release /etc/centos-release.orig -va
vi /etc/centos-release as follows:
Change CentOS to
Red Hat Enterprise Linux Server release 6.0 (Final)


otherwise, some psp packages will fail to install and hpsmh will fail to install as well , it will report zlib version 1.1.4 or above is missing.

2. install hp-snmp-agents first from command line, otherwise, smhd installation will fail from ./hpsum (software update manager)


[root@db03 psp]# yum localinstall hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm
Loaded plugins: fastestmirror
Setting up Local Package Process
Examining hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm: hp-snmp-agents-8.7.0.23-17.x86_64
Marking hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Resolving Dependencies
--> Running transaction check
---> Package hp-snmp-agents.x86_64 0:8.7.0.23-17 set to be updated
--> Processing Dependency: hp-health for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libcpqci.so.2 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpasmintrfc.so.3 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpev.so.1 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libcpqci64.so.2()(64bit) for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpasmintrfc64.so.3()(64bit) for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Finished Dependency Resolution
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpasmintrfc64.so.3()(64bit)
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libcpqci64.so.2()(64bit)
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpev.so.1
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: hp-health
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpasmintrfc.so.3
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libcpqci.so.2
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest


[root@db03 psp]# yum localinstall hp-health-8.7.0.22-17.rhel6.x86_64.rpm  --nogpgcheck
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Setting up Install Process
Examining hp-health-8.7.0.22-17.rhel6.x86_64.rpm: hp-health-8.7.0.22-17.x86_64
Marking hp-health-8.7.0.22-17.rhel6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package hp-health.x86_64 0:8.7.0.22-17 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================

==========================================================================================================================
 Package                                                    Arch                                                    Version                         

                              Repository                                                                            Size
====================================================================================================================================================

==========================================================================================================================
Installing:
 hp-health                                                  x86_64                                                  8.7.0.22-17                     

                              /hp-health-8.7.0.22-17.rhel6.x86_64                                                  1.6 M

Transaction Summary
====================================================================================================================================================

==========================================================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total size: 1.6 M
Installed size: 1.6 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hp-health-8.7.0.22-17.x86_64                                                                                                     

                                                                                                                     1/1
Please read the Licence Agreement for this software at

         /opt/hp/hp-health/hp-health.license

By not removing this package, you are accepting the terms
of the "HP Proliant Essentials Software End User License Agreement".
==============================================================================
NOTE: To activate the software contained in this package, you must type:
      /etc/init.d/hp-health start
      /etc/init.d/hp-asrd start
as 'root' user.
==============================================================================
The hp-health RPM has installed successfully.

Installed:
  hp-health.x86_64 0:8.7.0.22-17                                                                                                                    
Complete!

[root@db03 psp]# yum localinstall hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm --nogpgcheck
Loaded plugins: fastestmirror
Setting up Local Package Process
Examining hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm: hp-snmp-agents-8.7.0.23-17.x86_64
Marking hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Resolving Dependencies
--> Running transaction check
---> Package hp-snmp-agents.x86_64 0:8.7.0.23-17 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================

==========================================================================================================================
 Package                                                       Arch                                                  Version                        

                           Repository                                                                               Size
====================================================================================================================================================

==========================================================================================================================
Installing:
 hp-snmp-agents                                                x86_64                                                8.7.0.23-17                    

                           /hp-snmp-agents-8.7.0.23-17.rhel6.x86_64                                                5.9 M

Transaction Summary
====================================================================================================================================================

==========================================================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total size: 5.9 M
Installed size: 5.9 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hp-snmp-agents-8.7.0.23-17.x86_64                                                                                                

                                                                                                                     1/1
Please read the Licence Agreement for this software at

         /opt/hp/hp-snmp-agents/hp-snmp-agents.license

By not removing this package, you are accepting the terms
of the "HP Proliant Essentials Software End User License Agreement".
Installing /opt/hp/hp-snmp-agents/nic/etc/HPcmanic.pp SELinux policy module
==============================================================================
NOTE: In order to activate the software contained in this package, you must
      type '/sbin/hpsnmpconfig' as 'root' user.
      Once configuration is completed start the agents by typing
      /etc/init.d/hp-snmp-agents start
==============================================================================

Installed:
  hp-snmp-agents.x86_64 0:8.7.0.23-17                                                                                                               

                                                                                                                        
3. run /sbin/hpsnmpconfig to configure /etc/snmp/snmpd.conf
configure it to add the following part on the top of the /etc/snmp/snmpd.conf

dlmod cmaX /usr/lib64/libcmaX64.so
rwcommunity public 127.0.0.1
trapcommunity public
rocommunity public 127.0.0.1
trapsink 192.168.0.1 public
syscontact Jephe Wu <jwu@domain.com>
syslocation someplace



4. install tigervnc* and xterm for GUI installation environment for hpsum

yum install tigervnc* xterm

5. install psp by hpsum
cd /root/
download psp for RHEL6 x86_64
mkdir psp
cd psp
tar xvpzf ../psp-xxxxxx.tar.gz

vncserver
./hpsum

6. hpsum installation

a. check only 'software update'
b. do not rebuild kernel module for certain NIC
c. install them

7. check and reboot
psp create a new set of initrd file as initramfs blabla

8. commands and references:
yum localinstall *.rpm
yum whatprovides filename


9. modify  /opt/hp/hp-snmp-agents/cma.conf 
change 
trapemail /bin/mail -s 'db01 HP Insight Management Agents Trap Alarm' root
to 
trapemail /bin/mail -s 'db1 zdb01 HP Insight Management Agents Trap Alarm' root

Set up the third mysql server to be slave of the running master-master mysql database

Jephe Wu - http://linuxtechres.blogspot.com

Environment: master(db1)-master(db2) mysql 5.5 database (replicate each other) running Oracle Linux 5.6 64bit
Objective: setup a third slave only database(db3, mysql 5.1 on CentOS 6) to sync with one of the master(db2) which the zabbix application is connecting to


Concept:

a. stop slave on db1 and db2 (mysql -uroot -ppass; sql> stop salve)
b. use mysqldump to make snapshot on db1
c. import data into db3
d. make db3 to sync with db2


Steps:

1. stop slave on both db1 and db2
mysql -uroot -ppass
sql> stop slave;
sql> show slave status\G


3. setup db3 on CentOS 6 64bit, use /srv/mysql/data as data folder

sh /usr/bin/mysql_install_db --user=mysql --datadir=/srv/mysql/data

copy back /etc/my.cnf mysqld part from db2 or db1

/etc/init.d/mysqld restart
chkconfig mysqld on
chkconfig --list mysqld


Note:
[root@db03 mysql]# tail -f /srv/mysql/data/db03.domainnameofserver.err
/usr/libexec/mysqld: Can't create/write to file '/tmpfs/ibJw77ig' (Errcode: 2)
111207 14:19:01  InnoDB: Error: unable to create temporary file; errno: 2
111207 14:19:01 [ERROR] Plugin 'InnoDB' init function returned error.
111207 14:19:01 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
111207 14:19:01 [ERROR] Unknown/unsupported table type: innodb
111207 14:19:01 [ERROR] Aborting

111207 14:19:01 [Note] /usr/libexec/mysqld: Shutdown complete

111207 14:19:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended


solution=> change /etc/my.cnf /tmpfs to /dev/shm (according to df -h)

4. use mysqldump to make snapshot

on db01,do:
stop slave;
show slave status\G

note: record down about output for master_log_file and position so that db3 will use it to sync with db2 again after importing data

mysqldump -u root -ppass zabbix [--lock-all-tables] | gzip -c | ssh db03 'cat > /srv/mysql/mysqldump_on_db01.gz'


please refer to online doc at http://dev.mysql.com/doc/refman/5.0/en/replication-howto-mysqldump.html

5. import data into db3
zcat mysqldump_on_db01.gz | mysql -uroot -ppass zabbix

6. configuring db3 as slave for db2
/etc/init.d/mysqld start
sql> show slave status\G
empty record
sql> change master to master_host='192.168.0.1', master_user='replication', master_password='password',master_log_file='mysql-bin.001080',master_log_pos=50862743;
sql> show slave status\G


7. FAQ
After downgrading to 5.1 and make mysql 5.1 as master database,  if you try to create partition, you might encounter the following error:
ERROR 1548 (HY000) at line 1: Cannot load from mysql.proc. The table is probably corrupted


Solution: on mysql 5.1, run "mysql_upgrade -uroot -ppassword", please refer to http://dev.mysql.com/doc/refman/5.1/en/mysql-upgrade.html

8. References

a. database partitioning:http://www.slideshare.net/datacharmer/partitions-performance-with-mysql-51-and-55
b. mysql 5.5 online manual for replication at http://dev.mysql.com/doc/refman/5.5/en/replication.html

Set up bonding in RHEL6

Jephe Wu - http://linuxtechres.blogspot.com

Environment: CentOS 6 x86_64 on HP Blade Proliant BL460c G6, 2 builtin broadcom NIC and a NC364m Quad Port 1Gb NIC for c-class BladeSystem
Objective: after using builtin NIC to install OS , then configure bonding for 2 of quad port NIC.

Steps:

1. OS installation
You can download http://mirror.optus.net/centos/6/isos/x86_64/CentOS-6.0-x86_64-netinstall.iso and install OS through http URL from Internet.

2. bonding configuration

Firstly, disable network manager if NM is running.
service NetworkManager stop
chkconfig NetworkManager off


a. /etc/udev/rules.d/70-persistent-net.ruels

make sure it contains all interfaces you have in the system and don't configure bond0 inside.

b. /etc/sysconfig/network-scripts/

configuring ifcfg-bond0 as follows: (assuming ip address is 192.168.0.1)
       
[root@db03 network-scripts]# more ifcfg-bond0
DEVICE="bond0"
BROADCAST="192.168.0.255"
GATEWAY="192.168.0.254"
IPADDR="192.168.0.1"
NETMASK="255.255.255.0"
ONBOOT="yes"
USERCTL=no
BONDING_OPTS="mode=0 miimon=100 downdelay=300 updelay=300 max_bonds=4"


note: you must use capital letter for left side variables, especially BONDING_OPTS, otherwise, you might not be able to see slave devices from command output of 'ifconfig'.
Do not configure BONDING_OPTS in /etc/modprobe.d/bonding.conf as suggested by Redhat.

c. /etc/sysconfig/network-scripts/ifcfg-ethX

[root@db03 network-scripts]# more ifcfg-eth*
::::::::::::::
ifcfg-eth0
::::::::::::::
DEVICE=eth0
BOOTPROTO=none
MASTER=bond0
HWADDR=00:01:02:03:04:00
ONBOOT=yes
SLAVE=yes
USERCTL=no
::::::::::::::
ifcfg-eth1
::::::::::::::
DEVICE=eth1
BOOTPROTO=none
HWADDR=00:01:02:03:04:01
ONBOOT=no
USERCTL=no
::::::::::::::
ifcfg-eth2
::::::::::::::
DEVICE=eth2
BOOTPROTO=none
MASTER=bond0
HWADDR=00:01:02:03:04:02
ONBOOT=yes
SLAVE=yes
USERCTL=no
::::::::::::::
ifcfg-eth3
::::::::::::::
DEVICE=eth3
BOOTPROTO=none
HWADDR=00:01:02:03:04:03
ONBOOT=no
USERCTL=no
::::::::::::::
ifcfg-eth4
::::::::::::::
DEVICE="eth4"
HWADDR="00:00:00:00:00:04"
NM_CONTROLLED="no"
ONBOOT="no"
::::::::::::::
ifcfg-eth5
::::::::::::::
DEVICE="eth5"
HWADDR="00:00:00:00:00:05"
NM_CONTROLLED="no"
ONBOOT="no"

note: we connected network cables for eth0 and eth2 only, so only use these 2 ports for bonding.

d. /etc/modprobe/bonding.conf
[root@db03 network-scripts]# more /etc/modprobe.d/bonding.conf
alias eth0 e1000e
alias eth1 e1000e
alias eth2 e1000e
alias eth3 e1000e
alias eth4 bnx2x
alias eth5 bnx2x
alias scsi_hostadapter cciss
alias scsi_hostadapter1 lpfc
alias bond0 bonding


3. Testing and debug
a. more /proc/net/bonding/bond0
b. ethtool eth0



4. References
a. RHEL6 deployment guide
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-networkscripts-interfaces-chan.html

b. Redhat knowledge base DOC-48159 for 'How do I configure the bonding device on Red Hat Enteprise Linux 6" or search bonding in access.redhat.com knowledge base.