Preparing Oracle RAC system

Jephe Wu - http://linuxtechres.blogspot.com

Environment: openfiler as iscsi server, node1 and node2 are OL5.7, all of them are running under VirtualBox VM, Oracle 10gR2 10.2.0.1 clusterware

IP address assignment:
node1: eth0(public): 192.168.1.100, eth1(priv):172.16.1.100, vip:(going to be eth0:0) 192.168.1.102
node2: eth0(public): 192.168.1.101, eth1(priv):172.16.1.101, vip:(going to be eth0:0) 192.168.1.103
openfiler: 172.16.1.200 web login: https://172.16.1.200 (login as : openfiler/password)


Part I: OS installation
a. oracle user and groups
# groupadd oinstall
# groupadd dba
# groupadd oper
# useradd -u 200 -g oinstall -G dba[,oper] oracle

note: make sure all cluseter nodes has the same user and group id, otherwise ,it will fail to install clusterware.

b. verify nobody user exists
# id nobody

c. configure ssh on all nodes
make sure you can ssh into all nodes, public names and priv interconnect names, not vip ip address

d. check hardware requirements
memory size
swap size

e. NFS (http://download.oracle.com/docs/cd/B19306_01/install.102/b14203/prelinux.htm)
If you are using NFS for your shared storage, then you must set the values for the NFS buffer size parameters

rsize and wsize to at least 16384. Oracle recommends that you use the value 32768.

For example, if you decide to use rsize and wsize buffer settings with the value 16384, then update the /etc/fstab

file on each node with an entry similar to the following:

clusternode:/vol/DATA/oradata  /home/oracle/netapp     nfs    

rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,actimeo=0  1 2


f. NTP for both nodes

g. /etc/hosts and dns
For each node, register one virtual host name and IP address in DNS.
For each private interface on every node, add a line similar to the following to the /etc/hosts file on all nodes,

specifying the private IP address and associated private host name:

h. /etc/sysctl.conf
kernel.shmall = 2097152

kernel.shmmax = 2147483648

kernel.shmmni = 4096

kernel.sem = 250 32000 100 128

fs.file-max = 65536

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144

net.core.rmem_max = 1048576

net.core.wmem_default = 262144

net.core.wmem_max = 1048576

run sysctl -p

i. Add the following lines to the /etc/security/limits.conf file:


oracle              soft    nproc   2047

oracle               hard    nproc   16384

oracle               soft    nofile  1024

oracle               hard    nofile  65536

j. Add or edit the following line in the /etc/pam.d/login file, if it does not already exist:


session    required     /lib/security/pam_limits.so
note: should not add above, according to redhat KB, otherwise you cannot login from console
Why the system text console cannot be login? - Article ID: 52661

the following should be added to /etc/profile:

if [ $USER = "oracle" ]; then

        if [ $SHELL = "/bin/ksh" ]; then

              ulimit -p 16384

              ulimit -n 65536

        else

              ulimit -u 16384 -n 65536

        fi

fi


k. identify oracle base  and home directories
more /etc/oraInst.loc  - Inventory directory
more /etc/oratab - Oracle home directory

l. create oracle base and clusterware home directory
mkdir -p /u01/app/oracle/
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle 
(/u01 is the mount point)

mkdir -p /u01/crs/oracle/product/10/crs
chown root:oinstall /u01/crs/
chmod -R 775 /u01/crs/oracle


if mount point is /u01/, then the recommended Oracle clusterware home directory is
/u01/crs/oracle/product/10.2.1/crs


m. Verifying Hangcheck-timer Module on Kernel 2.6

For Red Hat Linux 4.0 and SUSE 9 systems, to verify that the hangcheck-timer module is running on every node:

    Enter the following command on each node to determine which kernel modules are loaded:

    # /sbin/lsmod

    If the hangcheck-timer module is not listed for any node, then enter a command similar to the following to

start the module located in the directories of the current kernel version:

    # insmod /lib/modules/kernel_version/kernel/drivers/char/hangcheck-timer.ko  hangcheck_tick=1

hangcheck_margin=10

    In the preceding command example, the variable kernel_version is the kernel version running on your system.

    To confirm that the hangcheck module is loaded, enter the following command:

    # lsmod | grep hang

    The output should be similar to the following:

    hangcheck_timer         3289  0

    To ensure that the module is loaded every time the system restarts, verify that the local system startup file

contains the command shown in the previous step, or add it if necessary:

        Red Hat:

        On Red Hat Enterprise Linux systems, add the command to the /etc/rc.d/rc.local file.

        SUSE:

        On SUSE systems, add the command to the /etc/init.d/boot.local file.


configure .bash_profile
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_HOME/product/10.2.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.2.0/crs
export PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin

2. clusterware installation

Please refer to http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html for complete

screenshot example
and also
http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2

2.1 os requirements

run as oracle for ./runInstaller to install 

Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5)
Checking operating system version: must be redhat-3, SuSE-9, redhat-4, UnitedLinux-1.0, asianux-1 or asianux-2
                                      Failed <<<<

-> Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5) [ID 456634.1]
If you are installing 10.2 from DVD, copy the <path>/database/install/oraparam.ini to a temporary directory (for

example, /tmp).

If you are installing 10.2 from an OTN download or have copied the 10.2 media to disk, take a backup of

<path>/database/install/oraparam.ini

Now edit oraparam.ini and change the appropriate line:

Original

[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2


New

[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2,redhat-5


note:
a. do not use ifconfig to configure vip before installing clusterware, just configure them in /etc/hosts
for example:

#cat /etc/hosts

127.0.0.1       localhost.localdomain localhost
192.168.1.100   jephe1.jephe.com jephe1
192.168.1.101   jephe2.jephe.com jephe2

172.16.1.100    jephe1-priv.jephe.com jephe1-priv
192.168.1.102   jephe1-vip.jephe.com jephe1-vip

172.16.1.101    jephe2-priv.jephe.com jephe2-priv
192.168.1.103   jephe2-vip.jephe.com jephe2-vip


OCR configration:
normal redundancy

specify OCR Location: /dev/raw/raw1
specify OCR Mirror Location: /dev/raw/raw2

When installing database software, you can choose 'configure ASM', then choose external for redundancy, then choose part of the raw disk for 'DATA' group, later we will run asmca to configure another disk group RECOVERY.

2.2 How To Setup UDEV Rules For RAC OCR And Voting Devices On SLES10, RHEL5, OEL5, OL5
- refer to http://www.held.org.il/blog/2007/11/setting-a-raw-device-in-redhatcentos-5/

configure /etc/sysconfig/rawdevices first, then service rawdevices restart to see those raw devices appear under /dev/raw/*

The following part are optional:
================
Add the required raw device ownership and permissions, for example:
a. Add to /etc/udev/rules.d/60-raw.rules:
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION=="add", KERNEL=="sdc1", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdd1", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="sde1", RUN+="/bin/raw /dev/raw/raw4 %N"
ACTION=="add", KERNEL=="sdf1", RUN+="/bin/raw /dev/raw/raw5 %N"

==================
b. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules

containing lines such as:

    KERNEL==”raw[1-5]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

    The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
    The following permissions have to apply:

OCR Device(s): root:oinstall , mode 0640
Voting device(s): oracle:oinstall, mode 0666
need to test the following commands:
# /sbin/udevcontrol reload_rules
# /sbin/start_udev


Or to add the following to 50-udev.rules
KERNEL=="vcsa[0-9]*",        NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="vcc/*",        NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="raw[1-9]",        OWNER="oracle", GROUP="oinstall", MODE="0640"
KERNEL=="raw10",        OWNER="oracle", GROUP="oinstall", MODE="0640"


# memory devices
KERNEL=="random",        MODE="0666", OPTIONS="last_rule"
KERNEL=="urandom",        MODE="0444", OPTIONS="last_rule"
KERNEL=="mem",            GROUP="kmem", MODE="0640", OPTIONS="last_rule"



permission is very important, otherwise, you might get error like this:
 when run 'crs_stat -t', node2 is offline and when run 'ps -efH' on node2, you will find 'startcheck'  like this:

root      99  416  0 1:23 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     16  67  0 1:46 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     1055  418  0 2:47 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck


check /var/log/messages, you will find this:
Feb  5 12:30:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.
Feb  5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb  5 12:32:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb  5 12:32:18 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.


[root@node2 ~]# more /tmp/crsctl.3178
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]

check crs to confirm error:
/u01/oracle/product/crs/bin/crsctl check  crs
References:
http://surachartopun.com/2009/04/why-my-oracle-cluster-could-not-start.html

If one node is down, when running 'crs_stat -t', it will show this:

[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....en1.gsd application    ONLINE    OFFLINE              
ora....en1.ons application    ONLINE    OFFLINE              
ora....en1.vip application    ONLINE    ONLINE    node2   
ora....en2.gsd application    ONLINE    ONLINE    node2   
ora....en2.ons application    ONLINE    ONLINE    node2   
ora....en2.vip application    ONLINE    ONLINE    node2 

Finally, if you encounter errors like this:

an error occurred during the interview for this component. oracle clusterware unable to retrieve voting disk information

then you need to 'rm -fr /u01/app/oracle/*' on node1 , then start ./runInstaller again.

2.3. at the end of clusterware installation, you will be prompted to run 2 programs which need root access permission.

permission,follow by the sequences below:
first program on first node
first program on second node

/home/oracle/oraInventory/orainstRoot.sh
ssh jephe2 -l root
/home/oracle/oraInventory/orainstRoot.sh
exit
/home/oracle/oracle/product/10.2.0/crs/root.sh
ssh jephe2 -l root
/home/oracle/oracle/product/10.2.0/crs/root.sh

second program on first node
second program on second node
basically, you need to finish the number 1 script on all nodes first before going to the next script.
also, run it under GUI interface, aka, X windows in vnc on node2 as root user, not oracle user, otherwise vipca will fail, vipca will use GUI

When you run the second /u01/app/oracle/product/10.2.0/crs/root.sh, there might be error like this:
 [root@oratest1 oratest1]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Failed to upgrade Oracle Cluster Registry configuration
=>check log for detail: /u01/app/oracle/product/10.2.0/crs/log/oratest1/alertoratest1.log
see metalink Executing root.sh errors with "Failed To Upgrade Oracle Cluster Registry Configuration" [ID 466673.1]
 => solution is 
a) replace the biary according to above metalink
b) dd if=/dev/zero of=/dev/raw/raw1 bs=1024, (no need to finish, ctrl -c to cancel it)
run above root.sh again on node1


c. FAQ
c.1 some error at the end of running second root program on the second node:
Running vipca(silent) for configuring nodeapps
/home/oracle/oracle/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries: libpthread.so.0:

cannot open shared object file: No such file or directory

-> this is a bug, due to newer version of glibc(RHEL5) is incompatible with Java, need to modify vipca script as follows:

=> you can also modify it first before running root.sh on second node to avoid this error:

vi /home/oracle/oracle/product/10.2.0/crs/bin/vipca
change
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL  
     
to:              

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL                                          
unset LD_ASSUME_KERNEL


 #add this line to uncomment variable LD_ASSUME_KERNEL

if 'srvctl start XXX' command  reports same error, you can vi srvctl script to uncomment LD_ASSUME_KERNEL too, (which srvctl to find the path of the file)

Now, login xwindows on second node, run /home/oracle/oracle/product/10.2.0/crs/bin/vipca again to get errors below: (this error is unavoidable if you are using private ip range for public interface)

Error 0(Native: listNetInterfaces:[3])

[Error 0(Native: listNetInterfaces:[3])]


solution:

$ oifcfg iflist

eth0 192.168.1.0
eth1  172.16.1.0

$ oifcfg setif -global eth0/192.168.1.0:public
$ oifcfg setif -global eth1/172.16.1.0:cluster_interconnect
$ oifcfg getif


eth0 192.168.1.0  global  public
eth1 172.16.1.0  global  cluster_interconnect

# vipca  (run on the second node with root user login at X windows, don't run it with oracle user then su - as root)
just choose eth0 as vip interface, do not choose eth0:0, otherwise, after finishing installation, the vip will become eth0:0:1 for 192.168.1.102 for node1

You will be asked for entering ip alias name and ip address for both codegen1 and codegen2. Firstly, enter vip address for codegen1: 192.168.1.102, then the rest will come out automatically.
node1-vip.jephe.com and node2-vip.jephe.com are ip alias name.

after finishing configure vipca, back to CRS installation GUI, click ok to finish installation of clusterware

If environment variable
$ORA_CRS_HOME/cfgtoollogs/configToolFailedCommands.sh

If you got error: PRKR-1062 : Failed to find configuration for node codegen1,
then your /etc/hosts might be missing domain name such as
192.168.1.100 jephe1
not
192.168.1.100 jephe1.jephe.com jephe1

==============
cronjob for checking RAC configuration:

---------

#!/bin/sh
. /home/oracle/.bash_profile

DATE=`date +%Y%m%d`
TMPFILE=`mktemp`

echo "running ocrconfig -showbackup" > $TMPFILE
ocrconfig -showbackup >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running ocrcheck" >> $TMPFILE
ocrcheck >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "rnning olsnodes -n -p -i" >> $TMPFILE
olsnodes -n -p -i >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running crsctl query css votedisk" >> $TMPFILE
crsctl query css votedisk >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running crsctl check crs" >> $TMPFILE
crsctl check crs >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running oifcfg iflist -p -n" >> $TMPFILE
oifcfg iflist -p -n >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running oifcfg getif" >> $TMPFILE
oifcfg getif >> $TMPFILE 2>&1


echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-01 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-01 -a -g -s -l >> $TMPFILE 2>&1

echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-02 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-02 -a -g -s -l >> $TMPFILE 2>&1



echo "" >> $TMPFILE
echo "cat /etc/oracle/ocr.loc" >> $TMPFILE
cat /etc/oracle/ocr.loc >> $TMPFILE

echo "" >> $TMPFILE
echo "crs_stat -t -v" >> $TMPFILE
crs_stat -t -v >> $TMPFILE


tar cpzf /tmp/crs.tar.gz /u01/crs/oracle/product/10.2.0/crs/cdata/crs

mutt -a /tmp/crs.tar.gz -s "crs/ocr status and backup on $DATE"  jwu@domain.com < $TMPFILE

rm -f $TMPFILE
---------------------


Part II: openfiler iscsi:

How to Dynamically Add and Remove SCSI Devices on Linux [ID 603868.1]


iscsiadm -m discovery -t sendtargets -p 192.168.1.5
 chkconfig iscsid on
chkconfig iscsi on

service iscsi resart
cd /var/lib/iscsi;ls


Reference: http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html



yum install lsscsi

lsscsi
cat /proc/scsi/scsi
grep host /etc/modprobe.conf
ls -ld /sys/class/scsi_host/host*
dmsetup ls | sort
multipath -d -ll
raw -qa
ls -l /dev/mapper/
ls -l /dev/raw/
ocrcheck
crsctl query css votedisk
crsctl check crs
ocrconfig -showbackup
cat /etc/oracle/ocr.loc

where is the setting for RAC service preference nodes?

nodes applications contain listener, gsnd etc?


Firstly, Oracle checks /etc/oracle/ocr.loc to find out where is the ocr (raw1 and raw2), then from ocr, to find out where are the voting disk devices.

[root@codegen1 bin]# lsscsi
[0:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda
[2:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0
[3:0:0:0]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdb
[3:0:0:1]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdc
[3:0:0:2]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdd
[3:0:0:3]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sde
[3:0:0:4]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdf
[3:0:0:5]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdg
[3:0:0:6]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdh
[3:0:0:7]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdi
[3:0:0:8]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdj
[3:0:0:9]    disk    OPNFILER VIRTUAL-DISK     0     /dev/sdk

5.1. Resizing an Online Multipath Device
If you need to resize an online multipath device, use the following procedure.

    Resize your physical device.
    Use the following command to find the paths to the LUN:

    # multipath -l
[ -ll] [-d -ll]

    Resize your paths. For SCSI devices, writing a 1 to the rescan file for the device causes the SCSI driver to rescan, as in the following command:

    # echo 1 > /sys/block/device_name/device/rescan

    Resize your multipath device by running the multipathd resize command:

    # multipathd -k'resize map mpath0'

    Resize the filesystem (assuming no LVM or DOS partitions are used):

    # resize2fs /dev/mapper/mpath0

For further information on resizing an online LUN, see the Online Storage Reconfiguration Guide.

echo "scsi add-single-device 3 0 0 0" > /proc/scsi/scsi


Reference:
1. http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2
2. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/10g/r2/rac_r2_work/02_01_crs_install/02_01_crs_inst

all_viewlet_swf.html - Install 10g clusterware livedemo from Oracle
3. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/11g/r1/clusterware/installation_of_oracle_clusterware/installation_of_oracle_clusterware_viewlet_swf.html - Install clusterware on Oracle 11gR1
4. install oracle 11gR2 database - http://st-curriculum.oracle.com/obe/db/11g/r2/2day_dba/install/install.htm
5. http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html

How to check or disable SELinux

Jephe Wu  - http://linuxtechres.blogspot.com

Environment: CentOS 5 or 6
Objective: check selinux status

Methods:
1. check selinux status:
getenforce 
or
selinuxenabled;echo $?  (return 1 means disabled, 0 means enabled, man selinuxenabled)
note: use setenforce  to switch between permissive or enforce mode runtime.

Accoding to Redhat knowledge base:  Only if current selinux status is not disabled. you can use setenforce to set Current Enforcing Mode which can be switched between "Enforcing" and "Permissive". This setting is effective immediately but is not persistent - the system will revert to the "System Default Enforcing Mode" setting when it is restarted.

2.  how to set selinux permanently
a. GUI tool: system-config-securitylevel
b. vi /etc/selinux/config or /etc/sysconfig/selinux , change SELINUX=Disabled
c. vi grub.conf,

  kernel /vmlinuz-xxxx ro root=/dev/VolGroup00/LogVol00 rhgb quiet selinux=0
d. at install time, type 'linux selinux=0' to boot prompt. 
 

Set up mysql proxy with mmm2 and mysql master-slave replication environment for zabbix read/write splitting

Jephe Wu - http://linuxtechres.blogspot.com

Environment
: 2 mysql database configured with master-slave replication. mysql-proxy 0.8.1 from EPEL running on CentOS 6 x86_64, as well as mmm_monitor. Zabbix server connecting to mysql-proxy ip address as database. separated web server for Zabbix running on another machine connecting to mysql-proxy ip address


Diagram:

 Zabbix Web -> mysql-proxy 0.8.1/mmm2  (CentOS 6 64bit)
 Zabbix Server -> mysql-proxy 0.8.1/mmm2 -> writer mysql db
                                                                        -> reader mysql db


Objective
: Make Zabbix to send non-transactional queries to reader mysql db only, and DML to writer mysql db, in short, read/write splitting.

Steps
: We are only talking about mysql-proxy configuration part in this article, you can refer other blogs on this website for Zabbix and master-slave database replication.

1. use mysql-proxy from EPEL on CentOS 6 64bit
configuring EPEL first, then install mysql-proxy

2. use read-write splitting script from 0.8.2 alpha source code instead of tutorial-keep-alive.lua from 0.8.1 yum repository which is not working.

3. configuration file:

[root@mysql-proxy ~]# more /etc/sysconfig/mysql-proxy
# Options for mysql-proxy
#LUA_PATH=/usr/share/doc/mysql-proxy-0.8.1/examples/?.lua
ADMIN_USER="admin"
ADMIN_PASSWORD="password"
ADMIN_LUA_SCRIPT="/usr/lib64/mysql-proxy/lua/admin.lua"
PROXY_USER="mysql-proxy"
PROXY_OPTIONS="--daemon --keepalive  --proxy-backend-addresses=192.168.12.233:3306 --proxy-read-only-backend-addresses=192.168.12.234:3306 --proxy-address=:3306 --proxy-lua-script=/usr/share/doc/mysql-proxy-0.8.1/examples/rw-splitting-got-from-0.8.2alpha-source-tar-file.lua --log-level=message --log-file=/var/log/mysql-proxy.log"



netstat -ntpl  , you will find mysql proxy admin port listening at 4041, you can connect to it from another host to admin port:
mysql -uadmin -ppassword -P4041 -h mysql-proxy-host
===============
or with the following configuration:
root@mon01:/etc/mysql-proxy/ # more /etc/sysconfig/mysql-proxy
# Options to mysql-proxy
# do not remove --daemon
LUA_PATH="/srv/mysql-proxy/lib/?.lua"
PROXY_OPTIONS="--daemon --defaults-file=/etc/mysql-proxy/my.cnf --keepalive"

---------
root@mon01:/etc/mysql-proxy/ # more my.cnf
[mysql-proxy]
admin-username=root
admin-password=password
proxy-address=0.0.0.0:3306

proxy-read-only-backend-addresses=172.17.1.6:3306
proxy-read-only-backend-addresses=172.17.1.7:3306
proxy-read-only-backend-addresses=172.17.1.8:3306
proxy-backend-addresses=172.17.1.5:3306

proxy-pool-no-change-user=true
proxy-skip-profiling=true

log-file=/srv/mysql-proxy/log/mysql-proxy.log
#log-level=warning
#log-level=message
#log-level=debug
user=mysql
admin-lua-script=/srv/mysql-proxy/lib/mysql-proxy/lua/admin.lua
pid-file=/srv/mysql-proxy/run/mysql-proxy.pid
proxy-lua-script=/srv/mysql-proxy/share/doc/mysql-proxy/rw-splitting.lua  (this rw-splitting.lua is from mysql-proxy 0.8.2 tar file)


4. /etc/init.d/mysql-proxy restart

5. monitor slave mysql db logfile -> tail -f /var/lib/mysql/mysql_query.log (see /etc/my.cnf)

6. you can also see some DML statements on slave database, that's because master-slave replication, slave gets syncing with master operations.

7. once you finished configuring above, when clicking zabbix web monitoring - dashboard, most of select statements will go to slave database.


FAQ:

1.  Zabbix web interface gives error "Lock wait timeout exceeded; try restarting transaction"
=> Uncommen innodb_lock_wait_timeout = 50, change it to 500 seconds. restart mysql

2. Error in query: Deadlock found when trying to get lock; try restarting transaction
=> Just try again and refer to http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html  - how to cope with deadlock

3. For read/write splitting, tutorial-keep-alive.lua which comes with EPEL mysql-proxy rpm doesn't work.

4. how to check if rw-spliting is working?
while true ;do mysql -uroot -ppassword -e 'show processlist' ;sleep 1;done
to check if you  can find any select statement.




References:

1. Mysql master-slave and mysql-proxy read/write splitting - http://www.linuxso.com/linuxrumen/10236.html
2. http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html - how to cope with Deadlocks

Commands:
1.  show slave status\G
2.  show create table history;  to display DDL statement for table creation
3. show table status
4. show processlist;
5. show full processlist; to find out slow queries.


==============
Use mysql proxy to balance load between  2 readonly mysql database.


root@mon02:/etc/mysql-proxy/ # more my.cnf
[mysql-proxy]
admin-username=root
admin-password=password
proxy-address=0.0.0.0:3306
#proxy-read-only-backend-addresses=172.16.2.1:3306
#proxy-read-only-backend-addresses=172.16.2.2:3306
proxy-backend-addresses=172.16.2.2:3306,172.16.2.2:3306
log-file=/srv/mysql-proxy/log/mysql-proxy.log
log-level=message
user=mysql
admin-lua-script=/srv/mysql-proxy/lib/mysql-proxy/lua/admin.lua
pid-file=/srv/mysql-proxy/run/mysql-proxy.pid


root@mon02:/etc/mysql-proxy/ # more /etc/sysconfig/mysql-proxy
# Options to mysql-proxy
# do not remove --daemon
PROXY_OPTIONS="--daemon --defaults-file=/etc/mysql-proxy/my.cnf --keepalive"

Note: if you separate proxy-backend-addresse to two lines, it might not work, it will only add one server as backend, check log file /srv/mysql-proxy/log/mysql-proxy.log to confirm.

Install PSP on CentOS 6 x86_64

Jephe Wu - http://linuxtechres.blogspot.com

Environment: HP ProLiant BL460c G6, installed CentOS 6 x86_64
Objective: install proliant support package (PSP)


Steps:

Note: do not use psp for CentOS 6 one, it's older version and it might not work. Use Redhat Enterprise Linux 6 instead.

1. change /etc/centos-release

cp /etc/centos-release /etc/centos-release.orig -va
vi /etc/centos-release as follows:
Change CentOS to
Red Hat Enterprise Linux Server release 6.0 (Final)


otherwise, some psp packages will fail to install and hpsmh will fail to install as well , it will report zlib version 1.1.4 or above is missing.

2. install hp-snmp-agents first from command line, otherwise, smhd installation will fail from ./hpsum (software update manager)


[root@db03 psp]# yum localinstall hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm
Loaded plugins: fastestmirror
Setting up Local Package Process
Examining hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm: hp-snmp-agents-8.7.0.23-17.x86_64
Marking hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Resolving Dependencies
--> Running transaction check
---> Package hp-snmp-agents.x86_64 0:8.7.0.23-17 set to be updated
--> Processing Dependency: hp-health for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libcpqci.so.2 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpasmintrfc.so.3 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpev.so.1 for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libcpqci64.so.2()(64bit) for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Processing Dependency: libhpasmintrfc64.so.3()(64bit) for package: hp-snmp-agents-8.7.0.23-17.x86_64
--> Finished Dependency Resolution
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpasmintrfc64.so.3()(64bit)
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libcpqci64.so.2()(64bit)
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpev.so.1
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: hp-health
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libhpasmintrfc.so.3
Error: Package: hp-snmp-agents-8.7.0.23-17.x86_64 (/hp-snmp-agents-8.7.0.23-17.rhel6.x86_64)
           Requires: libcpqci.so.2
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest


[root@db03 psp]# yum localinstall hp-health-8.7.0.22-17.rhel6.x86_64.rpm  --nogpgcheck
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Setting up Install Process
Examining hp-health-8.7.0.22-17.rhel6.x86_64.rpm: hp-health-8.7.0.22-17.x86_64
Marking hp-health-8.7.0.22-17.rhel6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package hp-health.x86_64 0:8.7.0.22-17 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================

==========================================================================================================================
 Package                                                    Arch                                                    Version                         

                              Repository                                                                            Size
====================================================================================================================================================

==========================================================================================================================
Installing:
 hp-health                                                  x86_64                                                  8.7.0.22-17                     

                              /hp-health-8.7.0.22-17.rhel6.x86_64                                                  1.6 M

Transaction Summary
====================================================================================================================================================

==========================================================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total size: 1.6 M
Installed size: 1.6 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hp-health-8.7.0.22-17.x86_64                                                                                                     

                                                                                                                     1/1
Please read the Licence Agreement for this software at

         /opt/hp/hp-health/hp-health.license

By not removing this package, you are accepting the terms
of the "HP Proliant Essentials Software End User License Agreement".
==============================================================================
NOTE: To activate the software contained in this package, you must type:
      /etc/init.d/hp-health start
      /etc/init.d/hp-asrd start
as 'root' user.
==============================================================================
The hp-health RPM has installed successfully.

Installed:
  hp-health.x86_64 0:8.7.0.22-17                                                                                                                    
Complete!

[root@db03 psp]# yum localinstall hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm --nogpgcheck
Loaded plugins: fastestmirror
Setting up Local Package Process
Examining hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm: hp-snmp-agents-8.7.0.23-17.x86_64
Marking hp-snmp-agents-8.7.0.23-17.rhel6.x86_64.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: mirror.optus.net
 * epel: mirror.optus.net
 * extras: mirror.optus.net
 * updates: mirror.optus.net
Resolving Dependencies
--> Running transaction check
---> Package hp-snmp-agents.x86_64 0:8.7.0.23-17 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================

==========================================================================================================================
 Package                                                       Arch                                                  Version                        

                           Repository                                                                               Size
====================================================================================================================================================

==========================================================================================================================
Installing:
 hp-snmp-agents                                                x86_64                                                8.7.0.23-17                    

                           /hp-snmp-agents-8.7.0.23-17.rhel6.x86_64                                                5.9 M

Transaction Summary
====================================================================================================================================================

==========================================================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total size: 5.9 M
Installed size: 5.9 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hp-snmp-agents-8.7.0.23-17.x86_64                                                                                                

                                                                                                                     1/1
Please read the Licence Agreement for this software at

         /opt/hp/hp-snmp-agents/hp-snmp-agents.license

By not removing this package, you are accepting the terms
of the "HP Proliant Essentials Software End User License Agreement".
Installing /opt/hp/hp-snmp-agents/nic/etc/HPcmanic.pp SELinux policy module
==============================================================================
NOTE: In order to activate the software contained in this package, you must
      type '/sbin/hpsnmpconfig' as 'root' user.
      Once configuration is completed start the agents by typing
      /etc/init.d/hp-snmp-agents start
==============================================================================

Installed:
  hp-snmp-agents.x86_64 0:8.7.0.23-17                                                                                                               

                                                                                                                        
3. run /sbin/hpsnmpconfig to configure /etc/snmp/snmpd.conf
configure it to add the following part on the top of the /etc/snmp/snmpd.conf

dlmod cmaX /usr/lib64/libcmaX64.so
rwcommunity public 127.0.0.1
trapcommunity public
rocommunity public 127.0.0.1
trapsink 192.168.0.1 public
syscontact Jephe Wu <jwu@domain.com>
syslocation someplace



4. install tigervnc* and xterm for GUI installation environment for hpsum

yum install tigervnc* xterm

5. install psp by hpsum
cd /root/
download psp for RHEL6 x86_64
mkdir psp
cd psp
tar xvpzf ../psp-xxxxxx.tar.gz

vncserver
./hpsum

6. hpsum installation

a. check only 'software update'
b. do not rebuild kernel module for certain NIC
c. install them

7. check and reboot
psp create a new set of initrd file as initramfs blabla

8. commands and references:
yum localinstall *.rpm
yum whatprovides filename


9. modify  /opt/hp/hp-snmp-agents/cma.conf 
change 
trapemail /bin/mail -s 'db01 HP Insight Management Agents Trap Alarm' root
to 
trapemail /bin/mail -s 'db1 zdb01 HP Insight Management Agents Trap Alarm' root

Set up the third mysql server to be slave of the running master-master mysql database

Jephe Wu - http://linuxtechres.blogspot.com

Environment: master(db1)-master(db2) mysql 5.5 database (replicate each other) running Oracle Linux 5.6 64bit
Objective: setup a third slave only database(db3, mysql 5.1 on CentOS 6) to sync with one of the master(db2) which the zabbix application is connecting to


Concept:

a. stop slave on db1 and db2 (mysql -uroot -ppass; sql> stop salve)
b. use mysqldump to make snapshot on db1
c. import data into db3
d. make db3 to sync with db2


Steps:

1. stop slave on both db1 and db2
mysql -uroot -ppass
sql> stop slave;
sql> show slave status\G


3. setup db3 on CentOS 6 64bit, use /srv/mysql/data as data folder

sh /usr/bin/mysql_install_db --user=mysql --datadir=/srv/mysql/data

copy back /etc/my.cnf mysqld part from db2 or db1

/etc/init.d/mysqld restart
chkconfig mysqld on
chkconfig --list mysqld


Note:
[root@db03 mysql]# tail -f /srv/mysql/data/db03.domainnameofserver.err
/usr/libexec/mysqld: Can't create/write to file '/tmpfs/ibJw77ig' (Errcode: 2)
111207 14:19:01  InnoDB: Error: unable to create temporary file; errno: 2
111207 14:19:01 [ERROR] Plugin 'InnoDB' init function returned error.
111207 14:19:01 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
111207 14:19:01 [ERROR] Unknown/unsupported table type: innodb
111207 14:19:01 [ERROR] Aborting

111207 14:19:01 [Note] /usr/libexec/mysqld: Shutdown complete

111207 14:19:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended


solution=> change /etc/my.cnf /tmpfs to /dev/shm (according to df -h)

4. use mysqldump to make snapshot

on db01,do:
stop slave;
show slave status\G

note: record down about output for master_log_file and position so that db3 will use it to sync with db2 again after importing data

mysqldump -u root -ppass zabbix [--lock-all-tables] | gzip -c | ssh db03 'cat > /srv/mysql/mysqldump_on_db01.gz'


please refer to online doc at http://dev.mysql.com/doc/refman/5.0/en/replication-howto-mysqldump.html

5. import data into db3
zcat mysqldump_on_db01.gz | mysql -uroot -ppass zabbix

6. configuring db3 as slave for db2
/etc/init.d/mysqld start
sql> show slave status\G
empty record
sql> change master to master_host='192.168.0.1', master_user='replication', master_password='password',master_log_file='mysql-bin.001080',master_log_pos=50862743;
sql> show slave status\G


7. FAQ
After downgrading to 5.1 and make mysql 5.1 as master database,  if you try to create partition, you might encounter the following error:
ERROR 1548 (HY000) at line 1: Cannot load from mysql.proc. The table is probably corrupted


Solution: on mysql 5.1, run "mysql_upgrade -uroot -ppassword", please refer to http://dev.mysql.com/doc/refman/5.1/en/mysql-upgrade.html

8. References

a. database partitioning:http://www.slideshare.net/datacharmer/partitions-performance-with-mysql-51-and-55
b. mysql 5.5 online manual for replication at http://dev.mysql.com/doc/refman/5.5/en/replication.html

Set up bonding in RHEL6

Jephe Wu - http://linuxtechres.blogspot.com

Environment: CentOS 6 x86_64 on HP Blade Proliant BL460c G6, 2 builtin broadcom NIC and a NC364m Quad Port 1Gb NIC for c-class BladeSystem
Objective: after using builtin NIC to install OS , then configure bonding for 2 of quad port NIC.

Steps:

1. OS installation
You can download http://mirror.optus.net/centos/6/isos/x86_64/CentOS-6.0-x86_64-netinstall.iso and install OS through http URL from Internet.

2. bonding configuration

Firstly, disable network manager if NM is running.
service NetworkManager stop
chkconfig NetworkManager off


a. /etc/udev/rules.d/70-persistent-net.ruels

make sure it contains all interfaces you have in the system and don't configure bond0 inside.

b. /etc/sysconfig/network-scripts/

configuring ifcfg-bond0 as follows: (assuming ip address is 192.168.0.1)
       
[root@db03 network-scripts]# more ifcfg-bond0
DEVICE="bond0"
BROADCAST="192.168.0.255"
GATEWAY="192.168.0.254"
IPADDR="192.168.0.1"
NETMASK="255.255.255.0"
ONBOOT="yes"
USERCTL=no
BONDING_OPTS="mode=0 miimon=100 downdelay=300 updelay=300 max_bonds=4"


note: you must use capital letter for left side variables, especially BONDING_OPTS, otherwise, you might not be able to see slave devices from command output of 'ifconfig'.
Do not configure BONDING_OPTS in /etc/modprobe.d/bonding.conf as suggested by Redhat.

c. /etc/sysconfig/network-scripts/ifcfg-ethX

[root@db03 network-scripts]# more ifcfg-eth*
::::::::::::::
ifcfg-eth0
::::::::::::::
DEVICE=eth0
BOOTPROTO=none
MASTER=bond0
HWADDR=00:01:02:03:04:00
ONBOOT=yes
SLAVE=yes
USERCTL=no
::::::::::::::
ifcfg-eth1
::::::::::::::
DEVICE=eth1
BOOTPROTO=none
HWADDR=00:01:02:03:04:01
ONBOOT=no
USERCTL=no
::::::::::::::
ifcfg-eth2
::::::::::::::
DEVICE=eth2
BOOTPROTO=none
MASTER=bond0
HWADDR=00:01:02:03:04:02
ONBOOT=yes
SLAVE=yes
USERCTL=no
::::::::::::::
ifcfg-eth3
::::::::::::::
DEVICE=eth3
BOOTPROTO=none
HWADDR=00:01:02:03:04:03
ONBOOT=no
USERCTL=no
::::::::::::::
ifcfg-eth4
::::::::::::::
DEVICE="eth4"
HWADDR="00:00:00:00:00:04"
NM_CONTROLLED="no"
ONBOOT="no"
::::::::::::::
ifcfg-eth5
::::::::::::::
DEVICE="eth5"
HWADDR="00:00:00:00:00:05"
NM_CONTROLLED="no"
ONBOOT="no"

note: we connected network cables for eth0 and eth2 only, so only use these 2 ports for bonding.

d. /etc/modprobe/bonding.conf
[root@db03 network-scripts]# more /etc/modprobe.d/bonding.conf
alias eth0 e1000e
alias eth1 e1000e
alias eth2 e1000e
alias eth3 e1000e
alias eth4 bnx2x
alias eth5 bnx2x
alias scsi_hostadapter cciss
alias scsi_hostadapter1 lpfc
alias bond0 bonding


3. Testing and debug
a. more /proc/net/bonding/bond0
b. ethtool eth0



4. References
a. RHEL6 deployment guide
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-networkscripts-interfaces-chan.html

b. Redhat knowledge base DOC-48159 for 'How do I configure the bonding device on Red Hat Enteprise Linux 6" or search bonding in access.redhat.com knowledge base.

Clone Oracle virtual machine to physical server

Jephe Wu - http://linuxtechres.blogspot.com

Environment: Oracle virtual machine image file .img (shutdown VM to make image), use hda as hard disk only, no any scsi driver in initrd image file. Oracle Enterprise Linux 4.5, grub-0.95-3.8.0.1 rpm which is the latest in OL4.9
Objective: Clone it to a dell PowerEdge 1950 (megaraid scsi driver)


Steps:

1.  boot from RIP CD(Recovery Is Possible) non-X version, 64bit kernel

login as root without password

vi /etc/hosts.deny to comment out the last line to allow remote ssh

vi /etc/ssh/sshd_config to allow direct root login

dhcpcd eth0 or use ifconfig to manually set up ip address

/usr/sbin/sshd

passwd root

2. transfer System.img which is Oracle VM image file over somewhere on the network which you can mount it

# file System.img

# mkdir iso

# mount System.img iso -o offset=$((512*63))

=========some examples for using offset to mount Oracle VM system image.

[root@oravmdr01 xen]# file dr2.img
dr2.img: x86 boot sector; partition 1: ID=0x83, active, starthead 0, startsector 2048, 208896 sectors; partition 2: ID=0x83, starthead 0, startsector 210944, 24954880 sectors, code offset 0x48

noet: first partition is /boot, the second one is /.

[root@oravmdr01 xen]# mount dr2.img -o offset=$((512*2048)) /tmp/a
[root@oravmdr01 xen]# ls /tmp/a
config-2.6.18-308.20.1.0.1.el5xen  grub                                   initrd-2.6.39-300.17.2.el5uek.img      symvers-2.6.32-300.21.1.el5uek.gz      System.map-2.6.32-300.21.1.el5uek   vmlinuz-2.6.32-300.21.1.el5uek  xen-syms-2.6.18-308.20.1.0.1.el5
config-2.6.18-308.4.1.0.1.el5xen   initrd-2.6.18-308.20.1.0.1.el5xen.img  lost+found                             symvers-2.6.39-300.17.2.el5uek.gz      System.map-2.6.39-300.17.2.el5uek   vmlinuz-2.6.39-300.17.2.el5uek  xen-syms-2.6.18-308.4.1.0.1.el5
config-2.6.32-300.21.1.el5uek      initrd-2.6.18-308.4.1.0.1.el5xen.img   symvers-2.6.18-308.20.1.0.1.el5xen.gz  System.map-2.6.18-308.20.1.0.1.el5xen  vmlinuz-2.6.18-308.20.1.0.1.el5xen  xen.gz-2.6.18-308.20.1.0.1.el5
config-2.6.39-300.17.2.el5uek      initrd-2.6.32-300.21.1.el5uek.img      symvers-2.6.18-308.4.1.0.1.el5xen.gz   System.map-2.6.18-308.4.1.0.1.el5xen   vmlinuz-2.6.18-308.4.1.0.1.el5xen   xen.gz-2.6.18-308.4.1.0.1.el5
[root@oravmdr01 xen]# umount /tmp/a

[root@oravmdr01 xen]# mount dr2.img -o offset=$((512*210944)) /tmp/a
[root@oravmdr01 xen]# ls /tmp/a
bin  boot  dev  etc  home  lib  lib64  lost+found  media  misc  mnt  net  opt  proc  root  sbin  selinux  srv  stage  sys  tftpboot  tmp  u01  usr  var
============================


3. transfer file system over physical machine

make partition for the physical server hard disk - sda(2x72G raid1)
fdisk /dev/sda
cd /mnt
mkdir sda
mount /dev/sda1 sda
cd sda
ssh remoteserver 'cd /path/to/iso; tar cvpf - . | tar xvpf -'


4. generate mkinitrd and modify some configuration

initrd:

vi /etc/modprobe.conf

alias scsi_hostadapter megaraid_sas
alias scsi_hostadapter1 ata_piix


cd /boot

mkinitrd -f /boot/initrd.img.new 2.6.9xxxxsmp (check folder name under /lib/modules)

vi /boot/grub/grub.conf to modify accordingly to use above generated initrd.img.new

configurations:

cd /mnt/sda
chroot .
modify /etc/sysconfig/network-scritps/ifcfg-eth0 and ifcfg-eth1 if necessary
modify /etc/sysconfig/network
modify /etc/fstab
modify /etc/grub.conf
cp /etc/blkid /etc/blkid.bak
rm -f /etc/blkid/*
cp /etc/mtab /etc/mtab.bak
vi /etc/mtab (to correct the / and /boot line to have the correct partition name)
vi /boot/grub/device.map to something like '(hd0)  /dev/sda'
cd /dev
./MAKEDEV sda


After that, create root, sys and proc directory for cloned virtual server.

cd /mnt/sda
mkdir proc sys root


5. generate MBR

(install it again everytime you modify file /etc/grub.conf)

# grub-install hd0

Or after reboot, at grub prompt:\

root (hd0,0)  -> hd0 means the first hard disk, 0 means /boot directory is sitting on the first partition, if /dev/sda3 is /boot partition, this should be 'root (hd0,2)'

find /grub/stage1
find /grub/stage2
setup (hd0)

reboot

Note: if you cannot use grub-install to install MBR for grub boot loader, you can use RIP CD to boot GRUB from cdrom or usb option to boot the OS on the first hard disk.

then you can fix it from actual os itself.

------
/etc/grub.conf
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-100.26.2.el5uek)
    root (hd0,0)
    kernel /vmlinuz-2.6.32-100.26.2.el5 ro root=/dev/sda3
    initrd /initrd-2.6.32-100.26.2.el5.img 


note:
a. splashimage=(hd0,0)/grub/splash.xpm.gz
means /boot partition is the first partition on first hard disk hd0, it's /grub, not /boot/grub
b. kernel /vmlinuz-2.6.32-100.26.2.el5 ro root=/dev/sda3
means /vmlinuz-2.6.32-100.26.2.el5, not /boot/vmlinuz-2.6.32-100.26.2.el5 since it's already specified root (hd0,0)
c. initrd /initrd-2.6.32-100.26.2.el5.img
means /initrd-2.6.32-100.26.2.el5.img, not /boot/initrd-2.6.32-100.26.2.el5.img if /boot is a separated partition

6. case study

/boot is not separated partition, after clone, it becomes seperated one, you need to change grub configuration as follows:

previous: (no /boot partition, only / one partition)
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-100.26.2.el5)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.32-100.26.2.el5 ro root=LABEL=/
        initrd /boot/initrd-2.6.32-100.26.2.el5.img

after: (/boot is at /dev/sda3)
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,2)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-100.26.2.el5)
        root (hd0,2)
        kernel /vmlinuz-2.6.32-100.26.2.el5 ro root=/dev/sda1
        initrd /initrd-2.6.32-100.26.2.el5.img.new


7. possible solution
a. error: /boot/grub/stage1 not read correctly when I tried to install GRUB by grub-install /dev/sdc to the new drive.


Grub consults your /etc/fstab and /etc/mtab files to determine which partition/drive the '/boot' directory is actually on, so check those files are correct. (See the next section for more details about this.)

solution 1: to create a separated /boot partition after boot from RIP CD boot loader, then modify /etc/grub.conf and install grub again:

grub-install /dev/sda

or

find /boot/grub/stage1
root (hd0,0)
find /boot/grub/stage1
setup hd0

solution 2:
Problem might be that the old legacy GRUB can not handle GPT partition tables. You can simply fix this with the tool gptsync:

gptsync /dev/sdc ( not tested)

How to make mysql slave db to sync with master again

Jephe Wu - http://linuxtechres.blogspot.com

Environment: mysql 5.5, CentOS 5.6 64bit for both master(192.168.0.1) and slave(192.168.0.2)
Objective: to make slave db to sync with master again

You can use manual method as stated on http://mysql-mmm.org/mmm2:guide for configuring mysql replication.


Steps:

1. [slave] stop slave mysql database
/etc/init.d/mysql stop
rm -fr /srv/mysql/data /srv/mysql/log


2. [master]
use mylvmbackup(http://www.lenzg.net/mylvmbackup/) to create snapshot for LVM volume where mysql db data and log file resides.
modify /etc/mylvmbackup.conf to keep_snapshot=1 and keep_mount=1

# copy mounted snapshot data and log files to slave database
scp -pr /snap/backup/data /snap/log slavedb:/srv/mysql/

# record down the master log file and position
root@db02:/srv/mysql-db2-snap/backup-pos/ # more backup-20111129_110147_mysql.pos
Master:File=mysql-bin.000046
Master:Position=25449360

Note: log folder on master database is not useful if there's no slave database which will use it, you can rm -fr log/* after shutting down master database, then make code backup for data folder only.

3. [slave]
cd /srv/mysql
chown mysql:mysql -R data log
/etc/init.d/mysql start
mysql -uroot -ppassword
mysql> show slave status\G

mysql> stop slave;
mysql> change master to MASTER_HOST='192.168.0.1', MASTER_USER='slave1_user', MASTER_PASSWORD='slave1_password', MASTER_LOG_FILE='mysql-bin.000046', MASTER_LOG_POS=25449360;

or after checking 'show slave status\G', the master host is correct, you can just use:
mysql> change master to master_log_file='mysql-bin.000046', MASTER_LOG_POS=25449360;

mysql> start slave;
mysql> show slave status;  make sure it shows:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

monitoring  Seconds_Behind_Master: 7770 column to make sure it's reducing until 0 which means it finishs syncing with master again.

4. [slave] make master to sync with slave too
on slavedb, run
mysql> show master status\G;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000055 | 17584079 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

5. [master] make it sync with slave
mysql> stop slave;
if not, maybe
mysql> reset slave;
mysql> change master to master_log_file='mysql-bin.000055', MASTER_LOG_POS=17584079;
mysql> start slave;
mysql> show slave status\G

note: you have to run 'show master status' first on slavedb, then record down master_log_file and master_log_pos, then use it on master db to make it sync with slave also.

6. [master] use pt-table-checksum to compare 2 databases
show tables status;
create database pt;
pt-table-checksum -uroot -ppassword  --create-replicate --replicate pt.checksums --databases zabbix --empty-replicate-table --chunk-size=5000 localhost

pt-table-checksum -uroot -ppassword  --create-replicate --replicate pt.checksums --databases zabbix --empty-replicate-table --chunk-size=5000 localhost --replicate-check=1

echo $?
0
----------------------------
or skip some huge tables
----------------------
pt-table-checksum -uroot -ppassword  --create-replicate --replicate pt.checksums --databases zabbix --empty-replicate-table --chunk-size=30000 --chunk-size-limit 50  localhost  --ignore-tables events,history,history_uint,trends,trends_uint --replicate-check 1

note:use phpmyadmin or show table status to get rough size of tables which might be skipped for pt-table-checksum.
 --------------------------
If it's different between master and slaves, it will show by paramter --replicate-check 1

pt-table-checksum -uroot -ppassword  --create-replicate --replicate pt.checksums --databases mysql --empty-replicate-table --chunk-size=30000 --chunk-size-limit 50  localhost --replicate-check 1
Differences on P=3306,h=db03
DB    TBL           CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
mysql db                0        1        1 1=1
mysql help_keyword      0       -1        1 1=1
mysql help_relation     0        1        1 1=1
mysql help_topic        0        1        1 1=1
mysql proc              0        0        1 1=1

echo $?
1
------------------
You can make mysql database itself sync between master and slaves by the following method:

on master:
mysqldump --add-drop-table mysql -uroot -ppassword | ssh db03 'mysql -uroot -ppassword mysql'

mysql -uroot -ppassword -h db03
sql> flush privileges;

FAQ:
root@db02:/etc/cron.d/ # pt-table-checksum -uroot -ppassword  --create-replicate --replicate pt.checksums --databases zabbix --empty-replicate-table --chunk-size=5000 localhost
You do not have the PROCESS privilege at /usr/bin/pt-table-checksum line 3761.
root@db02:/etc/cron.d/ # mysql -uroot -ppassword -h slavedb zabbix
ERROR 1044 (42000): Access denied for user 'root'@'192.168.7.2' to database 'zabbix'

[root@db03 sysconfig]# mysql -uroot -ppassword
mysql> grant all on zabbix.* to root@'192.168.7.22' identified by 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql>  exit

 try again.
or

The following method will definitely work when using pt-table-checksum:

mysql> grant all privileges on *.* to root@'172.17.7.22' identified by 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)



otherwise, you might encounter error like 'You do not have the PROCESS privilege at /usr/bin/pt-table-checksum line 3761.'


References:
1. how to completely delete snapshot logical volume
df -h
umount /mountpoint_for_snapshot_volume
lvdisplay -v
lvremove /dev/VolGroup02/snap02
cd /dev/mapper
dmsetup remove snap02
or
kpartx -d xxxxx


2. purge master log - http://dev.mysql.com/doc/refman/5.0/en/purge-binary-logs.html
This statement is safe to run while slaves are replicating. You need not stop them, it will Purge all log files up to but not including the target file.
example:
PURGE BINARY LOGS TO 'mysql-bin.010';
PURGE BINARY LOGS BEFORE '2008-04-02 22:46:26';

You can also set the expire_logs_days system variable to expire binary log files automatically after a given number of days (see Section 5.1.3, 揝erver System Variables?. If you are using replication, you should set the variable no lower than the maximum number of days your slaves might lag behind the master.

mysql> show master logs;


#!/bin/bash
CURRENT_LOGFILE=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | awk '$1 == "Master_Log_File:" {print $2}')
/usr/bin/mysql -h MASTER -e "PURGE MASTER LOGS TO '${CURRENT_LOGFILE}'"
exit $?


note: cronjob script to purge master log on slave db:


3. master/slave /etc/my.cnf configuration:
 note: server_id must be different. If not, change it, restart mysqld service. 
sql> reset slave;
sql> change master to ....
sql> start slave;


slave:

root@db01:/srv/mysql/log/ # cat /etc/my.cnf | grep -v ^# | grep -v ^$
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
default-storage-engine=innodb
innodb_file_per_table
innodb_lock_wait_timeout = 150
innodb_rollback_on_timeout
innodb_buffer_pool_size=20000M
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=4M
innodb_thread_concurrency=8
transaction-isolation=READ-COMMITTED
log-queries-not-using-indexes
key_buffer_size = 64M
user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /srv/mysql/data
tmpdir          = /tmpfs
skip-external-locking
bind-address            = 0.0.0.0
key_buffer              = 128M
max_allowed_packet      = 128M
thread_stack            = 192K
thread_cache_size       = 8
myisam-recover         = BACKUP
max_connections=512
tmp_table_size=256M
max_heap_table_size=256M
table_cache=1024
thread_cache=16
read_buffer_size        = 2M
read_rnd_buffer_size    = 2M  
sort_buffer_size        = 2M   
thread_stack            = 256K
join_buffer_size        = 8M   
log-queries-not-using-indexes
thread_concurrency     = 32
query_cache_type = 1
query_cache_size = 256M
query_cache_limit = 16M
log_error                = /srv/mysql/log/error.log
log_slow_queries        = /srv/mysql/log/mysql-slow.log
long_query_time = 2
server_id=1
log_bin             = /srv/mysql/log/mysql-bin.log
log_bin_index       = /srv/mysql/log/mysql-bin.log.index
relay_log           = /srv/mysql/log/mysql-relay-bin
relay_log_index     = /srv/mysql/log/mysql-relay-bin.index
expire_logs_days    = 2
max_binlog_size     = 100M
log_slave_updates   = 1
binlog-format=MIXED
sync_binlog=1

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]
[isamchk]
key_buffer              = 16M
------------------------------------
root@db02:/srv # cat /etc/my.cnf | grep -v ^# | grep -v ^$
[client]
port        = 3306
socket        = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket        = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
default-storage-engine=innodb
innodb_file_per_table
innodb_lock_wait_timeout = 150
innodb_rollback_on_timeout
innodb_buffer_pool_size=20000M
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=4M
innodb_thread_concurrency=8
transaction-isolation=READ-COMMITTED
log-queries-not-using-indexes
key_buffer_size = 64M
user        = mysql
socket        = /var/run/mysqld/mysqld.sock
port        = 3306
basedir        = /usr
datadir        = /srv/mysql/data
tmpdir        = /tmpfs
skip-external-locking
bind-address        = 0.0.0.0
key_buffer        = 128M
max_allowed_packet    = 128M
thread_stack        = 192K
thread_cache_size       = 8
myisam-recover         = BACKUP
max_connections=512
tmp_table_size=256M
max_heap_table_size=256M
table_cache=1024
thread_cache=16
read_buffer_size        = 2M
read_rnd_buffer_size    = 2M   
sort_buffer_size        = 2M  
thread_stack            = 256K 
join_buffer_size        = 8M  
log-queries-not-using-indexes
thread_concurrency     = 32
query_cache_type = 1
query_cache_size = 256M
query_cache_limit = 16M
log_error                = /srv/mysql/log/error.log
log_slow_queries    = /srv/mysql/log/mysql-slow.log
long_query_time = 2
server_id=2
log_bin             = /srv/mysql/log/mysql-bin.log
log_bin_index       = /srv/mysql/log/mysql-bin.log.index
relay_log           = /srv/mysql/log/mysql-relay-bin
relay_log_index     = /srv/mysql/log/mysql-relay-bin.index
expire_logs_days    = 2
max_binlog_size     = 100M
log_slave_updates   = 1
binlog-format=MIXED


#added by Jephe to ignore replication error, use all or comma separated error codes,such as 1507 for dropping non-existed partitions.
#slave-skip-errors=all

[mysqldump]
quick
quote-names
max_allowed_packet    = 16M

[mysql]
[isamchk]
key_buffer        = 16M

4. commands
a. grant pt.checksums table to my pc ip 192.168.0.100 with login id root and password as 'password'

mysql> grant all privileges on pt.* to root@'192.168.0.100' identified by 'password';
mysql> flush privileges;
mysql> show grants;


5. documents
http://www.howtoforge.com/mysql5_master_master_replication_debian_etch


6. References

http://dev.mysql.com/doc/refman/5.5/en/replication.html
http://mysql-mmm.org/mmm2:guide  - 2 masters/2 slaves mysql db and mmm2 setup guide 


7. how to ignore db for replication setup ( http://www.packtpub.com/article/install-manage-use-mmm-for-mysql-high-availability )
Now configure the mysqld section /etc/my.cnf on both nodes with the following steps:

  1. Prevent the server from modifying its data until told to do so by MMM. Note that this does not apply to users with SUPER privilege (that is, probably you at the command line!):
    read-only
  2. Prevent the server from modifying its mysql database as a result of a replicated query it receives as a slave:
    replicate-ignore-db = mysql
  3. Prevent this server from logging changes to its mysql database:
    binlog-ignore-db = mysql

how to access IBM p5 ASMI through firefox

Jephe Wu - http://linuxrechres.blogspot.com

Environment: IBM P5 ASMI IP address 192.168.0.254.
Problem: firefox and IE are not able to access https://192.168.0.254



Steps: 

1. use default ip address (192.168.2.147/24 and 192.168.3.147), laptop and crossover cable to configure ASMI ip address to 192.168.0.254.

2. use elinks to access it , then you can only see login page.

3. enable security.ssl3.rsa_rc4_40_md5 in firefox about:config to access https://192.168.0.254

References:
Front cover Integrated Virtualization Manager on IBM System p5 - http://www.redbooks.ibm.com/redpapers/pdfs/redp4061.pdf