Jephe Wu - http://linuxtechres.blogspot.com
Environment: openfiler as iscsi server, node1 and node2 are OL5.7, all of them are running under VirtualBox VM, Oracle 10gR2 10.2.0.1 clusterware
IP address assignment:
node1: eth0(public): 192.168.1.100, eth1(priv):172.16.1.100, vip:(going to be eth0:0) 192.168.1.102
node2: eth0(public): 192.168.1.101, eth1(priv):172.16.1.101, vip:(going to be eth0:0) 192.168.1.103
openfiler: 172.16.1.200 web login: https://172.16.1.200 (login as : openfiler/password)
Part I: OS installation
a. oracle user and groups
# groupadd oinstall
# groupadd dba
# groupadd oper
# useradd -u 200 -g oinstall -G dba[,oper] oracle
note: make sure all cluseter nodes has the same user and group id, otherwise ,it will fail to install clusterware.
b. verify nobody user exists
# id nobody
c. configure ssh on all nodes
make sure you can ssh into all nodes, public names and priv interconnect names, not vip ip address
d. check hardware requirements
memory size
swap size
e. NFS (http://download.oracle.com/docs/cd/B19306_01/install.102/b14203/prelinux.htm)
If you are using NFS for your shared storage, then you must set the values for the NFS buffer size parameters
rsize and wsize to at least 16384. Oracle recommends that you use the value 32768.
For example, if you decide to use rsize and wsize buffer settings with the value 16384, then update the /etc/fstab
file on each node with an entry similar to the following:
clusternode:/vol/DATA/oradata /home/oracle/netapp nfs
rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,actimeo=0 1 2
f. NTP for both nodes
g. /etc/hosts and dns
For each node, register one virtual host name and IP address in DNS.
For each private interface on every node, add a line similar to the following to the /etc/hosts file on all nodes,
specifying the private IP address and associated private host name:
h. /etc/sysctl.conf
kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 262144
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
run sysctl -p
i. Add the following lines to the /etc/security/limits.conf file:
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
j. Add or edit the following line in the /etc/pam.d/login file, if it does not already exist:
session required /lib/security/pam_limits.so
note: should not add above, according to redhat KB, otherwise you cannot login from console
Why the system text console cannot be login? - Article ID: 52661
the following should be added to /etc/profile:
if [ $USER = "oracle" ]; then
if [ $SHELL = "/bin/ksh" ]; then
ulimit -p 16384
ulimit -n 65536
else
ulimit -u 16384 -n 65536
fi
fi
k. identify oracle base and home directories
more /etc/oraInst.loc - Inventory directory
more /etc/oratab - Oracle home directory
l. create oracle base and clusterware home directory
mkdir -p /u01/app/oracle/
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle (/u01 is the mount point)
mkdir -p /u01/crs/oracle/product/10/crs
chown root:oinstall /u01/crs/
chmod -R 775 /u01/crs/oracle
if mount point is /u01/, then the recommended Oracle clusterware home directory is
/u01/crs/oracle/product/10.2.1/crs
m. Verifying Hangcheck-timer Module on Kernel 2.6
For Red Hat Linux 4.0 and SUSE 9 systems, to verify that the hangcheck-timer module is running on every node:
Enter the following command on each node to determine which kernel modules are loaded:
# /sbin/lsmod
If the hangcheck-timer module is not listed for any node, then enter a command similar to the following to
start the module located in the directories of the current kernel version:
# insmod /lib/modules/kernel_version/kernel/drivers/char/hangcheck-timer.ko hangcheck_tick=1
hangcheck_margin=10
In the preceding command example, the variable kernel_version is the kernel version running on your system.
To confirm that the hangcheck module is loaded, enter the following command:
# lsmod | grep hang
The output should be similar to the following:
hangcheck_timer 3289 0
To ensure that the module is loaded every time the system restarts, verify that the local system startup file
contains the command shown in the previous step, or add it if necessary:
Red Hat:
On Red Hat Enterprise Linux systems, add the command to the /etc/rc.d/rc.local file.
SUSE:
On SUSE systems, add the command to the /etc/init.d/boot.local file.
configure .bash_profile
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_HOME/product/10.2.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.2.0/crs
export PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin
2. clusterware installation
Please refer to http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html for complete
screenshot example
and also
http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2
2.1 os requirements
run as oracle for ./runInstaller to install
Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5)
Checking operating system version: must be redhat-3, SuSE-9, redhat-4, UnitedLinux-1.0, asianux-1 or asianux-2
Failed <<<<
-> Prerequisite Checks Fail When Installing 10.2 On Red Hat 5 (RHEL5) [ID 456634.1]
If you are installing 10.2 from DVD, copy the <path>/database/install/oraparam.ini to a temporary directory (for
example, /tmp).
If you are installing 10.2 from an OTN download or have copied the 10.2 media to disk, take a backup of
<path>/database/install/oraparam.ini
Now edit oraparam.ini and change the appropriate line:
Original
[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2
New
[Certified Versions]
Linux=redhat-3,SuSE-9,redhat-4,UnitedLinux-1.0,asianux-1,asianux-2,redhat-5
note:
a. do not use ifconfig to configure vip before installing clusterware, just configure them in /etc/hosts
for example:
#cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.100 jephe1.jephe.com jephe1
192.168.1.101 jephe2.jephe.com jephe2
172.16.1.100 jephe1-priv.jephe.com jephe1-priv
192.168.1.102 jephe1-vip.jephe.com jephe1-vip
172.16.1.101 jephe2-priv.jephe.com jephe2-priv
192.168.1.103 jephe2-vip.jephe.com jephe2-vip
OCR configration:
normal redundancy
specify OCR Location: /dev/raw/raw1
specify OCR Mirror Location: /dev/raw/raw2
When installing database software, you can choose 'configure ASM', then choose external for redundancy, then choose part of the raw disk for 'DATA' group, later we will run asmca to configure another disk group RECOVERY.
2.2 How To Setup UDEV Rules For RAC OCR And Voting Devices On SLES10, RHEL5, OEL5, OL5
- refer to http://www.held.org.il/blog/2007/11/setting-a-raw-device-in-redhatcentos-5/
configure /etc/sysconfig/rawdevices first, then service rawdevices restart to see those raw devices appear under /dev/raw/*
The following part are optional:
================
Add the required raw device ownership and permissions, for example:
a. Add to /etc/udev/rules.d/60-raw.rules:
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION=="add", KERNEL=="sdc1", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdd1", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="sde1", RUN+="/bin/raw /dev/raw/raw4 %N"
ACTION=="add", KERNEL=="sdf1", RUN+="/bin/raw /dev/raw/raw5 %N"
==================
b. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules
containing lines such as:
KERNEL==”raw[1-5]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”
Notice this:
The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
The following permissions have to apply:
OCR Device(s): root:oinstall , mode 0640
Voting device(s): oracle:oinstall, mode 0666
need to test the following commands:
# /sbin/udevcontrol reload_rules
# /sbin/start_udev
Or to add the following to 50-udev.rules
KERNEL=="vcsa[0-9]*", NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="vcc/*", NAME="%k", OWNER="vcsa", GROUP="tty", OPTIONS="last_rule"
KERNEL=="raw[1-9]", OWNER="oracle", GROUP="oinstall", MODE="0640"
KERNEL=="raw10", OWNER="oracle", GROUP="oinstall", MODE="0640"
# memory devices
KERNEL=="random", MODE="0666", OPTIONS="last_rule"
KERNEL=="urandom", MODE="0444", OPTIONS="last_rule"
KERNEL=="mem", GROUP="kmem", MODE="0640", OPTIONS="last_rule"
permission is very important, otherwise, you might get error like this:
when run 'crs_stat -t', node2 is offline and when run 'ps -efH' on node2, you will find 'startcheck' like this:
check /var/log/messages, you will find this:
Feb 5 12:30:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb 5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb 5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.
Feb 5 12:31:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3178.
Feb 5 12:32:17 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3353.
Feb 5 12:32:18 node2 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.3193.
[root@node2 ~]# more /tmp/crsctl.3178
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]
check crs to confirm error:
/u01/oracle/product/crs/bin/crsctl check crs
References:
http://surachartopun.com/2009/04/why-my-oracle-cluster-could-not-start.html
If one node is down, when running 'crs_stat -t', it will show this:
[oracle@node2 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....en1.gsd application ONLINE OFFLINE
ora....en1.ons application ONLINE OFFLINE
ora....en1.vip application ONLINE ONLINE node2
ora....en2.gsd application ONLINE ONLINE node2
ora....en2.ons application ONLINE ONLINE node2
ora....en2.vip application ONLINE ONLINE node2
Finally, if you encounter errors like this:
an error occurred during the interview for this component. oracle clusterware unable to retrieve voting disk information
then you need to 'rm -fr /u01/app/oracle/*' on node1 , then start ./runInstaller again.
2.3. at the end of clusterware installation, you will be prompted to run 2 programs which need root access permission.
permission,follow by the sequences below:
first program on first node
first program on second node
/home/oracle/oraInventory/orainstRoot.sh
ssh jephe2 -l root
/home/oracle/oraInventory/orainstRoot.sh
exit
/home/oracle/oracle/product/10.2.0/crs/root.sh
ssh jephe2 -l root
/home/oracle/oracle/product/10.2.0/crs/root.sh
second program on first node
second program on second node
basically, you need to finish the number 1 script on all nodes first before going to the next script.
also, run it under GUI interface, aka, X windows in vnc on node2 as root user, not oracle user, otherwise vipca will fail, vipca will use GUI
When you run the second /u01/app/oracle/product/10.2.0/crs/root.sh, there might be error like this:
[root@oratest1 oratest1]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Failed to upgrade Oracle Cluster Registry configuration
=>check log for detail: /u01/app/oracle/product/10.2.0/crs/log/oratest1/alertoratest1.log
see metalink Executing root.sh errors with "Failed To Upgrade Oracle Cluster Registry Configuration" [ID 466673.1]
=> solution is
a) replace the biary according to above metalink
b) dd if=/dev/zero of=/dev/raw/raw1 bs=1024, (no need to finish, ctrl -c to cancel it)
run above root.sh again on node1
c. FAQ
c.1 some error at the end of running second root program on the second node:
Running vipca(silent) for configuring nodeapps
/home/oracle/oracle/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries: libpthread.so.0:
cannot open shared object file: No such file or directory
-> this is a bug, due to newer version of glibc(RHEL5) is incompatible with Java, need to modify vipca script as follows:
vi /home/oracle/oracle/product/10.2.0/crs/bin/vipca
change
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
to:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
unset LD_ASSUME_KERNEL
#add this line to uncomment variable LD_ASSUME_KERNEL
Now, login xwindows on second node, run /home/oracle/oracle/product/10.2.0/crs/bin/vipca again to get errors below: (this error is unavoidable if you are using private ip range for public interface)
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
solution:
$ oifcfg iflist
eth0 192.168.1.0
eth1 172.16.1.0
$ oifcfg setif -global eth0/192.168.1.0:public
$ oifcfg setif -global eth1/172.16.1.0:cluster_interconnect
$ oifcfg getif
eth0 192.168.1.0 global public
eth1 172.16.1.0 global cluster_interconnect
# vipca (run on the second node with root user login at X windows, don't run it with oracle user then su - as root)
just choose eth0 as vip interface, do not choose eth0:0, otherwise, after finishing installation, the vip will become eth0:0:1 for 192.168.1.102 for node1
You will be asked for entering ip alias name and ip address for both codegen1 and codegen2. Firstly, enter vip address for codegen1: 192.168.1.102, then the rest will come out automatically.
node1-vip.jephe.com and node2-vip.jephe.com are ip alias name.
after finishing configure vipca, back to CRS installation GUI, click ok to finish installation of clusterware
If environment variable
$ORA_CRS_HOME/cfgtoollogs/configToolFailedCommands.sh
If you got error: PRKR-1062 : Failed to find configuration for node codegen1,
then your /etc/hosts might be missing domain name such as
192.168.1.100 jephe1
not
192.168.1.100 jephe1.jephe.com jephe1
==============
cronjob for checking RAC configuration:
---------
#!/bin/sh
. /home/oracle/.bash_profile
DATE=`date +%Y%m%d`
TMPFILE=`mktemp`
echo "running ocrconfig -showbackup" > $TMPFILE
ocrconfig -showbackup >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running ocrcheck" >> $TMPFILE
ocrcheck >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "rnning olsnodes -n -p -i" >> $TMPFILE
olsnodes -n -p -i >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running crsctl query css votedisk" >> $TMPFILE
crsctl query css votedisk >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running crsctl check crs" >> $TMPFILE
crsctl check crs >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running oifcfg iflist -p -n" >> $TMPFILE
oifcfg iflist -p -n >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running oifcfg getif" >> $TMPFILE
oifcfg getif >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-01 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-01 -a -g -s -l >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "running srvctl config nodeapps -n oradb-02 -a -g -s -l" >> $TMPFILE
srvctl config nodeapps -n oradb-02 -a -g -s -l >> $TMPFILE 2>&1
echo "" >> $TMPFILE
echo "cat /etc/oracle/ocr.loc" >> $TMPFILE
cat /etc/oracle/ocr.loc >> $TMPFILE
echo "" >> $TMPFILE
echo "crs_stat -t -v" >> $TMPFILE
crs_stat -t -v >> $TMPFILE
tar cpzf /tmp/crs.tar.gz /u01/crs/oracle/product/10.2.0/crs/cdata/crs
mutt -a /tmp/crs.tar.gz -s "crs/ocr status and backup on $DATE" jwu@domain.com < $TMPFILE
rm -f $TMPFILE
---------------------
Part II: openfiler iscsi:
How to Dynamically Add and Remove SCSI Devices on Linux [ID 603868.1]
iscsiadm -m discovery -t sendtargets -p 192.168.1.5
chkconfig iscsid on
chkconfig iscsi on
service iscsi resart
cd /var/lib/iscsi;ls
Reference: http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html
yum install lsscsi
lsscsi
cat /proc/scsi/scsi
grep host /etc/modprobe.conf
ls -ld /sys/class/scsi_host/host*
dmsetup ls | sort
multipath -d -ll
raw -qa
ls -l /dev/mapper/
ls -l /dev/raw/
ocrcheck
crsctl query css votedisk
crsctl check crs
ocrconfig -showbackup
cat /etc/oracle/ocr.loc
where is the setting for RAC service preference nodes?
nodes applications contain listener, gsnd etc?
Firstly, Oracle checks /etc/oracle/ocr.loc to find out where is the ocr (raw1 and raw2), then from ocr, to find out where are the voting disk devices.
[root@codegen1 bin]# lsscsi
[0:0:0:0] disk ATA VBOX HARDDISK 1.0 /dev/sda
[2:0:0:0] cd/dvd VBOX CD-ROM 1.0 /dev/sr0
[3:0:0:0] disk OPNFILER VIRTUAL-DISK 0 /dev/sdb
[3:0:0:1] disk OPNFILER VIRTUAL-DISK 0 /dev/sdc
[3:0:0:2] disk OPNFILER VIRTUAL-DISK 0 /dev/sdd
[3:0:0:3] disk OPNFILER VIRTUAL-DISK 0 /dev/sde
[3:0:0:4] disk OPNFILER VIRTUAL-DISK 0 /dev/sdf
[3:0:0:5] disk OPNFILER VIRTUAL-DISK 0 /dev/sdg
[3:0:0:6] disk OPNFILER VIRTUAL-DISK 0 /dev/sdh
[3:0:0:7] disk OPNFILER VIRTUAL-DISK 0 /dev/sdi
[3:0:0:8] disk OPNFILER VIRTUAL-DISK 0 /dev/sdj
[3:0:0:9] disk OPNFILER VIRTUAL-DISK 0 /dev/sdk
5.1. Resizing an Online Multipath Device
If you need to resize an online multipath device, use the following procedure.
Resize your physical device.
Use the following command to find the paths to the LUN:
# multipath -l [ -ll] [-d -ll]
Resize your paths. For SCSI devices, writing a 1 to the rescan file for the device causes the SCSI driver to rescan, as in the following command:
# echo 1 > /sys/block/device_name/device/rescan
Resize your multipath device by running the multipathd resize command:
# multipathd -k'resize map mpath0'
Resize the filesystem (assuming no LVM or DOS partitions are used):
# resize2fs /dev/mapper/mpath0
For further information on resizing an online LUN, see the Online Storage Reconfiguration Guide.
echo "scsi add-single-device 3 0 0 0" > /proc/scsi/scsi
Reference:
1. http://space.itpub.net/21162451/viewspace-696413 - RHEL5.4+Oracle10gR2 RAC+OCFS2
2. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/10g/r2/rac_r2_work/02_01_crs_install/02_01_crs_inst
all_viewlet_swf.html - Install 10g clusterware livedemo from Oracle
3. http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/11g/r1/clusterware/installation_of_oracle_clusterware/installation_of_oracle_clusterware_viewlet_swf.html - Install clusterware on Oracle 11gR1
4. install oracle 11gR2 database - http://st-curriculum.oracle.com/obe/db/11g/r2/2day_dba/install/install.htm
5. http://oracleinstance.blogspot.com/2010/03/oracle-10g-installation-in-linux-5.html