要利用两台140实现OPS环境首先需要实现concurrent环境。还是利用以上的硬件环境,对hacmp作如下设置,hacmp软件版本为hacmp escrm 4.4.1。
1。SERVICE网卡
对于concurrent环境,传统的做法是通过3个rg组成,其中两个rg为cascading方式,只包含一个svc ip地址。另一个rg为concurrent环境,包含了concurrent vg。其实现在oracle提供的OPS和RAC HACMP配置建议中,就非常简单,网卡直接设为SVC地址,连standby和boot都不用配。于是把两台140的内置网卡设为nodea_svc,nodeb_svc,在hacmp的toplogy中只设了两个对应的adapter。
2。concurrent vg
手册上说concurrent模式要求concurrent vg的硬盘由ssa硬盘或者raid硬盘组成,而现在共享硬盘只是一块独立的scsi硬盘,这能行吗?带着这个疑问还是继续测试下去。首先建立sharevg,对于ssa硬盘可以把vg建为concurrent capable,而对于其它raid硬盘就不能设concurrent capable为yes,raid硬盘是通过hacmp来实现concurrent共享。基于此,在建sharevg时没有设concurrent capable
3。concurrent rg
配置好sharevg,两个节点两边同步后,在hacmp中建了一个concurrent 模式的rg,包含了只包含一个sharevg。至于app,就没有配,主要想先测试好concurrent环境,app等oracle安装后再配置也不迟。
4。关键部分
以下是整个调试过程的关键部分,在ha两边同步顺利后,两边开始启动ha,由于网卡已开始就设为svc,所以就看不到boot到svc的改变。用lsvg -o 检查共享vg是否varyon,发现没有。检查hacmp.out文件看到如下的错误:
。。。
cl_raid_vg[97] cl_raid_vg[97] lsdev -Cc disk -l hdisk1 -F type
DEVTYPE=scsd
cl_raid_vg[103] grep -qw scsd /usr/es/sbin/cluster/diag/clconraid.dat
cl_raid_vg[106] THISTYPE=disk
cl_raid_vg[106] [[ -z ]]
cl_raid_vg[116] FIRSTTYPE=disk
cl_raid_vg[123] [[ disk = array ]]
cl_raid_vg[128] exit 1
cl_mode3[166] cl_log 485 cl_mode3: Failed concurrent varyon of sharevg\n
cl_log[50] version=1.9
cl_log[92] SYSLOG_FILE=/usr/es/adm/cluster.log
*******
Aug 1 2003 17:42:24 !!!!!!!!!! ERROR !!!!!!!!!!
*******
Aug 1 2003 17:42:24 cl_mode3: Failed concurrent varyon of sharevg because it is not made up of known RAID devices.
cl_mode3[168] STATUS=1
cl_mode3[217] exit 1
。。。
看来手册没有骗我,不过不能就此放弃。要知道hacmp其实是通过脚本和事件来实现,看来得对脚本作点手脚了。
在hacmp系统目录中.../utils中存储了很多运行脚本,其中与现在问题有关的是cl_mode3脚本。以下是该脚本全文(贴出来是希望大家也能看看):
#!/bin/ksh
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
#
#
# Licensed Materials - Property of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 1990,2001
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
#
# IBM_PROLOG_END_TAG
# @(#)27 1.9 src/43haes/usr/sbin/cluster/events/utils/cl_mode3.sh, hacmp.events, 43haes_rmo2, rmo2s01b 5/31/01 16:36:46
###################
#
# COMPONENT_NAME: EVENTUTILS
#
# FUNCTIONS: none
#
###################
###################
#
# Name: cl_mode3
#
# Returns:
# 0 - All of the volume groups are successfully varied on/changed mode
# 1 - varyonvg/mode change of at least one volume group failed
# 2 - Zero arguments were passed
#
# This function will place the volume groups passed in as arguments in
# the designated mode .
#
# Arguments: -s Varyon volume group in mode 3 with sync
# -n Varyon volume group in mode 3 without sync
#
# Environment: VERBOSE_LOGGING, PATH
#
###################
PROGNAME=$(basename ${0})
export PATH="$($(dirname ${0})/../../utilities/cl_get_path all)"
[[ "$VERBOSE_LOGGING" = "high" ]] && set -x
[[ "$VERBOSE_LOGGING" = "high" ]] && version='1.9'
HA_DIR="$(cl_get_path)"
if (( $# < 2 )) ; then
# Caller used incorrect syntax
cl_echo 204 "usage: $PROGNAME [-n | -s] volume_groups_to_varyon" $PROGNAME
exit 2
fi
if [[ $1 = "-n" ]] ; then # sync or no sync
SYNCFLAG="-n"
else
SYNCFLAG="" # LVM default is "sync"
fi
if [[ -z ${EMULATE} ]] ; then
EMULATE="REAL"
fi
STATUS=0
set -u
# Get volume groups, past the sync|nosync flag
shift
for vg in $*
do
VGID=$(/usr/sbin/getlvodm -v $vg)
# Check to see if this volume group is already vary'd on
if lsvg -o | fgrep -s -x "$vg" ; then
# Note this and keep going. This could happen legitimately on a
# node up after a forced down.
# Find out if its vary'd on in concurrent mode
if [[ 0 = $(lqueryvg -g $VGID -C) ]] ; then
# No, its not. Now, find out if its defined as concurrent capable
if [[ 0 = $(lqueryvg -g $VGID -X) ]] ; then
# We get here in the case where the volume group is
# vary'd on, but not in concurrent mode, and is not
# concurrent capable. This would be the case for a SCSI
# RAID disk used in concurrent mode.
if ! cl_raid_vg $vg ; then
# This volume group is not made up of known RAID devices
cl_log 485 "$PROGNAME: Failed concurrent varyon of $vg\n\
because it is not made up of known RAID devices." $PROGNAME $vg
STATUS=1
fi
continue
else
# For some obscure reason, the volume group that
# we want to vary on in concurrent mode is
# already vary'd on, in non-concurrent mode.
cl_echo 200 "$PROGNAME: Volume Group "$vg" in non-concurrent mode." $PROGNAME $vg
# Try to recover by varying it off, to be vary'd on in
# concurrent mode below.
if [[ $EMULATE = 'REAL' ]] ; then
if ! varyoffvg $vg
then
# Unable to vary off the volume group - probably because
# its in use. Note error and keep going
cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg
STATUS=1
continue
fi
else
cl_echo 3020 "NOTICE The following command was not executed \n"
echo "varyoffvg $vg"
fi
# At this point, the volume group was vary'd off. The
# flow takes over below, and vary's on the volume group
# in concurrent mode.
fi
else
# Since the volume group is already vary'd on in
# concurrent mode, there is really nothing more to do
# with it. Go on to the next one.
continue
fi
fi
# Find out whether LVM thinks this volume group is concurrent
# capable. Note that since the volume group is not vary'd on at this
# point in time, we have to look directly at the VGDA on the
# hdisks in the volume group.
export MODE
for HDISK in $(/usr/sbin/getlvodm -w $VGID | cut -d' ' -f2) ; do
# Check each of the hdisks for a valid mode value. Stop at the
# first one we find.
if MODE=$(lqueryvg -p $HDISK -X) ; then
break
fi
done
if [[ -z $MODE ]] ; then
# If we couldn't pull a valid mode indicator off of any disk in
# the volume group, there is no chance whatsoever that LVM
# will be able to vary it on. Give up on this one.
cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg
STATUS=1
elif [[ $MODE = "0" ]] ; then
# LVM thinks that this is not a concurrent capable
# volume group. This is the expected result if this is
# a RAID device treated as a concurrent device
# Check to make sure that this is a known RAID device
if cl_raid_vg $vg ; then
# If this is a known RAID device, attempt to vary it on
# with no reserve, to simulate concurrent mode
if ! convaryonvg $vg ; then
# It was not possible to vary on this volume
# group. Note error and keep going.
STATUS=1
fi
else
# This volume group is not made up of known RAID devices
cl_log 485 "$PROGNAME: Failed concurrent varyon of $vg\n\
because it is not made up of known RAID devices." $PROGNAME $vg
STATUS=1
fi
elif [[ $MODE = "32" ]] ; then
# LVM thinks that this volume group is defined as concurrent
# capable, for the group services based concurrent mode
# try to varyon in concurrent with appropriate sync option
if [[ $EMULATE = "REAL" ]] ; then
if ! varyonvg $SYNCFLAG -c $vg ; then
cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg
# note error and keep going
STATUS=1
fi
else
cl_echo 3020 "NOTICE The following command was not executed \n"
echo "varyonvg $SYNCFLAG -c $vg"
fi
else
# Anything else ("1" or "16", depending on the level of LVM)
# indicates that LVM thinks this volume group is
# defined as concurrent capable, for the covert channel based
# concurrent mode.
if cl_raid_vg $vg ; then
# SCSI attached RAID devices are reported as concurrent capable.
# If that is what we have here, try the appropriate varyon
if ! convaryonvg $vg ; then
# It was not possible to vary on this volume
# group. Note error and keep going.
STATUS=1
fi
else
# Its not a concurrent capable RAID device. The only remaining
# supported choice is covert channel based concurrent mode.
if [[ $EMULATE = "REAL" ]] ; then
if ! varyonvg $SYNCFLAG -c $vg ; then
cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg
# note error and keep going
STATUS=1
fi
else
cl_echo 3020 "NOTICE The following command was not executed \n"
echo "varyonvg $SYNCFLAG -c $vg"
fi
fi
fi
done
exit $STATUS
读了下该脚本就知道问题关键在于那块共享硬盘在这脚本中是不被承认为raid硬盘,结果返回1,那我们就对最后作简单的修改:
# add for 140 ha escrm
STATUS=0
exit $STATUS
希望能骗过HA。 注意两个节点同一脚本都要修改。
5。重新启动HA
非常令人高兴,HA启动成功,两边用lsvg -l sharevg ,能看到同样内容。