"Super Simple" overview HPUX Service Guard (Cluster).

Date : 11/01/2014
Version: 0.3
Remarks: Very simplified overview of the main HPUX Service Guard commands.



1. Main configuration files:

  1. /etc/cmcluster.conf - contains binary & library paths, and path to main "rc startup" file
  2. /etc/cmcluster/cmclnodelist – Contains the list of nodes in the cluster
  3. /etc/cmcluster/cluster_config.ascii - cluster configuration file. Edit, then compile it. This script activates VG's.
  4. /etc/cmcluster/package_name/package_config.ascii - package configuration file. Edit, then compile it.
  5. /etc/cmcluster/package_name/package.cntl - package control script.
  6. /etc/cmcluster/package_name/pkg_control_script.log - package control script log

In "/etc/cmcluster/package_name/" you will also usually find the package "stop" and "start" scripts.

=> A few Examples of config files:

# cat /etc/cmcluster.conf

SGCONF=/etc/cmcluster
SGSBIN=/usr/sbin
SGLBIN=/usr/lbin
SGLIB=/usr/lib
SGRUN=/var/adm/cmcluster
SGAUTOSTART=/etc/rc.config.d/cmcluster
SGFFLOC=/opt/cmcluster/cmff
CMSNMPD_LOG_FILE=/var/adm/SGsnmpsuba.log

# cat /etc/rc.config.d/cmcluster

AUTOSTART_CMCLD=1
NODE_TOC_BEHAVIOR="reboot"

# cat /etc/lvmrc

AUTO_VG_ACTIVATE=0
RESYNC="SERIAL"

{
and vg "sync" activation routines
}

# cat /etc/cmcluster/cmclnodelist

pri-node root
pri-node.company.com root
sec-node root
sec-node.company.com root

# cat /etc/cmcluster/cluster.ascii

# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
# ***** For complete details about cluster parameters and how to *******
# ***** set them, consult the Serviceguard manual. *********************
# **********************************************************************

CLUSTER_NAME MYCLUSTER

QS_HOST 162.17.16.3
QS_POLLING_INTERVAL 120000000
QS_TIMEOUT_EXTENSION 2000000

NODE_NAME pri-node
NETWORK_INTERFACE lan0
STATIONARY_IP 162.17.16.2
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan2
NETWORK_INTERFACE lan3
HEARTBEAT_IP 10.10.120.1

NODE_NAME sec-node
NETWORK_INTERFACE lan0
STATIONARY_IP 162.17.16.82
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan2
NETWORK_INTERFACE lan3
HEARTBEAT_IP 10.10.120.2

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 6000000

other parameters not listed...

VOLUME_GROUP /dev/cluvg01
VOLUME_GROUP /dev/cluvg02

# cat /etc/cmcluster/package_name/package.conf

# **********************************************************************
# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******
# **********************************************************************
# ******* Note: This file MUST be edited before it can be used. ********
# * For complete details about package parameters and how to set them, *
# * consult the Serviceguard manual.
# **********************************************************************

PACKAGE_NAME mypackage
PACKAGE_TYPE FAILOVER

NODE_NAME pri-node
NODE_NAME sec-node

AUTO_RUN YES
NODE_FAIL_FAST_ENABLED NO

RUN_SCRIPT /etc/cmcluster/mypackage/start_my_package.sh
HALT_SCRIPT /etc/cmcluster/mypackage/stop_my_package.sh
RUN_SCRIPT_TIMEOUT NO_TIMEOUT
HALT_SCRIPT_TIMEOUT NO_TIMEOUT
other parameters not listed...

After editing those conf files, check and compile them:

# cmcheckconf -C /etc/cmcluster/cluster.ascii # check cluster conf file
# cmapplyconf -C /etc/cmcluster/cluster.ascii # apply cluster conf file

# cmcheckconf –P /etc/cmcluster/package_name/package.conf # check package conf file
# cmapplyconf -P /etc/cmcluster/package_name/package.conf # apply package conf file


2. Main Service Guard commands:

=> Viewing cluster and package status:

# cmviewcl -v

Shows you the detailed status of the cluster, nodes, packages and services.

=> start the cluster:

# cmruncl -v # start entire cluster
# cmruncl -v -n nodename # if only one node is available

=> start cluster on one node:

# cmrunnode -v
# cmrunnode -v othernode # start a single node

This command will start the specified node to join an already running cluster.

=> Running a package

# cmrunpkg [ -n nodename ] packagename

This will run the package on the current node or on the node specified.
Logs will be written in /etc/cmcluster//.log.

=> halt the cluster:

# cmhaltcl -v
# cmhaltcl -f

This will force the packages to halt and after that it halts ServiceGuard operations on all nodes
which are currently running in the cluster,

=> stop cluster on one node:

# cmhaltnode -v
# cmhaltnode -v othernode

This command will halt ServiceGuard operations on the specified node. If any packages are running
on that node, the node will not be halted.

# cmhaltnode -f nodename

Force the node to halt even if there are packages or group members running on it.

=> Halting a package:

# cmhaltpkg packagename

This will halt the package, Logs will be written in /etc/cmcluster/packagename/.log.

=> enable or disable switching attributes for a cluster:

# cmmodpkg -e OR -d packagename

Enabling a package to run on a particular node.
After a package has failed on one node, that node is disabled. This means the package
will not be able to run on that node. The following command will enable the package to run on the specified node.

# cmmodpkg -e -n [nodename] packagename

=> Disabling a package from running on a particular node:

# cmmodpkg -d -n nodename packagename

This will command will disable the package to run on the specified node.


3. Failover: Move a Package:

A "package" is the unit to handle for Service Guard, like for example with a failover operation.

The package has a name to identify it, a "virtual IP address", which can be owned by one of the nodes,
and in DNS the package name is registered with it's "virtual IP", so that clients can always access the application,
no matter on which node the package runs on.

So, suppose we have the nodes "black" and "white", and the package "pkg1".
On both nodes, configuration files are present, like those examples shown in section 1.

So, in configuration files, the associated Volume Group(s) are listed, the package name, the nodes, timing variables,
and "start" and "stop" scripts to start and stop the application which is associated with this package.

So, suppose the package currently runs on "white". Now, let's do a failover to "black":

-- Step 1. Halt the package at "white".
-- You halt a Serviceguard package when you want to stop the package, butyou want the node to continue running in the cluster.
-- Then you must manually start it at the "black".

# cmhaltpkg pkg1

-- Step 2. Start the package at "black".
-- After starting the package using "cmrunpkg", you then must also enable package switching.
-- This is necessary when a package has previously been halted on some node, since halting the package, disables switching.

# cmrunpkg -n black pkg1
# cmmodpkg -e pkg1

However, in some cases, the upper sequence is not enough. At many sites, the correct Volume Group operations has not been
implemented in the package stop and start scripts.
So, in our example we might be forced to peform the following extra steps:

on white: "vgchange -a n vgname" after step 1, halting the package on "white"
on black: "vgchange -a y vgname" before step 2, starting the package on "black"

However, many HP articles say that Service Guard expects a VG to be activated in "exclusive mode". In this case,
the appropriate command would be:

# vgchange -a e vgname

It depends a bit on how the VG was created. All disks in the VG have LVM "metadata", which include volume group activation mode bits.
The most general ones are:

- 00=standard activation mode (-a y). This default setting is normal for a VG in a non-clustered setting.
- 01=exclusive activation mode (-a e). This is the value that Serviceguard usually uses for operation.

So the "-a e" activation mode is the correct one. Nevertheless, at many sites the default "-a y" is used.

See also section 5.


4. Serviceguard daemons:

The main Cluster Management Daemon is a process called "cmcld".
One of its main duties is to send and receive heartbeat packets across all designated heartbeat networks.
Other tasks involves management of packages, node memberships, coordinating other cluster daemons etc..

It is up to Serviceguard to activate a cluster Volume Group on a node that needs access to the data. In order to disable volume
group activation at boot time, we need to modify the startup script "/etc/lvmrc".
The first part of this process is as follows:

AUTO_VG_ACTIVATE=1
Changed to …
AUTO_VG_ACTIVATE=0

Make sure every node has the "/etc/cmcluster/cmclnodelist" file in place.

Here are the OS MC ServiceGurard Components:
  1. /usr/lbin/cmclconfd --ServiceGuard Configuration Daemon (gathers cluster info ie network and vol grp info started in /etc/inetd.conf)
  2. /usr/lbin/cmcld --ServiceGuard Cluster Daemon (determines cluster membership. Package Mgr, Cluster Mgr, and Network Mgr run as parts of cmcld.)
  3. /usr/lbin/cmlogd --ServiceGuard Syslog Log Daemon (used by cmcld to write syslog messages.)
  4. /usr/lbin/cmlvmd --Cluster Logical Volume Manager Daemon (keeps track of Volume group info.)
  5. /usr/lbin/cmomd --Cluster Object Manager Daemon - logs to /var/opt/cmom/cmomd.log (provides info to client about the cluster. /etc/inetd.conf.)
  6. /usr/lbin/cmsnmpd --Cluster SNMP subagent (optionally running) (produces MIB for snmp)
  7. /usr/lbin/cmsrvassistd --ServiceGuard Service Assistant Daemon (fork and exec scripts for the cluster.)
  8. /usr/lbin/cmtaped --ServiceGuard Shared Tape Daemon (keeps track of shard tape devices.)
Each of these daemons also logs to the /var/adm/syslog/syslog.log file.

Information about the starting and halting of each package is found in the package’s
control script log. This log provides the history of the operation of the package control script.
It is found at /etc/cmcluster/pkgname/pkgname.cntl.log or /etc/cmcluster/package_name/control_script.log.

You can also find in /var/adm/syslog/syslog.log which indicate what has occurred and whether or not
the package has halted or started.


5. VG operations with Service Guard:

=> Quick recipy for Adding a VG:
  1. Scan for new disks, if neccessary (ioscan, or reboot etc..)
  2. Create PV's
  3. Create new vgs & lvs
  4. Export vg to map file
  5. Import vg at failover node
  6. Deactivate vg (vgchange -a n vgname)
  7. Make vg cluster aware (vgchange -c y vgname )
  8. Active vg exclusively (vgchange -a e vgname )
  9. Mount new new lvs manually with mount command
  10. Take copy of /etc/cmcluster/pkg/pkg.cntl file
  11. Edit /etc/cmcluster/pkg/pkg.cntl & add new vg, lv details
  12. Copy pkg control files to all failover nodes
=> VG operations:

-> Marking a Volume Group for Serviceguard:

Marking a VG for Serviceguard: # vgchange -c y VGName
Marking a VG as non-Serviceguard: # vgchange -c n VGName

The "vgchange -c y VGName" command marks a volume groups as part of a cluster.
The "vgchange -a n VGName" deactivates a VG in the usual way.

The "marking for SG" is applied automatically by the "cmapplyconf" command when the volume group
is listed in the cluster-wide ASCII file.

->VG Activation options:
  1. Standard Volume Groups Activation: # vgchange -a y VGName
  2. Standard Volume Groups Deactiviation: # vgchange -a n VGName
  3. Exclusive Volume Group Activation: # vgchange -a e VGName
  4. Exclusive Volume Group Deactivation: # vgchange -a n VGName
  5. Shared mode Volume Group activation: # vgchange -a s VGName