The Cluster Monitor provides a graphical view of the Production Server cluster or Available Server configuration based on event reports from the connection manager and available server environment (ASE) daemons. Use the Cluster Monitor to track service availability and connectivity among member systems. You can also use it to manage services and to start storage management applications.
The Cluster Monitor:
Displays the status of each member of an Available Server configuration or Production Server cluster.
Reports errors on the primary network interface, member system failures, ASE service failures, and hard and soft disk errors.
Displays the configuration of the Available Server configuration or Production Server cluster, including all ASEs, member systems and their services, storage devices, and network interfaces.
Displays the devices on a member system's private SCSI buses.
Displays the shared storage reserved by a service.
Starts, stops, restarts, and relocates services.
Launches the
asemgr
utility as an external
tool.
Launches
dxadvfs,
dxterm,
dxlsm,
cnxshow
(in a Production Server configuration),
and Performance Manager as external tools.
(Note that some of these tools
require the installation of specific product licenses and subsets.
See the
TruCluster Software Products
Software Installation
manual for additional information.)
The Cluster Monitor does not display the following:
Devices associated with a shared tape service.
Problems that occur with network interconnects other than the one defined as the primary network interconnect. (In a Production Server configuration, this is always the MEMORY CHANNEL interconnect.)
The Cluster Monitor displays an updated snapshot of the state of a Production Server cluster or Available Server configuration each time an event or change occurs.
Note
The Cluster Monitor groups all Production Server cluster members that are not in an ASE into a single pseudodomain named 9999. You can display the device for this domain, but not an ASE view.
To set up the Cluster Monitor, follow these steps:
Make sure all systems and devices are properly connected.
Make sure the base operating system subsets upon which the Cluster Monitor depends are installed (see the TruCluster Software Products Software Installation manual).
Make sure the Cluster Monitor subset (TCRCMS150)
is installed on every member.
On each member system, set up the
/.rhosts
file to
allow root access for the
rsh
command between any two members.
Be sure to use the systems' member names in the
/.rhosts
file.
Note
To maintain a secure network, modify the
ifaccess.conffile as explained in thecluster_map_create(8) reference page.
Make sure the
/etc/hosts
file on each system in the
cluster lists the Internet Protocol (IP) name and IP address and the MC (MEMORY CHANNEL)
IP name and IP address of every member system and the
cluster_cnx
IP name and "10.0.0.42" IP address.
Note
The IP name and IP address must be manually inserted in each cluster member's
/etc/hostsfile. Each member's own MC IP name and address and that of thecluster_cnxare entered automagically during the install. The other member's MC IP names and addresses must also be entered manually in each member's/etc/hostsfile.
If you have not already done so, configure each available server environment
(ASE) by running the
asemgr
utility on one member in each
ASE and entering the complete member list for that ASE.
(See
Chapter 2
for instructions.)
Check that all members are up by running the
asemgr
utility in each ASE and displaying member status.
(See
Chapter 2
for more information.)
Create the cluster configuration map on one cluster member (see Section 11.1.1).
After the cluster configuration
map is successfully created, the value of the
CMS_CONF
variable in the
/etc/rc.config
file is set to
on
and the
tractd
and
submon
daemons are automatically started on all cluster members.
(These daemons also
automatically start on subsequent reboots.)
To enable remote ASE functions between ASEs in a cluster, you must make
each cluster member an authorized host; you can use the
xhosts
command or add all cluster members to the X access list using the Host Manager
utility.
Be sure to use the MEMORY CHANNEL IP name for each cluster member.
Invoke the Cluster Monitor from the command line to verify the cluster installation and configuration (see Section 11.1.2). The output of the monitor can be directed to any display device that is compatible with the X Windows protocol.
You must create a cluster configuration map:
After the TruCluster software is installed on all member systems and before using the Cluster Monitor for the first time
Whenever you add a new member system or delete an existing member system
Whenever the hardware configuration changes
The cluster configuration map contains a record of the entire hardware
configuration, including systems, interconnects, and devices.
A copy of the
cluster configuration map resides on each member.
The cluster configuration
map file,
/etc/CCM, must be identical and up to date on
each member system so that the Cluster Monitor can properly display configuration
information.
To update or re-create the cluster configuration map,
select a single cluster member on which to run the
cluster_map_create
utility.
You can use the following flags:
Use the
-full
flag to force each
cluster member to reconstruct its current cluster configuration map by invoking
the SCSI CAM utility (scu) to derive its SCSI bus and
device configurations.
When the
cluster_map_create
utility
is used with the
-full
flag, it includes in the
map only those members and devices it can access.
If it cannot access a
component, it omits it from the map.
Use the
-full
flag only when you are certain that all member systems are up and operational.
Use the
cnxshow
utility to determine the status of cluster
members.
Use the
-append
flag to force each
cluster member to append new hardware components to its current cluster configuration
map.
Use this flag to add hardware components to an existing cluster configuration
map when a full rebuild of the map would cause hardware that is temporarily
down and unavailable to be removed from the map.
If a cluster configuration map file (/etc/CCM) does
not already exist in the cluster, the
cluster_map_create
utility creates it, distributes a copy to each cluster member, and starts
the
submon
process and trigger-action daemon (tractd) on each cluster member.
If a cluster configuration map already exists, the utility does nothing.
However, when the
-full
flag is specified, the
cluster_map_create
utility always rebuilds and redistributes the
cluster configuration map.
(Note that the
cluster_map_create
utility only starts the
submon
process and
tractd
daemon if they are not currently running.)
You must perform the following tasks before entering the
cluster_map_create
command:
You must configure all cluster members, ASEs, and shared storage in the cluster.
You must add the names of all members' cluster interconnect
interfaces to each member's
/.rhosts
file.
This enables
the
cluster_map_create
utility root access to all cluster
members from any member.
(To protect your system against distributed security
attacks, remove these names from the
.rhosts
files after
the
cluster_map_create
utility completes.
However, if
you do so, you will not be able to run external tools, such as the
asemgr
utility, on individual members by dragging and dropping
a tool icon on a member icon.)
When the
cluster_map_create
utility completes successfully,
it displays a series of messages.
The utility displays one or more dots ( . ) to indicate that work is in progress.
The length of
time required for the
cluster_map_create
utility to complete
depends on the number of shared devices in the cluster.
In the following example,
the
cluster_map_create
utility completed successfully on
a two-system cluster:
# /usr/sbin/cluster_map_create cluster1 -full Members running are ( clu13, clu14 ) Doing device table scans ... Doing symmetry checks ... Processing map input file Calling makeclmap to create /etc/CCM Distributing cluster map to all members Processing member clu13 Processing member clu14 Successful cluster map creation and distribution.
The
cluster_map_create
utility collects, checks, and merges the configuration information
into the cluster configuration map file,
/etc/CCM, which
is distributed to the kernels of all member systems.
If configuration errors
(such as missing devices or asymmetry) are discovered or if any cluster members
go down (with a return status of DOWN) during this process, the
cluster_map_create
utility displays error messages in the terminal
window from which you invoked it.
Appendix A
lists
the error messages generated by the
cluster_map_create
utility.
If the cluster configuration map is not created, see Section 11.2.1 for troubleshooting information.
For more information on this utility, see
cluster_map_create(8).
The syntax of the
/etc/CCM
file is described in
CCM(5).
Before starting the Cluster Monitor from a remote client display, make sure you have set up security to allow the member that will be running the Cluster Monitor to access the client display.
Note
You must have root privilege to configure clients and servers using this application.
To start the Cluster Monitor from the Common Desktop Environment (CDE), follow these steps:
Click on the Application Manager icon on the CDE front panel.
Double click on the System Admin application group icon.
Double click on the TruCluster Tools application group icon.
Double click on the Cluster Monitor application icon.
To start the Cluster Monitor from the command line, follow these steps:
Log in to the member system as root.
If you logged in to the member system from a remote workstation,
set the
DISPLAY
variable to reference the remote workstation.
Run the
cmon
program by entering the following
command:
# nohup /usr/bin/X11/cmon &
The
nohup
utility allows you to log out of the member system without exiting the Cluster
Monitor.
After you log out, the Cluster Monitor continues to be displayed
on the remote workstation.
To use
the high availability feature of the Cluster Monitor, you must set up a shared
Network File System (NFS) area, as described in
Section 11.3,
and invoke the
cmon
command with the
-ha
flag.
See
cmon(8)
for information about invoking the Cluster Monitor from the
command line, including its command-line options.
See
X(1X)
for details on
the X toolkit command-line options that you can use with the Cluster Monitor.
For information about using the Cluster Monitor, see the online help.
This section provides troubleshooting information for the following situations:
Cluster configuration map not created (Section 11.2.1)
Cluster monitor does not start (Section 11.2.2)
Cluster monitor does not show a member (Section 11.2.3)
Errors reported by the
cluster_map_create
utility (Section 11.2.4)
Errors reported by the
cmon
utility (Section 11.2.5)
Other problems (Section 11.2.6)
If the cluster configuration map cannot be created, the following are likely reasons:
An incorrect
cluster_map_create
command
was issued.
The
cluster_map_create
utility does not
recognize a cluster member or is experiencing difficulty when communicating
with a cluster member.
There is an inconsistent SCSI bus configuration within the cluster.
If the cluster configuration map is not created when you run the
cluster_map_create
utility, check the following:
Verify that all shared SCSI buses and devices on the shared buses are symmetrically configured on all members to which they are connected (see the TruCluster Software Products Software Installation manual).
Use the
ping
command to determine whether all ASE
members are reachable from one another over the MEMORY CHANNEL interconnect.
Ensure that each member system's
/.rhosts
file contains
an entry for each member system (using the MEMORY CHANNEL IP name).
Run the
cnxshow
utility and ensure that each member
system is recognized and running in the cluster.
If a member system does
not display in the
cnxshow
utility:
Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view the MEMORY CHANNEL
kernel attributes (see
Appendix B).
Ensure that all member systems are recognized and running
by using the
asemgr
utility.
If a member system does not
display in the
asemgr
utility, then add it.
See
Appendix A
for specific errors generated
by the
cluster_map_create
utility.
If you installed the Cluster Monitor subset, and the cluster configuration
map has been successfully created, use the
ps
command with
the
ag
options to verify that the
tractd
and
submon
daemons are running on each cluster member.
If the daemons are not running, check that the value of the
CMS_CONF
variable in the
/etc/rc.config
file
is set to
on.
Then, manually start the daemons on all systems
in the cluster.
Start the
tractd
daemon first, then start
the
submon
daemon.
The
tractd
daemon
must perform some extra initialization tasks when started for the first time
in a cluster.
To allow this initialization to be completed, wait a few seconds
before starting the
submon
daemon on the first system.
If the
tractd
and
submon
daemons
are running, inspect the
daemon.log
file and analyze messages
from the
cmon
utility to determine the cause of the problem.
If the system warns you that it is unable to open the display, use the
setenv DISPLAY
command to set the display to your workstation before
entering the
cmon
command.
If the Cluster Monitor does not show a cluster member system in its
display, run the
cnxshow
utility on the system that is
not shown as a member, and compare the output with that produced by running
the
cnxshow
utility on another cluster member.
If there
is an inconsistency in the output, reboot the members that are inconsistent.
If the output from the
cnxshow
utility is consistent
on all cluster members, re-create the cluster configuration map (using the
-full
option) and restart the Cluster Monitor.
For information on
re-creating the cluster configuration map, see
Section 11.1.1.
The following errors are reported by
cluster_map_create
utility:
You entered the
cluster_map_create
command
with the
-append
option, but no cluster configuration map
file (/etc/CCM) exists to which to append.
Reenter the
cluster_map_create
command with the
-full
option
(without the
-append
option) to create a cluster configuration
map file.
The
cluster_map_create
utility has detected
that there are no other member systems in the cluster.
Member systems are
either incorrectly configured for the Production Server or the cluster communications subsystem
is not operating properly.
To correct this problem, follow these steps:
Run the
cnxshow
utility and ensure that
each member system is recognized and running in the cluster.
If a member
system is not displayed by the
cnxshow
utility:
Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view
the MEMORY CHANNEL kernel attributes (see
Appendix B).
Run the
asemgr
utility and ensure that
the member system is recognized and running.
If a member system is not recognized
by the
asemgr
utility, add it.
Reenter the
cluster_map_create
command.
The
cluster_map_create
utility could not
contact a member system.
The member system is either incorrectly configured
for the Production Server or the cluster communication subsystem is not operating properly.
To correct this problem, follow these steps:
Ensure that each member system's
/.rhosts
file contains an entry for each member system (using the MEMORY CHANNEL Internet
Protocol (IP) name).
Run the
cnxshow
utility and ensure that
each member system is recognized and running in the cluster.
If a member
system is not displayed by the
cnxshow
utility:
Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view
the MEMORY CHANNEL kernel attributes (see
Appendix B).
Run the
asemgr
utility and ensure that
the member system is recognized and running.
If a member system is not recognized
by the
asemgr
utility, add it.
Reenter the
cluster_map_create
command.
The
cluster_map_create
utility has found
an inconsistency in the number of SCSI buses between member systems.
To correct
this problem, follow these steps:
Run the
scu
utility and ensure that the
same number of buses are configured between member systems (see the TruCluster
Software Products
Software Installation
manual).
Reenter the
cluster_map_create
command.
You entered the
cluster_map_create
command
with the
-append
option, but the
cluster_map_create
command cannot append to the original cluster configuration map
file (/etc/CCM).
Reenter the
cluster_map_create
command without the
-append
option.
The
cluster_map_create
utility could not
read the contents of an input file it created to generate the cluster configuration
map.
To correct this problem, follow these steps:
Ensure that each member system's
/.rhosts
file contains an entry for each member system (using the MEMORY CHANNEL IP name).
Run the
scu
utility to ensure that the
number of SCSI bus and shared SCI device identifiers are consistent between
member systems (see the TruCluster Software Products
Software Installation
manual).
Reenter the
cluster_map_create
command.
You entered the
cluster_map_create
command
with the
-append
option, but no cluster configuration map
file exists to which to append.
Reenter the
cluster_map_create
command without the
-append
command to create the cluster
configuration map file (/etc/CCM).
A cluster configuration map file (/etc/CCM)
has been created for the cluster; however, the
cluster_map_create
utility cannot distribute it to the member system identified in
the message.
The member system is either incorrectly configured for Production Server or
the cluster communication subsystem is not operating properly.
To correct
this problem, follow these steps:
Ensure that each member system's
/.rhosts
file contains an entry for each member system (using the MEMORY CHANNEL IP name).
Run the
cnxshow
utility and ensure that
each member system is recognized and running in the cluster.
If a member
system is not displayed by the
cnxshow
utility:
Ensure that you have correctly configured the MEMORY CHANNEL interconnect in a Production Server cluster. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view
the MEMORY CHANNEL kernel attributes (see
Appendix B).
Run the
asemgr
utility and ensure that
the member system is recognized and running.
If a member system is not recognized
by the
asemgr
utility, add it.
Reenter the
cluster_map_create
command.
The member system identified in the message did not receive a copy of the cluster configuration map. The member system is either incorrectly configured, or the cluster communication subsystem is not operating properly. To correct this problem, follow these steps:
Ensure that each member system's
/.rhosts
file contains an entry for each member system.
In a Production Server cluster, run the
cnxshow
utility and ensure that each member system is recognized and running in the
cluster.
If a member system is not displayed by the
cnxshow
utility:
Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view
the MEMORY CHANNEL kernel attributes (see
Appendix B).
Run the
asemgr
utility and ensure that
the member system is recognized and running.
If a member system is not recognized
by the
asemgr
utility, add it.
Reenter the
cluster_map_create
command.
The
cluster_map_create
utility could not
start the
tractd
daemon and
submon
process
on cluster members; therefore, it could not distribute the cluster configuration
map.
Member systems are either incorrectly configured for Production Server or the cluster
communication subsystem is not operating properly.
To correct this problem,
follow these steps:
Ensure that each member system's
/.rhosts
file contains an entry for each member system (using the MEMORY CHANNEL IP name).
In a Production Server cluster, run the
cnxshow
utility and ensure that each member system is recognized and running in the
cluster.
If a member system is not displayed by the
cnxshow
utility:
In a Production Server cluster, ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.
Enter the
sysconfig -q rm
command and view
the MEMORY CHANNEL kernel attributes (see
Appendix B).
Run the
asemgr
utility and ensure that
the member system is recognized and running.
If a member system is not recognized
by the
asemgr
utility, add it.
Reenter the
cluster_map_create
command.
One of the base operating system reference pages subsets required by the Cluster Monitor online help is not installed. Check the dependencies listed in the TruCluster Software Products Software Installation manual.
If the X color map is fully allocated or becomes fully allocated when you invoke the Cluster Monitor, the Cluster Monitor may use other colors in place of those it cannot allocate. Typically this happens when you are using other color-intensive or graphic-intensive applications at the same time as you are using the Cluster Monitor. Many of these applications have mechanisms that allow you to run them with a reduced color map. This may allow you to run the Cluster Monitor with its full color palette.
If you are in the middle of an available server environment (ASE) operation, such as a service relocation, the Cluster Monitor may temporarily display an erroneous message. If there is no problem in the ASE, the message will not appear in the display for the next reporting cycle (approximately 20 seconds).
You can set up a Network File System (NFS) service to make the Cluster Monitor
graphical user interface highly available.
Invoke the Cluster Monitor by
using the
cmon
-ha
command, which
makes the Cluster Monitor highly available.
You must also run the Cluster
Monitor on a member system with its display set to a workstation or another
member system.
If a member system running a highly available Cluster Monitor fails, the Cluster Monitor is restarted on the member system currently running the NFS service. The Cluster Monitor will display on the same workstation as before the failure.
If the member system running the NFS service fails, the Cluster Monitor will also restart, but there will be a short delay while the service relocates to another member. After the service relocates, it restarts the Cluster Monitor on the local system.
After the service relocates, there is a short delay while the NFS locking mechanism restarts. The Cluster Monitor acquires a lock on the run list file, adds an entry so that it is registered in case of another failure, and restarts its display. To minimize the effect of the NFS locking delay, you can set up the Automatic Service Placement (ASP) policy for the NFS service so that the service runs only on members that rarely fail.
To set up a highly available Cluster Monitor, follow these steps:
Set up the Cluster Monitor as described in the previous sections.
Determine the name of the NFS service, for example
cmonha, and assign an Internet Protocol (IP) address to it, specifying
the service name and address in the
/etc/hosts
file on
all member systems.
See
Chapter 4
for detailed information
about preparing disks for a service.
Use the
asemgr
utility to set up the NFS
service.
When you set up the NFS service, you must specify:
The name of the service, which is also an IP host name
The shared disk specification for the
/var/cmon
directory
The
/var/cmon
directory mount point
Root write permission for the
/var/cmon
mount point
The service's ASP (b
option for Balance
Services)
After you add the service, it is started on a member system.
Using the
asemgr
utility, modify the ASE
exports file for the Cluster Monitor service so that the phrase
-root=0
is at the end of the exports line.
This preserves root
ID mapping for all clients of the service.
Because the member systems will act as both clients and servers
of the NFS service, you must run
nfssetup
on each member
system and NFS-mount
/var/cmon
from the NFS service using
its IP host name.
On one member system, create a
/var/cmon/run
directory.
NFS exports this directory on the service's shared disk to each
member system.
Use the
asemgr
utility to modify the NFS
service and set up user-defined start and stop action scripts.
See
Chapter 4
for information about action scripts.
See
Chapter 10
for information about modifying services.
Add the following commands at the
bottom of the user-defined start action script, before the
exit
command:
if [ ! -f /usr/sbin/cmonsvc ]
then
exit 2
fi
/usr/sbin/cmonsvc
The
cmonsvc
daemon uses the
host status monitor (HSM) daemon to monitor the member systems in the available
server environment (ASE).
If a member fails, the
cmonsvc
daemon checks the
/var/cmon/run/runList
file to determine
if the member had been running the
cmon -ha
command.
If so, the Cluster Monitor is restarted on the member that is running the
NFS service.
Add the following commands at the bottom of the user-defined
stop action script, before the
exit
command:
if [ ! -f /usr/sbin/sendTrig ]
then
exit 2
fi
/usr/sbin/sendTrig -e cmonsvcStop
The
sendTrig
program sends a
cmonsvcStop
trigger message to the
cmonsvc
daemon on the member system running the service.
This stops
the service.
Invoke the Cluster Monitor from a remote system as described
in
Section 11.1.2, but use the following
cmon
command:
# nohup /usr/bin/X11/cmon -ha -d hostname:0.0 &
The
-d
option is not needed if the
DISPLAY
environment
variable is set on the member system.
When the Cluster Monitor is invoked
with the previous command line, it writes a line to the
/var/cmon/run/runList
file.
The line contains the name of the member system, the
cmon
command's process ID, and the name of the system that is displaying
the Cluster Monitor.
That line is then used to restart the Cluster Monitor,
if necessary.