[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

11    Using the Cluster Monitor

The Cluster Monitor provides a graphical view of the Production Server cluster or Available Server configuration based on event reports from the connection manager and available server environment (ASE) daemons. Use the Cluster Monitor to track service availability and connectivity among member systems. You can also use it to manage services and to start storage management applications.

The Cluster Monitor:

The Cluster Monitor does not display the following:

The Cluster Monitor displays an updated snapshot of the state of a Production Server cluster or Available Server configuration each time an event or change occurs.

Note

The Cluster Monitor groups all Production Server cluster members that are not in an ASE into a single pseudodomain named 9999. You can display the device for this domain, but not an ASE view.


[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

11.1    Setting Up the Cluster Monitor

To set up the Cluster Monitor, follow these steps:

  1. Make sure all systems and devices are properly connected.

  2. Make sure the base operating system subsets upon which the Cluster Monitor depends are installed (see the TruCluster Software Products Software Installation manual).

  3. Make sure the Cluster Monitor subset (TCRCMS150) is installed on every member.

  4. On each member system, set up the /.rhosts file to allow root access for the rsh command between any two members. Be sure to use the systems' member names in the /.rhosts file.

    Note

    To maintain a secure network, modify the ifaccess.conf file as explained in the cluster_map_create(8) reference page.

  5. Make sure the /etc/hosts file on each system in the cluster lists the Internet Protocol (IP) name and IP address and the MC (MEMORY CHANNEL) IP name and IP address of every member system and the cluster_cnx IP name and "10.0.0.42" IP address.

    Note

    The IP name and IP address must be manually inserted in each cluster member's /etc/hosts file. Each member's own MC IP name and address and that of the cluster_cnx are entered automagically during the install. The other member's MC IP names and addresses must also be entered manually in each member's /etc/hosts file.

  6. If you have not already done so, configure each available server environment (ASE) by running the asemgr utility on one member in each ASE and entering the complete member list for that ASE. (See Chapter 2 for instructions.)

  7. Check that all members are up by running the asemgr utility in each ASE and displaying member status. (See Chapter 2 for more information.)

  8. Create the cluster configuration map on one cluster member (see Section 11.1.1).

    After the cluster configuration map is successfully created, the value of the CMS_CONF variable in the /etc/rc.config file is set to on and the tractd and submon daemons are automatically started on all cluster members. (These daemons also automatically start on subsequent reboots.)

  9. To enable remote ASE functions between ASEs in a cluster, you must make each cluster member an authorized host; you can use the xhosts command or add all cluster members to the X access list using the Host Manager utility. Be sure to use the MEMORY CHANNEL IP name for each cluster member.

  10. Invoke the Cluster Monitor from the command line to verify the cluster installation and configuration (see Section 11.1.2). The output of the monitor can be directed to any display device that is compatible with the X Windows protocol.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.1.1    Creating the Cluster Configuration Map

You must create a cluster configuration map:

The cluster configuration map contains a record of the entire hardware configuration, including systems, interconnects, and devices. A copy of the cluster configuration map resides on each member. The cluster configuration map file, /etc/CCM, must be identical and up to date on each member system so that the Cluster Monitor can properly display configuration information.

To update or re-create the cluster configuration map, select a single cluster member on which to run the cluster_map_create utility. You can use the following flags:

If a cluster configuration map file (/etc/CCM) does not already exist in the cluster, the cluster_map_create utility creates it, distributes a copy to each cluster member, and starts the submon process and trigger-action daemon (tractd) on each cluster member.

If a cluster configuration map already exists, the utility does nothing. However, when the -full flag is specified, the cluster_map_create utility always rebuilds and redistributes the cluster configuration map. (Note that the cluster_map_create utility only starts the submon process and tractd daemon if they are not currently running.)

You must perform the following tasks before entering the cluster_map_create command:

When the cluster_map_create utility completes successfully, it displays a series of messages. The utility displays one or more dots ( . ) to indicate that work is in progress. The length of time required for the cluster_map_create utility to complete depends on the number of shared devices in the cluster. In the following example, the cluster_map_create utility completed successfully on a two-system cluster:

# /usr/sbin/cluster_map_create cluster1 -full
Members running are ( clu13, clu14 )
Doing device table scans
...
Doing symmetry checks
...
Processing map input file
Calling makeclmap to create /etc/CCM
Distributing cluster map to all members
Processing member clu13
Processing member clu14
Successful cluster map creation and distribution.

The cluster_map_create utility collects, checks, and merges the configuration information into the cluster configuration map file, /etc/CCM, which is distributed to the kernels of all member systems. If configuration errors (such as missing devices or asymmetry) are discovered or if any cluster members go down (with a return status of DOWN) during this process, the cluster_map_create utility displays error messages in the terminal window from which you invoked it. Appendix A lists the error messages generated by the cluster_map_create utility.

If the cluster configuration map is not created, see Section 11.2.1 for troubleshooting information.

For more information on this utility, see cluster_map_create(8). The syntax of the /etc/CCM file is described in CCM(5).


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.1.2    Starting the Cluster Monitor

Before starting the Cluster Monitor from a remote client display, make sure you have set up security to allow the member that will be running the Cluster Monitor to access the client display.

Note

You must have root privilege to configure clients and servers using this application.

To start the Cluster Monitor from the Common Desktop Environment (CDE), follow these steps:

  1. Click on the Application Manager icon on the CDE front panel.

  2. Double click on the System Admin application group icon.

  3. Double click on the TruCluster Tools application group icon.

  4. Double click on the Cluster Monitor application icon.

To start the Cluster Monitor from the command line, follow these steps:

  1. Log in to the member system as root.

  2. If you logged in to the member system from a remote workstation, set the DISPLAY variable to reference the remote workstation.

  3. Run the cmon program by entering the following command:

    # nohup /usr/bin/X11/cmon &
    

    The nohup utility allows you to log out of the member system without exiting the Cluster Monitor. After you log out, the Cluster Monitor continues to be displayed on the remote workstation. To use the high availability feature of the Cluster Monitor, you must set up a shared Network File System (NFS) area, as described in Section 11.3, and invoke the cmon command with the -ha flag.

See cmon(8) for information about invoking the Cluster Monitor from the command line, including its command-line options. See X(1X) for details on the X toolkit command-line options that you can use with the Cluster Monitor. For information about using the Cluster Monitor, see the online help.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2    Troubleshooting the Cluster Monitor

This section provides troubleshooting information for the following situations:


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.1    Cluster Configuration Map Not Created

If the cluster configuration map cannot be created, the following are likely reasons:

If the cluster configuration map is not created when you run the cluster_map_create utility, check the following:

See Appendix A for specific errors generated by the cluster_map_create utility.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.2    Cluster Monitor Does Not Start

If you installed the Cluster Monitor subset, and the cluster configuration map has been successfully created, use the ps command with the ag options to verify that the tractd and submon daemons are running on each cluster member.

If the daemons are not running, check that the value of the CMS_CONF variable in the /etc/rc.config file is set to on. Then, manually start the daemons on all systems in the cluster. Start the tractd daemon first, then start the submon daemon. The tractd daemon must perform some extra initialization tasks when started for the first time in a cluster. To allow this initialization to be completed, wait a few seconds before starting the submon daemon on the first system.

If the tractd and submon daemons are running, inspect the daemon.log file and analyze messages from the cmon utility to determine the cause of the problem.

If the system warns you that it is unable to open the display, use the setenv DISPLAY command to set the display to your workstation before entering the cmon command.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.3    Cluster Monitor Does Not Show a Member

If the Cluster Monitor does not show a cluster member system in its display, run the cnxshow utility on the system that is not shown as a member, and compare the output with that produced by running the cnxshow utility on another cluster member. If there is an inconsistency in the output, reboot the members that are inconsistent.

If the output from the cnxshow utility is consistent on all cluster members, re-create the cluster configuration map (using the -full option) and restart the Cluster Monitor. For information on re-creating the cluster configuration map, see Section 11.1.1.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.4    Errors Reported by the cluster_map_create Utility

The following errors are reported by cluster_map_create utility:

No previous cluster map.

You entered the cluster_map_create command with the -append option, but no cluster configuration map file (/etc/CCM) exists to which to append. Reenter the cluster_map_create command with the -full option (without the -append option) to create a cluster configuration map file.

No members found!

The cluster_map_create utility has detected that there are no other member systems in the cluster. Member systems are either incorrectly configured for the Production Server or the cluster communications subsystem is not operating properly. To correct this problem, follow these steps:

  1. Run the cnxshow utility and ensure that each member system is recognized and running in the cluster. If a member system is not displayed by the cnxshow utility:

    1. Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.

    2. Enter the sysconfig -q rm command and view the MEMORY CHANNEL kernel attributes (see Appendix B).

  2. Run the asemgr utility and ensure that the member system is recognized and running. If a member system is not recognized by the asemgr utility, add it.

  3. Reenter the cluster_map_create command.

Cluster or ASE member hostname is either unreachable or improperly configured.

The cluster_map_create utility could not contact a member system. The member system is either incorrectly configured for the Production Server or the cluster communication subsystem is not operating properly. To correct this problem, follow these steps:

  1. Ensure that each member system's /.rhosts file contains an entry for each member system (using the MEMORY CHANNEL Internet Protocol (IP) name).

  2. Run the cnxshow utility and ensure that each member system is recognized and running in the cluster. If a member system is not displayed by the cnxshow utility:

    1. Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.

    2. Enter the sysconfig -q rm command and view the MEMORY CHANNEL kernel attributes (see Appendix B).

  3. Run the asemgr utility and ensure that the member system is recognized and running. If a member system is not recognized by the asemgr utility, add it.

  4. Reenter the cluster_map_create command.

Error: Asymmetric shared SCSIs in ASE ASE_ID between members hostname and hostname

The cluster_map_create utility has found an inconsistency in the number of SCSI buses between member systems. To correct this problem, follow these steps:

  1. Run the scu utility and ensure that the same number of buses are configured between member systems (see the TruCluster Software Products Software Installation manual).

  2. Reenter the cluster_map_create command.

Badly formatted filename file.

You entered the cluster_map_create command with the -append option, but the cluster_map_create command cannot append to the original cluster configuration map file (/etc/CCM). Reenter the cluster_map_create command without the -append option.

Error: failed to make the cluster map.

The cluster_map_create utility could not read the contents of an input file it created to generate the cluster configuration map. To correct this problem, follow these steps:

  1. Ensure that each member system's /.rhosts file contains an entry for each member system (using the MEMORY CHANNEL IP name).

  2. Run the scu utility to ensure that the number of SCSI bus and shared SCI device identifiers are consistent between member systems (see the TruCluster Software Products Software Installation manual).

  3. Reenter the cluster_map_create command.

Error: No saved map input file filename to append to.

You entered the cluster_map_create command with the -append option, but no cluster configuration map file exists to which to append. Reenter the cluster_map_create command without the -append command to create the cluster configuration map file (/etc/CCM).

Member hostname is unreachable. Failure to distribute new cluster map.

A cluster configuration map file (/etc/CCM) has been created for the cluster; however, the cluster_map_create utility cannot distribute it to the member system identified in the message. The member system is either incorrectly configured for Production Server or the cluster communication subsystem is not operating properly. To correct this problem, follow these steps:

  1. Ensure that each member system's /.rhosts file contains an entry for each member system (using the MEMORY CHANNEL IP name).

  2. Run the cnxshow utility and ensure that each member system is recognized and running in the cluster. If a member system is not displayed by the cnxshow utility:

    1. Ensure that you have correctly configured the MEMORY CHANNEL interconnect in a Production Server cluster. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.

    2. Enter the sysconfig -q rm command and view the MEMORY CHANNEL kernel attributes (see Appendix B).

  3. Run the asemgr utility and ensure that the member system is recognized and running. If a member system is not recognized by the asemgr utility, add it.

  4. Reenter the cluster_map_create command.

Failure to load the new cluster map on member hostname.

The member system identified in the message did not receive a copy of the cluster configuration map. The member system is either incorrectly configured, or the cluster communication subsystem is not operating properly. To correct this problem, follow these steps:

  1. Ensure that each member system's /.rhosts file contains an entry for each member system.

  2. In a Production Server cluster, run the cnxshow utility and ensure that each member system is recognized and running in the cluster. If a member system is not displayed by the cnxshow utility:

    1. Ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.

    2. Enter the sysconfig -q rm command and view the MEMORY CHANNEL kernel attributes (see Appendix B).

  3. Run the asemgr utility and ensure that the member system is recognized and running. If a member system is not recognized by the asemgr utility, add it.

  4. Reenter the cluster_map_create command.

Failure to load new map onto all cluster members.

The cluster_map_create utility could not start the tractd daemon and submon process on cluster members; therefore, it could not distribute the cluster configuration map. Member systems are either incorrectly configured for Production Server or the cluster communication subsystem is not operating properly. To correct this problem, follow these steps:

  1. Ensure that each member system's /.rhosts file contains an entry for each member system (using the MEMORY CHANNEL IP name).

  2. In a Production Server cluster, run the cnxshow utility and ensure that each member system is recognized and running in the cluster. If a member system is not displayed by the cnxshow utility:

    1. In a Production Server cluster, ensure that you have correctly configured the MEMORY CHANNEL interconnect. See the TruCluster Software Products Hardware Configuration manual and the MEMORY CHANNEL User's Guide for configuration information.

    2. Enter the sysconfig -q rm command and view the MEMORY CHANNEL kernel attributes (see Appendix B).

  3. Run the asemgr utility and ensure that the member system is recognized and running. If a member system is not recognized by the asemgr utility, add it.

  4. Reenter the cluster_map_create command.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.5    Error Reported by the cmon Utility

Man Page could not be formatted. The requested Man Page is either not present, or corrupt.

One of the base operating system reference pages subsets required by the Cluster Monitor online help is not installed. Check the dependencies listed in the TruCluster Software Products Software Installation manual.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

11.2.6    Other Problems

If the X color map is fully allocated or becomes fully allocated when you invoke the Cluster Monitor, the Cluster Monitor may use other colors in place of those it cannot allocate. Typically this happens when you are using other color-intensive or graphic-intensive applications at the same time as you are using the Cluster Monitor. Many of these applications have mechanisms that allow you to run them with a reduced color map. This may allow you to run the Cluster Monitor with its full color palette.

If you are in the middle of an available server environment (ASE) operation, such as a service relocation, the Cluster Monitor may temporarily display an erroneous message. If there is no problem in the ASE, the message will not appear in the display for the next reporting cycle (approximately 20 seconds).


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]

11.3    Setting Up a Highly Available Cluster Monitor Service

You can set up a Network File System (NFS) service to make the Cluster Monitor graphical user interface highly available. Invoke the Cluster Monitor by using the cmon -ha command, which makes the Cluster Monitor highly available. You must also run the Cluster Monitor on a member system with its display set to a workstation or another member system.

If a member system running a highly available Cluster Monitor fails, the Cluster Monitor is restarted on the member system currently running the NFS service. The Cluster Monitor will display on the same workstation as before the failure.

If the member system running the NFS service fails, the Cluster Monitor will also restart, but there will be a short delay while the service relocates to another member. After the service relocates, it restarts the Cluster Monitor on the local system.

After the service relocates, there is a short delay while the NFS locking mechanism restarts. The Cluster Monitor acquires a lock on the run list file, adds an entry so that it is registered in case of another failure, and restarts its display. To minimize the effect of the NFS locking delay, you can set up the Automatic Service Placement (ASP) policy for the NFS service so that the service runs only on members that rarely fail.

To set up a highly available Cluster Monitor, follow these steps:

  1. Set up the Cluster Monitor as described in the previous sections.

  2. Determine the name of the NFS service, for example cmonha, and assign an Internet Protocol (IP) address to it, specifying the service name and address in the /etc/hosts file on all member systems. See Chapter 4 for detailed information about preparing disks for a service.

  3. Use the asemgr utility to set up the NFS service. When you set up the NFS service, you must specify:

    After you add the service, it is started on a member system.

  4. Using the asemgr utility, modify the ASE exports file for the Cluster Monitor service so that the phrase -root=0 is at the end of the exports line. This preserves root ID mapping for all clients of the service.

  5. Because the member systems will act as both clients and servers of the NFS service, you must run nfssetup on each member system and NFS-mount /var/cmon from the NFS service using its IP host name.

  6. On one member system, create a /var/cmon/run directory. NFS exports this directory on the service's shared disk to each member system.

  7. Use the asemgr utility to modify the NFS service and set up user-defined start and stop action scripts. See Chapter 4 for information about action scripts. See Chapter 10 for information about modifying services. Add the following commands at the bottom of the user-defined start action script, before the exit command:

    if [ ! -f /usr/sbin/cmonsvc ]
     
    then
     
            exit 2
     
    fi
     
    /usr/sbin/cmonsvc
    

    The cmonsvc daemon uses the host status monitor (HSM) daemon to monitor the member systems in the available server environment (ASE). If a member fails, the cmonsvc daemon checks the /var/cmon/run/runList file to determine if the member had been running the cmon -ha command. If so, the Cluster Monitor is restarted on the member that is running the NFS service. Add the following commands at the bottom of the user-defined stop action script, before the exit command:

    if [ ! -f /usr/sbin/sendTrig ]
     
    then
     
            exit 2
     
    fi
     
    /usr/sbin/sendTrig -e cmonsvcStop
    

    The sendTrig program sends a cmonsvcStop trigger message to the cmonsvc daemon on the member system running the service. This stops the service.

  8. Invoke the Cluster Monitor from a remote system as described in Section 11.1.2, but use the following cmon command:

    # nohup /usr/bin/X11/cmon -ha -d hostname:0.0 &
    

    The -d option is not needed if the DISPLAY environment variable is set on the member system. When the Cluster Monitor is invoked with the previous command line, it writes a line to the /var/cmon/run/runList file. The line contains the name of the member system, the cmon command's process ID, and the name of the system that is displaying the Cluster Monitor. That line is then used to restart the Cluster Monitor, if necessary.


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]