[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

12    Troubleshooting


[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

12.1    Using ASE Event Logging

The available server environment (ASE) logger daemon (aselogger) tracks the ASE messages generated by all the member systems. A logger daemon can be run on one or more member systems. If you have more than one member system running a logger daemon, you will have virtually duplicate logs on the member systems. Messages appear in the log files in the order that they were logged, not necessarily in the order that they occurred.

During software installation, the TruCluster software installation procedure prompted you to determine if you want to run the ASE logger daemon each time the system is booted. If you chose not to run the logger daemon when you installed the TruCluster software, you can invoke the following command and then reboot the system to start the logger daemon each time the system is booted:

# rcmgr set ASELOGGER 1

Note

A temporary stop in a network or a high network load may cause the aselogger daemon to overflow its message queue, resulting in the loss of some log messages on the system running the daemon. To avoid losing messages, run the aselogger daemon on each member system.

The ASE logger daemon logs messages generated by the asemgr utility, the director daemon, the agent daemon, and the logger daemon. Messages generated by the host status monitor (HSM) daemon and the availability manager (AM) driver are logged to the local system. In addition, if the ASE logger daemon stops, all daemon messages are logged only to the system on which they occurred. Note that when the TruCluster software first starts, the initial messages that are generated may be logged only to the local system.

The logger daemon uses the DIGITAL UNIX event logging facility, syslog, to collect messages that are logged by the various kernel, command, utility, and application processes. Messages are either logged to a local file or forwarded to a remote system, as specified in the /etc/syslog.conf file on each member system running the logger daemon.

The /etc/syslog.conf event logging configuration file specifies how a member system logs messages. If you use the default logging configuration, all asemgr utility and ASE daemon messages are logged to the /var/adm/syslog.dated/date /daemon.log file. The AM driver messages are logged to the /kern.log file in the same directory.

In addition, you can set the severity level of ASE error logging by using the asemgr utility. This allows you to limit the ASE messages that are logged. See Section 12.1.3 for more information.

To examine the ASE messages generated by the asemgr utility and the logger, director, and agent daemons, check the event logging files of any member system that is running a logger daemon. To examine the ASE messages generated by the HSM daemon and the AM driver on a particular member system, check that system's event logging files. Appendix A contains a partial list of important event messages and their descriptions.

The following example shows a remote message and a local message on member system gideontc, which is running a logger daemon:

Jan 27 11:22:27  gideontc  ASE: pigeon Agent Error: HSM reported state
 
           change of zen.tst.com, a non-member host
 
 
 
Jan 27 11:22:29  gideontc  ASE: local AseLogger Notice: connected to Agent
 
 
     [1]             [2]      [3]       [4]       [5]       [6]        [7]

The ASE messages are logged in a specific format and include the following information:

  1. Date and timestamp. [Return to example]

  2. Local system name. [Return to example]

  3. Identifier (not used in messages from the AM driver). [Return to example]

  4. System that generated the message--Note that local is specified if the message was logged locally or was not logged using the logger daemon. This information is not specified in messages from the AM driver. [Return to example]

  5. Source of the message--The following components can generate an ASE message:

    AseMgr The asemgr utility
    Director The ASE director daemon
    Agent The ASE agent daemon
    HSM The HSM daemon
    AseLogger The logger daemon
    AM The AM driver
    vmunix The kernel
    AseUtility A process or daemon unrelated to ASE

    [Return to example]

  6. Severity of the message--The severity level is not included in messages from the AM driver. Messages can have the following severity levels:

    info A low-level informational message
    notice A high-level informational message about significant activity in the ASE
    warning A message about activity in the ASE that may indicate an error condition
    error A message about an error that was detected
    alert A message about a critical condition that requires immediate attention

    [Return to example]

  7. Message text. [Return to example]

The ASE action scripts capture any output from the commands that they execute. If the action script fails, the command output is logged as errors and the source of the message is specified in the log files as AseUtility.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.1.1    Configure Mail for the ASE Logger Daemon

By default, the ASE logger daemon logs alert messages in the daemon.log file in the /var/adm/syslog.dated/date directory and notification is sent to root on the local system. You can use the mailsetup program to configure mail so that the superuser can receive error alert messages from the ASE logger daemon. (You can use the Mail option on the setup utility menu to run this program.) See mailsetup(8) for more information. For information on setting up mail to fail over, see Section 5.5.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.1.2    Displaying the Members Running a Logger Daemon

To determine which member systems are running a logger daemon, choose the "Obtaining ASE Status" item from the ASE Main Menu and then choose the "Display the location(s) of the logger" item.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.1.3    Setting and Displaying the Message Logging Severity Level

ASE message logging uses the DIGITAL UNIX syslog function and syslogd daemon. However, you can use the asemgr utility to specify the severity level of the messages that you want the ASE logger daemon to log, which restricts the severity level of the messages that are logged.

There are five possible severity levels that can be logged, as described in Section 12.1.1. The following table describes the types of messages associated with the possible severity levels:

Message Type Description
Informational Logs messages of all severity levels. This is the default.
Notice, warning, and error logging Logs messages with the notice, warning, error, and alert severity levels.
Warning and error logging Logs messages with the warning, error, and alert severity levels.
Error logging only Logs messages with the error and alert severity levels.

To set the severity level for message logging, choose the "Set the logging level" item from the Managing the ASE menu. Example 12-1 shows how to set the severity level for message logging.

Example 12-1:  Setting the Logging Severity Level

Enter the logging level for the ASE:
 
 
 
    i)  Informational (log everything)
 
    n)  Notice, warning, and error logging
 
    w)  Warning and error logging
 
    e)  Error logging only
 
 
 
    x)  Exit to Managing the ASE
 
Enter your choice [i]: n

You can display the severity level of the messages being logged by choosing the "Display the level of logging" item from the Obtaining ASE Status menu.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.1.4    Disabling ASE Event Logging

To disable ASE event logging on a member system, you must stop ASE services, which stops the enabled logger daemon, reset the ASELOGGER parameter to zero, and restart ASE services for the change to take effect.

Enter the following commands to disable ASE envent logging on a member system:

# /sbin/init.d/asemember stop
 
# rcmgr set ASELOGGER 0
 
# /sbin/init.d/asemember start
 


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.1.5    Editing and Testing the Error Alert Script

TruCluster software provides you with a script that executes a specified task when an error with the alert severity level occurs. You use the asemgr utility to edit and test the script.

The default error alert script sends mail to users that you specify in the script. You can edit the script to specify which users will receive the mail, and you can specify some other action to take when a severe error occurs.

To edit the error alert script, choose the "Edit the error alert script" item from the Managing the ASE menu. The asemgr utility invokes the vi editor or the editor defined by the EDITOR environment variable, and you can specify the users to which you want mail sent or you can make other changes to the script.

Example 12-2 shows how to edit the error alert script.

Example 12-2:  Editing the Error Alert Script

#  Define ADMIN on next line to get mail for critical ASE errors
 
 
ADMIN=root
 
 
PATH=/sbin:/usr/sbin:/usr/bin
 
export PATH
 
 
 
ERR_FILE=/var/ase/tmp/alertMsg
 
TIME=`date +"%D %T"`
 
HSM_STATUS=`awk -F: '{print $2}' ${ERR_FILE} | sed 's/ //g'`
 
 
 
case    "${HSM_STATUS}" in
 
                HSM_NI_STATUS)
 
 
 
                        awk -f /var/ase/lib/ni_status_awk ${ERR_FILE}
 
                        ;;
 
                HSM_PATH_STATUS)
 
                        awk -f /var/ase/lib/path_status_awk ${ERR_FILE}
 
                        ;;
 
esac
 
 
 
if [ -n "${ADMIN}" ]; then
 
        if [ ! -f "${ERR_FILE}" ]; then
 
                echo "Critical ASE error detected on `date`" >
 
         ${ERR_FILE}
 
        fi
 
 
 
        mailx -s "***Critical ASE error - ${TIME}" ${ADMIN} < ${ERR_FILE}
 
fi
 
 
 
rm -f ${ERR_FILE}
 
:wq

To test the error alert script, choose the "Test the error alert script" item from the Managing the ASE menu. ASE sends a test alert message to the Logger daemon and invokes the error alert script.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.2    Resetting the ASE Daemons

You can reset the available server environment (ASE) daemons on a member system if problems occur in the ASE. Resetting the ASE daemons stops the ASE director, logger, and host status monitor (HSM) daemons and initializes the ASE agent daemons on a system. The agent daemons then restart all the daemons to make the ASE fully operational. If resetting the ASE daemons does not fix the problem, you can initialize or reboot the member system.

To reset the ASE daemons on a member system, use the following command:

/sbin/init.d/asemember restart


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

12.3    Controlling the Priority of the ASE Daemons

You must ensure that the available server environment (ASE) daemons do not time out, because other system processes have a higher scheduling priority. The ASE daemons must have a scheduling priority that is higher than normal system processes; they must be able to respond to administrative commands and other events in the ASE. The daemons' high priority enables the ASE to operate even when the member systems are busy. See the DIGITAL UNIX System Administration manual for information about scheduling processes.

If there are processes other than those generated by the ASE with a scheduling priority that is higher than the priority of the ASE daemons, the daemons could time out while waiting to run. If this occurs, messages such as the following are written to the log file, indicating that operations are timing out:

Mar 8 13:09:28 surry ASE: surry AseMgr Error: ASE timeout -
 
              Unable to stop service.

The ASE agent daemon (aseagent) and logger daemon (aselogger) are started in the /sbin/init.d/asemember script with a "nice" value of -5, which raises the priority of the daemons. The processes that descend from the ASE daemons inherit the raised scheduling priority. For example, the director daemon (asedirector) and any programs or scripts started by the ASE daemons have the same raised priority as the agent and logger daemons.

You can raise the scheduling priority of the ASE daemons by changing the "nice" value specified in the lines in the /sbin/init.d/asemember file that start the aseagent and aselogger daemons. See nice(1) for more information about scheduling priorities.

Note that ASE daemons started with a "nice" priority will not always stay at that priority. Over time, if the member systems do not reboot, the daemons' priority may return to the average run priority. When the member systems reboot, the daemons' priority is raised again according to the "nice" value in the /sbin/init.d/asemember script.

Therefore, the default /sbin/init.d/asemember script contains the following command, which supersedes the "nice" value for the asehsm daemon and runs the daemon with a fixed high priority that does not degrade over time:

aseagent -p hsm

If you do not want the fixed high priority for the asehsm daemon, remove this command from the /sbin/init.d/asemember script.

You can also raise and fix the priority of the aseagent, asedirector, and asehsm daemons by including the following command in the /sbin/init.d/asemember script:

aseagent -p all


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]

12.4    Connection Manager Removed System from the Cluster (PS)

The connection manager's monitor daemon, cnxmond is started on all cluster members by the /sbin/init.d/clumember script. The cnxmond daemon has the following two options that, when multiplied, specify the longest duration that communications can be inoperative between a system and the connection manager monitor daemon:

For example:

cnxmond -p 10 -D 6

This command results in a 60-second timeout interval.

When started by the clumember script, the cnxmond daemon searches the /etc/rc.config file to determine the values for the -p and -D options. The value for the -p option is obtained from the CNX_INTERVAL variable; the value for the -D option is obtained from the CNX_WAVES variable.

When the cnxmond daemon detects an interruption in communications with a system (that is, no ping is received during the timeout interval), the connection manager removes the system from the cluster. Investigate the source of the communications problem and, if necessary, use the rcmgr set command to increase the value of the CNX_WAVES variable. For example, to change the value of CNX_WAVES to 10, enter the following command:

# rcmgr set CNX_WAVES 10


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]