This chapter describes the following problems, which you might encounter during installation and suggests corrective actions:
Setting logging levels (PS, AS, MC)
Kernel build fails (PS, AS, MC)
ASE validation fails (PS, AS)
For Production Server and Available Server, you can set the
asemgr
logging
level to
Informational, which increases the amount of messages
written to
For Production Server and MEMORY CHANNEL, you can use the
mchan_debug
attribute in the
/etc/sysconfigtab
file to generate verbose MEMORY CHANNEL
error messages.
Set the attribute as shown in the following example:
mchan:
mchan_debug=1
You must reboot the system in order for the
mchan_debug
change to take effect.
The additional debug information, when included in
a problem report, can help your DIGITAL service representative diagnose problems.
After prompting for configuration options, the installation procedure
attempts to build a new kernel using the
doconfig
utility.
If the newly configured kernel cannot be built, the installation procedure
displays the following message:
*** WARNING *** An error has occurred during system configuration. A partial listing of the error log file (./errs) follows:
.
.
.
*** NOTE *** The customized kernel for this machine could not be successfully created. One possible problem could be kernel layered products that might be incompatible with the operating system. This script will now automatically attempt to build a kernel using the operating system only. Is this ok? (y/n) [y]:
If the rebuild is still unsuccessful, the installation procedure displays the following message:
*** NOTE ***
A new kernel for this machine could not be successfully created.
Unable to build the new kernel. Please perform the following actions:
o Run "doconfig" to build a good kernel.
o Move the new kernel to /.
o Before rebooting make sure that the MEMORY CHANNEL IP
addresses for all cluster members are recorded in each member's
/etc/hosts file.
o Reboot the system.
For information on building, tuning, and debugging kernels see the DIGITAL UNIX System Administration, System Configuration and Tuning, and Kernel Debugging manuals.
The primary network for Production Server is the MEMORY CHANNEL subnet; the primary network for Available Server is the network attached to the interface specified during installation. (Section 2.1 has a description of each product's primary network interface.)
If a member system does not respond to the
ping
command,
do the following:
Check that each member's primary interface is configured
UP
and that the appropriate interface-related entries are present
in that member's
/etc/rc.config
and
/etc/hosts
files.
In the following example, host
clu14's primary network
interface is
mc0:
# ifconfig mc0
mc0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX>
inet 10.0.0.2 netmask ffffff00 broadcast 10.0.0.255 ipmtu 8008
The interface is configured
UP, and has the following
NETDEV_n
and
IFCONFIG_n
entries in the member's
/etc/rc.config
file:
# egrep "mc0|10.0.0.2" /etc/rc.config NETDEV_1="mc0" IFCONFIG_1="10.0.0.2 netmask 255.255.255.0"
The interface's host entry in
/etc/hosts
associates
the IP address assigned to the
IFCONFIG
entry to the IP
name assigned the
CLUSTER_NET
entry:
# rcmgr get CLUSTER_NET mcclu14 # grep mcclu14 /etc/hosts 10.0.0.2 mclu14.abc.def.com mcclu14
Make sure that the following entries are in each member system's
/etc/hosts:
An entry for each member system's IP name and IP address on the cluster's primary network.
The IP host addresses used by critical network services such as BIND, NIS, and NTP.
For Production Server, the MEMORY CHANNEL IP address of the connection manager service
(cluster_cnx), which must be host number 42 on the MEMORY CHANNEL
subnet.
(The
clu_ivp
utility checks for the presence of
the
cluster_cnx
service but does not verify its IP address.)
For systems with more than one network interface, the IP host names and addresses used to communicate with cluster members and clients through those network interfaces. For example, a Production Server cluster has a conventional Ethernet or FDDI network in addition to its MEMORY CHANNEL subnet; an Available Server ASE often has a secondary network as a backup.
Each system in a failover-capable cluster must have identically configured MEMORY CHANNEL adapters.
For a physical hub configuration, if the primary adapter is plugged, for example, into the primary hub's linecard in slot 3, the alternate adapter must be plugged into the alternate hub's linecard in slot 3. (The slot location determines the adapter's node ID, and the node IDs must be identical among all cluster members.)
If the MEMORY CHANNEL adapters are not connected properly, the system can panic with the following message:
rm_check_cables: cables are crossed
In a two-system, virtual-hub cluster, the jumper settings determine the node IDs. A system's primary and alternate adapters must be jumpered identically (either as VH0 or VH1). See the TruCluster Software Products Hardware Configuration manual for information on configuring MEMORY CHANNEL adapters. See Section 3.14 for information on setting up a tie-breaker disk for virtual-hub clusters.
If
cnxshow
indicates that a system is unable to join
the cluster, perform the following checks:
Use the
ps ag
command to verify that the
portmap
and
cfgmgr
processes are running.
These
processes, while not specific to clusters, must be running in order for the
cluster to operate.
For example:
# ps ag | egrep "portmap|cfgmgr" | grep -v egrep 224 ?? I 0:04.77 /usr/sbin/portmap 244 ?? I 0.00.01 /sbin/cfgmgr
Check initialization and error messages (for example, the
daemon.log
and
kern.log
files, and the
uerf
utility).
See
Appendix A
for examples
of startup, cluster formation, and cluster recovery messages.
If either the
clu_ivp
utility or the
drd_ivp
utility (Production Server only) reports that the available server environment
(ASE) validation checks failed, run the
asemgr
utility
with the
-d
and
-h
options on one system in
each ASE to ensure that all ASE member systems are up and running.
For example:
# asemgr -dh
Member Status
Member: Host Status: Agent Status:
mcclu6 UP RUNNING
mcclu7 UP RUNNING
See
asemgr(8)
for more information on these options.
Because each Available Server installation consists of a single ASE, the following applies only to Production Server installations.
All members in an ASE must have the same ASE
ID.
You can use the
rcmgr get ASE_ID
command to check the
ASE identifier (ASE_ID) of each system.
For example:
# rcmgr get ASE_ID 1
To change a system's ASE_ID, follow these steps:
If DRD services are configured, delete all services on the system.
Shut the system down to single-user mode.
Set the ASE_ID value.
In the following example, the
rcmgr
command is used to set the ASE_ID value to 2 to match the
ASE_ID assigned to the other members in the ASE:
# rcmgr set ASE_ID 2
Halt and reboot the system.
Add any DRD services that were deleted.
If the
drd_ivp
utility is run (either manually or
as part of the
clu_ivp
utility) prior to defining the available
server environment (ASE) member list, it can report that it is unable to determine
ASE membership.
For example:
#drd_ivp
Cluster Configuration Information
Hostname ASE_ID BSSD BSSD DRD Lic
Reg Resp Conf Reg
----------------------------------------------------------
mcclu6 0 Yes Yes Yes Yes
mcclu7 0 Yes Yes Yes Yes
DRD configuration validation tests succeeded.
Unable to determine which nodes are in the same ASE
as node mcclu6. Verify that node mcclu6 is up and that it
has the ASE_ID parameter in its '/etc/rc.config' file.
Verify that mcclu6 is registered as a member of an ASE.
Unable to determine which nodes are in the same ASE
as node mcclu7. Verify that node mcclu7 is up and that it
has the ASE_ID parameter in its '/etc/rc.config' file.
Verify that mcclu7 is registered as a member of an ASE.
Failed to validate ASE_ID values.
Use the
asemgr
utility to populate the ASE member
list.
Then rerun either the
clu_ivp
utility or the
drd_ivp
utility to check that the systems are registered as members
of the ASE.
The TruCluster Software Products Administration manual provides more information on troubleshooting the DRD subsystem.
If the member systems connected to a shared SCSI bus have inconsistent views of the devices on the bus (all ASE members must have identical numbers for shared buses and devices), do the following:
Make sure that all shared SCSI cables are connected and terminated as described in the TruCluster Software Products Hardware Configuration manual.
For systems that support the
bus_probe_algorithm
console variable, check that its value is set to
new
(see
Section 2.3).
Verify that the shared SCSI buses are numbered equivalently
on each system.
As mentioned in
Chapter 5, you can run
the
clu_ivp
utility on each system and compare the output
to check whether all system have the same view of shared SCSI buses and devices.
If you discover an inconsistency, do the following on the affected system
or systems:
Run the
/var/ase/sbin/ase_fix_config
utility, described in
Section 3.7, and adjust
the bus numbering.
Build a new kernel using the
doconfig -c
HOSTNAME
command.
Move the new kernel to
/vmunix.
Reboot the system.