An available server environment (ASE) manages a collection of systems and the shared SCSI buses to which they are connected, and provides an environment in which services can be started, stopped, and automatically relocated in response to a software or hardware failure. ASEs provide high availability and increase storage capacity, reduce performance bottlenecks, and permit a wider range of configurations.
An Available Server configuration contains but a single ASE. A Production Server configuration contains one or more ASEs. As discussed in the TruCluster Software Products Software Installation manual, when you install the Production Server Software, you identify each ASE that is to exist within the cluster by assigning an ASE identifier (ASE_ID) to those members that are to participate in an ASE.
The ASE_ID is a value from 0 to 63 that cluster software uses to uniquely identify the ASE in which the system resides within the cluster. Each ASE has a unique ASE_ID; all systems in the same ASE share the same ASE_ID. (In an Available Server configuration, all systems have an ASE_ID of 0. You are not prompted to supply an ASE_ID during software installation.)
The following sections describe how to manage the membership of ASEs and clusters. It discusses the following tasks:
Adding members to a cluster (Production Server cluster only) (Section 2.1)
Adding members to an ASE (Section 2.2)
Enabling ASE on an existing cluster member (Production Server cluster only) (Section 2.3)
Deleting members from an ASE (Section 2.4)
Displaying the status of ASE members (Section 2.5)
Initializing ASE member systems (Section 2.6)
Stopping and restarting ASE activity (Section 2.7)
Shutting down a member (Section 2.8)
To add a new system to an existing Production Server configuration, follow these steps:
Follow the instructions in the TruCluster Software Products Hardware Configuration manual to physically connect the new system to existing shared storage, the MEMORY CHANNEL interconnect, and external networks.
Connect the new system to the MEMORY CHANNEL subnet, turn on the power, and boot the system.
Install the same versions of the DIGITAL UNIX operating system and TruCluster software on the new system as you installed on existing cluster member systems. Decide which available server environment (ASE), if any, the new system will belong to and specify the correct ASE_ID during cluster base subset configuration.
Ensure that the settings of the following
/etc/sysconfigtab
attributes are the same on the new system as on all current member
systems:
dochecksum--Enables or disables Transmission
Control Program/Internet Protocol (TCP/IP) checksums.
(See
Appendix B
for more information.)
rx_mapping_enabled--Enables or disables
copy avoidance.
(See
Appendix B
for more information.)
rm_rail_style--Configures the reliability
style of the MEMORY CHANNEL interconects on a cluster member.
(See
Appendix B
and the TruCluster Production Server Software
MEMORY CHANNEL Application Programming Interfaces
guide
for more information.)
enable_extended_uids--Enables or disables
extended UIDs in the base operating system.
(See the DIGITAL UNIX
Release Notes
for more information.)
Warning
Failure to match the current configuration can cause one or more systems to panic when you attempt to add the new system to the cluster.
Modify the new
/etc/hosts
file on the new
system to add entries for the IP address and hostname associated with the MEMORY CHANNEL
subnet for each existing member system (including the new system).
Modify the
/etc/hosts
file on existing
cluster member systems to include an entry for the new system's MEMORY CHANNEL
IP address and hostname.
If the new cluster member is to belong to an ASE, run the
asemgr
utility on an existing member of the ASE to which the new
system will belong and add the new member.
The updated ASE database will
be propagated to all ASE members when the new system is rebooted.
See Section 2.2 for instructions on adding a new member to an ASE.
The following requirements pertain to adding member systems to an available server environment (ASE):
The host name and IP address for each member system must be
included in all the member systems' local
/etc/hosts
files.
For Available Server configurations, this host name can correspond to the
network interface you specify in the
HOSTNAME
configuration
variable in the local
/etc/rc.config
file.
For a Production
Server configuration, it must correspond with the cluster interconnect name
that the cluster installation script automatically adds to the
/etc/rc.config
file in the
CLUSTER_NET
configuration
variable.
You must include network interface names in each member system's
local
/etc/hosts
file, and they must be configured on
the member system.
See your software installation manual and the DIGITAL
UNIX
Network Administration
manual for more information.
After you set up the cluster hardware configuration and install
the TruCluster software, immediately use the
asemgr
utility
to configure the cluster's ASEs, and add the member systems to each ASE.
Add all the member systems from the same system.
See
Section 2.3
for more information.
If you want to change the name of a member system, you must
use the
asemgr
utility to delete the member system from
the ASE, change the name of the system, and then add the renamed member system
to the ASE.
Make sure you also make changes as appropriate to members'/etc/host
files.
See
Section 2.4
for
more information.
You must ensure that the ASE daemons do not time out because other system processes have a higher scheduling priority. The ASE daemons should have a scheduling priority that is higher than normal system processes, because the daemons must be able to respond to administrative commands and other events in the ASE. The daemons' high priority enables the ASE to operate even when the member systems are busy. See Section 12.3 for more information.
Each member system that participates in an ASE must reside in the same Berkeley Internet Name Domain (BIND) domain.
Use the
asemgr
utility to add one member system at
a time to the ASE.
The system on which you run the
asemgr
utility for the first time is your first member system.
You must add at
least one other member system to your ASE.
Add all the member systems from the same system.
Do not run
the
asemgr
utility on one system and add one member system,
and then run the
asemgr
utility on another system and
add a different member system.
In an Available Server configuration, after you enter the names of all the member systems, you are prompted for additional network interfaces for each member system. Before you add a member system, all network interfaces must be configured on the system. See Section 3.3.3 for information about using multiple networks in an Available Server configuration.
If you want to add member systems to an existing ASE, choose the "Add a member" item from the Managing the ASE menu. A list of the current member systems is displayed and you can add additional member systems, one at a time. You are then prompted for additional network interfaces for the new member systems. Example 2-1 shows how to add member systems to the ASE.
Managing the ASE
a) Add a member
d) Delete a member
n) Modify the network configuration
m) Display the status of the members
C) Display the configuration of the ASE database
l) Set the logging level
e) Edit the error alert script
t) Test the error alert script
) Enable ASE V1.5 functionality
q) Quit (back to the Main Menu)
x) Exit to the Main Menu ?) Help
Enter your choice [x]: a
Member List: tototc, gideontc
Enter a new member: daffytc
Member List: tototc, gideontc, daffytc
Is this correct (y/n) [y]: y
Would you like to define any other network interfaces to daffytc
for ASE use (y/n)? [n]: n
ASE Network Configuration
Member Name Interface Name Monitor
___________ ______________ _______
tototc tototc Yes
gideontc gideontc Yes
daffytc daffytc Yes
Is this configuration correct (y|n)? [y]: y
Note
After an ASE member system has been deleted from an ASE, attempts to add it back into the ASE may fail. To resolve this problem, perform one of the following actions before trying to add the affected system back into the ASE:
Reboot the member system.
Enter the following command:
%/sbin/init.d/asemember restart
When you install the TruCluster Production Server software on a member system, the installation procedure asks if you intend to run the available server environment (ASE) on that system. If you answer "no" at that time, and later decide to run ASE on that system, you must perform the following steps to enable ASE:
Select a value for the member system's ASE_ID. The member's ASE_ID must be the same as the ASE_IDs of the other member systems in the ASE it is joining.
Determine whether this system should run the ASE logger.
Enter the following commands, specifying the selected ASE_ID for <var> where appropriate:
# rcmgr set ASE on # rcmgr set ASE_ID <var> # rcmgr set ASELOGGER 1 if you wish to run the logging daemon #
Rebuild the kernel using the
doconfig
program.
See the DIGITAL UNIX
System Administration
manual for instructions on running
the
doconfig
program.
Reboot the system.
To delete member systems from the available server environment (ASE), choose the "Delete a member" item from the Managing the ASE menu. You then specify the number associated with the member system you want to delete. If a member system is running a service, ASE relocates the service to another member system.
You cannot delete the member system on
which you are running the
asemgr
utility.
To delete the
last member system in an ASE, you must delete the TruCluster software subsets
from that system.
To display the status of the member systems in an available server environment (ASE), choose the "Display the status of the members" item from the Managing the ASE menu. The status of each member system and the agent daemon running on the system are displayed. See Section 1.2 for a description of the ASE daemons.
Example 2-2 shows an example of member system status in the ASE.
Member Status
Member: Host Status: Agent Status:
tototc UP RUNNING
daffytc UP RUNNING
The director daemon obtains system status from the host status monitor
(HSM) daemons running on all the member systems.
The following table describes
the information for the
Host Status
field:
| Host Status | Description |
| UP | The member system is up and can be accessed by the member system that is running the ASE director daemon using the cluster interconnect. The member system can be queried over the cluster interconnect, and can add, delete, start, and stop services. |
| DOWN | The member system cannot be accessed by the member system that is running the director daemon using any network or shared SCSI bus. The member system does not answer queries over the cluster interconnect or SCSI bus, and cannot start or stop services. |
| DISCONNECTED | The member system is disconnected from all monitored networks. Any services running on the member system are stopped, and no services can be added, deleted, or started on the member system. |
| NETPAR | There is a network partition between the member system and the member system running the director daemon, although the member systems can communicate using SCSI bus queries. Services that are currently running on the member system remain running, but the member system cannot start or stop any service until it leaves this state. |
The director daemon determines the status of the agent daemons running
on the member systems.
The following table describes the information in
the
Agent Status
field:
| Agent Status | Description |
| RUNNING | The ASE agent daemon is running on the member system. |
| DOWN | The ASE agent daemon is not running on the member system. |
| INITIALIZING | The ASE agent daemon that is running on the member system is in its initialization phase and will be running soon. |
| UNKNOWN | The ASE director daemon cannot determine the state of the agent daemon on the member system. |
| INVALID | The ASE director daemon reports an invalid state for the agent daemon on the member system. |
If an available server environment (ASE) does not work correctly,
make sure that you have adhered to the requirements in this manual and in
the TruCluster Software Products
Hardware Configuration
manual.
You should also read the TruCluster Software Products
Release Notes.
Running the
clu_ivp
utility
may reveal the cause of errors.
If you cannot fix the problem, you can initialize
one or all of the member systems in an ASE.
Initializing a system stops any running ASE daemons and removes any member system and service information from the ASE database on the system. After you initialize a system, it can be added to an existing ASE or used in a new ASE.
You may want to initialize a system if you cannot add it to an ASE, or if the ASE database is corrupted on the member system. However, you may need to initialize all the ASE member systems to solve the problem.
The following sections describe how to initialize one or all of the ASE systems.
To initialize one system, follow these steps:
If the system is already a member system, use the
asemgr
utility to delete the member system from the ASE.
If you
cannot delete the member system, you cannot initialize only this member.
If the system is not an ASE member system, delete the
/usr/var/ase/config/asecdb
ASE database file, if it exists, from
the system.
Invoke the
/usr/sbin/asesetup
command on
the system.
Run the
asemgr
utility on an existing member
system and add the initialized system to the ASE.
Initializing all the member systems returns the ASE to a state that includes no member systems or services. After you do this, you must add the member systems and set up your services again.
To initialize all the member systems in an ASE, follow these steps:
If possible, use the
asemgr
utility to
display the status of the member systems, network, and services in the ASE.
This information will help you to re-create your ASE.
If possible, use the
asemgr
utility to
delete all the services from the ASE.
This allows you to save any Logical
Storage Manager (LSM) or Advanced File System (AdvFS) disk configurations
on a specific system.
Delete the
/usr/var/ase/config/asecdb
ASE
database file from all the systems.
Invoke the
/usr/sbin/asesetup
command on
each system.
Run the
asemgr
utility on a system, add
the other initialized systems to the ASE, one at a time, and set up your
services.
To change your available server environment (ASE) hardware configuration or perform maintenance, you may have to stop all activity in the ASE.
To stop all ASE activity, follow these steps:
Use the
asemgr
utility to place each ASE
service off line, stopping the services.
Invoke the
/sbin/init.d/asemember stop
command on all the member systems.
After you stop ASE activity, you can perform the desired maintenance.
To restart ASE activity, follow these steps:
Invoke the
/sbin/init.d/asemember start
command on all the member systems.
Use the
asemgr
utility to place the ASE
services on line.
Because each cluster member must maintain a kernel state regarding the clusterwide activities of the connection manager and distributed lock manager (DLM), you cannot shut a cluster member down to single-user mode and then bring it back up to multiuser mode. A full halt or complete reboot is required.
All normal methods of shutting down a single system and rebooting work
for a cluster member.
That is, the
shutdown -h
and
shutdown -r
commands (and halt and reboot console operations) work
normally for systems running TruCluster Production Server Software with one
exception.
On multiprocessing systems that are cluster members, pressing the halt button (or typing Ctrl/P at an AlphaServer 8200/8400 system console) does not cause a full halt of the member. To bring a multiprocessing system in a cluster to a full halt, enter one of the following console commands immediately after pressing the halt button:
Initialize the console by using the console's
init
command.
This stops all CPUs and resets all buses and is the quickest
and surest way to bring the system to a full halt.
Halt each CPU by using the console's
halt
command.
If you wish to halt all the CPUs in order to examine hardware registers
or memory locations, type
halt 1,
halt 2,
...
.
This prevents corruption of system data and guarantees that the MEMORY CHANNEL
hardware will time out so that other cluster members will realize the member
is down.
To force a crash dump, use the appropriate console command.
(This
will safely halt all CPUs and generate a crash dump at the next boot.)
Reboot the system by using the console's boot command.