[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

1    Understanding Available Server Environments and Clusters

The TruCluster software products suite consists of three separately licensed products:

TruCluster MEMORY CHANNEL Software supplies an application programming interface (API) library that lets applications perform high-speed data transfers between systems connected to the MEMORY CHANNEL interconnect. (This API library is also included in the Production Server Software.) TruCluster MEMORY CHANNEL Software, unlike TruCluster Available Server Software and TruCluster Production Server Software provides neither shared storage nor application failover capabilities. Consequently, management of MEMORY CHANNEL Software configurations is largely a matter of setting up the appropriate hardware, installing the software, and understanding the MEMORY CHANNEL API library. These tasks are described in the Hardware Configuration, Software Installation, and MEMORY CHANNEL Application Programming Interfaces manuals. Therefore, the remainder of this manual focuses exclusively on managing Available Server and Production Server configurations.

This chapter provides an overview to understanding available server environments (ASEs), the additional components of a Production Server cluster, and how to use the asemgr utility.


[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

1.1    Using Storage Availability Domains in an Available Server Configuration or a Production Server Cluster

TruCluster Available Server Software and TruCluster Production Server Software let you configure a highly integrated organization of member systems, services, and storage devices. From a client's perspective, this configuration appears to be a powerful single-server system, providing greater application availability than is possible with a single system, and scalability beyond the limits of a single symmetric multiprocessing system.

A key component of the TruCluster Available Server Software and TruCluster Production Server Software is the storage availability domain. A storage availability domain is a collection of nodes that can access commonly shared storage devices in an available server environment (ASE). These nodes are considered to be ASE members.

Because all members in a given ASE can access the same shared storage, an application that requires that storage can run on any member. Both Production Server Software and Available Server Software let you configure such an application so that it runs on a single ASE member and, upon a failure of that member, restarts on another. This application could be a service that exports Network File System (NFS) file systems to clients, a disk-based application like a database engine or mail service, a tape-based service, or a nondisk-based application, such as a remote login service.

The most significant difference between Production Server Software and Available Server Software is that Production Server Software lets you develop and deploy an application whose components run concurrently, with equal access to raw disk data, on any node in the Production Server configuration. A Production Server cluster provides an ideal environment for applications that require high availability and performance, such as highly parallelized databases and transaction processing systems. The means by which raw disk data is provided to the components of applications distributed throughout the cluster involves a special type of ASE service (provided only with Production Server Software) known as distributed raw disk (DRD). Use of a distributed lock manager (DLM) ensures synchronized access to the data provided clusterwide by DRD services.

Because a Production Server cluster and an Available Server configuration both employ ASE technology, the administrator of either fundamentally manages ASE membership, ASE services, and service storage. However, the distributed nature of services within a Production Server cluster makes the configuration and management of ASEs within the cluster somewhat different than managing the ASE in an Available Server configuration. This manual will make the necessary distinctions as appropriate. To begin, keep the following configuration rules in mind when dealing with a TruCluster configuration:


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2    Components of an Available Server Environment

An available server environment (ASE) is a multinode configuration in which member systems and highly available storage are connected to shared SCSI buses. Software running on each ASE member monitors the health of ASE member systems and shared storage. In case of a failure, the ASE software causes services to fail over to surviving systems in the ASE that share access to the associated storage. Scripts associated with each service control failover.

An Available Server configuration contains a single ASE. A Production Server cluster can contain one or more nonoverlapping ASEs. A given cluster member can be a member of at most one ASE. However, a cluster member does not have to be a member of an ASE.

ASE members run the ASE daemons and driver, which monitor the network interconnects and the status of the systems, disks, and shared SCSI buses in the ASE. The ASE daemons and driver are as follows:

The following sections describe the ASE daemons and the AM driver.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2.1    The ASE Director Daemon

The ASE director daemon (asedirector) controls an entire ASE. It coordinates most of the activities that occur during ASE setup and operation and has a global view of the ASE. The ASE director daemon maintains information about ASE members and services, including which member system is running which service. It decides what actions to take when a change in the environment occurs and coordinates these actions in the ASE.

The ASE director daemon runs on only one member system in the ASE. If an ASE director daemon is not running on one of the members, the agent daemons on the members choose an ASE member to run the daemon.

The ASE director daemon ensures that all the services are always configured on all the member systems, using the ASE agent daemon running on each member to implement its decisions. It also maintains such information as the current state of services and member systems.

For example, I/O events, such as a device going off line or a disk reservation failure, are detected by the availability manager (AM) driver and reported to the director daemon by the agent daemon. Member and network events, such as a member system going down or a network partition, are detected by the HSM daemon and then reported to the director daemon.

In addition, the ASE director daemon handles all requests from the asemgr utility, such as configuring a service or displaying status.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2.2    The ASE Agent Daemon

An ASE agent daemon (aseagent) controls ASE operations on each member of an ASE and has a local view of the ASE. An ASE agent daemon synchronizes access to shared resources, using the AM driver interfaces to reserve disks and to receive notification of lost reservations and device connectivity losses.

Each ASE agent daemon reports local events (such as disk failures) to the ASE director daemon and also performs local ASE management tasks as requested by the director daemon. An ASE agent daemon invokes the commands to configure, start, and stop a service at the request of the director daemon.

An ASE agent daemon runs on each member of an ASE. On each member, the ASE agent daemon initializes the ASE, starts the HSM daemon, and starts the director daemon if necessary. For example, if the ASE director daemon terminates unexpectedly, the ASE agent daemons on the ASE members choose a member on which to run the ASE director daemon, and the ASE agent daemon on that member system starts the ASE director daemon.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2.3    The Host Status Monitor Daemon

A host status monitor (HSM) daemon (asehsm) runs on each member in an ASE and monitors member system status. It detects any breaks (partitions) in the network connections between member systems. The HSM daemon uses the availability manager (AM) driver to query systems over the SCSI bus. It uses network interfaces to query systems over the network.

In addition to providing the interface that can query hosts, the AM driver provides the HSM daemon running on a member system with the ability to transfer data when the network is not working.

The HSM daemon is started by the ASE agent daemon and reports to both the ASE director daemon (if it is running locally) and the ASE agent daemon. For example, if a member system goes down, the AM driver notifies the HSM daemon that the SCSI member system query has timed out or that it has noticed a break in the network connection.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2.4    The Availability Manager Driver

The availability manager (AM) driver is a kernel-level device driver that provides device reservations (locking), monitors remote hosts on the SCSI bus, and provides error and event notifications. Changes in the hardware run-time status are detected by the AM driver and reported to the host status monitor (HSM) daemon and the ASE agent daemon running on the member system.

The AM driver interfaces reserve disks and ensure that only one ASE member has access to a shared device at one time. They allow the agent daemon to query devices and the HSM daemon to query members.

If an I/O bus partition occurs (for example, the SCSI bus cable is disconnected from the member system), the AM driver notifies the HSM daemon that the system query failed. If a device is powered off, the AM driver notifies the ASE agent daemon that a device path failure has occurred, or that an I/O bus partition has occurred such that a system no longer has connectivity to a device.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.2.5    The Logger Daemon

The logger daemon (aselogger) tracks all the ASE messages that are generated by all the members of an ASE. When you install the TruCluster software on a system, you are prompted to determine if you want a logger daemon running on the system. A logger daemon can be run on more than one member system in an ASE.

The logger daemon uses the DIGITAL UNIX event logging facility, syslog, which collects messages that are logged by the various kernel, command, utility, and application programs. Messages are logged to a local file or forwarded to a remote system, as specified in the local system's /etc/syslog.conf file.

The logger daemon collects messages generated by the asemgr utility, the ASE director daemon, the ASE agent daemon, and the logger daemon. Messages generated by the host status manager (HSM) daemon and the availability manager (AM) driver are logged only to the local system. If all the logger daemons in the ASE stop, daemon messages continue to be logged, but only locally.

See the DIGITAL UNIX System Administration manual, syslog(3), and syslogd(8) for information on system event logging. See Appendix A for a description of some ASE error messages.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.3    Additional Components of a Production Server Cluster

Although the ASE components discussed in Section 1.2 are fundamental to a Production Server cluster's ability to allow database system elements to fail over from member to member without disrupting access to data, there are several other technologies used in the cluster that are critical to the operation of highly available, large database systems:

Figure 1-1 shows the relationship of these components. The remainder of this chapter provides additional details on the operation of these components.

Figure 1-1:  Overview of Production Server Software Subsystems


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.3.1    Distributed Raw Disk

Distributed raw disk (DRD) services allow a disk-based, user-level application to run within a cluster, regardless of where in the cluster the physical storage on which it depends is located. A DRD service allows an application, such as a distributed database system or transaction processing (TP) monitor, parallel access to storage media from multiple cluster members. Applications that perform I/O involving sets of large data files, random access to records within these files, and concurrent read/write data sharing can benefit from using the features of DRD. As deployed within an ASE, a DRD service can survive failures of both the server system and any mirrored disk participating in the service.

The DRD subsystem, shown in Figure 1-2, consists of four primary components:

The DRD subsystem, in conjunction with ASE services, is designed to provide applications with uninterrupted access to storage devices. Depending upon the hardware configuration of the cluster, DRD can withstand member failures, controller failures, and disk failures.

Figure 1-2:  Distributed Raw Disk


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.3.2    Distributed Lock Manager

The distributed lock manager (DLM), shown in Figure 1-3, synchronizes access to the resources that are shared among cooperating processes throughout the cluster. For example, a distributed database application uses lock manager services to coordinate access to the shared disks participating in the database.

Figure 1-3:  Distributed Lock Manager

An application secures a lock on a named shared resource. Resource names can be single-dimensional or tree-structured. A resource tree allows you to create a hierarchy of locks and sublocks that reflect the structure of a shared resource. The DLM:

The DLM employs a distributed, centralized tree design. It does not replicate lock information on each cluster member. Rather, the cluster member that manages a lock tree maintains all information about that tree. The member that holds a given lock is aware of only its contribution of that lock to the resource. Any member system can serve as the master for any lock tree, which distributes the overall lock management load.

The DLM uses a distributed directory service to quickly locate the directory node for a resource tree. A directory table associates a root resource name with the cluster member that is the manager of the resource. This directory table is identical on all cluster members.

The DLM is designed to handle member failures. If a lock holder fails, its locks are released. If a member system fails, a new lock master for locks previously mastered on that member is chosen and provided with all pertinent lock information.

The DLM also maintains a communications service that the connection manager uses to establish a communications channel between member systems.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.3.3    Connection Manager

Systems in a Production Server cluster configuration share data and system resources, such as access to data and files. To achieve the coordination required to maintain data integrity, the systems must maintain a clear sense of cluster membership. The connection manager ensures that the clustered systems communicate with one another, and it enforces the rules of cluster membership.

The connection manager is a set of daemons that creates a cluster when the first member is booted, and reconfigures the cluster when other systems join or leave it. The overall responsibilities of the connection manager are to:

Figure 1-4 shows the components of the connection manager.

Figure 1-4:  Connection Manager

The connection manager consists of a kernel component that maintains the configuration information and, as shown in Figure 1-4, the following daemons that control and distribute configuration information:

The TruCluster software installation procedure adds or modifies system startup scripts to automatically start these daemons each time the system boots.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

1.3.4    MEMORY CHANNEL

In a Production Server configuration, all cluster members must have a direct connection to all other members to facilitate communications among members and provide a fast and reliable transport for passing messages throughout the cluster. This version of the TruCluster software product supports the MEMORY CHANNEL interconnect, a specialized interconnect designed specifically for the needs of clusters.

The MEMORY CHANNEL interconnect is based on a peripheral component interconnect (PCI), which cluster members use to communicate among themselves on a private subnet. (See the TruCluster Software Products Hardware Configuration manual and TruCluster Software Products Software Installation manuals for instructions on how to set up the MEMORY CHANNEL subnet.) Each cluster system has a MEMORY CHANNEL interface card that connects to a MEMORY CHANNEL hub. The MEMORY CHANNEL hub provides both broadcast and point-to-point connections between cluster members. In most two-member cluster configurations, a physical MEMORY CHANNEL hub is not used. Instead, the members utilize the virtual hub mode of the MEMORY CHANNEL interface card.

The Production Server configuration fails over from one MEMORY CHANNEL interconnect to another if a configured and available secondary MEMORY CHANNEL interconnect exists on all member systems, and one of the following situations occurs in the primary interconnect:

After the failover completes, the secondary MEMORY CHANNEL interconnect becomes the primary interconnect. Another interconnect failover cannot occur until you fix the problem with the interconnect that was originally the primary.

If more than ten MEMORY CHANNEL errors occur on any member system within a one-minute interval, the MEMORY CHANNEL error recovery code attempts to determine if a secondary MEMORY CHANNEL interconnect has been configured on the member as follows:

The MEMORY CHANNEL interconnect:

Figure 1-5 shows the general flow of a MEMORY CHANNEL transfer.

Figure 1-5:  MEMORY CHANNEL Transfer

You need at least one MEMORY CHANNEL adapter installed in a PCI slot in each member system and a link cable to connect the adapters. If you have more than two members in your cluster, link cables are used to connect the MEMORY CHANNEL adapters to a MEMORY CHANNEL hub.

A redundant MEMORY CHANNEL configuration can further improve reliability and availability. In this case, you need a second MEMORY CHANNEL hub, a second MEMORY CHANNEL adapter in each cluster member, and link cables to connect the second MEMORY CHANNEL adapters to the MEMORY CHANNEL hub.

See the TruCluster Software Products Hardware Configuration manual for information on how to configure the MEMORY CHANNEL interconnect in a cluster.


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]

1.4    Using the asemgr Utility

The asemgr utility allows you to administer the available server environment (ASE) and configure and manage services. The asemgr utility has an interactive mode and a command-line interface. If you enter the asemgr command with no options, the utility displays menus and task items and prompts you for information about the task you want to perform.

You can use the command-line interface for the asemgr utility if you want to include the asemgr command in shell scripts. The syntax for the command is as follows:

/usr/sbin/asemgr [options]

The options are as follows:

-d [-h member]|[-v service]|[-l]

Displays the status of all the member systems (-h) and services (-v) or specific member systems and services. Also displays the member systems that are running the logger daemon (-l).

-d [-C [database]]|[-c service]

Displays the contents of the current or specified ASE database (-C [database]) or the contents of the specified service (-c service).

-m service member

Relocates the specified service to the specified member system. When you relocate a service, you stop the service on the member system currently running the service and start the service on another member system.

-r service

Restarts a service.

-s service [member]

Starts the specified service and places it on line, making it available to clients. When the member parameter is specified, the service is started on that member, regardless of the service's current ASP policy.

-x service

Stops the specified service and places it off line, making it unavailable to clients.

Some ASE administrative tasks can lock the ASE. If you try to run the asemgr utility and the ASE is locked, the following message is displayed:

ASE is locked by `hostname`

This message indicates that the task cannot be performed because another member system is running the asemgr utility.


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]