
Exploring High Availability Issues with BEA Tuxedo and Third Party High Availability Software
Improving Tuxedo by Combining it with a High Availability Package
The purpose of this paper is to discuss the shortcoming of Tuxedo in the area of High Availability, and how it can be improved when combined with a High Availability package. This paper briefly touches the HACMP package from IBM, which is specific for AIX. Similar High Availability packages from other vendors should work just as well.
Return to Top of Page
Short Description of Tuxedo
BEA Tuxedo is a software bundle that allows integrators and developers to rapidly develop robust client0server applications.
Tuxedo provides application programmers an interface called ATMI that implements most of the code necessary for clients and servers to communicate. Application programmers divide the business logic amongst Tuxedo controlled programs or processes called servers. Other programs, called clients, can independently call functions within these servers that implement specific tasks. These tasks are called services.
An instance of a Tuxedo system is often called an application. Tuxedo provides a runtime graphical administration tool to configure and monitor applications. Tuxedo applications can span multiple machines, even those with very different internal architecture. Logical applications or domains can be configured to facilitate inter-application communications.
Return to Top of Page
High Availability Cluster Multi-Processing (HACMP) for AIX is a high availability application that can link up over a dozen RS06000 nodes into highly available clusters.
Clustering servers or nodes enables parallel access to their resources, which can eliminate down time in a fault situation. Clustering is one step in providing the redundancy and fault resiliency required for business critical applications. Another step is closely monitoring the system for all types of failures, and executing takeover scripts. HACMP attempts to remove single point of failures, and address all fault situations. The HACMP package includes user-friendly GUI tools to help install, configure, and manage the clusters.
Return to Top of Page
Opposites Attract
Tuxedo and HACMP makes a good team because they complement each other. Even though both products are software, each emphasizes a different aspect of high availability (HA). Tuxedo monitors server processes as well as network accessibility.
Tuxedo restarts dead server processes and partitions unavailable nodes from a multi-node application. When multiple networks are defined for a node, Tuxedo will attempt communication through each network before labeling the node as partitioned and unreachable. These features are administered through the Tuxedo configuration file or the GUI administrative interface. The administrator does not require root privileges to configure these features. On the other hand, HACMP monitors systems resources at a much lower level. HACMP monitors clusters for system failures. When a node fails within a cluster an alternate node can takeover its role. The failed node network address can be made available on the alternate node. If disk resources were not shared between the failed and the alternate node, they can be logically moved to the alternate node. The takeover scripts that are executed when a monitored event occurs often require root privileges. As you can see, HACMP definitely operates at a lower level of the scale.
Return to Top of Page
There are several ways to configure Tuxedo. The most basic is through the ASCII text file, commonly referred as the UBBCONFIG file.
The UBBCONFIG contains several sections. The complexity of the file depends on the type of application being configured (single processor or multiple processor). The text file is compiled into a binary format for syntax checking and to allow for runtime administration. This method of configuration requires an inactive application. If inter-application communication will be performed, a domains configuration file commonly referred as the DMCONFIG, also needs to be configured. This text file is also compiled into a binary format. A second method of configuring Tuxedo is to use the non-graphical interface tool called tmconfig. A third method of configuring Tuxedo is to use the web graphical user administrative interface. And last,
the MIB (Management Information Base) interface can be used to dynamically reconfigure Tuxedo. Unlike the other methods, the MIB requires an active Tuxedo system to reconfigure the application.
The most difficult part of configuring Tuxedo in a HACMP environment is to decide to what extent will Tuxedo submit to HACMP, visa-versa. Some of the features that Tuxedo and HACMP provide complements each other, and while others, will hamper their collaboration. A balance should be achieved so that no system delves too deeply into each other’s territory and specialty. A successful implementation is a configuration that consistently failover and failback with very little or no human intervention at all.
Return to Top of Page
The BEA Systems, Inc. web site at www.beasys.com, contains a wealth of information on the BEA Tuxedo middleware.
The IBM web site at www.austin.ibm.com/software/Apps/hacmp contains some information on their HACMP software.
Return to Top of Page
Tuxedo, as matured as it is, does not provide a subsystem that exclusively deals with high availability. Over the years there have been many enhancements made, but none of them fully addresses all of the issues. Barring operating system and hardware failures, Tuxedo does have a robust architecture that keeps its processes stable and highly available.
However, it falls short on providing a concerted effort to recover from hard failures with little or no human intervention.
Tuxedo does have a number of features that attempt to address some of the HA issues. In a multi-processor (MP) application, an alternate machine can be specified as the Backup-Master. If the Master becomes unavailable, the Backup-Master can take the role of the Master, which is to monitor critical Tuxedo system resources (i.e. Bulletin Board Liaison).
Tuxedo also allows for specification of a primary and an alternate machine location for it servers. During the boot procedure, if a server group fails to boot due to a system resource (i.e. non-existent queue device), Tuxedo dynamically reconfigures the server’s group to boot on the alternate machine. Once a server’s group has been started, Tuxedo does not perform any automatic reconfiguration. Any migration from the primary to and from the alternate location needs to be done explicitly by the administrator. If a node or a machine becomes unreachable, it is simply partitioned from the application. Partitioned simply means that the node is no longer part of the application and that all of the services offered by that node will no longer be handled. If the partition was caused by a transient network outage, Tuxedo may be able to reconnect when it issues its periodic reconnect.
Tuxedo provides a feature that can automatically spawn and decay server processes based on the number of outstanding messages or their total load. This alleviates the administrator from having to constantly monitor the application just to start and stop Tuxedo servers.
As one can clearly see, Tuxedo has many features that address high availability. Not discussed here, Tuxedo also supplies a SNMP agent for controlled software that uses that protocol.
Return to Top of Page
There have been many enhancements in the 6.4 Tuxedo release; however, the following points are ones that affect high availability.
Improved performance of Domains gateway processes.
Better performance between domains allows more flexible applications to be designed. Related subsystems can be placed in the same domain, and, unrelated subsystems can be placed in a different domain. If they need to communicate, they can, very effectively. A failure in one domain does not necessarily affect an unrelated domain. Thus, the ability to modularize applications while maintaining performance greatly improves high availability.
Improved bridge processing using multiple network addresses.
In a multiple processor (MP) application, multiple network addresses can be specified for a node. Once a Tuxedo application has booted all configured nodes, the only available communication between the nodes is through the bridge processes. In pre-6.4 releases, the bridge process running on each node handled only one network address. In the 6.4 release, a bridge can utilize multiple network addresses. In addition, the addresses are utilized in order of priority. That is, if multiple addresses are specified for a node, the highest available priority address will be used. If a lower priority address is being used and a higher priority becomes available, the higher priority address will now be used. If the current priority has more than one address, concurrent addresses will be used whenever the load exceeds a pre-configured value.
The ability to specify multiple network addresses lessens the likelihood that an application will become partitioned due to a network outage. It also alleviates the possible bridge bottleneck that can occur between nodes that communicate intensively.
Improved system availability through a configurable mechanism for detecting lost connections between /Workstation clients and servers.
In pre-6.4 releases, the loss of a network connection sometimes caused the client application to hang indefinitely. In the 6.4 release, you can configure the amount of time a client will wait for a response before timing out. When the client times out, additional information is provided so that the appropriate action can be taken.
Additionally, a keep-alive option has been added to the Workstation extension. This feature improves detection of network failures between clients and servers. This option is only available on platforms that Tuxedo uses socket TCP/IP.
Improved handling of runaway processes and reporting of service time out errors.
The Tuxedo administrator can now set a maximum time for a service to process a request. This time should be set from 2 to 3 times the amount of time a normal request takes to process. Any service that exceeds its maximum processing time will be deemed as a runaway process, and will be terminated by Tuxedo.
Application programmers can now get additional information when service errors occurs so that they can programmatically take action. Because several types of errors can cause service timeout, the improved error reporting will greatly simplify error handling.
All of these improvements aid in providing a more stable application environment, which is directly proportional to high availability.
Return to Top of Page
Usage of Multiple Bridges
There are often unavoidable situations in which a Tuxedo application must span more than one physical machine. Additionally, logical units of work may require services from multiple nodes.
The physical separation may be due to reasons such as design requirements, integration issues, or hardware limitations. Whatever may be the reason, a bridge’s queue can quickly become saturated if servers are queuing messages faster than it can route messages to the remote node. A solution to this problem could be either allowing the specification of multiple bridge processes per node or allowing a single bridge process to multiplex. The later is the approach taken in the 6.4 release. The bridge process can now utilize multiple endpoints to route information to a remote node.
Return to Top of Page
As was alluded in the previous section, the BEA Tuxedo 6.4 release allows the specification of multiple network addresses. These addresses may all be on the same network or different networks. It is not necessary to have more than one physical network to specify multiple addresses.
However, if different networks are specified for a node, multiple physical network addresses must exist for the given node. When multiple network addresses are specified for a node, each address behaves as a logical bridge. Tuxedo per se does not create a separate bridge process for each address. Configuring two or more addresses with the same network priority specifies parallel network addresses.
Parallel network addresses can alleviate bottlenecks at the bridge on applications that are bridge network intensive. When the amount of data flowing across a logical bridge becomes heavy, the bridge is often unable to route data quickly enough to keep up. This can cause a backlog of data waiting to be sent. When the number of bytes of backlog data exceeds a limit, the bridge will begin using another address on the parallel network. Because of the overhead of maintaining parallel networks, they are used only when network activity on the bridge is heavy. Tuxedo provides a configurable parameter (MAXPENDINGBYTES) that determines when parallel networks should be used.
Multiple physical network addresses are a powerful way of maintaining high availability. When one network goes down, the system automatically switches to the next available network. When the higher priority network returns, the system automatically switches back.
Return to Top of Page
Multiple bridges are configured by specifying two or more network addresses (NADDR) for a logical machine identifier (LMID) in the NETWORKS section of the Tuxedo configuration file.
A network listener address (NLSADDR) may be specified for each NADDR entry, however the system defaults to the entry with the DEFAULTNET network group. Additionally, a network group (NETGROUP) parameter must be specified for each NADDR entry to denote the physical network and the priority of the address in the network. Addresses with the same network group specify parallel networks. Every machine in the configuration must have one entry in the DEFAULTNET network group.
The network groups are managed under the NETGROUPS section of the Tuxedo configuration file. A network group number (NETGRPNO) and a network priority number (NETPRIO) is specified for each network group entry. The DEFAULTNET network group entry is required and must have a NETGRPNO equal to zero. The NETPRIO of the DEFAULTNET network group defaults to 100 and may be changed. The NETPRIO defaults to 100 if it is not specified for any network group. Once the NETGROUPS and the NETWORKS sections have been configured, Tuxedo completely manages the networks to detect outages and to route data over backup and parallel networks.
Return to Top of Page
Description of Dynamic Reconfiguration
Dynamic reconfiguration involves making changes that affect the behavior of a running system. The changes are classified as dynamic if the system is active while the changes are made.
The ability to make dynamic changes is essential in a business critical application. Tuxedo provides many ways of dynamically changing the system. In fact, there are three ways of reconfiguring the system, the tmconfig utility tool, the web GUI interface, and the MIB application interface.
Dynamic reconfiguration is often needed in a failover and failback situation. In a failover situation, the administrator has to reconfigure the application so those Tuxedo servers on a failed node will be started on an alternate node. A similar situation occurs for the failback. Tuxedo administrators often try to automate this task by writing scripts that calls the tmconfig interactive tool.
Return to Top of Page
Like the GUI interface, the tmconfig tool is fine for interactive usage; however, it is not ideally suited for scripting.
The tmconfig tool requires an editor such as the UNIX ed or vi editor to perform updates. As of yet, I have not found a way of writing a script in which tmconfig successfully called the ed or vi editor. Since tmconfig successfully calls the ed and vi editors when issued outside of a here-to-document script (a script in which standard input is redirected), it appears to be a UNIX issue in which too many level of redirections are occurring.
Return to Top of Page
The MIB interface allows integrators and administrators to have total control over Tuxedo applications. The interface is more involved than the tmconfig interface, in that, it requires programming to implement the administrative code.
However, it is currently the only method that provides the flexibility needed to fully automate most administrative tasks. The MIB interface is powerful because it is implemented with the same APIs that Tuxedo developers use to write business critical client/server applications.
Tuxedo defines a MIB interface to administer each of its components and extensions. There are interfaces to administer the access control list, disk-based queues, events, core Tuxedo, and the workstation extension. The following are the corresponding MIB component names: ACL_MIB, APPQ_MIB, EVENT_MIB, TM_MIB, and WS_MIB. Through the MIB interface, administrators control the application by programmatically querying the Tuxedo bulletin board (BB) for the current state of MIB objects, and then effecting administrative changes by either setting and resetting specific MIB values or creating new MIB objects.
This level of control may seem excessive; however, it may come in handy in a failover and fallback situation. Since the success and failure of administrative changes made through the MIB interface is unambiguous, error handling is straightforward and less tortuous. The MIB programming interface is the only way to handle all the possible complications that can occur in a failover situation. During a failover, scripts can be used to execute client MIB programs that perform specifics task such as shutting down and migrating server groups, and verifying the state of the application. A controlled program such as HACMP can execute these scripts when triggered, to provide a completely automated high availability solution.
Return to Top of Page
BEA Tuxedo and a High Availability package makes for a powerful combination. The High Availability option insures an almost 24 by 7 operation of business critical applications. The failover and takeover scripts greatly simplifies the role of the Tuxedo administrator. If the Tuxedo and HA software are configured optimally, all users will have a more pleasing and less eventful experience.
Return to Top of Page
BEA Tuxedo 6.4 Online Documentation
BEA Tuxedo Release 6.4 Addendum
BEA Tuxedo 6.4 Release Notes
IBM HACMP For AIX 4.2.1 Documentation (www.austin.ibm.com)
~end of Aurora Information Systems White Paper Series #3
"Exploring High Availability Issues with BEA Tuxedo and Third Party High Availability Software"~
Return to Top of Page
Do you find this information interesting?
Consider employment with Aurora!
|
|
Do you need help implementing your Middleware app? Aurora can make your life easier.
|
|