White Paper

RAID and Cluster Solutions and SAN Applications

ABSTRACT
With ever-increasing requirements for continuous streams of information and zero down time, SAN has become the focus of much attention. However, the cost of implementing a SAN can be cost prohibitive. The combination of RAID (Redundant Array of Independent Disks) and/or failover server clustering can make entry into the SAN storage arena much more feasible.

Fibre Channel Arbitrated Loop is promising to be a very powerful standard for the SAN environment. It provides outstanding data transfer rates and supports many devices and extremely long cable lengths. Host-based Fibre Channel RAID offers an affordable solution into the entry-level SAN field, providing interoperability and scalability.

SOME FACTS
With all the recent hype concerning Storage Area Networks, one would think it was the "Second Coming" of storage. However, according to a survey by Enterprise Management Associates reported in Information Week, roughly one half (46%) of the 187 IT professionals queried currently had no plans to implement any kind of SAN. The major reasons cited for this were high implementation costs and difficulty confirming a need for a SAN.

This is understandable considering the average cost of implementing a SAN can run anywhere from $200,000 to over $4 million. This is a sizable investment when the industry itself seems to have no clear-cut, generally accepted definition of what constitutes a SAN.

In addition, interoperability and scalability are very real concerns when deploying a SAN. It is important to find compatible equipment that will not be obsolete in a year or so. If more storage should be needed, however, it is equally important that even obsolete equipment be scalable within existing infrastructure parameters.

The Storage Networking Industry Association defines a SAN rather vaguely as "a network whose primary purpose is the transfer of data between computer systems and storage elements, and among storage elements". In other words, a SAN is like a LAN designed for storage. According to this definition, it could simply be a server on a LAN with a backup device.

SINGLE VERSES DUAL LOOP
That being said, there are ways of setting up basic Fibre Channel storage solutions using existing, interoperable technologies such as RAID (Redundant Array of Independent Disks) and high availability cluster solutions. Other products would also be required like Microsoft Cluster Server for NT and APPTIME's Watchdog for Linux.

RAID is probably the easiest and least expensive to set up. Simply install a single port intelligent FC RAID controller into a server and then connect a hub that has a connection to a disk enclosure. Now attach this setup to a LAN via the server and you have a very basic SAN island. The hub gives this "SAN island" an easy way to scale into a larger SAN environment when resources permit. For example, adding more storage is as simple as adding another FC disk enclosure to the existing hub. The intelligent (I2O) RAID controller should be able to configure this new storage on the fly. A dual port FC controller and another hub (or split hub) can carry this one step further by adding cable redundancy and a failover plan. The dual loop configuration of the controller can also add increased bandwidth.

Single Loop
This topology is ideal for systems where the speed and cable length of FC are important. A simple physical setup would include a single port Fibre Channel intelligent RAID controller located in a host server with FC hard disks in a separate enclosure (figure 1). This setup allows up to 125 hard disks to be connected to the server with one FC controller. Using inexpensive copper cables, the enclosure can be placed up to 30m (90 feet) away from the server. This makes it possible to have the server and enclosure in separate rooms.

Dual Loop
The drawback with single loop topology is the fact that there is only one connection between the server and the hard disks. This connection is not redundant, so if this connection fails, the server can no longer access the data. To avoid this situation, a dual loop configuration can be implemented using a dual port Fibre Channel RAID controller and parallel cables between the two-channel controller and the hard disks (figure 2). If one loop has problems in such a configuration, the system simply routes all I/Os to the other loop. In addition, this configuration also improves the data transfer rate because if both loops are up and running, the bandwidth is doubled to 200 MB/sec, increasing overall performance. In addition, the extra redundancy of the dual loop configuration makes it a good choice for high security, high performance systems. Hard disks connected to the Fibre controller may be configured to form one large RAID 5 array, or to form several smaller arrays depending on specific needs (e.g. RAID 1 for operating system, RAID 5 for critical user data and RAID 0 for high performance/non-redundant needs).

HIGH AVAILABILITY CLUSTERING
Another attractive application of Fibre Channel in the mass storage sector is server clustering. In a small cluster, two servers share one redundant mass storage system. If one server goes down, all resources and tasks are switched over to the remaining one. Since the mass storage system must be accessible to both servers, the mass storage interface must support very high performance and lengthy secure connections. The redundant dual loop topology of FC makes this possible. Using two active loops, there is a bandwidth of 200MB/sec available for accessing the hard disks. Using standard twisted pair copper cables, the distance between the mass storage system and the servers may be as much as 30 meters (roughly 90 feet), which is enough for each of the three systems (two servers and the mass storage) to be in different rooms.

How Clusters Work
A failover server cluster basically works through polling. Each server in the cluster continually polls every other server in the cluster to insure that all servers are still operational. This polling relies on each server not only being connected to the network and to a common mass storage device, but also to each other through some sort of interconnect device, usually a secondary network card (figure 3). This interconnection, sometimes called a private network, is merely a heartbeat connection. For a failover to take place, all three connections get involved. If a failed server is no longer available over the heartbeat connection, the other servers are aware of its absence and poll it again over the LAN. If there is still no reply, the polling server has to take over the failed server's assignment of connecting the users to the common data. One downside to this type of redundancy is that the heartbeat connection requires another cable run. This is usually not a problem if the servers are located in the same room. There are different ways of setting up a high availability cluster server. The simplest way is to begin with a two-node cluster. Storage and servers can be added as needed or as budget permits. Single channel FC RAID controllers can be used in such a simple configuration.

Utilizing Fibre Channel technology, two dual channel FC-AL controllers can be used along with redundant Fibre hubs in a dual loop (redundant cable) configuration. In this situation, the Fibre hubs allow you to disconnect one node from the cluster and still maintain a certain level of redundancy. For maximum security, the same configuration could use redundant RAID enclosures (figure 4). HOST-BASED RAID
It is commonly thought that implementing such solutions in the SAN environment requires that external RAID controllers be used with the storage device itself and then routed back to the server through high-speed switches via a host bus adapter. While this is certainly a viable solution, it is clearly not the only one.

By off loading the RAID control from the storage device and mounting it to the server, throughput increases because the bottleneck of having all data passing through one single point in the SAN is eliminated. Using the dual port RAID controller in such a scenario not only increases bandwidth, but also eliminates a single point of failure. Certainly more servers can be added to the mix, but it is much simpler and more cost effective to begin with the dual port controller, because it is like having two servers in one box. Dual ports also increase the bandwidth at the server and allows less expensive hubs to be used instead of switches.

These so called SAN islands can be configured together into a larger SAN environment (figure 5) through Fibre Channel hubs or even switch technology, if segmentation is needed.

SUMMARY
While the estimated cost of implementing a SAN has scared off many IT professionals, there are creative ways of using less expensive, intelligent host based, Fibre Channel Arbitrated Loop RAID technology to create safe and effective SAN islands. The scalability of basic RAID and failover cluster solutions makes them a perfect choice for entry-level answers to growing storage needs. Frank W. Poole, ICP vortex Corporation