White Paper

The Case For Storage Virtualization Using Intelligent Routers

Abstract

What is storage virtualization? What value does it hold for IT operations? Does the virtualization of storage offer a new way of organizing and managing the relationship among hosts, storage devices and a network? Where and how should it be deployed?

This white paper suggests that storage virtualization represents a major advance in IT architecture. The paper explores the opportunities, challenges and limits of storage virtualization, both as conceptual architecture and as a real technology to be implemented. The discussion includes a conceptual definition of storage virtualization and it suggests that from a deployment standpoint, network based virtualization provides flexibility and heterogeneity. The paper also proposes that a scaleable, router-based implementation offers several attractive benefits over alternative storage virtualization choices.

Introduction

For readers familiar with mainframe operations, storage virtualization should not be an entirely new concept. The idea of pooling storage was developed by IBM as part of its MVS operating system well over ten years ago. Called "System Managed Storage," the concept involved both networking storage devices and enabling the operating system to view all attached storage devices of a similar type as a common pool.

Mid-range and open systems, however, tightly coupled storage to the computer and, in some cases, to the operating system. Adding, changing or rearranging volumes or files with tools other than those provided within the operating system could cause havoc.

Storage networks and especially Fibre Channel based Storage Area Networks (SANs) took the first step in decoupling storage from the computer. SANs allowed storage devices to break the distance limits imposed by the SCSI channel protocol. Devices could be co-located in a data center. Communication between servers and the storage devices was over high-capacity fiber. Fibre Channel loops, hubs, routers and switches created a block network; performance exceeded SCSI and continues to increase, sometimes outpacing the ability of storage systems to read or write the data to disk.

Nevertheless, storage devices remained "tethered" to their respective operating systems. While some RAID systems offered the possibility of placing volumes from different O/S's in one cabinet, these often resulted in vendor lock-in. Using physical storage resources as an independent utility remained a challenge.

Storage virtualization addresses this challenge. By abstracting storage from its physical limitations-much in the same way that a PC operates virtual memory-storage can be added, reassigned, migrated and reconfigured without affecting the host application. Instead of the host server managing the placement of data volumes, a virtualization system takes over, representing itself to the host operating system as a physical device. Beneath the virtualization system, volumes are spread across multiple disks, moved and migrated without the host's knowledge. Pools of storage are accessed and managed by an organization's policies rather than an operating system.

Storage virtualization can be deployed in a variety of ways. Users should consider their options carefully as the choice of implementation architecture will have lasting effects on their operations.

The Case for Storage Virtualization Using Intelligent Routers

"The new economy is driven by ideas, not by demand."
George Gilder

Fundamentals
Understanding and implementing storage virtualization depends on where one starts. Storage, server and network vendors tend to propose definitions based on their respective views of how storage virtualization should be implemented.

In discussing storage virtualization and the different schools of thought about its meaning will lead to a better understanding about what it is, what it does and how it can be implemented.

The word "virtual" means, "being such in essence or effect, though not in actual fact" such as "virtual reality" being a technology based representation or depiction of things, people and actions that aren't (and may never have been) real. In storage virtualization, "virtualization" is the process by which physical storage devices are made virtual.

Recently, The Aberdeen Group and The Evaluator Group, Inc., jointly published a paper with perhaps the most succinct definition of storage virtualization thus far:

Virtualization separates the representation of storage to the server operating system from actual physical storage.

The article's discussion of the definition concludes with: "Thus, in short, virtualization is the abstraction of storage."

As a conceptual definition, this is entirely adequate. However, while each of the key elements needed for storage virtualization is mentioned (representation of storage, server operating system, physical storage), one element is omitted from the definition: a network.

Figure 1: In virtualized storage, the host servers "think" they are connected to physical storage devices; however, the virtualization process manages the placement of volumes on the physical devices.

Both conceptually and practically, a network that is external to and independent of the internal I/O of storage devices and servers is an essential part of storage virtualization. A complete definition of storage virtualization should not overlook reference to an external network as an essential component. Therefore, Crossroads would slightly amend the above definition to read:

Virtualization separates the representation of storage to the server operating system from actual physical storage connected over an external network.

In Crossroads' view, the Aberdeen/Evaluator definition, with the added inclusion of a reference to networks, gets to the heart of what storage virtualization is all about: a uniform view of storage-a single, common disk available to and accessible across a network by servers without modification to operating systems or applications.

What Storage Virtualization Is Not
There are distinctions between storage virtualization and other storage technologies: Storage Area Networks (SANs), Redundant Arrays of Independent Disks (RAID) and Network Attached Storage (NAS).

SANs

From time to time the term "SAN Virtualization" is used in place of storage virtualization. This can lead to confusion of storage virtualization with the storage network.

SANs physically decouple storage from servers and place that storage on a network. However, without virtualization, the SAN attached servers-even those running the same operating system-are assigned to and manage specific storage resources. Storage can be grouped (or pooled) into zones (through LUN zoning, masking and management); but the storage devices are still pre-assigned by the storage administrator to a specific server. By themselves, SANs do not do virtualization.

RAID

RAID aggregate disks into a logical pool by stripping data across several disk drives. Depending on the RAID level, data is either striped via static maps or mirrored to a second set of disks.

RAID is virtualization-but at the disk level. It is not storage virtualization at a storage system level involving heterogeneous disks, RAID devices from multiple vendors or combinations of storage media (e.g., RAID and JBOD).

Network Attached Storage (NAS)

NAS systems are file sharing systems that work from a common storage pool. They provide an effective means to share files among host servers, in some cases even across operating systems (e.g., UNIX and Microsoft NT/2000). However, while NAS systems aggregate storage not all NAS systems virtualize the storage.

Why Storage Virtualization is Important

With the conceptual definition of storage virtualization as the separation and representation to the host of networked storage devices, what makes it something that should command attention and investment? What set of problems does it address and what is its corresponding value proposition?

Storage virtualization is important because it is a remedy for storage growth. Storage growth is occurring as a response to data proliferation and by organizational and business expansion. As storage devices are added, the complexities involved in arranging, connecting and managing them as well as the data within them increase.

Of course, inherent in managing storage is the critical requirement that all operations avoid adverse impacts on the availability of data to the business. The challenge of arranging, connecting and managing storage would be easier if the world had evolved with one operating system. IBM (and others) solved the problems of effective storage management in the mainframe arena long ago. But, large IT operations have an average of four different operating systems, each responsible for some part of the business and each with its own unique requirements in reading and writing to storage.

Storage Virtualization's Value Propositions
In this environment, the cost of adding storage capacity is often less than the cost of manpower and time to manage it. For example, customers of one major computer manufacturer estimate that 32% of server downtime was storage related-adding capacity, upgrading devices, etc. Of course, when storage devices are being reconfigured and the server is down, the applications are down and with applications down, users cannot conduct business. Missing has been the means to reconfigure storage without affecting host operations; the notion of storage as a utility.

Storage networks address part of the storage management challenge. They provide fast (high bandwidth) connections and switching between hosts and storage devices. But, networks themselves are relatively unintelligent in the area of storage management or administration. Fibre channel fabrics (SANs) have lacked the software to manage the organization of data across and among attached storage devices. Storage virtualization provides the missing piece.

This leads to storage virtualization's most powerful value proposition:

Storage virtualization permits the non-disruptive management and administration of storage.

This benefit is the core reason that storage virtualization is being adopted.

However, storage virtualization also has another important and so far overlooked benefit: it allows IT planners to manage a changing infrastructure of host systems and applications without disrupting storage operations.

This is the corresponding benefit to non-disruptive storage management. However fast data may be collected and storage expanded, the use of data by employees, over Internets, Intranets and Extranets is the central issue and to accommodate this growth, server capacity is also expanding.

Storage and hosts need to be reconfigured, balanced and expanded. Storage virtualization offers more than managing storage; it helps IT managers adjust to processing demands while keeping storage intact. It allows hosts to be managed (capacity added, applications upgraded or migrated, etc.) independently of the storage to which they would otherwise be directly attached.

With virtualization, neither users nor host applications care about the location or the type of devices on which data is stored. Similarly, neither the data nor the storage devices care about the host operating systems or the demands of ravenous users. Hosts and disks have become "virtual" to each other.

A Caveat
Just as hosts are configured and optimized for specific applications, storage is as well. Storage devices, and especially disks, have different characteristics developed and optimized for specific types of data. For example, some disks are optimized for transactional operations (such as financial applications); others are optimized for data streaming (video and rich media applications), while yet others may be optimized for complex query processing (data warehousing). There is not a "one size fits all" approach to any aspects of IT.

Implementation Approaches
Storage virtualization approaches can be grouped into three generalized categories:
  1. Host-based - virtualization software resides on the application server (or host);
  2. Storage-based - the storage devices provide virtualization functions, usually in the controller;
  3. Network-centric - the virtualization is done by a device that is part of the storage network.

Host-based virtualization
In a host-based architecture (Figure 2 on the next page), storage virtualization software resides on the application server (host). The virtualization software causes the host's operating system to behave as if it were in direct communication with a storage device.

Figure 2: Host-based virtualization.

This approach works best if all the hosts are the same. Centralized management of the system is critical to preclude an action being taken by one host that could affect the integrity of all the storage connected through the SAN.

Storage-based virtualization
Somewhat like host-based virtualization, storage-based virtualization works best in a uniform environment, often where the storage is provided from a single vendor. From the standpoints of simplicity and performance, this approach may be the most optimal, given a vendor's control over the development and tuning of the system.

Figure 3: Storage virtualization at the RAID level.

Of course, with the benefits of simplicity and performance may come the limitation of flexibility and price. The trade-offs between these benefits and vendor lock-in must be weighed carefully.

Network-centric virtualization
By far, the most prevalent approaches to deploying storage virtualization are centered in and around the SAN itself. Within a network-centric approach, there are at least three possible architectures:

  • Appliance-based - using a "commodity" server to provide the virtualization intelligence;
  • Switch-based - placing the virtualization function within an intelligent switch;
  • Router-based - adding the virtualization intelligence to protocol translating and data path routing devices.

Overlaying these three approaches is the distinction between symmetric and asymmetric architectures.

Symmetric

A symmetric architecture is defined as one in which the metadata and maps are in the data path. Requests for data by the host operating are made directly to the storage virtualization platform, which satisfies the requests by accessing the data located on the physical disks.

The hosts, storage devices and platform all may be connected directly to a SAN fabric through a switch (the illustration shows storage and virtualization platform connected to the SAN). However, the important point is that data movement and metadata access are concentrated through the virtualization platform.

By definition, symmetric architectures are always in-band, meaning that the same network is used for both data movement and metadata access.

Figure 4: Data and metadata access flows in a symmetric virtualization system.

This architecture can be an easy to implement. Some vendors offer a domain controller or a "SAN in a Box" and may include switching and virtualization on one platform.

Symmetric Example - Domain Controller

The domain controller provides a common point of entry and exit to the network. Often containing a fabric switch, this architecture requires that all connections, data and command, be made to it.

Figure 5: An example of symmetric virtualization using a domain controller.

The concerns about this approach have to do with the virtualization platform becoming an operational bottleneck. There are also questions about the ability to economically scale performance as more hosts and storage are added. Some vendors add large cache memory to help with this.

Asymmetric

An asymmetric architecture removes the metadata from the path between host and storage. In so doing, it separates the data path from the path used to access the metadata and maps. Often, this metadata path is over an out-of-band connection (e.g., Ethernet) between the hosts and the metadata server.

However, an asymmetric architecture is not required to use out-of-band connections. Accessing the metadata server could occur through the SAN. The distinction in this case between symmetric and asymmetric architectures is the location of the metadata. If it is in the path of the data, it is a symmetric model. If it is not, it is asymmetric.

Virtualization is achieved by placing a software agent within each host. The agent has the maps and addresses of the storage assigned to it. When it needs to access a data volume on storage device to which it is not mapped, it queries the metadata server for the address and directs the data accordingly. The metadata server retains a global view of the storage pool and which server is assigned to which devices.

The asymmetric model attempts to address the issues of performance and scalability at the expense of complexity.

Figure 6: An asymmetric storage virtualization architecture using an out-of-band network to access the metadata server.

Asymmetric Example - Metadata Server

There are several companies with asymmetric storage virtualization products on the market. In this model, as discussed above, metadata sit on a Metadata Server and the data flows directly between the host and pooled storage devices.

Figure 7: Using a Metadata Server for asymmetric storage virtualization. The out-of-band connections between the metadata server and hosts involve a specialized HBA where a virtualization software agent may be resident.

An agent is required on each server to redirect the I/O. Depending on vendor, the agent either redirects the request to the Metadata Sever or, as in the above illustration, the agent resides on an HBA or other devices within the host.

Hybrid

As can be inferred, at an architectural level, the dividing issue between symmetric and asymmetric approaches is the extension of the command path across multiple hosts. Both implementations rely on some amount of metadata or volume maps being within the data path. In one instance, they are centralized on a single device; in the other they are parsed and distributed to each server. In a hybrid approach, the virtualization platform remains in the data path.

Figure 8: A hybrid storage virtualization architecture. Metadata and routing maps are located on both of the virtualization platforms.

Hybrid Example - Router-based Virtualization

Routers are specialized, intelligent devices, designed from a network perspective and therefore optimized for I/O operations. Routers bridge and translate protocols and intelligently route data over different data paths. With an only slight increase in CPU and memory, routers can provide a cost-effective, scaleable convergence of the best features of both the symmetric and asymmetric models.

Although data and commands share a common path, the unique capabilities of routers offer significant advantages for storage virtualization.

First, each host has access to all metadata maps. This can make scaling easier as all hosts are assured of finding a connection to all physical storage devices. A Metadata Server is no longer the single repository of knowledge about the system (metadata and maps); all virtualization routers share the knowledge.

Second, fail-over and recovery options for the metadata can be made more flexible. Connected either in-band via the SAN (or optionally via Ethernet), each router need only send "heartbeat" messages and periodic updates on its activities to the others. When new storage devices are added to the SAN, their presence is made know immediately to each router.

In addition, protocol bridging and translation can facilitate the use of legacy storage devices to make them part of a storage virtualization pool. Migration from direct attached storage to networked and virtualized storage is made easier-with the additional benefit of including the legacy devices as resources within the virtualized storage pool.

Finally, the router-based approach offers the possibility of bringing not only SAN-connected hosts but hosts connected via Ethernet into the storage virtualization pool. There is also the potential to include hosts with proprietary channel protocols (e.g., ESCON) in the virtual storage pool.

Figure 9: Hybrid storage virtualization using intelligent routers.

With its protocol translation and intelligent routing features, router-based storage virtualization enables virtualization to be implemented and managed over high-speed network other than Fibre Channel. Connections to remote storage could be made through SAN-to-SAN over ATM, IP, or other WAN or MAN technologies/protocols.

Storage management, the need to easily create and assign storage pools based on policies, is an important aspect of storage virtualization. This Management function is practical with a Router-based solution.

Additional Functions
Whatever method is used to deploy storage virtualization, there are data volume and storage functions that are often included in or as options for virtualization software. These include:

  • Data mirroring and replication both locally and remotely;
  • Data snap shots allowing point-in-time replication of online data;
  • LAN- and server-free data backup.

With the addition of these applications, router-based storage virtualization moves from being a feature of storage networks to the central position within the network.

Virtualization's Future

Storage virtualization's value proposition leads to a second, more profound, and long-term reason as to why it is one of the most important developments in IT. Over ten years ago consultant and futurist Stan Davis anticipated we are entering an era in which information is available at any time, in any place and in any form. Davis's world is quickly becoming a reality. We see examples every day how email can be read and composed on notebook computer, PDA or cell phone and transmitted to anyone via an office LAN, a cell phone or the seatback air-phone during a flight. . The method employed to move data is based on which network is more conveniently available at the time. The information is available wherever and whenever the user wants it and in a form that adapts to the media.

The continuous availability of information, its near-universal accessibility and its ready adaptability are the central paradigm shifts of 21st Century IT. From a technical standpoint, they are driven by the proliferation of high-speed networks. This is the critical technology available to move the paradigm forward.

Bandwidth and connectivity are fast reaching infinite capacity and becoming almost free. The speed of I/O operations within a computer is being matched and surpassed by external networks. These networks, including Fibre Channel, Gigabit Ethernet and 10 Gigabit Ethernet, are part of and fuel for the continued march toward the "hollowed out computer" predicted over five years ago by Eric Schmidt, now CEO of Novell. Data storage and application processing will all be virtualized-relocated to network oriented "appliances" that perform specialized functions.

Of course, storage virtualization is only one element in this paradigm shift. Soon, virtualization technology will allow storage and data to be seen as a single, globally networked asset. Ultimately, storage virtualization will play a key role in enabling a "data-tone"-a network-based utility that people will use to connect to data and to each other into any time, any place and in any form.

Summary

Rather than just a new feature for storage networks or a vendor-driven technology, Storage Virtualization represents a major reordering of the IT infrastructure.

The notion of abstracting storage from the physical devices moves storage closer to being a utility. Where SANs have physically decoupled storage devices from host servers, virtualization breaks the operating system dependencies. Storage takes over the central role in the data center once held by the computer.

Importantly, storage virtualization not only improves the management of storage but potentially eases the management of host servers, as well.

Implementation and deployment strategies vary from host-based to network-centric. While each has benefits and trade-offs, an emerging consensus seems to be building for a network-centric approach. And, here, two models, symmetric and asymmetric, have emerged relating to the degree that metadata exists within the data path.

From Crossroads' vantage point, a network-centric, router-based deployment of storage virtualization offers several benefits. Scalability, attachment of legacy storage devices, protocol transparency and superior performance are just a few.

Credits

This white paper was written by Peter LaPorte, Director of Product Marketing, Crossroads Systems and Manager of Crossroads Oregon. Contributors included:

Crossroads Systems: Matt Carr, Director of Strategy and Planning; and, Burke Chess, Manager, S/390 Software Engineering

The Aberdeen Group: Dave Hill and Dan Tanner

Peter LaPorte, Director of Product Marketing, Crossroads Systems