White Paper

Extending SRM with Active Archiving to Manage the Data Life Cycle

"Extending SRM with Active Archiving to Manage the Data Life Cycle"
Submitted by Jim Lee, Vice President of Product Marketing, Princeton Softech

Data is at the heart of every organization, and as companies accumulate increasing volumes of data, managing and storing enterprise data is fast becoming one of the most critical IT challenges. IT executives are looking for more cost-effective ways to improve data and storage management, while reducing costs to maximize their storage investment. As a result, Storage Resource Management (SRM) is rapidly taking a lead market position in helping companies meet these goals. As the amount and value of corporate data increases, as computing environments become more complex, and as storage management costs skyrocket, the importance of SRM is increasing exponentially.

The data explosion has resulted in increasing enterprise storage requirements, creating the need for companies to deploy a variety of storage technologies including storage area networks (SANs), networked attached storage (NAS), hierarchical storage management (HSM) and direct attached storage. Considering the breadth of enterprise storage, a comprehensive SRM solution is needed to provide a global view of an organization's data and storage resources, including monitoring the status, managing storage resources cost-effectively, ensuring availability and supporting future growth.

Today, active archiving is recognized as a proven and cost-effective strategy for managing fast growing complex relational databases by controlling excessive database growth for the long-term. Active archiving works within the framework of various storage technologies and SRM to offer a 'best practices' approach for managing storage resources and reducing operational costs. Combining active archiving with SRM provides organizations with the ability to best meet the challenge of managing increasing data volumes effectively. The right combination and deployment of these technologies ensures that organizations can meet their data management, data retention and storage requirements at the lowest cost.

Managing Enterprise Data throughout its Life Cycle
Appropriate data and storage management requires the realization that data has a life cycle. Typically, the data life cycle begins with a business need, initially acquiring data and subsequently referencing that data on a regular basis during day-to-day business operations. Over time this data loses its vitality and is accessed less often, gradually losing its business value, and finally ending with its disposal. However, through most of the life cycle, this data is retained online.

This simple, but critical principle that all data moves through life cycle stages is the key to improving data management. By understanding how the data is used and how long it must be retained, companies can develop a strategy to map usage patterns to the optimal storage media, minimizing the total cost of storing the data over its life cycle.

The same principles apply when the data is stored in a relational database; however, the challenge of managing and storing relational data is compounded because of the complexities inherent in the data relationships. Relational databases are a major consumer of storage and are also among the most difficult to manage because they are accessed on a regular basis. Without the ability to manage relational data effectively, relative to its use and storage requirements, runaway database growth will result in increased operational costs, poor performance and limited availability for the applications that rely on these databases. The ideal solution is to manage data stored in relational databases as part of an overall enterprise SRM solution.

Impact of Relational Database Growth
Accelerating database growth across industries and applications, combined with the dramatic increase in graphic, audio and video media have created a growing demand for better ways to manage the data and for more efficient and less expensive storage solutions throughout the data life cycle. Within the data storage marketplace, the most difficult challenge is managing the growth of relational databases that drive mission-critical applications - the backbone for today's corporate decision-making and competitive advantage.

The impact of database growth extends well beyond increasing storage costs and is critical to business continuity and disaster recovery plans. Larger databases take significantly more time to rebuild and restore. In addition, overloaded relational databases also degrade performance and limit the availability of critical applications. Database tuning and expensive hardware, software, and storage upgrades offer diminishing returns. Lastly, to comply with data retention policies, companies retain much of their historical data online for audit and legal reasons, even though much of it is rarely accessed. Storing data online degrades the performance and limits the availability of critical applications, not to mention increasing operational and storage costs.

Even though managing the data life cycle is critical to the enterprise, few standards exist today to assist companies in formulating and implementing long-term data retention strategies. Based on regulatory and legislative requirements, IT organizations must develop a plan for managing enterprise data in a complex relational database environment. So, how can companies implement the best methodology for managing this critical data throughout its life cycle?

Active Archiving is Key for Managing the Data Life Cycle
Active archiving is essential for managing the data life cycle efficiently, complying with data retention requirements, and reducing costs. Active archiving is the only way that rarely accessed data can be safely archived and removed from an online relational database and transitioned to other storage media, while retaining easy access to the archived data in its business context. However, before developing an active archiving strategy, an organization must first identify all the types of enterprise data to ensure a comprehensive understanding of what the data is and how it is used, and to identify the data retention and appropriate storage requirements. Typical corporate data includes all transactional data from enterprise business applications and the associated databases, such as payroll, customer information systems, and purchasing systems.

This analysis ensures the best mix of what data must remain online and what data should be archived to ensure a cost-effective balance throughout the data life cycle. This process also ensures that enterprise application databases are maintained at a manageable size that improves performance and availability of critical systems. The goal of effective data life cycle management is to keep historical data as long as required, but not any longer. Consider this approach "Just-in-Time" data accessibility. Data retained past its retention requirement can increase costs associated with storing and managing the data after it is no longer needed.

What is Active Archiving?
Going well beyond the traditional definition of archiving, active archiving is a proven technology that safely archives and removes precise subsets of rarely used data from complex relational databases with 100 percent accuracy. Companies can store archived data and keep it "active" for easy access when needed. The referential integrity and business context is preserved. Users may even access and restore archived selectively and referentially intact, eliminating the need to restore all archived data for the sake of just a few rows.

These capabilities dramatically reduce database overload, allowing companies to reduce storage requirements, improve application response time and reallocate current capacity to support more users and transactions. Active archiving allows IT organizations to maximize the benefits of existing SANs, NAS and HSM storage solutions because active archiving complements these technologies, especially HSM systems, to enable a best-practice "staged" approach managing historical relational data that can be an integral part of enterprise SRM.

Active Archiving and HSM
It's true that active archiving and HSM both address the problem of explosive data growth by moving data to more cost-effective storage devices. However, active archiving is designed for relational data, while HSM is best suited for other types of data such as document files, bit maps, and video clips. Although HSM is ideal for managing these types of data, the technology is poorly matched for managing relational database tables, which can be very large.

Active archiving handles relational data at the row or record level, while HSM handles relational data at the table or dataset level (a relational database table is physically stored as a file). HSM performs the migration function based on the last time a particular database table or file was accessed. It is likely that users may need to access a small part of the database at least once during the period when the storage system administrator has designated that the data be kept at the highest level. For this reason, it is most likely that the entire relational database file will continue to reside at the highest storage level.

For example, a customer database table will probably be accessed on a regular basis, keeping it at Level One. However, only a subset of this data remains "hot" (that is, current customers), consequently, the entire dataset must be kept on the server because HSM cannot distinguish relational data at the row or record level.

Companies that have already deployed HSM will understand the benefits of "staged" data management. With active archiving, companies achieve similar benefits for relational data. Although HSM can migrate relational database tables up and down the HSM hierarchy, the size of the database does not change. In contrast, active archiving streamlines relational databases by archiving and removing referentially intact subsets of related data. This capability provides the best of both worlds, combining active archiving for the relational databases, while applying HSM rules to manage the archived data.

Selecting the Best Practice Archiving Methodology
Princeton Softech has customers that have safely archived and removed 65% of their database in just their first production archive and delete. This capability frees tremendous processing power to improve performance and availability and implement new applications, without upgrading capacity. In addition, a large amount of disk capacity is made available for other uses. Regularly scheduled active archiving continues to free significant disk space, saving millions in hardware and software upgrades. Because active archiving is an effective long-term solution to the problem of explosive database growth, it is critical to an enterprise data storage strategy.

A comprehensive enterprise active archiving methodology must provide the capability to archive data from a variety of relational databases and platforms. The ideal active archiving solution must also guarantee to retain the referential integrity and business context of the archived data and provide for easy access. In addition, there must be a capability for managing and storing archived data on the most cost-effective storage medium (online in an archive database, near-line on a file server, optical devices, or offline to tape). Integrating active archiving with SRM offers a best practices approach is to ensure that data and storage resources are well managed throughout the data life cycle.

Summary
Effective Storage Resource Management enables companies to reduce storage costs, improve data management, and keep data accessible throughout its life cycle. Along with the leading storage technologies, active archiving must be an integral part of any SRM initiative. Companies can remove rarely accessed historical data from overloaded databases and store it on the most cost effective medium. The best and most comprehensive solution to the data explosion challenges and for managing data throughout its life cycle requires the overall view provided by SRM combined with the refined and proven approach provided by active archiving.