Data Management Software Market Overview

What is Data Management?
Software in the data management market falls into three categories Device Control, Backup and HSM, in order of complexity. The distinctions can become blurred as the less complex products market themselves as HSM solutions.

Device control software provides basic access to storage devices. Operating systems without native support for certain peripherals require third party software to allow the devices to operate. This market is largest for optical storage devices that often have no operating system support, due to the complexity required to enable them work, and the relatively small market size.

Backup software is aimed more at data copying rather than device control. Often the backup software has no device control built in and relies on the storage peripherals being accessible via the operating system. If a user chooses a storage peripheral with no support in the operating system they would have to install a device control package before the backup software will work. This reliance on operating systems to control the devices also restricts the data formats available to use sometimes creating great compromise in performance or capability.

HSM is the most complex data management class of products and encompasses device control software in some products, data copying functions as in Backup, and additional data movement functions. Data movement or migration physically transfers data from one location in storage to another, normally between different devices to place rarely use data files on cheaper storage devices to reduce the cost of storing data.

What are the technologies?
As the third party device control software is often designed to support the most complex storage devices, the underlying technologies to provide users transparent access, is far more complicated than it appears. Native File Systems is the least complex method, simply pointing the operating system at the device and relying on the standard support. While this can work for mainstream storage devices, less popular storage device types can suffer from restricted function, unclear manufacturer support or inefficient protocols. Vendor unique File Systems are written in answer to these operating system shortcomings. Most device control vendors have their own format tested and supported for all modes of operation and often optimised for the device capabilities. The major drawback of these formats are that they are proprietary and data interchange, upgrade protection and long term support can be an issue. Filters are a compromise between operating system support and vendor unique filesystems. Data being sent to the device passes directly through the filter while device control commands such as robotic control are 'filtered' off through a third party control program. This reduces the complexity of the software but provides more device control than the operating system alone is capable of. However all the format problems discussed above still apply in this scheme.

Cheyenne, Filejockey, takes this approach. Time to market is fast but any device without operating system drivers (such as 5.2GB drives in NT or WORM drives on most operating systems) will not work, control is there but data cannot be read or written past the filter. Device support in applications can be hard-coded such as LINDI in Lotus Notes. Purchase the application and no third party software is required to drive the storage devices. Often the device control within applications has been purchased from one of the specialist third party vendors.

What is Backup?
Backup is essentially data copying and the complexity is centered around the flexible scheduling of the copy commands and the organization of which devices can be commanded to copy off data or receive copied data.

Image backup simply takes all the data available in one storage location and copies it to another, for example a hard disk drive would be copied as an exact copy onto a tape. This method allows for users to restore data as a snapshot taken sometime in the past in the event of a crash. This can restore data and function to a user but all changes which occurred from the time the image backup was taken to the time of the crash will be lost, examples of this type of program are NT Backup, Seagate Backup Exec, Snapback.

Incremental backup is the next step up and fills the gap left by the first scheme. An image backup is taken at set times and during the intervening periods copies of the files that have changed are made and added to the original image copy. In the event of a crash the user restores the original image and overlays the image with the additional files copied as increments. This gives the user everything lost in the crash except those files changed since the last incremental backup, much less than the previous restore but there are one more class of files still not copied. Cheyenne Arcserve, Legato Networker, Netvault use these schemes.

Open file backup is the final step allowing a backup program to copy files that are being used at the time the copy command is attempted. In a large database a backup program will encounter many files being used and therefore not able to be copied, without open file backup these files will not be copied at the same time as the database tables or indexes. On restore the index will not correspond to the data and the database will crash. Once these three schemes are implemented the next level is providing links between all the devices that need to be backed up to on a network to all the devices which can be used to hold the backup copies.

Client backup is a local scheme where a single user copies files onto a tape drive, jazz drive or some other removable media device. The user normally instigates the backup and waits until completion to remove the backup copy. A basic scheme that protects an individual from a crash.

Client/Server backup allows users to backup their data and servers to be backed up to central repositories. This scheme allows backups to be automated, timing the data copying when resources are free and ensuring backups are done in a controlled manner. The key to controlling all the backup schemes on a client server network is keeping track of the backup copies, reconciling files which are linked within the backup copies which may have to be copied at different times and policing multiple files with the same name.

What is HSM?
Hierarchical Storage Management adds Migration and Archiving to backup. HSM is used to remove unused data from expensive storage devices and place it onto cheaper storage media such as tape. Migrating data removes it from the local device freeing space for new data. Links remain on the local device to the data's new location allowing the user to see the files and if the user attempts to access the file the hook is used to request the migrated file copying it back from its remote location back onto the users workstation. Archiving allows the system to migrate data physically offsite in necessary. The data is moved onto tape and the tape removed to offsite vaults for disaster recovery. The HSM system preserves the hooks from the users workstation to the HSM directory so if requests are made for archived data the system restores from the archive if it is available online or the administrator is prompted to manually retrieve tapes which are physically inaccessible to the system.

article image

Who are the players?

  • EMASS
  • Cheyenne
  • ADSM
  • Seagate
  • Legato

Where does Manager fit in?

article image