From The Editor | November 23, 2009

Maximum Data Reduction (PART TWO): Beyond Deduplication

By Dipesh Patel, Senior Product Marketing Manager, CommVault

Following on from the last post, maximum data reduction requires a three-part modular strategy: identify/categorize your data based on demonstrated usage; archive "stale" data; and deduplicate across both archive and backup data. This approach actually delivers better benefits than either archive or deduplication on its own.

What does mean for you? Well we'll start with a greatly simplified example and walk through the various permutations.

First, let's establish the baseline for comparison (always need a baseline for comparison). Say you've got 10TBs of data and need full backups for 30 days to facilitate rapid recovery. In a normal scenario that would mean that you would have 10TBs on primary (Tier1 storage), and 300TBs on Tier2 (30 backups * 10TBs per backup). At a cost of $10K/usable TB for production, and $2K/usable TB for Tier2, you'd be spending about $700K for this kind of protection/retention. That's based on $100K for Tier1 (10TBs * $10K/TB), and $600K for Tier2 (300TB * $2K/TB).

But we know that in general, about 50% of data is stale, meaning it hasn't been accessed in over a year. If you archived the stale data, you'd end up with 5 TBs of data that you moved from Tier1 to Tier2 storage. You would also end up with an additional 150TBs on Tier2 as you backup the 5TBs of actively used data from Tier1. Not only would you save on capacity across the 30 full backup jobs, but you also wouldn't need to spend time/processing power backing up data that's not actively being used. So that leaves you with 5 TBs of active data on Tier1 and 155TBs on Tier2 (5 TBs of archived stale data + 150TBs of backup data). In this scenario your costs would be reduced to $360K: $50K for Tier1 storage for 5TBs, and $310K for 155TBs of data on Tier2. Of course you've already paid for the additional 5TBs freed up on Tier1, but in general your assigned costs to protect/retain the data will have been almost halved. This freed capacity can be used to handle future data growth or to store archived data that needs to be brought back to the active storage Tier (for instance to bring back an old sales contract that needs to be reviewed for a current renewal opportunity).

So even though we haven't even deduplicated the data just yet, we've already halved the overall Tier1/Tier2 costs PLUS effectively doubled our spare capacity on the most expensive (Tier1) storage. Not bad for such a "simple" trick.

Why haven't more customers taken the reporting/resource management/archiving route? Good question. I would love to hear why you think this approach has not gained as much buzz and mindshare as deduplication. My personal opinion is that more customers ignored the issue as long as hardware kept getting cheaper, an approach that no longer cuts it with continued data growth acceleration. Secondly, I think it was just too complicated, not to mention expensive, to cleanly implement both the resource/reporting pieces and the archive technology stacks. With Simpana software, it's all one platform. That makes it a lot easier for our customers to do both quickly, cleanly and in a coordinated fashion.

In the next post, I'll conclude this three-part blog series by discussing the role that deduplication plays in maximizing data reduction.

SOURCE: CommVault Systems