From The Editor | December 18, 2009

Bigger Is Not Always Better

By Dipesh Patel, Senior Product Marketing Manager, CommVault

Whenever EMC/Data Domain rolls out new models of their storage appliances, folks tend to ask me what that means for CommVault.

Well, regardless of their offering, the answer so far has been the same. New EMC/Data Domain models don't change the fundamental differences between traditional deduplication versus CommVault's approach to deduplication.

All of the dedupe appliance vendors continue to update their models to respond to rapid data growth. That's pretty much expected for two reasons. First, a data growth rate of even 25% per year means that the total amount of data that needs to be backed up doubles in just about three years. Second when you factor in even tighter SLA agreements that demand more and more backup (versus retention) data be kept on disk, it's easy to see why every appliance vendor would need to scale to keep up with bigger workloads.

(For example: If you have a 10TB dataset and back that up 5 days every week, that's 50TBs which is stored on 5TB of usable capacity given a 10:1 dedupe ratio. Now if that production dataset doubles over time to 20TBs, and you decide to now keep 20 backups on disk instead of the original 5 backups, that's 400TBs of backup data. Deduplicated down 10:1, that would now require 40TBs. That 8X increase is driven by both the growth of the original dataset and the extension of your restore-from-disk window from 5 days to 20 days.)

So the boxes may have become bigger to handle the increased capacity requirements, but not necessarily better in many of the ways that matter when it comes to scaling performance. Granted any improvements in capacity or performance within an appliance are almost always an improvement on previous models, but what about improving performance outside of the dedupe appliance? Don't count on it. Bigger and better dedupe appliances neither improve performance before the backup data hits the appliance, nor afterwards as you move data to JBOD or to tape.

Touting improvements at the very end of the backup/archive process is good, but it still doesn't address how to reduce data end-to-end. First, the backup/archive network, the media agents/servers, and the backup process speed/performance on the data-source servers are not alleviated by bigger appliances at the end of the chain.

Second, as I mentioned in the previous blog series, you can actually reduce data (and therefore avoid performance bottlenecks) before it even requires an expensive appliance upgrade. So accurate reporting, then archiving for space management, and then deduplicating across both backups and archive data often delivers better data reduction. That in turn minimizes the need for bigger and bigger backup storage capacity. And there is no proprietary hardware involved.

We can even take this a couple of steps further. Throw in data compression and incremental backups on the front-end source servers and you've dramatically reduced the processing/network loads right from the get-go. And here's another advantage of a software-based approach: you can enable both compression and incremental backups, and encrypt, and still dedupe the data since it's part of a broader and robust backup/archive data management flow.

So what's the "moral" of this post? Pretty simple: BIGGER boxes are only bigger, not always better, AND they're probably not the BEST choice when it comes to delivering better end-to-end performance.

SOURCE: CommVault Systems