With the right infrastructure anything is possible.
Call us at: 1.800.445.2786
 

dcVAST Practice Brief - Data DeDuplication

 

The Business Situation

The volume of data that corporations have to deal with today is increasing at exponential rates. The causes of the rapid increase in volume are many, but a few key factors are:

  • Transaction Processing Applications, with large amounts of collected data
  • Use of email communications, often with large attachments
  • Viral marketing techniques where many files get shared
  • The addition of pictures and videos in emails
  • Collaborative work processes where file sharing is usually accomplished via replicated files
  • Government and Industry regulations are requiring data retention for longer periods of time

This larger volume of data needs to be stored, properly managed and protected with backup and recovery techniques and replication to facilitate disaster recovery efforts. To accommodate this growing requirement, companies are investing in disk based archiving systems or large tape libraries. The replication of this data to support disaster recovery requirements is also becoming a challenge. If online replication is the desired solution, the bandwidth required to support data transfer is huge, growing, and expensive. If tape based Disaster Recovery is the solution, the logistics and expense of duplicating, relocating, and cataloging these tapes is almost unmanageable.

The IT Challenge

The growing volume of data has impacted the internal IT staff in a very significant way. Budgets to keep up with the constant requirement for additional disks, and the constantly increasing demand on IT staff to manage the growing storage volume have hindered the IT department’s ability to deal with other changes in IT infrastructure and IT support. In order to keep the backups performing within the allowable window of time, most companies have moved to Disk-to-Disk backup methods vs Disk-to-Tape. Tape is now only used when offline copies are required. Backup processes need to be monitored to ensure that the backups are run completely and accurately the first time, since there is not time to repeat the backup process. Consideration also must be given to the storage network in order to support the requirements for faster backup processes. With all of these considerations, the challenge remains, how to backup an increasing volume of corporate data in a shrinking or quickly vanishing backup window.

New capabilities now exist to eliminate the redundant data created. This new capability is De-Duplication or the ability to eliminate redundant data from the backups. Thus reducing the amount of data that needs to be backed up significantly. This also reduces the amount of data that needs to be transmitted over the web for copies supporting disaster recovery operations.

The dcVAST Solution

Data De-Duplication is the most impactful development in recent times as it relates to managing the ever growing volumes of data. Data De-Duplication can now be delivered in a cost effective and useful manner. In a Data De-Duplication program, the data is broken into small blocks and that block analyzed and assigned a unique fingerprint. The fingerprint for all of the blocks of backup data is maintained in a specialized database. Each time a fingerprint is calculated for a block of data the database is referenced to see if that block of data already exists. If it is found, a reference to the data is maintained and the redundant data itself if discarded. This process oftentimes results in 10-20x reduction in the amount of data required to be backed up. This savings also translates into savings in disk space for maintaining the backup as well as savings in network bandwidth to replicate the data to a disaster recovery location.

The Data De-Duplication operation can be flexibly applied at the client or data source, at the backup media server, or as a backup target. If client De-Duplication is chosen, a resulting saving on LAN bandwidth can be realized. The load on the client or media server to perform the De-Duplication operation is normally considered to be 10-15% of the server’s computing capacity. In all cases, the data is reconstituted if it is necessary to write the data to tape as part of the long term Backup and Recovery process.
dcVAST engineers can provide you with the very best assistance with:

Assess: dcVAST will work with your organization to understand the impact adding Data De-Duplication into your backup environment will have on your backup operation. dcVAST takes the time to understand actual usage trends, sources of growth and rates of data change. dcVAST also takes the time to understand the needs of the business as it relates to your disaster recovery solution and the impact of Data De-Duplication on this requirement and the business as a whole. 

Design: This assessment can then be applied to create a forecast of future demand for the "right" improvements. dcVAST architects have a broad range of technologies to choose from to deliver solutions based on industry best practices. These technologies include disk based or appliance based De-Duplication options.

Proof-of-Concept: dcVAST can assist with the performance of a proof step which will determine the applicability of de-duplication to your environment.

Build: dcVAST Customer Engineers are available to implement the solution or to work with your personnel and assist in the implementation.

Support: dcVAST will support your personnel and all of the technologies involved in the chosen De-Duplication solution. We will keep this solution current and operational.

Manage: dcVAST will fully manage the operation of your backup infrastructure, including the De-Duplication function, at your location or off-site based on your preference.

Now more than ever, companies need to take aggressive action to fine tune the implementation of their backup and recovery operations. dcVAST understands the implication of implementing Data De-duplication into your backup and recovery system and will work with your organization to develop the right solution. After all, there is no such thing as "one size fits all" when it comes to your data.

 

 

 
 

Hardware Support Services

Software Support Services