Thursday, December 24, 2009

Secure Storage of cached and de-duplication data

WAN optimization devices store the data (content) in their own hard drives to process future requests. Some of this data can be confidential. More places the data is stored in clear, there are more opportunities of this data leaking out. In WAN optimization device case, if device is stolen, thieves can retrieve confidential data easily. To protect the privacy of data, it is necessary that the devices store the content in encrypted form. One thing one must ensure is that the DRE (Data Redundancy Elimination) efficiency should not go down even when encryption is applied. As you know Encryption algorithms similar to compression algorithms create dependency across the data in the file. That is some portion of previously encrypted block data is used to encrypt further data in the file. This will break the de-duplication efficiency dramatically. That is every time file gets modified by the user application, the file content change exponentially due to encryption even though the changes were made to the clear file were small.

'rsyncrypto' file encrypts the file such a way that there is no dependency among the encrypted blocks. Typically IV (Initialization Vector) is taken from the previous block and used along with the key to encrypt the new block. 'rsyncrypto' eliminates the IV being taken from the previous block and uses random IV for all blocks in the file. Though this may reduce some security effectiveness, it provides enough security effectiveness.

Backup market certainly can use this feature to secure data at rest while maintaining the de-duplication efficiency. This feature is particularly useful when external backup storage providers are used to backup the data. It is required that the users have control over keys used to encrypt the files and at no time backup storage providers have access to these keys at any time including while applying delta changes. This requirement mandates that the delta data is obtained on old and new encrypted files. So, 'rsyncrypto' utility is really useful. When the data needs to be retrieved from backup storage providers, user known keys would be used to decrypt the files.

I am not sure whether this technique is applicable for WAN de-duplication markets. WAN optimization devices need to serve the content locally without downloading the data from central WAN optimization device. That is, all WAN optimization devices should be able to get hold of clear data. Hence, I feel that WAN optimization devices would use 'Crypto file systems'. These file systems transparently encrypts all files in the file system. No knowledge of this is needed by WAN optimization feature in the device. This kind of secure storage appears to be fine as these WAN devices are typically administrated by same entity serving confidential data.

No comments: