If your cloud provider can't guarantee the availability, confidentiality and integrity of your data, this means trouble. At least that's the view of Andy Dancer, CTO, Encryption, of California-based security company Trend Micro.
Speaking at the Infosecurity Europe conference in London in April, Dancer said that cloud providers like Amazon and Google solve the availability issue in a way that can compromise confidentiality. Amazon's cloud storage makes multiple copies of your data and spreads them across data centers within an "availability zone" to provide high availability, while Google spreads your data across distributed file systems on multiple devices, he said.
Amazon's recent high-profile AWS crash is a good example of what can happen if you don't architect your cloud presence correctly.
"Highly available cloud data is highly dangerous because multiple copies of your data creates multiple headaches," said Dancer. One of the key problems is when you delete files you no longer want, the effective destruction of remnant data becomes next to impossible. "When you delete your data, is the link to it removed, or is it overwritten with 1s and 0s?," asked Dancer. "You don't know, and this loss of control increases the risk of a data breach."
Ensuring the integrity of your data so that you get back the same data that you store in the cloud can be tackled in a number of ways, said Dancer. Creating and storing hashes for chunks of data can tell you when a chunk has changed, perhaps due to an I/O error, but not how the data has been changed. To do that you need to use checksums and parity bits to provide redundancy, but this can increase the volume of data stored significantly. In any case neither of these measures protects against the malicious modification of your data, because a hacker could simply modify the hash or checksum to match the modified data.
Not sure of the terminology used in this article? Check out Webopedia.com for answers.
The obvious solution to this integrity problem -- and one which also provides confidentiality -- is to encrypt any data stored in the cloud. This will ensure you data can't be maliciously modified, deter curious administrators or hackers from prying on your data, and reduce the risk that cloud storage devices could be sold or reused while they still contain confidential company information, he said. Make sure encryption keys are kept secure and separate from the data. "Encrypting a volume wont stop a hacker if the encryption key is also easily available in the cloud."
Seattle-based Symform provides a cloud storage service, and therefore faces the challenge of providing availability, confidentiality and integrity outlined by Dancer. But its cloud storage system works in a completely different way to Amazon's or Google's, so the company addressed this challenge in a very unusual way.
Symform actually offers a kind of storage exchange: customers store their data in a storage cloud made up of other Symform customers' storage resources, and in return they have to contribute a similar amount of disk space to be used as part of Symform's storage cloud. The service was designed in response to the fact that while one terabyte (1TB) hard drive can be bought for under $100, typical cloud service charge as much as $1000 per month to store 1TB, said Praerit Garg, Symform's president and co-founder.
When a customer has data to upload to the cloud (or, more precisely, to storage resources provided by other Symform customers,) the data is first broken into 64 megabyte (MB) chunks. To provide confidentiality, each chunk is then passed through a hashing function to produce a random string which is used as a unique 256-bit key to encrypt that chunk using the AES-256 cipher. Symform stores the keys for each block in a geo-distributed database the company calls Cloud Control. For additional confidentiality customers can also pre-encrypt each block and manage the key for that round of encryption themselves to provide protection against the possibility that the keys in Cloud Control get compromised.
Each encrypted 64MB chunk is then divided into 64 1MB fragments, to which 32 1MB parity fragments are generated and added using a RAID algorithm. Although this significantly increases the total amount of data to be stored, it introduces redundancy. That's because each of the 96 fragments is then randomly distributed to a different customer's storage resources for storage -- the location of each block's 96 fragments are stored in Cloud Control -- and any 64 can be used to reconstruct the entire block. "That means that 33 different devices would have to go down at 33 different customer sites before you would lose a block," said Garg.
Symform's Cloud Control also monitors the availability of each fragment. If for any reason a fragment disappears then it can be regenerated by Symform Cloud Control and stored elsewhere. This provides a high level of availability.
That leaves integrity, and this is provided by storing a hash of each fragment both with the fragment and in Cloud Control. If a fragment becomes corrupted this can be detected automatically by Cloud Control, or by software running at the customer site, by comparing the fragment with its hash. The corrupted fragment can then be regenerated and stored either at its original or a new location.
An obvious worry
Even if data is secured by encryption in the cloud, an obvious worry is that data that a customer stores on their own resources for other Symform customers may contain malicious code that could infect the corporate network. Because of that customers isolate the storage they make available to others on a server placed in a DMZ on the network, Garg said.
"Also, the service that receives fragments is in a low privilege account with only access to the fragment folder, so malicious code could only damage that folder," he said.
Symform's approach to providing cloud storage with availability, confidentiality and integrity is certainly unorthodox, but the company claims its service is attractive as it complies with HIPAA, Sarbanes-Oxley, GrammLeachBliley and other regulations. It is also undergoing the SAS-70 certification process and anticipates this being completed in the next few months.
But its biggest attraction of its unusual architecture may well be its pricing. A company with two servers and up to 50 PCs and laptops to back up would pay $1800 per year for an unlimited amount of data storage -- subject to contributing 1.5x the volume of data to Symform's cloud. Based on 250GB per server and 25GB per end user machine, Symform said this around half the cost of Amazon's S3 storage and less than 20 percent of the cost of a cloud backup services such as MozyPro or Carbonite Pro.
Paul Rubens has written about business IT as a staff and freelance journalist for over twenty years. In that time he has written for leading UK and international publications including The Economist, The Times, Financial Times, the BBC, Computing and ServerWatch.