When warning about the risks of website attacks like SQL injection and remote file inclusion, we often talk about how these breaches can reveal “sensitive data.” What kind of sensitive data? Well, lots of kinds, such as personal data about account holders or privileged information like internal business documents. But the kind of sensitive data often targeted in website attacks are user account credentials. In short: passwords.
Hashing It Out
Surely any organization worth their salt wouldn’t store user passwords on their servers in unencrypted form. If only that were true for the 450,000 users whose Yahoo Voice credentials were lifted earlier this year. In this hack, SQL injection was used to dump the password list from a database table.
This particular breach owes its success to three critical security failures at Yahoo:
- Being vulnerable to an SQLi attack in the first place. Thoroughly sanitizing user input might have prevented this.
- Storing passwords on the server in plaintext format – an invitation for trouble.
- Storing passwords on the server at all.
Although we talk about “password encryption,” in practice it is never secure to store user passwords in any format, be it plaintext or encrypted by a fancy algorithm. When a user loses a password, your website should not be able to retrieve it -- simply because it does not exist in retrievable form.
When an account password is created, either by the user or the system, the secure practice is to store on the server only a hash of the password. A hash is a value calculated from the content of the password, which can only be produced in one direction. A hash can be generated from a password, but a password cannot be generated from a hash.
That’s not the end of the story, however, because hackers know that all hashes are not created equal.
Obfuscation Is Not Security
Suppose a hacker has managed to lift a table of password hashes from your database. It is tempting to think “no harm no foul” because the hashes cannot be reverse engineered into the passwords themselves.
Many password security systems are designed to use either the MD5 or SHA1 algorithm to calculate the hashes. A hacker wanting to compromise these hashes has a couple of options:
Brute force attack. In a brute force attack, the hacker uses code to crack the passwords by trial and error. For example, if the hacker has identified that the hashes are generated using SHA1, the hacker runs a series of candidate passwords through an SHA1 hash and compares the results to the stolen data. If there is a match, the hacker has just figured out one password.
Brute force attacks can rely on “dictionary” files to generate candidate passwords, since most users build their passwords using common words. Combining these words with symbols and numbers can improve the success of a brute force attack. Long and complex passwords are more difficult to expose by brute force and may require more computational power (and patience) than the hacker may want to use.
Rainbow tables. Hackers with an interest in efficiency have embraced a tool called the rainbow table. This table is a pre-computed dataset for a given hash algorithm (such as MD5), containing millions of permutations. Although a rainbow table can be very large in data size, it can also significantly reduce computing time to find a string of characters which maps to a given hash. In the wrong hands, rainbow tables are a dangerously effective tool.
There are numerous resources online a hacker can use to look up hashes in pre-existing rainbow tables. In short, even hashing of passwords is not secure enough.
Dash of Salt
One weakness of simple hashes is that they are predictable. For example, the MD5 hash of the password “mydogFido” is 0529ee331f4491e7e61894f32edb1112. If your system has a dozen users who all used this same password, they will have identical hashes in the database. A hacker who decrypts this password by brute force or rainbow tables has now compromised all dozen accounts.
In the real world this problem is even worse because too many users choose simple phrases like “password” and “1234” even when advised to create longer and more complex passwords. This is exactly the breach that struck LinkedIn in June of this year.
A “salt” is data added to the password before it is hashed. Using salts, your system can generate unique hashes for every user regardless of how weak their chosen passwords actually are.
Your salt needs to be data which adheres to a formula that is unknown to the hackers. For example, a salt might consist of random number+Unix time of user account creation. Another possible salt formula is to employ the username modified through a predictable transformation (such as repeated twice).
There are countless ways to generate a strong salt, but the point is that once a good salt is added to the user’s chosen password and then hashed, that hash will be unique. Because each hash is the result of a different salt, even identical passwords will have different hashes. This makes it impossible to effectively use a rainbow table attack against your password database.
Rise of the GPU
For years, salting hashes has been the best practice to defend against password encryption attacks. But this is changing because of graphics cards.
Even a salted hash can be decoded given enough time and processing power to brute force every possible combination. Modern graphics cards, optimized to perform massively complex mathematical functions to produce sophisticated visual effects in video games, now feature extremely powerful processors called GPUs.
Consider this sobering metric: Cracking a password hash that would take 90 minutes on a modern CPU could take as little as 4 seconds on a modern GPU. Sophisticated hackers can amass a cluster of GPUs working in parallel. They can now burn through even the best salted hashes in a fraction of the time compared to just a couple of years ago.
Today’s best defense is to increase the “cost” of processing time. The best practice to date is to adopt the algorithm called bcrypt to generate salted hashes in place of the traditional algorithms like MD5, SHA-1 and SHA-2.
Bcrypt allows you to create very costly hashes – that is, they take a long time to calculate. Measured even in an extra few hundred milliseconds, the processing cost incurred when a user creates an account or logs in is not significant because these events do not occur millions of times a second (unless you are Google). Yet a brute force attack using a GPU will be significantly slower, demanding that the hackers can afford the time and resources to compromise your database.
While it is never possible to guarantee against an encryption breach, every security measure reduces your vulnerability and increases the demands on the attacker.
Aaron Weiss is a technology writer and frequent contributor to eSecurity Planet and Wi-Fi Planet.