Sunday, November 25, 2018

Hashing

Hashing converts a set of characters to a different set of characters when passed into a hashing function/hashing algorithm. 


The resultant string is called the hash value, and it represents the original string. The hash value is normally much shorter than the original, and has a fixed length. While the original string can sometimes have a meaning and be human-readable, the hashed value will not have any meaning as it is just a set of characters returned by the hashing function/hashing algorithm.

Hashing cannot be reversed which means you cannot obtain the original character set from the hash value of that particular character combination.

No two different inputs will and can have the same output. Only the same input character combination will have the same output. By any chance, if a hashing algorithm is found to produce the same output for two different inputs, we must stop using that particular hashing algorithm, as it violates the hashing rules and may tend to create security issues when used in applications, as it will behave more differently than it is intended to. Hash Collision is this scenario where the hashes generated by a hashing algorithm collides or in other words, generates the same output for two different inputs.

MD5, SHA-1, SHA-256, and SHA-512 are some of the hashing algorithms. While MD5 was known to have Hash Collision for some time now, a recent research revealed that SHA-1 hashing algorithm also has Hash Collision. Therefore, the hashing algorithm mainly recommended to be used now is SHA-256.

Some Use Cases of Hashing
  • Password Verification - When a user registers in a website, without storing the password directly in the string format, the hash value of the password is stored in the database, so that the real passwords are not exposed to the people who have access to the database. Whenever the user logs into the system again, the password entered by the user is taken, and the hash value of the password is generated. This generated hash value is compared against the hash value stored in the database for that particular user to verify whether the password entered by the user is correct.
  • Integrity Protection - The integrity of a message is said to be protected, if the receiver receives the exact message sent by the sender, exactly same as it was, without any modifications done to it in the middle by an intruder. 
    • Hashes are used to verify whether the message received is the exact message sent by the user. The picture below depicts this verification process.
    • A similar process with few modifications takes place when verifying the Digital SSL Certificates. 
    • This message can also be in the format of files. Since we might not be able to check the details by ourselves, we can use the hash provided by the sender to verify the file. For example, when downloading certain software like IntelliJ IDEA, you will be provided with the hash to verify the file with, as in the picture below. In this case, as it can be seen in the picture, they are providing the SHA-256 hash of the file.
  • Indexing - When searching for data, if the original string value is too long, an index can be used which is a shorter hashed representation of the original value. Searching using a shorter value or index for the original value in the database will make the process much faster, and optimize the process, improving performance.

In the above use cases, it is a must to use the same hashing algorithm when storing or transmitting data, and later when retrieving or verifying it, as different hashing algorithms will work in different ways when producing the hash value. 

Although hashing cannot be reversed, this does not mean that the hashed passwords stored in the database are safe from the attackers. This is because there are tables existing which maintains the hash value for each permutation or combination of plain text possible for a number of characters, and these tables which are called Rainbow Tables can be used by hackers to try different combinations when cracking passwords. This is one of the reasons why we must make sure to use a proper password we can remember with considerable length. I intend to enlighten you more on this area in one of my future posts by discussing about password attacks and multi-factor authentication.

Salted Hashing

Salted Hashing was introduced to add extra security and improve the security of hashing.

A salt is a randomly generated string which is unique.

Against what kind of situations does Salted Hashing provide security?
Let's say that Sam has access to the database where the hashed passwords are stored. If Sam's hashed password appears to be xbc124 in the database, and John's hashed password is also stored as xbc124 in the database, then Sam will come to know what John's password is, as only the same input plain text password will have the same hashed password. 

What Salted Hashing does is, it adds extra security by appending the salt to either front or back of the plain text password before hashing and storing the hashed password.

The figure below illustrates the Password Verification process when Salted Hashing is involved.