# What is Hashing

Hashing is the encryption process by which a computer generates a value or values from a string of text using a mathematical function.

In cooking terminology, ‘hashing’ means to chop, mix, and produce a new mixture. Like cooking, hashing or a hash function is where a computer takes an input of any length and content (letters, numbers, symbols, etc.) and uses a mathematical formula to chop it, mix it, and produce an output of a specific length. The output is known as a hash value or hash.

The purpose of hashing is to enable security during the process of sending a message, when the message is intended for a particular recipient only--in other words, cryptography.

## Secure Hashing Algorithms (SHA-256)

In transactions involving cryptocurrencies like bitcoin, the transactions are taken as input and run through a secure hashing algorithm (“SHA”). SHA is comprised of four SHA algorithms: SHA-0, SHA-1, SHA-2, and SHA-3.

SHA-1 is the most widely used of the current SHA hash functions, utilized in many applications and protocols, including Secure Socket Layer (SSL) security.

SHA-2 is the other most common, comprised of SHA-224, SHA-256, SHA-384, and SHA-512, depending upon the number of bits in a hash value.

For example, bitcoin uses SHA-256, which gives it a hash value of a fixed length. No matter the size or length of the input, the output will always have a fixed 256-bits length. This is so you don’t have to remember the input data, which could be huge--all you have to do is remember the hash and keep track. For more information on SHA functions, please click here.

## Cryptographic Hash Functions

Depending upon its cryptographic characteristics, hash functions can be applied in two different ways: password storage and data integrity.

## Password Storage

Instead of storing the password out in the open, all logon processes store the hash values of passwords in the file itself. The password file consists of a table of pairs, which are in the form of (user id, h(P)).

The process is depicted in the below graphic:

In the event an intruder comes across the file, they can only see the hashes of passwords, even if they accessed the password itself. They would be unable to login using hash nor can they derive the password from hash value, since hash function possesses the property of pre-image resistance.

## Data Integrity

This is the most common application of the hash functions. It is used to generate the checksums on data files, providing assurance to the user that the data is accurate.

In the graphic above, you can see the integrity check assisting the user in detecting any changes made to the original file.

The caveat with data integrity checks is that this is only valuable if you believe the file is in fact the original file.

Example: An intruder comes in and instead of modifying file data, they change the entire file and compute an entirely new hash, and then send it to the receiver. How would you know? You wouldn’t.

So, the integrity check is only useful if the user is certain as to the originality of the file.

## Cryptographic Characteristics/Properties

For a hash function to be considered a “secure” and an effective cryptographic tool, it must have certain characteristics or properties.

## Deterministic

Under this property, no matter how many times an individual goes through a particular input through a hash function, you will always get the same result. This is to keep it easy when keeping track of any particular input.

## Quick Computation

Under this property, any hash function would need to return the hash of the input quickly.

## Pre-Image Resistance

The “pre-image resistance” property means that it should be computationally difficult to reverse a hash function.

Example: If a hash function (h) produced a hash value (z), then it should be a difficult process to find any input value (x) that hashes to (z). The link in the chain is very difficult to find. This ensures against any potential hacker who only has a hash value and is trying to find the input (link in the chain).

## Second Pre-Image Resistance

This property means that it should be hard to find a different input with the same hash.

Example: If a hash function (h) for an input (x) produces a hash value h(x), then it should be difficult to find any other input value (y) so that h(y) = h(x).

This protects against any threat who has an input value and its hash, and wants to substitute a different value h(x) as legitimate value in place of that original input value.

## Collision Resistance

This property makes it difficult to find two different inputs of any length that result from the same hash.

Example: For a hash function, h, it should be difficult to find any two different outputs, x and y to where the hacker would be able to put together h(x) = h(y).

While hash functions essentially compress functions with a fixed hash length, it’s impossible for a hash function not to have a collision. By having the property of “collision-free”, only makes it all the more difficult for an attacker to find two input values with the same hash.

## Collision Resistance

If you put “x” number of people in a room, the odds of any two people sharing the exact same birthday, increases, pursuant to the rules of probability.

Taking that analogy and applying the birthday paradox to hashing, specifically a 128-bit hash, you have an “x” percentage of a chance to break the collision resistance at the sqrt(2^128) = 2^64th instance.

## Puzzle Friendly

For every output, “y”, if “K” is chosen from a distribution with “high min-entropy”, it is very difficult to find an input “x” such that H(k|x) = Y.

A “high min-entropy” means that the value being chosen is so vastly distributed over a range of values, that the probability of choosing the correct value is very unlikely.

Remember the game “choose a number between 1 and 100?” That’s high min-entropy.

The | means join--in other words, k|x means kx.

## Data Structure

When we talk about data structure properties, we refer to pointers and linked lists.

Pointers are variables that store the address of another variable in programming, pointing to the location of other variables.

A linked list is a sequence of blocks that each contains a set of data that is linked to the next block via a pointer.

Inside each block you will see a pointer, which contains the address to the next block. The first block where you see pointer, is called the Genesis Block

## The Merkle Tree

When we talk about data structure properties, we refer to pointers and linked lists.

Pointers are variables that store the address of another variable in programming, pointing to the location of other variables.

A linked list is a sequence of blocks that each contains a set of data that is linked to the next block via a pointer.

Whenever looking at a Merkle Tree, it’s best to start at the very bottom with the leaf nodes (L1, L2, L3, L4). Moving up, you will see non-leaf nodes, which serve as the hash of the values (hash(L1)), (hash(L2)), (hash(L3)), and (hash(L4)) of their child nodes (Hash 0-0, 0-1, 1-0, 1-1).

A “child node” are the nodes feeding into the hash. For Hash 0, the child nodes are Hash 0-0 and Hash 0-1. For Hash 1, the child nodes are Hash 1-0 and Hash 1-1.

Moving up the chart to the highest tier, labeled “Top Hash”, this is the root node.

So you are wondering what the purpose is of the Merkle Tree? Sorting through any particular block is not an easy task, however, when using a Merkle tree, you will save time in looking for a particular transaction to determine whether or not it belongs in that particular block or not.

When you are looking at a transaction, you want to make sure that the data contained belongs in the appropriate blocks. By using the Merkle tree, you are able to quickly trace the data by following the trail of hashes.

## Applying Hashing to the Mining Process

When a new block arrives, the entire contents of those blocks are hashed. If the hash is less than the difficulty target, it’s then added to the blockchain for the community to acknowledge.

Very rarely do you get a new block ready to be added to the chain, just like that. Which is why nonce, an arbitrary string of data is added to the hash of the block. Once added, the string is hashed again and compared to the difficulty level.

If it is higher than the difficulty level, the nonce is changed and it continues repeating over and over until the difficulty level requirements are met. Only when those requirements are met, is the block finally added to the blockchain.

## The Hash Rate

The hash rate describes how fast hashing operations take during the mining process.

A high hash rate means there are more parties involved in the mining process, providing for a smooth operation. However, if a hash rate is too fast, the difficulty level is increased.

If the hash rate becomes too slow, the difficulty level is decreased. The idea is to always ensure the system runs smoothly, so providing every means of accurate hashing operations is essential.

So, to sum up:

Once you receive a block of data, the hash of the contents of that block is combined with a nonce, or random string of data.

That new string of data (hash + nonce) is then hashed again, compared to the difficulty level. Depending on whether it meets the requirements, does the new string either get hashed again or added to the blockchain.

Once it’s added to the blockchain, the community is informed

Miners responsible for this process are rewarded with bitcoins.