THM – Hashing – Part 14
This is a continued series where I document my path through different tryhackme courses. I recommend everyone that wants to learn cyber security to subscribe to tryhackme.com and take the courses there.
Table Of Contents
Key Terms
Plaintext – Data before encryption or hashing, often text but not always as it could be a photograph or other file instead.
Encoding – This is NOT a form of encryption, just a form of data representation like base64 or hexadecimal. Immediately reversable.
Hash – A hash is the output of a hash function. Hashing can also be used as a verb, “to hash”, meaning to produce the hash value of some data.
Brute force – Attacking cryptography by trying every different password or every different key.
Cryptanalysis – Attacking cryptography by finding a weakness in the underlying math’s.
What’s a hash function?
Hash functions are quite different from encryption. There is no key, and its meant to be impossible to go from the output back to the input.
A hash function takes some input data of any size and creates a summary or “digest” of that data. The output is a fixed size. Its hard to predict what the output will be for any input and vice versa. Good hashing algorithms will be fast to compute and slow to reverse. (Go from output and determine input). Any small change in the input data (even a single bit) should base a large change in the output.
The output of a hash function is normally raw bytes, which are then encoded. Common encodings for this are base 64 or hexadecimal. Decoding these won’t give you anything useful.
Why should I care?
Hashing is used very often in cyber security. When I logged into TryHackMe, it used hashing to verify my password. When I logged into my computer, that also used hashing to verify my password.
Whats a hash collision?
A hash collision is when 2 different inputs give the same output. Hash functions are designed to avoid this as best as the can. Due to the pigeonhole effect, collisions are not avoidable. The pigeonhole effect is basically, there are a set number of different output values for the hash function, but you can give it any size input. As there are more inputs than output values for the hash function, some of the inputs must give the same output. If you have 128 pigeons and 96 pigeonholes, some of the pigeons are going to have to share.
MD5 and SHA1 have been attacked, and made technically insecure due to engineering hash collisions. However, no attack has yet given a collision in both algorithms at the same time so if you use MD5 hash and the SHA1 hash to compare, you will se they’re different.
Uses for hashing
What can we do with the hashing?
Hashing is used for 2 main purposes in Cyber Security. To verify integrity of data or for verifying passwords.
Hashing for password verification
Most webapps need to verify users passwords and storing these passwords in plaintext would be bad.
Its bad practice to encrypt passwords as the key has to be stored somewhere. If someone gets the key, they can just decrypt the passwords.
This is where hashing come in. Instead of storing the password, you store the hash of the password. This means that you never have to store the users password and if the database was leaked the attacker would have to crack each password to find out what the password was.
A rainbow table is a lookup table of hashes to plaintexts, so you can quickly find out wat password a user had just from the hash.
Protection against rainbow tables
To protect against rainbow tables, we add a salt to the passwords.
The salt is randomly generated and stored in the database, unique to each user.
The salt is added to either the start or the end of the password before its hashed and this means that every user will have a different password hash even if they have the same password. Hash functions like bcrypt and sha512crypt handle this automatically. Salts don’t need to be kept private.
Recognizing password hashes
Use a healthy combination of context and tools. If you found the hash in a web application database, its more likely to be md5 than NTLM. Automated hash recognition tools often get these hash types mixed up, which highlights the importance of learning yourself.
Unix style password hashes are very easy to recognize, as they have a prefix. The prefix tells you the hashing algorithm used to generate the hash. The standard format is $format$rounds$salt$hash
.
Windows passwords are hashed using NTLM, which is a variant of md4. They’re visually identical to md4 and md5 hashes, so its important to use context to work out the hash type.
On Linux, password hashes are stored in /etc/shadow
. This file is normally only readable by root. They used to be stored in /etc/password
, and were readable by everyone.
On Windows, password hashes are stored in the SAM. Windows tries to prevent normal users from dumping them, but tools like mimikatz exist for this. Hashes found there are split into NT hashes and LM hashes.
Table of the most Unix style password prefixes:
Prefix | Algorithm |
$1$ | md5crypt, used in Cisco stuff and older Linux/Unix systems |
$2$, $2a$, $2b$, $2x$, $2y$ | Bcrypt (Popular for web applications) |
$6$ | sha512crypt (Default for most Linux/Unix systems) |
Hash formats and password prefixes example page:
https://hashcat.net/wiki/doku.php?id=example_hashes
Password Cracking
You cant decrypt password hashes. They are not encrypted. You have to crack the ashes by hashing a large number of different inputs, potentially adding the salt if there is one and comparing it to the target hash. Tools like hashcat and john the ripper are normally used for this.
Why crack on GPUs?
GPUs have thousands of cores. They are very good at some of the math’s involved in hash functions.
NEVER use –force
for hashcat. It can lead to false positives and false negatives.
Hashing for integrity checking
Integrity Checking
Hashing can be used to check that files haven’t been changed. If you put the same data in, you always get the same data out. If even a single bit changes, the hash will change a lot.
HMACs
HMAC is a method of using a cryptographic hashing function to verify the authenticity and integrity of data. A HMAC can be used to ensure that the person who created the HMAC is who they say they are (authenticity), and that the message has not been modified or corrupted (integrity). They used a secret key, and a hashing algorithm to produce a hash.