Overview - Hashing algorithms (SHA, MD5)

What is it?

Hashing algorithms are special methods that turn any input data, like a password or a file, into a fixed-size string of characters called a hash. This hash looks random but is always the same length no matter the input size. SHA and MD5 are two popular hashing algorithms used to create these hashes quickly and securely. They help verify data integrity and protect sensitive information without revealing the original data.

Why it matters

Hashing algorithms exist to ensure data has not been changed or tampered with and to protect passwords and other sensitive data safely. Without hashing, anyone could easily see or alter private information, leading to security breaches and loss of trust. For example, websites use hashing to store passwords so even if hackers get access, they cannot see the actual passwords. This keeps our online accounts safer.

Where it fits

Before learning hashing algorithms, you should understand basic data security concepts like encryption and data integrity. After mastering hashing, you can explore digital signatures, cryptographic protocols, and password management techniques. Hashing is a foundational tool in cybersecurity and data protection.

Mental Model

Core Idea

A hashing algorithm transforms any input into a unique, fixed-size string that acts like a digital fingerprint, making it easy to check data integrity without revealing the original content.

Think of it like...

It's like pressing a leaf onto a piece of paper to create a leaf print; the print is unique to that leaf and always the same size, but you can't recreate the leaf from the print alone.

Input Data ──▶ [Hashing Algorithm] ──▶ Fixed-size Hash String

┌───────────────┐      ┌─────────────────────┐      ┌───────────────┐
│ Any size data │─────▶│ SHA or MD5 function │─────▶│ 128 or 256-bit  │
│ (password,    │      │ (processes input)   │      │ hash output    │
│ file, message)│      └─────────────────────┘      └───────────────┘

Build-Up - 7 Steps

1

FoundationWhat is a Hash Function?

Concept: Introduces the basic idea of a hash function as a tool that converts data into a fixed-size string.

A hash function takes any input data and produces a short, fixed-length string called a hash. This hash looks random but is always the same length no matter how big or small the input is. For example, a password like 'mypassword' might become '5f4dcc3b5aa765d61d8327deb882cf99' using MD5.

Result

You get a unique, fixed-size string that represents the original data.

Understanding that hashing creates a consistent digital fingerprint helps you see why it’s useful for verifying data without storing the original.

2

FoundationProperties of Good Hash Functions

3

IntermediateUnderstanding MD5 Hash Algorithm

4

IntermediateExploring SHA Family Algorithms

5

IntermediateHow Hashing Ensures Data Integrity

6

AdvancedSalting Hashes for Password Security

7

ExpertCollision Attacks and Their Impact

Under the Hood

Hashing algorithms process input data in fixed-size blocks through multiple rounds of mathematical operations like bitwise shifts, modular additions, and logical functions. These steps mix and scramble the input bits to produce a fixed-length output that appears random. The design ensures that even a tiny change in input drastically changes the output hash, making it infeasible to reverse or find collisions easily.

Why designed this way?

Hash functions were designed to be fast and deterministic while resisting reverse engineering and collisions. Early algorithms like MD5 prioritized speed but later showed weaknesses. Newer designs like SHA-2 balance speed with stronger security by using more complex operations and longer hash sizes. The tradeoff is between performance and security, evolving as attackers find new vulnerabilities.

Input Data ──▶ [Block Processing] ──▶ [Rounds of Mixing]
   │                  │                     │
   ▼                  ▼                     ▼
┌─────────┐      ┌─────────────┐      ┌──────────────┐
│ Split   │─────▶│ Bitwise and │─────▶│ Final Hash   │
│ into    │      │ modular ops │      │ Output       │
│ blocks  │      └─────────────┘      └──────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Is MD5 still safe for storing passwords? Commit to yes or no.

Common Belief:MD5 is secure enough for password storage because it produces a unique hash.

Tap to reveal reality

Quick: Does hashing encrypt data so it can be decrypted later? Commit to yes or no.

Common Belief:Hashing encrypts data and can be reversed to get the original input.

Tap to reveal reality

Quick: Can two different inputs never produce the same hash? Commit to yes or no.

Common Belief:Hash functions always produce unique hashes for different inputs.

Tap to reveal reality

Quick: Is salting unnecessary if you use a strong hash like SHA-256? Commit to yes or no.

Common Belief:Strong hashes alone are enough; salting is optional.

Tap to reveal reality

Expert Zone

1

Some hashing algorithms like SHA-3 use completely different internal structures (sponge construction) compared to SHA-2, offering resistance to different attack types.

2

Performance of hashing algorithms can vary greatly depending on hardware; some are optimized for CPUs, others for GPUs or specialized chips, affecting their practical security.

3

Hash length impacts security: longer hashes reduce collision chances but increase storage and computation, so choosing the right balance is critical.

When NOT to use

Hashing is not suitable when you need to recover original data; encryption should be used instead. Also, for password storage, use specialized algorithms like bcrypt or Argon2 that include salting and are designed to be slow to resist brute-force attacks.

Production Patterns

In real systems, hashes are combined with salts and stored securely for passwords. Digital signatures use hashes to verify message integrity. File verification tools use hashes to detect corruption. Blockchain technology relies heavily on hashing to link blocks securely.

Connections

Encryption

Complementary security techniques

Understanding hashing alongside encryption clarifies when to use one-way data verification versus reversible data protection.

Digital Signatures

Builds on hashing for integrity checks

Knowing how hashes create unique fingerprints helps grasp how digital signatures verify authenticity and prevent tampering.

Biometrics

Similar concept of unique identifiers

Hashing’s idea of unique fixed-size outputs parallels how biometric systems use unique physical traits to identify individuals securely.

Common Pitfalls

#1Using MD5 to store user passwords.

Wrong approach:hashed_password = md5(user_password)

Correct approach:hashed_password = bcrypt(user_password + salt)

Root cause:Believing MD5 is secure enough without understanding its vulnerabilities and the need for salting and slow hashing.

#2Assuming hashing encrypts data and can be reversed.

Wrong approach:original_data = decrypt(hash_value)

Correct approach:Store original data securely or use encryption if reversibility is needed; hashing is one-way.

Root cause:Confusing hashing with encryption due to similar goals of data protection.

#3Not using salt when hashing passwords.

Wrong approach:hashed_password = sha256(user_password)

Correct approach:hashed_password = sha256(user_password + unique_salt)

Root cause:Underestimating the risk of rainbow table attacks and the importance of unique salts.

Key Takeaways

Hashing algorithms convert data into fixed-size strings that act like digital fingerprints, ensuring data integrity and security.

MD5 is outdated and insecure; modern systems use stronger algorithms like SHA-256 combined with salting for password protection.

Hashing is one-way and cannot be reversed, unlike encryption which is designed to be reversible.

Salting hashes is essential to prevent attackers from using precomputed tables to crack passwords.

Understanding collisions and their risks is critical to choosing the right hashing algorithm for security.