Security Concerns¶
The use of insecure hashing functions in Python represents a critical cryptographic weakness. While the hashlib library provides robust cryptographic tools, it still allows the use of fundamentally broken algorithms like MD5 and SHA-1, which should be avoided in security-sensitive contexts.
Why MD5 and SHA-1 Are Insecure¶
Both MD5 and SHA-1 are considered cryptographically broken for security purposes:
Collision Attacks: An attacker can find two different inputs that produce the same hash digest. For MD5, collisions can be generated in seconds on consumer hardware. For SHA-1, practical collision attacks (SHAttered) have been demonstrated at a cost of approximately $110,000 in cloud computing time.
Preimage Resistance Failure: While still computationally difficult, both algorithms show weaknesses that make them less resistant to finding inputs that match a given hash.
Length Extension Attacks: Both MD5 and SHA-1 are vulnerable to length extension attacks, where an attacker can compute a valid hash for a message consisting of a secret plus additional data without knowing the secret.
Real-World Exploits and Risks¶
The dangers of using these algorithms are not theoretical:
Certificate Forgery: SHA-1 collisions have been used to create fraudulent SSL/TLS certificates, enabling man-in-the-middle attacks.
File Integrity Compromise: Attackers can create malicious files that have the same MD5 or SHA-1 hash as legitimate files, bypassing integrity checks.
Password Cracking: MD5 and SHA-1 are fast algorithms, making them susceptible to brute-force and dictionary attacks. Modern GPU clusters can perform billions of MD5 hashes per second.
Data Breaches: Leaked passwords hashed with MD5 or SHA-1 can be quickly cracked, exposing user credentials.
When Insecure Hashing Is Used¶
Despite their known weaknesses, MD5 and SHA-1 appear in codebases for several reasons:
Legacy Compatibility: Older systems or protocols that require these algorithms
Non-Security Contexts: Checksums for duplicate detection or data integrity (not cryptographic)
Convenience: The algorithms are fast and readily available
Python’s usedforsecurity Parameter¶
From Python 3.9 onward, hashlib constructors include a keyword-only usedforsecurity parameter with a default value of True. Setting it to False allows the use of insecure algorithms, but requires explicit acknowledgment:
import hashlib
# This will raise a ValueError in Python 3.9+ unless usedforsecurity=False
md5_hash = hashlib.md5(b"data") # ValueError in FIPS mode
# Explicit opt-out - indicates non-security use
md5_hash = hashlib.md5(b"data", usedforsecurity=False)Critical Warning: Setting usedforsecurity=False explicitly acknowledges you are using the algorithm outside a security context (e.g., as a non-cryptographic checksum). This should never be done for password hashing, digital signatures, or any security-critical operation.
Preventive Measures¶
1. Use Secure Hashing Algorithms¶
Replace MD5 and SHA-1 with cryptographically strong alternatives:
| Use Case | Recommended Algorithm |
|---|---|
| Password hashing | Argon2id, bcrypt, scrypt, or PBKDF2 |
| File integrity | SHA-256, SHA-384, or SHA-512 |
| Digital signatures | SHA-256, SHA-384, or SHA-512 |
| Checksums (non-security) | MD5, SHA-1 (with usedforsecurity=False) |
2. Implement Password Hashing Correctly¶
For password storage, never use plain hashing algorithms like SHA-256 directly. Instead, use dedicated password hashing functions:
import hashlib
import os
# GOOD: Using a password hashing algorithm
from passlib.hash import argon2
# Argon2id is the current gold standard
hash = argon2.hash("user_password")
# Or using hashlib's PBKDF2
salt = os.urandom(32)
key = hashlib.pbkdf2_hmac(
'sha256', # Hash algorithm
b'password', # Password
salt, # Salt
600000, # Iterations (OWASP recommended minimum)
dklen=32 # Desired key length
)3. Use HMAC for Message Authentication¶
For message authentication, use HMAC with a secure hash algorithm:
import hmac
import hashlib
# GOOD: HMAC with SHA-256
secret_key = b'supersecretkey'
message = b'Important data'
signature = hmac.new(secret_key, message, hashlib.sha256).hexdigest()4. Explicitly Mark Non-Security Uses¶
If you must use MD5 or SHA-1 for non-security purposes (e.g., checksums for duplicate detection), explicitly mark them as such:
import hashlib
# ACCEPTABLE: Explicitly marking as non-security use
# Used only for deduplication, not security-critical
checksum = hashlib.md5(b"file_content", usedforsecurity=False).hexdigest()
# BETTER: Consider using xxHash or other non-cryptographic hashes
# for performance-critical checksums
import xxhash
fast_checksum = xxhash.xxh64(b"file_content").hexdigest()Example¶
Vulnerable Implementation¶
import hashlib
import hmac
# VULNERABLE: Using MD5 for password hashing
def store_password_md5(password):
hash_obj = hashlib.md5(password.encode())
return hash_obj.hexdigest() # Broken!
# VULNERABLE: Using SHA-1 for integrity checks
def verify_integrity_sha1(filename, expected_hash):
with open(filename, 'rb') as f:
content = f.read()
actual_hash = hashlib.sha1(content).hexdigest()
return actual_hash == expected_hash # Weak!
# VULNERABLE: Using MD5 for HMAC
def create_hmac_md5(secret, message):
return hmac.new(secret, message, hashlib.md5).hexdigest() # Broken!Secure Implementation¶
import hashlib
import hmac
import os
from passlib.hash import argon2 # Install: pip install passlib
# GOOD: Password hashing with Argon2id
def store_password_secure(password):
# Argon2id is the winner of the Password Hashing Competition
hash = argon2.hash(password)
return hash
# GOOD: Integrity checks with SHA-256
def verify_integrity_secure(filename, expected_hash):
BUF_SIZE = 65536 # 64KB chunks
sha256 = hashlib.sha256()
with open(filename, 'rb') as f:
while chunk := f.read(BUF_SIZE):
sha256.update(chunk)
return sha256.hexdigest() == expected_hash
# GOOD: HMAC with SHA-256
def create_hmac_secure(secret, message):
return hmac.new(secret, message, hashlib.sha256).hexdigest()
# ACCEPTABLE: Non-security checksum (explicitly marked)
def deduplicate_checksum(content):
# Used only for duplicate detection, not security-critical
return hashlib.md5(content, usedforsecurity=False).hexdigest()Discussion¶
When Is It OK to Use MD5 or SHA-1?¶
The short answer is: almost never for security purposes. However, there are limited legitimate use cases:
Non-cryptographic checksums for duplicate detection (e.g., file deduplication)
Integrity checks in non-security-critical contexts (e.g., verifying downloads from trusted sources)
Compatibility with legacy systems that cannot be updated
Educational or research purposes
Even in these cases:
Explicitly mark with
usedforsecurity=FalseDocument why the insecure algorithm is being used
Consider using faster non-cryptographic hashes like xxHash or CityHash
The usedforsecurity Parameter Caveat¶
While usedforsecurity=False provides a way to use insecure algorithms, it creates a risk:
False Sense of Security: Developers might incorrectly believe that setting this flag somehow makes the algorithm secure
Accidental Misuse: Code might be copied into security-critical contexts without updating the algorithm
Maintenance Overhead: Future developers might not understand why the flag is set
Recommendation: When you see usedforsecurity=False in a code review, demand:
Justification for using an insecure algorithm
Documentation explaining the non-security use case
Consideration of whether a non-cryptographic hash would be more appropriate
