Table of Contents
- Pitfall 1: Mismanaging Time Zones with
datetime - Pitfall 2: Neglecting
pathlibin Favor ofos.path - Pitfall 3: Regex Blunders in
re(Greedy Matching & Raw Strings) - Pitfall 4: Unhandled Non-Serializable Types in
json - Pitfall 5: Orphaned Temporary Files with
tempfile - Pitfall 6: Botched Logging Configuration
- Pitfall 7: Unsafe
subprocessUsage - Pitfall 8:
urllibTimeouts and SSL Risks - Pitfall 9: Itertools Iterator Exhaustion
- Conclusion
- References
Pitfall 1: Mismanaging Time Zones with datetime
The datetime module is essential for working with dates and times, but its handling of time zones is notoriously error-prone. A common mistake is using “naive” datetime objects (without time zone info) in applications that require time zone awareness, leading to bugs like incorrect comparisons or daylight saving time (DST) errors.
The Problem: Naive vs. Aware Datetime Objects
A “naive” datetime object (e.g., datetime(2024, 3, 10, 2, 30)) has no concept of time zones or DST. If you try to compare or convert it to a time zone-aware object (e.g., UTC), Python will throw an error or produce incorrect results.
Example: Comparing Naive and Aware Datetimes
from datetime import datetime
from zoneinfo import ZoneInfo # Python 3.9+; use `pytz` for older versions
# Naive datetime (no time zone)
naive_dt = datetime(2024, 3, 10, 2, 30)
# Aware datetime (New York time, which observes DST)
ny_tz = ZoneInfo("America/New_York")
aware_dt = datetime(2024, 3, 10, 2, 30, tzinfo=ny_tz)
# Attempt to compare (will raise TypeError)
print(naive_dt == aware_dt)
# Output: TypeError: can't compare offset-naive and offset-aware datetimes
The Fix: Use Time Zone-Aware Objects
Always use time zone-aware datetime objects when working across time zones. Python 3.9+ includes the zoneinfo module (backed by the system’s time zone data), and pytz is a popular third-party alternative for older versions.
Example: Creating and Comparing Aware Datetimes
from datetime import datetime
from zoneinfo import ZoneInfo
# Create aware datetimes for New York and UTC
ny_tz = ZoneInfo("America/New_York")
utc_tz = ZoneInfo("UTC")
ny_dt = datetime(2024, 3, 10, 2, 30, tzinfo=ny_tz)
utc_dt = datetime(2024, 3, 10, 7, 30, tzinfo=utc_tz) # NY is UTC-5 in standard time
# Convert NY time to UTC for safe comparison
ny_dt_utc = ny_dt.astimezone(utc_tz)
print(ny_dt_utc == utc_dt) # Output: True
Best Practices:
- Always attach time zone info to
datetimeobjects (usezoneinfoorpytz). - Store times in UTC internally; convert to local time only for display.
- Avoid ambiguous times (e.g., DST transitions) by using time zone databases.
Pitfall 2: Neglecting pathlib in Favor of os.path
File path handling is a core task, but many developers still rely on os.path functions (e.g., os.path.join, os.path.exists) instead of the modern pathlib module. While os.path works, it uses string-based paths that are error-prone and harder to read.
The Problem: String-Based Paths Are Fragile
os.path functions return strings, which require manual concatenation and are prone to errors like missing slashes or OS-specific separator issues (e.g., / vs. \).
Example: Error-Prone os.path Usage
import os
# Risky: manual string concatenation (misses slashes on some OS)
data_dir = "/data"
file_path = data_dir + "output.txt" # Oops! Becomes "/dataoutput.txt"
# Better, but still string-based:
file_path = os.path.join(data_dir, "output.txt") # Correct: "/data/output.txt"
# Checking if the file exists (another string operation)
if os.path.exists(file_path):
with open(file_path, "r") as f:
...
The Fix: Use pathlib for Object-Oriented Paths
pathlib (introduced in Python 3.4) wraps paths in objects, enabling method chaining, OS-agnostic handling, and cleaner code.
Example: Clean pathlib Usage
from pathlib import Path
data_dir = Path("/data")
file_path = data_dir / "output.txt" # Intuitive: uses OS-specific separators
# Check existence with a method, not a function
if file_path.exists():
with file_path.open("r") as f: # Open directly from the Path object
...
# Bonus: Easily get parent directory, file name, or suffix
print(file_path.parent) # Output: /data
print(file_path.name) # Output: output.txt
print(file_path.suffix) # Output: .txt
Best Practices:
- Use
pathlib.Pathfor all path operations. - Leverage method chaining (
Path("/data") / "subdir" / "file.txt"). - Prefer
Path.open()overopen(Path), as it’s more readable.
Pitfall 3: Regex Blunders in re (Greedy Matching & Raw Strings)
The re module for regular expressions is powerful, but two common mistakes trip up developers: greedy quantifiers and forgetting raw strings.
Pitfall 3.1: Greedy vs. Non-Greedy Quantifiers
Regex quantifiers like * (match 0+ times) and + (match 1+ times) are “greedy” by default—they match as much as possible. This can lead to over-matching.
Example: Greedy Matching Gone Wrong
import re
text = "<div>First</div><div>Second</div>"
# Greedy: matches from the first '<' to the last '>'
greedy_match = re.search(r"<div>.*</div>", text)
print(greedy_match.group()) # Output: <div>First</div><div>Second</div> (too much!)
# Fix: Use non-greedy quantifier '.*?' (add '?')
non_greedy_match = re.search(r"<div>.*?</div>", text)
print(non_greedy_match.group()) # Output: <div>First</div> (correct)
Pitfall 3.2: Forgetting Raw Strings
Regex patterns often contain backslashes (e.g., \d for digits). Without raw strings (r"..."), Python interprets backslashes as escape characters, leading to unexpected behavior.
Example: Missing Raw Strings
import re
# Without raw string: Python interprets '\d' as an escape sequence (invalid here)
pattern = "\d{3}-\d{2}-\d{4}" # Error: 'd' is not a valid escape character
# With raw string: backslashes are treated literally
pattern = r"\d{3}-\d{2}-\d{4}" # Correct: matches SSN-like patterns (e.g., 123-45-6789)
Best Practices:
- Use non-greedy quantifiers (
*?,+?) when matching minimal text. - Always wrap regex patterns in raw strings (
r"...") to avoid escape character issues. - Compile patterns with
re.compile()for repeated use (improves performance).
Pitfall 4: Unhandled Non-Serializable Types in json
The json module serializes Python objects to JSON, but it only supports basic types (str, int, float, list, dict, bool, None). Trying to serialize non-serializable objects (e.g., datetime, set, custom classes) raises a TypeError.
The Problem: datetime and Other Non-Serializable Types
A common example is serializing a datetime object, which json.dumps cannot handle by default.
Example: Serialization Failure
import json
from datetime import datetime
data = {
"event": "login",
"timestamp": datetime(2024, 1, 1, 12, 0, 0) # Non-serializable!
}
json.dumps(data) # Raises TypeError: Object of type datetime is not JSON serializable
The Fix: Use the default Parameter
The json.dumps function accepts a default argument to handle non-serializable types. Define a custom serializer for unsupported objects.
Example: Serializing datetime with default
import json
from datetime import datetime
def serialize(obj):
if isinstance(obj, datetime):
return obj.isoformat() # Convert datetime to ISO 8601 string
raise TypeError(f"Type {type(obj)} not serializable")
data = {"event": "login", "timestamp": datetime(2024, 1, 1, 12, 0, 0)}
json_str = json.dumps(data, default=serialize)
print(json_str)
# Output: {"event": "login", "timestamp": "2024-01-01T12:00:00"}
Best Practices:
- Use
defaultto serialize custom/non-serializable types. - Document serialized formats (e.g., ISO 8601 for datetimes).
- Use
json.loadswithobject_hookto deserialize back to Python objects.
Pitfall 5: Orphaned Temporary Files with tempfile
The tempfile module creates temporary files/directories, but improper usage can leave files orphaned, wasting disk space or causing security risks.
The Problem: Not Cleaning Up Temporary Files
By default, tempfile.NamedTemporaryFile deletes files when closed, but if the file handle is not closed (e.g., due to an exception), the file persists.
Example: Orphaned Temp File
import tempfile
# Risky: file may not close if an error occurs
temp_file = tempfile.NamedTemporaryFile(mode="w+")
temp_file.write("sensitive data")
# ... if an exception is raised here, temp_file is not closed ...
temp_file.close() # Manual close (error-prone)
The Fix: Use Context Managers
The with statement ensures temporary files are closed and deleted automatically, even if an error occurs.
Example: Safe Temp File Handling
import tempfile
with tempfile.NamedTemporaryFile(mode="w+", delete=True) as temp_file: # delete=True is default
temp_file.write("sensitive data")
temp_file.seek(0)
print(temp_file.read()) # Read back data
# File is automatically closed and deleted here
Best Practices:
- Always use
withstatements for temporary files/directories. - Avoid
delete=Falseunless you explicitly need the file to persist. - Use
tempfile.TemporaryDirectoryfor temporary directories (also context-manager-friendly).
Pitfall 6: Botched Logging Configuration
The logging module is critical for debugging, but misconfiguration is rampant. Common issues include missing logs, duplicate output, or unhandled exceptions.
The Problem: Default Log Level and Misplaced basicConfig
By default, logging only shows messages of level WARNING or higher. Many developers forget to configure the log level, leading to missing DEBUG/INFO messages. Additionally, logging.basicConfig has no effect if called after adding handlers.
Example: Missing Logs Due to Default Level
import logging
logging.debug("Debug message") # Not shown (default level is WARNING)
logging.info("Info message") # Not shown
logging.warning("Warning message") # Shown
The Fix: Configure Logging Early
Set the log level explicitly with basicConfig before logging messages.
Example: Proper Logging Setup
import logging
# Configure logging FIRST (level=DEBUG to show all messages)
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logging.debug("Debug message") # Now shown
logging.info("Info message") # Now shown
Best Practices:
- Call
logging.basicConfigat the start of your application. - Use
logging.getLogger(__name__)for module-specific logging. - Avoid
print()for debugging—uselogging.debug()instead.
Pitfall 7: Unsafe subprocess Usage
The subprocess module runs external commands, but risky practices like shell=True or unvalidated input can lead to security vulnerabilities or deadlocks.
The Problem: shell=True and Shell Injection
Using shell=True spawns a shell to run the command, which is convenient but dangerous with untrusted input (e.g., user-provided arguments).
Example: Shell Injection Risk
import subprocess
user_input = "; rm -rf /" # Malicious input
subprocess.run(f"ls {user_input}", shell=True) # Executes "ls; rm -rf /" (disaster!)
The Fix: Use shell=False and Argument Lists
Avoid shell=True unless necessary. Pass commands as lists to prevent shell injection.
Example: Safe Subprocess Call
import subprocess
# Safe: pass args as a list (no shell parsing)
subprocess.run(["ls", "/tmp"], shell=False) # No risk of injection
# Use check=True to raise an error if the command fails
try:
subprocess.run(["invalid_command"], check=True, shell=False)
except subprocess.CalledProcessError as e:
print(f"Command failed: {e}")
Best Practices:
- Use
shell=Falseand pass arguments as a list. - Set
timeoutto prevent hanging commands. - Use
check=Trueto catch failed commands early.
Pitfall 8: urllib Timeouts and SSL Risks
The urllib module handles HTTP requests, but missing timeouts or disabled SSL verification can lead to hanging processes or security breaches.
The Problem: No Timeout and Disabled SSL Verification
Without a timeout, urllib.request.urlopen can hang indefinitely. Disabling SSL verification (e.g., to bypass self-signed certificates) exposes you to man-in-the-middle attacks.
Example: Risky urllib Usage
from urllib.request import urlopen
import ssl
# No timeout: request may hang forever
response = urlopen("https://slow-api.example.com")
# Disabling SSL verification (unsafe!)
context = ssl._create_unverified_context() # Bypasses certificate checks
response = urlopen("https://untrusted-site.example.com", context=context)
The Fix: Enforce Timeouts and Validate SSL
Always set a timeout and never disable SSL verification in production.
Example: Safe urllib Usage
from urllib.request import urlopen
# Set timeout (in seconds)
try:
response = urlopen("https://api.example.com", timeout=10) # Fails after 10s
print(response.read())
except TimeoutError:
print("Request timed out")
Best Practices:
- Always include
timeoutfor network requests. - Use
ssl.create_default_context()(default) to verify SSL certificates. - For self-signed certificates (development only), use a custom context with the CA certificate.
Pitfall 9: Itertools Iterator Exhaustion
The itertools module provides efficient iterators, but many developers forget that iterators are exhausted after one use, leading to unexpected empty results.
The Problem: Iterators Are Single-Pass
Functions like itertools.chain, itertools.product, or itertools.combinations return iterators, which are consumed after the first iteration.
Example: Exhausted Iterator
import itertools
# Chain two lists into an iterator
numbers = itertools.chain([1, 2, 3], [4, 5, 6])
# First iteration works
print(list(numbers)) # Output: [1, 2, 3, 4, 5, 6]
# Second iteration: iterator is exhausted
print(list(numbers)) # Output: []
The Fix: Convert to List for Multiple Passes
If you need to iterate multiple times, convert the iterator to a list first.
Example: Preserving Data with Lists
import itertools
# Convert iterator to list for multiple passes
numbers = list(itertools.chain([1, 2, 3], [4, 5, 6]))
print(list(numbers)) # Output: [1, 2, 3, 4, 5, 6]
print(list(numbers)) # Output: [1, 2, 3, 4, 5, 6] (still works)
Best Practices:
- Remember that iterators are single-pass.
- Convert to a list if you need to reuse the data.
- Use
itertools.teeto create multiple independent iterators from one.
Conclusion
Python’s standard library is a powerful tool, but its depth means even experienced developers can stumble over hidden behaviors. By avoiding these pitfalls—whether mismanaging time zones, neglecting pathlib, or misconfiguring logging—you’ll write more reliable, secure, and efficient code.
Always consult the Python Standard Library Documentation for module-specific details, and test edge cases rigorously. With careful usage, the standard library will remain your most trusted ally in Python development.
References
- Python Standard Library Documentation
- datetime — Basic date and time types
- pathlib — Object-oriented filesystem paths
- re — Regular expression operations
- json — JSON encoder and decoder
- tempfile — Generate temporary files and directories
- logging — Logging facility for Python
- subprocess — Subprocess management
- urllib — URL handling modules
- itertools — Functions creating iterators for efficient looping