Table of Contents#
- Understanding urllib2 and URLError
- What is urllib2?
- What is URLError?
- HTTPError as a Subclass of URLError
- Basic urllib2 Request and Exception Handling
- Example: Making a Request
- Catching URLError: A Basic Approach
- Retrieving HTTP Status Code from URLError
- Checking for HTTPError Within URLError
- Extracting the Status Code (code attribute)
- Handling Non-HTTP URLErrors
- Advanced Examples and Scenarios
- Example 1: Handling 404 Not Found
- Example 2: Handling 500 Internal Server Error
- Example 3: Network Failure (No Status Code)
- Best Practices for Error Handling in urllib2
- Conclusion
- References
Understanding urllib2 and URLError#
What is urllib2?#
urllib2 is a Python 2 standard library module used to open URLs (e.g., HTTP, HTTPS, FTP). It provides a high-level interface for making requests, handling cookies, and managing authentication. While Python 3 replaced urllib2 with urllib.request and urllib.error, urllib2 remains relevant for legacy Python 2 codebases.
What is URLError?#
URLError is an exception raised by urllib2 when a request fails to reach the server or encounters a generic network/URL issue. Common causes include:
- Network failures (e.g., no internet connection).
- Invalid URLs (e.g., malformed domain names).
- DNS resolution errors (e.g., "could not resolve host").
HTTPError as a Subclass of URLError#
HTTPError is a subclass of URLError that specifically occurs when the server responds with an HTTP error status code (e.g., 404, 500). Unlike generic URLErrors, HTTPError carries detailed information about the server’s response, including:
code: The HTTP status code (e.g., 404, 500).reason: A human-readable status message (e.g., "Not Found", "Internal Server Error").headers: The HTTP response headers.
This distinction is critical: only HTTPError instances contain an HTTP status code. Generic URLErrors (e.g., network issues) do not.
Basic urllib2 Request and Exception Handling#
Example: Making a Request#
To use urllib2, you typically call urllib2.urlopen(url) to send a GET request. For successful responses (2xx status codes), this returns a response object, from which you can retrieve the status code with response.getcode().
Catching URLError: A Basic Approach#
If the request fails, urllib2.urlopen() raises an exception. A basic error-handling pattern might catch URLError to handle all request-related issues:
import urllib2
url = "http://example.com/invalid-page" # This URL may return 404
try:
response = urllib2.urlopen(url)
print("Request succeeded! Status code:", response.getcode()) # Works for 2xx codes
except urllib2.URLError as e:
print("Request failed with URLError:", e)Output (for a 404 response):
Request failed with URLError: HTTP Error 404: Not Found
While this catches the error, it does not extract the HTTP status code (e.g., 404). To get the code, we need to check if the URLError is actually an HTTPError.
Retrieving HTTP Status Code from URLError#
Checking for HTTPError Within URLError#
Since HTTPError is a subclass of URLError, any HTTPError raised will also be caught by a URLError handler. To retrieve the status code, we first check if the exception is an instance of HTTPError using isinstance(e, urllib2.HTTPError).
Extracting the Status Code (code attribute)#
If the exception is an HTTPError, we access its code attribute to get the status code:
import urllib2
url = "http://example.com/invalid-page"
try:
response = urllib2.urlopen(url)
print("Success! Status code:", response.getcode())
except urllib2.URLError as e:
if isinstance(e, urllib2.HTTPError):
# It's an HTTPError: extract status code and reason
print(f"HTTP Error {e.code}: {e.reason}")
else:
# Generic URLError (no status code)
print(f"Non-HTTP URLError: {e.reason}")Output (for 404):
HTTP Error 404: Not Found
Handling Non-HTTP URLErrors#
Not all URLErrors are HTTPErrors. For example, a network failure (e.g., no internet) will raise a generic URLError with no code attribute. In such cases, we handle the error using the reason attribute (e.g., " [Errno 11001] getaddrinfo failed"):
import urllib2
url = "http://invalid-domain.xyz" # Invalid domain (DNS failure)
try:
response = urllib2.urlopen(url)
except urllib2.URLError as e:
if isinstance(e, urllib2.HTTPError):
print(f"HTTP Error {e.code}: {e.reason}")
else:
print(f"Non-HTTP URLError: {e.reason}") # No status codeOutput:
Non-HTTP URLError: [Errno 11001] getaddrinfo failed
Advanced Examples and Scenarios#
Example 1: Handling 404 Not Found#
A 404 error occurs when the requested resource does not exist. Here’s how to extract its status code:
import urllib2
url = "http://httpstat.us/404" # Test URL that returns 404
try:
urllib2.urlopen(url)
except urllib2.URLError as e:
if isinstance(e, urllib2.HTTPError):
print(f"Caught 404! Status code: {e.code}, Reason: {e.reason}")
else:
print(f"Other error: {e.reason}")Output:
Caught 404! Status code: 404, Reason: Not Found
Example 2: Handling 500 Internal Server Error#
A 500 error indicates a server-side issue. The same pattern applies:
import urllib2
url = "http://httpstat.us/500" # Test URL that returns 500
try:
urllib2.urlopen(url)
except urllib2.URLError as e:
if isinstance(e, urllib2.HTTPError):
print(f"Server error! Status code: {e.code}, Reason: {e.reason}")
else:
print(f"Other error: {e.reason}")Output:
Server error! Status code: 500, Reason: Internal Server Error
Example 3: Network Failure (No Status Code)#
If the request fails due to a network issue (e.g., no Wi-Fi), no HTTP status code exists. The URLError will not be an HTTPError:
import urllib2
url = "http://example.com" # Valid URL, but disconnect your internet first!
try:
urllib2.urlopen(url)
except urllib2.URLError as e:
if isinstance(e, urllib2.HTTPError):
print(f"HTTP Error: {e.code}")
else:
print(f"No status code available. Error: {e.reason}")Output (with no internet):
No status code available. Error: [Errno 10061] No connection could be made because the target machine actively refused it
Best Practices for Error Handling in urllib2#
-
Catch Specific Exceptions First: Since
HTTPErroris a subclass ofURLError, catchHTTPErrorbeforeURLErrorif you want separate handling for HTTP errors and generic issues:try: urllib2.urlopen(url) except urllib2.HTTPError as e: # Catch HTTP errors first print(f"HTTP Error {e.code}: {e.reason}") except urllib2.URLError as e: # Then catch generic URLErrors print(f"URL Error: {e.reason}") -
Log Details: Always log the
code,reason, andheaders(forHTTPError) to debug issues later. -
Handle Retries for 5xx Errors: For server-side errors (5xx), consider retrying the request with a backoff strategy.
-
Validate URLs Before Requests: Check for malformed URLs (e.g., missing
http://) to avoid unnecessaryURLErrors.
Conclusion#
Retrieving the HTTP status code from urllib2.URLError exceptions requires distinguishing between generic URLErrors and HTTPError subclasses. By checking if the exception is an instance of HTTPError, you can access its code attribute to get the status code. For non-HTTP URLErrors (e.g., network issues), no status code exists, and you should handle the error using the reason attribute.
This approach ensures your application gracefully handles both HTTP-specific errors and generic network issues, making it more robust and user-friendly.
References#
- Python 2.7
urllib2Documentation: https://docs.python.org/2/library/urllib2.html URLErrorClass: https://docs.python.org/2/library/urllib2.html#urllib2.URLErrorHTTPErrorClass: https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError- HTTP Status Codes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status