py4u blog

How to Retrieve HTTP Status Code When Handling urllib2 URLError Exceptions in Python

When working with HTTP requests in Python, handling errors gracefully is critical for building robust applications. The urllib2 library (a staple in Python 2 for making HTTP requests) provides tools to send requests and handle responses, but exceptions like URLError can be tricky to parse—especially when you need to extract specific details like the HTTP status code (e.g., 404 for "Not Found" or 500 for "Internal Server Error").

While URLError is a broad exception that covers various issues (e.g., network failures, invalid URLs), HTTP-specific errors (like 4xx or 5xx responses) are encapsulated in a subclass called HTTPError. The key challenge is distinguishing between generic URLErrors and HTTPErrors to retrieve the status code.

This blog will guide you through understanding urllib2 exceptions, distinguishing HTTPError from other URLErrors, and extracting the HTTP status code effectively.

2026-01

Table of Contents#

  1. Understanding urllib2 and URLError
    • What is urllib2?
    • What is URLError?
    • HTTPError as a Subclass of URLError
  2. Basic urllib2 Request and Exception Handling
    • Example: Making a Request
    • Catching URLError: A Basic Approach
  3. Retrieving HTTP Status Code from URLError
    • Checking for HTTPError Within URLError
    • Extracting the Status Code (code attribute)
    • Handling Non-HTTP URLErrors
  4. Advanced Examples and Scenarios
    • Example 1: Handling 404 Not Found
    • Example 2: Handling 500 Internal Server Error
    • Example 3: Network Failure (No Status Code)
  5. Best Practices for Error Handling in urllib2
  6. Conclusion
  7. References

Understanding urllib2 and URLError#

What is urllib2?#

urllib2 is a Python 2 standard library module used to open URLs (e.g., HTTP, HTTPS, FTP). It provides a high-level interface for making requests, handling cookies, and managing authentication. While Python 3 replaced urllib2 with urllib.request and urllib.error, urllib2 remains relevant for legacy Python 2 codebases.

What is URLError?#

URLError is an exception raised by urllib2 when a request fails to reach the server or encounters a generic network/URL issue. Common causes include:

  • Network failures (e.g., no internet connection).
  • Invalid URLs (e.g., malformed domain names).
  • DNS resolution errors (e.g., "could not resolve host").

HTTPError as a Subclass of URLError#

HTTPError is a subclass of URLError that specifically occurs when the server responds with an HTTP error status code (e.g., 404, 500). Unlike generic URLErrors, HTTPError carries detailed information about the server’s response, including:

  • code: The HTTP status code (e.g., 404, 500).
  • reason: A human-readable status message (e.g., "Not Found", "Internal Server Error").
  • headers: The HTTP response headers.

This distinction is critical: only HTTPError instances contain an HTTP status code. Generic URLErrors (e.g., network issues) do not.

Basic urllib2 Request and Exception Handling#

Example: Making a Request#

To use urllib2, you typically call urllib2.urlopen(url) to send a GET request. For successful responses (2xx status codes), this returns a response object, from which you can retrieve the status code with response.getcode().

Catching URLError: A Basic Approach#

If the request fails, urllib2.urlopen() raises an exception. A basic error-handling pattern might catch URLError to handle all request-related issues:

import urllib2
 
url = "http://example.com/invalid-page"  # This URL may return 404
 
try:
    response = urllib2.urlopen(url)
    print("Request succeeded! Status code:", response.getcode())  # Works for 2xx codes
except urllib2.URLError as e:
    print("Request failed with URLError:", e)

Output (for a 404 response):

Request failed with URLError: HTTP Error 404: Not Found

While this catches the error, it does not extract the HTTP status code (e.g., 404). To get the code, we need to check if the URLError is actually an HTTPError.

Retrieving HTTP Status Code from URLError#

Checking for HTTPError Within URLError#

Since HTTPError is a subclass of URLError, any HTTPError raised will also be caught by a URLError handler. To retrieve the status code, we first check if the exception is an instance of HTTPError using isinstance(e, urllib2.HTTPError).

Extracting the Status Code (code attribute)#

If the exception is an HTTPError, we access its code attribute to get the status code:

import urllib2
 
url = "http://example.com/invalid-page"
 
try:
    response = urllib2.urlopen(url)
    print("Success! Status code:", response.getcode())
except urllib2.URLError as e:
    if isinstance(e, urllib2.HTTPError):
        # It's an HTTPError: extract status code and reason
        print(f"HTTP Error {e.code}: {e.reason}")
    else:
        # Generic URLError (no status code)
        print(f"Non-HTTP URLError: {e.reason}")

Output (for 404):

HTTP Error 404: Not Found

Handling Non-HTTP URLErrors#

Not all URLErrors are HTTPErrors. For example, a network failure (e.g., no internet) will raise a generic URLError with no code attribute. In such cases, we handle the error using the reason attribute (e.g., " [Errno 11001] getaddrinfo failed"):

import urllib2
 
url = "http://invalid-domain.xyz"  # Invalid domain (DNS failure)
 
try:
    response = urllib2.urlopen(url)
except urllib2.URLError as e:
    if isinstance(e, urllib2.HTTPError):
        print(f"HTTP Error {e.code}: {e.reason}")
    else:
        print(f"Non-HTTP URLError: {e.reason}")  # No status code

Output:

Non-HTTP URLError: [Errno 11001] getaddrinfo failed

Advanced Examples and Scenarios#

Example 1: Handling 404 Not Found#

A 404 error occurs when the requested resource does not exist. Here’s how to extract its status code:

import urllib2
 
url = "http://httpstat.us/404"  # Test URL that returns 404
 
try:
    urllib2.urlopen(url)
except urllib2.URLError as e:
    if isinstance(e, urllib2.HTTPError):
        print(f"Caught 404! Status code: {e.code}, Reason: {e.reason}")
    else:
        print(f"Other error: {e.reason}")

Output:

Caught 404! Status code: 404, Reason: Not Found

Example 2: Handling 500 Internal Server Error#

A 500 error indicates a server-side issue. The same pattern applies:

import urllib2
 
url = "http://httpstat.us/500"  # Test URL that returns 500
 
try:
    urllib2.urlopen(url)
except urllib2.URLError as e:
    if isinstance(e, urllib2.HTTPError):
        print(f"Server error! Status code: {e.code}, Reason: {e.reason}")
    else:
        print(f"Other error: {e.reason}")

Output:

Server error! Status code: 500, Reason: Internal Server Error

Example 3: Network Failure (No Status Code)#

If the request fails due to a network issue (e.g., no Wi-Fi), no HTTP status code exists. The URLError will not be an HTTPError:

import urllib2
 
url = "http://example.com"  # Valid URL, but disconnect your internet first!
 
try:
    urllib2.urlopen(url)
except urllib2.URLError as e:
    if isinstance(e, urllib2.HTTPError):
        print(f"HTTP Error: {e.code}")
    else:
        print(f"No status code available. Error: {e.reason}")

Output (with no internet):

No status code available. Error: [Errno 10061] No connection could be made because the target machine actively refused it

Best Practices for Error Handling in urllib2#

  1. Catch Specific Exceptions First: Since HTTPError is a subclass of URLError, catch HTTPError before URLError if you want separate handling for HTTP errors and generic issues:

    try:
        urllib2.urlopen(url)
    except urllib2.HTTPError as e:  # Catch HTTP errors first
        print(f"HTTP Error {e.code}: {e.reason}")
    except urllib2.URLError as e:  # Then catch generic URLErrors
        print(f"URL Error: {e.reason}")
  2. Log Details: Always log the code, reason, and headers (for HTTPError) to debug issues later.

  3. Handle Retries for 5xx Errors: For server-side errors (5xx), consider retrying the request with a backoff strategy.

  4. Validate URLs Before Requests: Check for malformed URLs (e.g., missing http://) to avoid unnecessary URLErrors.

Conclusion#

Retrieving the HTTP status code from urllib2.URLError exceptions requires distinguishing between generic URLErrors and HTTPError subclasses. By checking if the exception is an instance of HTTPError, you can access its code attribute to get the status code. For non-HTTP URLErrors (e.g., network issues), no status code exists, and you should handle the error using the reason attribute.

This approach ensures your application gracefully handles both HTTP-specific errors and generic network issues, making it more robust and user-friendly.

References#