Python 3 urllib examples
Posted onThis article is the missing manual for Python 3’s urllib. It shows you how to do basic things that are not clearly described in the official documentation.
The requests library is much easier to use than urllib. Only use urllib if you want to avoid external dependencies.
Request a URL, read response content
To make an HTTP request download a page with urllib, you must call urllib.request.urlopen().
import urllib.request
response = urllib.request.urlopen('https://nicolasbouliane.com')
response_content = response.read()
print(response_content)
# "<!doctype html>\n<html..."
A few notes:
- urlopen() returns a http.client.HTTPResponse object
- urlopen() automatically follows redirects
- urlopen() throws an HTTPError on when the server returns an error response like HTTP 404 or 500.
Get response status code
To make an HTTP request download a page with urllib, you must call urllib.request.urlopen().
from urllib.error import HTTPError
import urllib.request
try:
response = urllib.request.urlopen('https://nicolasbouliane.com')
response_status = response.status # 200, 301, etc
except HTTPError as error:
response_status = error.code # 404, 500, etc
A few notes:
- urlopen() automatically follows redirects. You will see the status code of the destination URL.
- urlopen() throws anHTTPError on when the server returns an error response like HTTP 404 or 500.
Get response headers
urllib.request.urlopen() returns a http.client.HTTPResponse object. You get headers by calling response.getheaders() or getheader(header_name).
import urllib.request
response = urllib.request.urlopen('https://nicolasbouliane.com')
headers = response.getheaders()
content_type = response.getheader('Content-Type')
print(headers)
# [('Content-Type', 'text/html; charset=utf-8'), ('Transfer-Encoding', 'chunked'), ...]
print(content_type)
# "text/html; charset=utf-8"
A few notes:
getheader()
is not case-sensitive.getheader('Date')
and getheader(‘date’) will return the same value.getheaders()
returns a list of two-tuples, not a dict.