Learn to work with the Python httplib2
module. The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.
Python httplib2
module provides methods for accessing Web resources via HTTP. It supports many features, such as HTTP and HTTPS, authentication, caching, redirects, and compression.
$ service nginx status * nginx is running |
We run nginx web server on localhost. Some of our examples will connect to PHP scripts on a locally running nginx server.
Table of Contents Check httplib2 Library Version Use httplib2 to Read Web Page Send HTTP HEAD Request Send HTTP GET Request Send HTTP POST Request Send User Agent Information Add Username/Password to Request
Check httplib2 Library Version
The first program prints the version of the library, its copyright, and the documentation string.
#!/usr/bin/python3 import httplib2 print (httplib2.__version__) print (httplib2.__copyright__) print (httplib2.__doc__) |
The httplib2.__version__
gives the version of the httplib2
library, the httplib2.__copyright__
gives its copyright, and the httplib2.__doc__
its documentation string.
$ . / version.py 0.8 Copyright 2006 , Joe Gregorio httplib2 A caching http interface that supports ETags and gzip to conserve bandwidth. Requires Python 3.0 or later Changelog: 2009 - 05 - 28 , Pilgrim: ported to Python 3 2007 - 08 - 18 , Rick: Modified so it's able to use a socks proxy if needed. |
This is a sample output of the example.
Use httplib2 to Read Web Page
In the following example we show how to grab HTML content from a website called www.something.com.
#!/usr/bin/python3 import httplib2 http = httplib2.Http() print (content.decode()) |
An HTTP client is created with httplib2.HTTP()
. A new HTTP request is created with the request()
method; by default, it is a GET request. The return value is a tuple of response and content.
$ . / get_content.py <html><head><title>Something.< / title>< / head> <body>Something.< / body> < / html> |
This is the output of the example.
Stripping HTML tags
The following program gets a small web page and strips its HTML tags.
#!/usr/bin/python3 import httplib2 import re http = httplib2.Http() stripped = re.sub( '<[^<]+?>' , '', content.decode()) print (stripped) |
A simple regular expression is used to strip the HTML tags. Note that we are stripping data, we do not sanitize them. (These are two different things.)
$ . / strip_tags.py Something. Something. |
The script prints the web page’s title and content.
Check Response Status
The response object contains a status
property which gives the status code of the response.
#!/usr/bin/python3 import httplib2 http = httplib2.Http() print (resp.status) print (resp.status) |
We perform two HTTP requests with the request()
method and check for the returned status.
$ . / get_status.py 200 404 |
200 is a standard response for successful HTTP requests and 404 tells that the requested resource could not be found.
Send HTTP HEAD Request
The HTTP HEAD method retrieves document headers. The header consists of fields, including date, server, content type, or last modification time.
#!/usr/bin/python3 import httplib2 http = httplib2.Http() print ( "Server: " + resp[ 'server' ]) print ( "Last modified: " + resp[ 'last-modified' ]) print ( "Content type: " + resp[ 'content-type' ]) print ( "Content length: " + resp[ 'content-length' ]) |
The example prints the server, last modification time, content type, and content length of the www.something.com
web page.
$ . / do_head.py Server: Apache / 2.4 . 12 (FreeBSD) OpenSSL / 1.0 . 1l - freebsd mod_fastcgi / mod_fastcgi - SNAP - 0910052141 Last modified: Mon, 25 Oct 1999 15 : 36 : 02 GMT Content type : text / html Content length: 72 |
This is the output of the program. From the output we can see that the web page is delivered by Apache web server, which is hosted by FreeBSD. The document was last modified in 1999. The web page is an HTML document whose length is 72 bytes.
Send HTTP GET Request
The HTTP GET method requests a representation of the specified resource. For this example, we are also going to use the greet.php
script:
<?php echo "Hello " . htmlspecialchars($_GET[ 'name' ]); ?> |
Inside the /usr/share/nginx/html/
directory, we have this greet.php
file. The script returns the value of the name
variable, which was retrieved from the client.
The htmlspecialchars()
function converts special characters to HTML entities; e.g. & to &.
#!/usr/bin/python3 import httplib2 http = httplib2.Http() method = "GET" )[ 1 ] print (content.decode()) |
The script sends a variable with a value to the PHP script on the server. The variable is specified directly in the URL.
$ . / mget.py Hello Peter |
This is the output of the example.
$ tail - 1 / var / log / nginx / access.log 127.0 . 0.1 - - [ 21 / Aug / 2016 : 17 : 32 : 31 + 0200 ] "GET /greet.php?name=Peter HTTP/1.1" 200 42 "-" "Python-httplib2/0.8 (gzip)" |
We examine the nginx access log.
Send HTTP POST Request
The POST request method requests that a web server accept and store the data enclosed in the body of the request message. It is often used when uploading a file or submitting a completed web form.
<?php echo "Hello " . htmlspecialchars($_POST[ 'name' ]); ?> |
On our local web server, we have this target.php
file. It simply prints the posted value back to the client.
#!/usr/bin/python3 import httplib2 import urllib http = httplib2.Http() body = { 'name' : 'Peter' } method = "POST" , headers = { 'Content-type' : 'application/x-www-form-urlencoded' }, body = urllib.parse.urlencode(body) )[ 1 ] print (content.decode()) |
The script sends a request with a name
key having Peter
value. The data is encoded with the urllib.parse.urlencode()
method and sent in the body of the request.
$ . / mpost.py Hello Peter |
This is the output of the mpost.py
script.
$ tail - 1 / var / log / nginx / access.log 127.0 . 0.1 - - [ 23 / Aug / 2016 : 12 : 21 : 07 + 0200 ] "POST /target.php HTTP/1.1" 200 37 "-" "Python-httplib2/0.8 (gzip)" |
With the POST method, the value is not send in the request URL.
Send User Agent Information
In this section, we specify the name of the user agent.
<?php echo $_SERVER[ 'HTTP_USER_AGENT' ]; ?> |
Inside the nginx document root, we have the agent.php
file. It returns the name of the user agent.
#!/usr/bin/python3 import httplib2 http = httplib2.Http() headers = { 'user-agent' : 'Python script' })[ 1 ] print (content.decode()) |
This script creates a simple GET request to the agent.php
script. In the headers
dictionary, we specify the user agent. This is read by the PHP script and returned to the client.
$ . / user_agent.py Python script |
The server responded with the name of the agent that we have sent with the request.
Add Username/Password to Request
The client’s add_credentials()
method sets the name and password to be used for a realm. A security realm is a mechanism used for protecting web application resources.
$ sudo apt - get install apache2 - utils $ sudo htpasswd - c / etc / nginx / .htpasswd user7 New password: Re - type new password: Adding password for user user7 |
We use the htpasswd
tool to create a user name and a password for basic HTTP authentication.
location / secure { auth_basic "Restricted Area" ; auth_basic_user_file / etc / nginx / .htpasswd; } |
Inside the nginx /etc/nginx/sites-available/default
configuration file, we create a secured page. The name of the realm is “Restricted Area”.
<!DOCTYPE html> <html lang = "en" > <head> <title>Secure page< / title> < / head> <body> <p> This is a secure page. < / p> < / body> < / html> |
Inside the /usr/share/nginx/html/secure
directory, we have the above HTML file.
#!/usr/bin/python3 import httplib2 user = 'user7' passwd = '7user' http = httplib2.Http() http.add_credentials(user, passwd) print (content.decode()) |
The script connects to the secure webpage; it provides the user name and the password necessary to access the page.
$ . / credentials.py <!DOCTYPE html> <html lang = "en" > <head> <title>Secure page< / title> < / head> <body> <p> This is a secure page. < / p> < / body> < / html> |
With the right credentials, the script returns the secured page.
In this tutorial, we have explored the Python httplib2
module.
The tutorial was written by Jan Bodnar who runs zetcode.com, which specializes in programming tutorials.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.