Chapter 8_Network Programming


1. Socket (Low-Level)

A socket allows direct communication between two computers over a network.

With sockets, we manually build the HTTP request:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('data.pr4e.org', 80)) # 80 -> port

cmd = 'GET /romeo.txt HTTP/1.0\r\nHost: data.pr4e.org\r\n\r\n'
sock.send(cmd.encode())

Receiving data:

while True:
    data = sock.recv(512)
    if not data:
        break
    print(data.decode())

# Full control, Complex and error-prone

2. HTTP Request

An HTTP request is a text message sent to a server.

Main parts:

  • Method (GET)
  • Path
  • Headers
  • Blank line

Example:

GET /romeo.txt HTTP/1.0
Host: data.pr4e.org

3. urllib (Built-in Library)

urllib is a built-in Python library that hides sockets and HTTP details.

from urllib.request import urlopen

f = urlopen('http://data.pr4e.org/romeo.txt')
for line in f:
    print(line.decode().strip())


# Safer than sockets

4. requests (Third-Party Library)

requests is a popular but NOT built-in library.

import requests

r = requests.get('http://data.pr4e.org/romeo.txt')
print(r.text)

# Must be installed with pip
# Very easy to use

5. BeautifulSoup (bs4)

BeautifulSoup extracts data from HTML after it is loaded.

For HTML parsing.

from bs4 import BeautifulSoup
from urllib.request import urlopen

html = urlopen('http://www.dr-chuck.com/page1.htm')
soup = BeautifulSoup(html, 'html.parser')

for tag in soup('a'):
    print(tag.get('href', None))