BA372 — HyperText Transfer Protocol (HTTP)
- HTTP is a protocol; i.e., a convention used for data exchange between web clients and web servers.
- If it's not HTTP, it's not 'web.'
- Understanding of HTTP is a must(!) for a basic understanding of how the web works.
- Keep in mind; Raja's (BA479) five-layer IP stack model explains how the Internet works (Internet ≠ web.)
- Problem: Where in the five-layer model can we find HTTP?
- Some HTTP is "filtered back" to users through (error) messages in
browsers; e.g., 404 Error: File or Directory Not Found.
- Created by Tim Berners-Lee as part of his
invention of the 'World-Wide Web':
- HTML, URI, HTTP, web (HTTP) client, web (HTTP) server.
- ...and the rest is history.
- Governed by the W3C.
- Problem: What is Berners-Lee warning us for in his Sci-Am. article Long Live the Web?
- Problem: Explain the article's illustration:
The web is dead:
The web is very much alive?
- Problem: which sort of validity problem do we have here?
- Web (HTTP) client — web (HTTP) server model:
Web stack animated and explained (video)
- Latest version: HTTP 1.1.
- Three parts to each HTTP client request and each
HTTP server response:
- Request or response line: specifies the contents
of the request or the status of the response, respectively.
- Request or response headers (optional): specify configurations, acceptable
formats, and lots of other things the browser and the server want to
tell each other; e.g., the
date and time the file was last modified; whether the file may be
cached or not, etc.
- Request or response body (optional): additional data; e.g., the actual data passed back
from the HTTP server or any additional data the server must know in order to
execute a request.
- Example of an HTTP transaction:
- HTTP client initializes transaction: http://classes.bus.oregonstate.edu/ba372/index.htm
HTTP server responds:
- Request line: <method> <document
address> <HTTP version number>
GET classes.bus.oregonstate.edu/ba372/index.htm HTTP/1.1
- GET: request
method (GET, HEAD, POST, PUT, DELETE, TRACE, CONNECT).
(sub)domain name of oregonstate.edu (IP address: 18.104.22.168) (ping it!).
- Problem: Which protocol / service relates domain names to IP addresses?
off of the server's document
root folder, get me the index.htm
file which resides in the ba372
- HTTP/1.1: I, browser, speak HTTP 1.1. Please do not send me anything back more recent than HTTP 1.1.
- Request headers:
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
- Request body: nothing (body content is optional).
A word on HTTPS (SSL/TLS):
- Response line: <HTTP version> <status code> <status description>
100 - 199
200 - 299
Client request successful
300 - 399
Client request redirected, action necessary
400 - 499
http://www.google.com/foo.html — 404
http://cob-te-web/admin — 401
client request incomplete, file not found, request not allowed
500 - 599
- Response headers: http://www.teachengineering.org:
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Date: Fri, 23 Mar 2018 17:37:33 GMT
Location: https://www.teachengineering.org/ [following]
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Date: Fri, 23 Mar 2018 17:37:34 GMT
Length: 34589 (34K) [text/html]
look at some response headers sent by HTTP servers.
- Response body: any content; most often HTML, XML or ordinary text.
- SSL/TLS technology establishes safe; i.e., encrypted, data exchange.
- HTTPS: HTTP with SSL/TLS; i.e., encrypted HTTP.
- Although anyone can implement SSL/TLS technology, certified SSL/TLS requires an SSL/TLS certificate issued by a so-called
Certificate Authority (CA) — audited by American Institute of CPAs (AICPA).
- Certificate 'guarantees' that the accessed domain/IP address is under the control of the certificate owner.
- Certificate is used for encryption.
- Certificate contains info about the certificate owner (click on 'lock' icon in URL bar)
- Different levels/degrees of certificates:
- Self-signed: not issued by CA (mostly used for internal use only).
- Domain Validated: CA issued: domain/IP address certified.
- Fully Authenticated: CA issued: required background checks; e.g., is the business operating as a business?
At this point we have enough background knowledge to
complete our first assignment. Have a go at it!
HTTP is a so-called stateless protocol: each transaction is independent
of the previous ones; i.e., the
process has no memory!!
Problem: Can you
devise one or more (conceptual!) workarounds? Keep in mind that you cannot(!!) change HTTP to maintain state.
For each workaround, think of advantages and disadvantages:
- Client issues request for information about book X.
- Server replies with price and availability.
- Client issues request to put the book in the shopping cart.
But!!! HTTP is stateless —> step 3 is independent of steps 1 and 2
—> In step 3, the server has no memory about steps 1 and 2, including which
book is involved, who you are, etc.!
- 'Memory-on-the-go' or 'Memory-in-transit': The
server passes all the information back to the client and forgets about the client. But at the next request, the
client passes everything it received from the server (back) to the server again, plus any new information it wants to submit.
- Rather than endowing either the client or the server with memory, we accumulate memory as we carry it back and forth between
client and server.
- The memory is always in transit between server and client.
- How to pass this information back and forth?
- From client to server: depends on which HTTP method we use:
- GET method
combinations in the URL;
- POST method passes
parameter=value combinations in the body of the HTTP client request.
- Note that browse-edgar,
maps and search
are references to programs(Application Server lane in the activity diagram above) that are invoked by the web server.
- It is these programs' task to retrieve or generate the information passed in through
GET or POST and
process that information accordingly.
- !!! Problem: Can you draw a schematic of this basic web 'architecture?'
- GET and POST in HTML forms:
- From server to client:
- The server can pass information in <input
type="hidden"> tags in an HTML form.
- These input fields, although part of the HTML page, will (should) not be rendered on the client.
- But because they are <input>
tags, their values are being sent back to the server on submission of
the form as parameter=valuepairs.
- 'Somebody Else's Problem (SEP)': ask
something/someone else to do the memorizing:
- Store memory client-side: cookies:
- Server includes information to be remembered in the
Set-Cookie HTTP response header:
- If client is cookie enabled, cookie is stored in
- At next client request to the same IP/domain name, cookie information is included as (an) HTTP Cookie request header(s):
- Let's see who's giving us cookies:
- cmd —> curl -I www.ebay.com
- cmd —> curl -I www.bing.com (cookie-monster www.bing.com)
- Problem: what
are some advantages/disadvantages of the cookie model?
- Store memory server-side:
- The program called by the server (passed in the URL) can organize server-side storage:
- file system.
- memory; e.g., IIS-ASP.NET.
- Follow-up program calls would first do a lookup to 'restore' memory.
- Problem: what
are some advantages/disadvantages of the server-side model?
- Hybrid models; e.g., sensitive or permanent info memorized on server; transient memory on client or on-the-go
Some quiz-like questions you can use to see if you got all of this:
- Which 'actor' in the Web stack diagram generates HTTP 404 errors? answer
- Between which actors and in which direction is SQL passed? answer
- What happens when an HTTP client requests a page which exists but to which the server will not provide access? answer
- Where are cookies stored? answer
- How does the 'memory-on-the-go model work (in concept)? answer
- In your mind, reconstruct one complete full Web stack round-trip, starting with the HTTP client, which represents what happens when
a browser requests information on flights between two cities on a particular date.
- In the 4-tier web stack, where is the HTML generated? answer
- In the 4-tier web stack, where is the XML generated? answer