BA372 — HyperText Transfer Protocol (HTTP)
- HTTP is a protocol; i.e., a convention used for data exchange between Web clients and Web servers.
- If it's not HTTP, it's not 'Web.'
- Web client — Web server model:
- Understanding of HTTP is a must(!) for a basic understanding of how the Web works.
- Keep in mind; Raja's (BA479) five-layer IP stack model explains how the Internet works (Internet ≠ Web.)
- Problem: Where in the five-layer model can we find HTTP?
- Some HTTP is "filtered back" to users through (error) messages in
browsers; e.g., 404 Error: File or Directory Not Found.
- Created by Tim Berners-Lee as part of his
invention of the 'World-Wide Web':
- HTML, URI, HTTP, Web (HTTP) client, Web (HTTP) server.
- ...and the rest is history.
- Governed by the W3C.
- Problem: What is Berners-Lee warning us for in his
- Problem: Explain the article's illustration:
The Web is dead:
The Web is very much alive?
- Problem: which sort of validity problem do we have here?
- Latest version: HTTP 1.1.
- Three parts to each HTTP client request and each
HTTP server response:
- Request or response line: specifies the contents
of the request or the status of the response, respectively.
- Request or response Headers (optional): specify configurations, acceptable
formats, and lots of other things the browser and the server want to
tell each other; e.g., the
date and time the file was last modified; whether the file may be
cached or not, etc.
- Request or response body (optional): additional data; e.g., the actual data passed back
from the HTTP server or any additional data the server must know in order to
execute a request.
- Example of an HTTP transaction:
- HTTP client initializes transaction: http://classes.bus.oregonstate.edu/ba372/index.htm
HTTP server responds:
- Request line: <method> <document
address> <HTTP version number>
GET classes.bus.oregonstate.edu/ba372/index.htm HTTP/1.1
- GET: request
method (GET, HEAD, POST, PUT, DELETE, TRACE, CONNECT).
domain name of the OSU COB web server (IP address: 220.127.116.11).
- Problem: Which protocol / service relates domain names IP addresses?
off of the server's document
root folder, get me the index.htm
file which is in the ba371
- HTTP/1.1: I, browser, speak HTTP 1.1. Please do not send me anything back more recent than HTTP 1.1.
- Request headers:
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
- Request body: nothing (body content is optional).
A word on HTTPS / SSL:
- Response line: <HTTP version> <status code> <status description>
- Response headers: http://www.teachengineering.org:
HTTP/1.1 301 Moved Permanently
Date: Mon, 14 Mar 2016 21:45:03 GMT
Location: https://www.teachengineering.org/ [following]
--2016-03-14 14:45:03-- https://www.teachengineering.org/
Connecting to www.teachengineering.org (www.teachengineering.org)|18.104.22.168|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Mon, 14 Mar 2016 21:45:03 GMT
Keep-Alive: timeout=2, max=100
look at some response headers sent by HTTP servers.
- Response body: any content; most often HTML, XML or ordinary text.
- SSL (Secure Sockets Layer) technology establishes safe; i.e., encrypted, data exchange.
- HTTPS: HTTP with SSL; i.e., encrypted HTTP.
- Although anyone can implement SSL technology, certified SSL requires an SSL certificate issued by a so-called
Certificate Authority (CA) — audited by American Institute of CPAs (AICPA).
- Certificate 'guarantees' that the accessed domain/IP address is under the control of the certificate owner.
- Certificate is used for encryption
- Certificate contains info about the certificate owner (click on 'lock' icon in URL bar)
- Different levels/degrees of certificates:
- Self-signed: not issued by CA (mostly used for internal use only).
- Domain Validated: CA issued: domain/IP address certified.
- Fully Authenticated: CA issued: required background checks; e.g., is the business operating as a business?
At this point we have enough background knowledge to
complete our first assignment. Have a go at it!
HTTP is a so-called stateless protocol: each transaction is independent
of the previous ones; i.e., the
process has no memory!!
Problem: Can you
devise one or more (conceptual!) workarounds? Keep in mind that you cannot(!!) change HTTP to maintain state.
For each workaround, think of advantages and disadvantages:
- Client issues request for information about book X.
- Server replies with price and availability.
- Client issues request to put the book in the shopping cart.
But!!! HTTP is stateless —> step 3 is independent of steps 1 and 2
—> In step 3, the server has no memory about steps 1 and 2, including which
book is involved, who you are, etc.!
- 'Memory-on-the-go' or 'Memory-in-transit': The
server passes all the information back to the client and forgets about the client. But at the next request, the
client passes everything it received from the server (back) to the server again, plus any new information it wants to submit.
- Rather than endowing either the client or the server with memory, we accumulate memory as we carry it back and forth between
client and server.
- The memory is always in transit between server and client.
- How to pass this information back and forth?
- From client to server: depends on which HTTP method we use:
- GET method
combinations in the URL;
- POST method passes
combinations in the body of the HTTP client request.
- Note that browse-edgar,
are programs(!!) that are invoked by the Web server.
- It is these programs' task to retrieve or generate the information passed in through
GET or POST and
process that information accordingly.
- !!! Problem: Draw a schematic of this basic Web 'architecture' !!!
- GET and POST in HTML forms:
- From server to client:
- The server can pass information in <input
type="hidden"> tags in an HTML form.
- These input fields, although part of the HTML page, will (should) not be rendered on the client.
- But because they are <input>
tags, their values are being sent back to the server on submission of
the form as parameter=valuepairs.
- 'Somebody Else's Problem (SEP)': ask
something/someone else to do the memorizing:
- Store memory client-side: cookies:
- Server includes information to be remembered in the
Set-Cookie HTTP response header:
- If client is cookie enabled, cookie is stored in
- At next client request to the same IP/domain name, cookie information is included as (an) HTTP Cookie request header(s):
- Let's see who's giving us cookies:
- Problem: what
are some advantages/disadvantages of the cookie model?
- Store memory server-side:
- The program called by the server (passed in the URL) can organize server-side storage:
- file system.
- memory; e.g., IIS-ASP.NET.
- Follow-up program calls would first do a lookup to 'restore' memory.
- Problem: what
are some advantages/disadvantages of the server-side model?
- Hybrid models; e.g., sensitive or permanent info memorized on server; transient memory on client or on-the-go
A few loose ends: HTTP inside the larger Web application
Homework 1: Program a simple HTTP image browser.
- /cgi-bin/ in
- CGI or Common Gateway
Interface: oldest (and fundamental) way to call a secondary program through an HTTP server.
- (CGI) Program can be written in any programming language.
- CGI program is an independent executable (independent of the Web server).
- HTTP server calls the program and passes it all input information:
- GET: parameter=value
pairs are stored in the QUERY_STRING
pairs are in the body of the HTTP request; body is passed to CGI
program on stdin.
- HTTP server 'listens' to CGI program's stdout.
- CGI program outputs:
text/html HTTP response header.
- An empty line to terminate the header(s).
- Whatever must be passed in the body of the HTTP response; e.g., HTML.
- Server's configuration file contains a directive to execute all
requests for /cgi-bin/...
programs as CGI.
- Problem: what
are some advantages/disadvantages of the CGI model?
- No CGI privileges on ONID Web server.