HTTP Fundamentals

Introduction

Today, the majority of the applications we use constantly interact with the internet, both web and mobile applications.
Most internet communications are made with web requests through the HTTP protocol.
HTTP is an application-level protocol used to access the World Wide Web resources. The term hypertext stands for text containing links to other resources and text that the readers can easily interpret.
HTTP communication consists of a client and a server, where the client requests the server for a resource. The server processes the requests and returns the requested resource.
The default port for HTTP communication is port 80, though this can be changed to any other port, depending on the web server configuration.
The same requests are utilized when we use the internet to visit different websites. We enter a Fully Qualified Domain Name (FQDN) as a Uniform Resource Locator (URL) to reach the desired website, like www.hackthebox.com.

URL

Resources over HTTP are accessed via a URL, which offers many more specifications than simply specifying a website we want to visit.
Structure of a URL:

Scheme

http:// https://

This is used to identify the protocol being accessed by the client, and ends with a colon and a double slash (://)

User Info

admin:password@

This is an optional component that contains the credentials (separated by a colon :) used to authenticate to the host, and is separated from the host with an at sign (@)

Host

inlanefreight.com

The host signifies the resource location. This can be a hostname or an IP address

Port

:80

The Port is separated from the Host by a colon (:). If no port is specified, http schemes default to port 80 and https default to port 443

Path

/dashboard.php

This points to the resource being accessed, which can be a file or a folder. If there is no path specified, the server returns the default index (e.g. index.html).

Query String

?login=true

The query string starts with a question mark (?), and consists of a parameter (e.g. login) and a value (e.g. true). Multiple parameters can be separated by an ampersand (&).

Fragments

#status

Fragments are processed by the browsers on the client-side to locate sections within the primary resource (e.g. a header or section on the page).

Not all components are required to access a resource. The main mandatory fields are the scheme and the host, without which the request would have no resource to request.

HTTP Flow

At a very high level, the first time a user enters a URL into the browser, it sends a request to a DNS (Domain Name Resolution) server to resolve the domain and get its IP.
The DNS server looks up the IP address for the URL and returns it.
All domain names need to be resolved this way, as a server can't communicate without an IP address.
Our browsers usually first look up records in the local '/etc/hosts' file, and if the requested domain does not exist within it, then they would contact other DNS servers. We can use the '/etc/hosts' to manually add records to for DNS resolution, by adding the IP followed by the domain name.
Once the browser gets the IP address linked to the requested domain, it sends a GET request to the default HTTP port (e.g. 80), asking for the root (/)path. Then, the web server receives the request and processes it.
By default, servers are configured to return an index file when a request for / is received.

cURL

cURL (client URL) is a command-line tool and library that primarily supports HTTP along with many other protocols.
This makes it a good candidate for scripts as well as automation, making it essential for sending various types of web requests from the command line, which is necessary for many types of web penetration tests.
cURL does not render the HTML/JavaScript/CSS code, unlike a web browser, but prints it in its raw format. However, as penetration testers, we are mainly interested in the request and response context, which usually becomes much faster and more convenient than a web browser.
We may also use cURL to download a page or a file and output the content into a file using the -O flag (-o to provide a different name for the file)
cURL should automatically handle all HTTPS communication standards and perform a secure handshake and then encrypt and decrypt data automatically. However, if we ever contact a website with an invalid SSL certificate or an outdated one, then cURL by default would not proceed with the communication to protect against the different MITM attacks.
We may face such an issue when testing a local web application or with a web application hosted for practice purposes, as such web applications may not yet have implemented a valid SSL certificate. To skip the certificate check with cURL, we can use the -k flag
To view the full HTTP request and response, we can simply add the -v verbose flag
If we were only interested in seeing the response headers, then we can use the -I flag to send a HEAD request and only display the response headers.
Furthermore, we can use the -i flag to display both the headers and the response body (e.g. HTML code).
In addition to viewing headers, cURL also allows us to set request headers with the -H flag.
Some headers, like the User-Agent or Cookie headers, have their own flags. For example, we can use the -A to set our User-Agent.

Browser DevTools

Most modern web browsers come with built-in developer tools (DevTools), which are mainly intended for developers to test their web applications. However, as web penetration testers, these tools can be a vital asset in any web assessment we perform, as a browser (and its DevTools) are among the assets we are most likely to have in every web assessment exercise.
Whenever we visit any website or access any web application, our browser sends multiple web requests and handles multiple HTTP responses to render the final view we see in the browser window.
The network field can show us the requests and responses that our browser sends and receives.
In the first Headers tab, we see both the HTTP request and HTTP response headers.
The devtools automatically arrange the headers into sections, but we can click on the Raw button to view their details in their raw format.
Furthermore, we can check the Cookies tab to see any cookies used by the request.

Hypertext Transfer Protocol Secure (HTTPS)

One of the significant drawbacks of HTTP is that all data is transferred in clear-text. This means that anyone between the source and destination can perform a Man-in-the-middle (MiTM) attack to view the transferred data.
To counter this issue, the HTTPS (HTTP Secure) protocol was created, in which all communications are transferred in an encrypted format, so even if a third party does intercept the request, they would not be able to extract the data out of it. For this reason, HTTPS has become the mainstream scheme for websites on the internet, and HTTP is being phased out, and soon most web browsers will not allow visiting HTTP websites.
Websites that enforce HTTPS can be identified through https:// in their URL (e.g. https://www.google.com), as well as the lock icon in the address bar of the web browser, to the left of the URL.
Note that although the data transferred through the HTTPS protocol may be encrypted, the request may still reveal the visited URL if it contacted a clear-text DNS server. For this reason, it is recommended to utilize encrypted DNS servers (e.g. 8.8.8.8 or 1.1.1.1), or utilize a VPN service to ensure all traffic is properly encrypted.

HTTPS Workflow

If we type http:// instead of https:// to visit a website that enforces HTTPS, the browser attempts to resolve the domain and redirects the user to the webserver hosting the target website.
A request is sent to port 80 first, which is the unencrypted HTTP protocol.
The server detects this and redirects the client to secure HTTPS port 443 instead.
This is done via the 301 Moved Permanently response code.
Next, the client (web browser) sends a "client hello" packet, giving information about itself.
After this, the server replies with "server hello", followed by a key exchange to exchange SSL certificates.
The client verifies the key/certificate and sends one of its own.
After this, an encrypted handshake is initiated to confirm whether the encryption and transfer are working correctly.
Once the handshake completes successfully, normal HTTP communication is continued, which is encrypted after that.
This is a very high-level overview of the key exchange.

HTTP Requests and Responses

Introduction

HTTP communications mainly consist of an HTTP request and an HTTP response.
An HTTP request is made by the client (e.g. cURL/browser), and is processed by the server (e.g. web server)
The requests contain all of the details we require from the server, including the resource (e.g. URL, path, parameters), any request data, headers or options we specify, and many other options.
Once the server receives the HTTP request, it processes it and responds by sending the HTTP response, which contains the response code and may contain the resource data if the requester has access to it.

HTTP Requests

The first line of any HTTP request contains three main fields 'separated by spaces':

Method

GET

The HTTP method or verb, which specifies the type of action to perform.

Path

/users/login.html

The path to the resource being accessed. This field can also be suffixed with a query string (e.g. ?username=user).

Version

HTTP/1.1

The third and final field is used to denote the HTTP version.

The next set of lines contain HTTP header value pairs, like Host, User-Agent, Cookie, and many other possible headers.
These headers are used to specify various attributes of a request.
The headers are terminated with a new line, which is necessary for the server to validate the request.
Finally, a request may end with the request body and data.

HTTP version 1.X sends requests as clear-text, and uses a new-line character to separate different fields and different requests. HTTP version 2.X, on the other hand, sends requests as binary data in a dictionary form.

HTTP Responses

The first line of an HTTP response contains two fields separated by spaces.
- The first being the HTTP version (e.g. HTTP/1.1)
- The second denotes the HTTP response code (e.g. 200 OK).
Response codes are used to determine the request's status.
After the first line, the response lists its headers, similar to an HTTP request.
Finally, the response may end with a response body, which is separated by a new line after the headers.
The response body is usually defined as HTML code. However, it can also respond with other code types such as JSON, website resources such as images, style sheets or scripts, or even a document such as a PDF document hosted on the webserver.

HTTP Headers

Introduction

Some headers are only used with either requests or responses, while some other general headers are common to both.
Headers can have one or multiple values, appended after the header name and separated by a colon. We can divide headers into the following categories:
- General Headers
- Entity Headers
- Request Headers
- Response Headers
- Security Headers

The following sections only mention a small subset of commonly seen HTTP headers.

There are many other contextual headers that can be used in HTTP communications.

It's also possible for applications to define custom headers based on their requirements.

General Headers

General headers are used in both HTTP requests and responses.
They are contextual and are used to describe the message rather than its contents.

Date

Date: Wed, 16 Feb 2022 10:38:44 GMT

Holds the date and time at which the message originated.

Connection

Connection: close

Dictates if the current network connection should stay alive after the request finishes. Two commonly used values for this header are close and keep-alive. The close value from either the client or server means that they would like to terminate the connection, while the keep-alive header indicates that the connection should remain open to receive more data and input.

Entity Headers

Similar to general headers, Entity Headers can be common to both the request and response.
These headers are used to describe the content (entity) transferred by a message.
They are usually found in responses and POST or PUT requests.

Content-Type

Content-Type: text/html

Used to describe the type of resource being transferred. The value is automatically added by the browsers on the client-side and returned in the server response. The charset field denotes the encoding standard, such as UTF-8.

Media-Type

Media-Type: application/pdf

The media-type is similar to Content-Type, and describes the data being transferred. This header can play a crucial role in making the server interpret our input. The charset field may also be used with this header.

Boundary

boundary="b4e4fbd93540"

Acts as a marker to separate content when there is more than one in the same message. For example, within a form data, this boundary gets used as --b4e4fbd93540 to separate different parts of the form.

Content-Length

Content-Length: 385

Holds the size of the entity being passed. This header is necessary as the server uses it to read data from the message body, and is automatically generated by the browser and tools like cURL.

Content-Encoding

Content-Encoding: gzip

Data can undergo multiple transformations before being passed. For example, large amounts of data can be compressed to reduce the message size. The type of encoding being used should be specified using the Content-Encoding header.

Request Headers

The client sends Request Headers in an HTTP transaction.
These headers are used in an HTTP request and do not relate to the content of the message.

Host

Host: www.inlanefreight.com

Used to specify the host being queried for the resource. This can be a domain name or an IP address. HTTP servers can be configured to host different websites, which are revealed based on the hostname. This makes the host header an important enumeration target, as it can indicate the existence of other hosts on the target server.

User-Agent

User-Agent: curl/7.77.0

The User-Agent header is used to describe the client requesting resources. This header can reveal a lot about the client, such as the browser, its version, and the operating system.

Referer

Referer: http://www.inlanefreight.com/

Denotes where the current request is coming from. For example, clicking a link from Google search results would make https://google.com the referer. Trusting this header can be dangerous as it can be easily manipulated, leading to unintended consequences.

Accept: */*

The Accept header describes which media types the client can understand. It can contain multiple media types separated by commas. The */* value signifies that all media types are accepted.

Cookie: PHPSESSID=b4e4fbd93540

Contains cookie-value pairs in the format name=value. A cookie is a piece of data stored on the client-side and on the server, which acts as an identifier. These are passed to the server per request, thus maintaining the client's access. Cookies can also serve other purposes, such as saving user preferences or session tracking. There can be multiple cookies in a single header separated by a semi-colon.

Authorization

Authorization: BASIC cGFzc3dvcmQK

Another method for the server to identify clients. After successful authentication, the server returns a token unique to the client. Unlike cookies, tokens are stored only on the client-side and retrieved by the server per request. There are multiple types of authentication types based on the webserver and application type used.

Response Headers

Response Headers can be used in an HTTP response and do not relate to the content.
Certain response headers such as Age, Location, and Server are used to provide more context about the response.

Server

Server: Apache/2.2.14 (Win32)

Contains information about the HTTP server, which processed the request. It can be used to gain information about the server, such as its version, and enumerate it further.

Set-Cookie

Set-Cookie: PHPSESSID=b4e4fbd93540

Contains the cookies needed for client identification. Browsers parse the cookies and store them for future requests. This header follows the same format as the Cookie request header.

WWW-Authenticate

WWW-Authenticate: BASIC realm="localhost"

Notifies the client about the type of authentication required to access the requested resource.

Security Headers

With the increase in the variety of browsers and web-based attacks, defining certain headers that enhanced security was necessary.
HTTP Security headers are a class of response headers used to specify certain rules and policies to be followed by the browser while accessing the website.

Content-Security-Policy

Content-Security-Policy: script-src 'self'

Dictates the website's policy towards externally injected resources. This could be JavaScript code as well as script resources. This header instructs the browser to accept resources only from certain trusted domains, hence preventing attacks such as Cross-site scripting (XSS).

Strict-Transport-Security

Strict-Transport-Security: max-age=31536000

Prevents the browser from accessing the website over the plaintext HTTP protocol, and forces all communication to be carried over the secure HTTPS protocol. This prevents attackers from sniffing web traffic and accessing protected information such as passwords or other sensitive data.

Referrer-Policy

Referrer-Policy: origin

Dictates whether the browser should include the value specified via the Referer header or not. It can help in avoiding disclosing sensitive URLs and information while browsing the website.

PreviousWeb Requests NextGetting Started

Last updated 20 days ago