Introduction

The information gathering phase is the first step in every penetration test where we need to simulate external attackers without internal information from the target organization.
This phase helps us understand the attack surface, technologies used, and, in some cases, discover development environments or even forgotten and unmaintained infrastructure that can lead us to internal network access as they are usually less protected and monitored.
Information gathering is typically an iterative process.

Areas of Search

Often, we are given a single domain or perhaps a list of domains and subdomains that belong to an organization.
Many organizations do not have an accurate asset inventory and may have forgotten both domains and subdomains exposed externally.
We may come across various subdomains that map back to in-scope IP addresses, increasing the overall attack surface of our engagement.
Hidden and forgotten subdomains may have old/vulnerable versions of applications or dev versions with additional functionality (a Python debugging console, for example).

Unless we are constrained to a very specific scope, we want to find out as much about our target as possible.
Finding additional IP ranges owned by our target may lead to discovering other domains and subdomains and open up our possible attack surface even wider.

We need to know what technology stacks our target is using. (Are their applications all ASP.NET? Do they use Django, PHP, Flask, etc.?)
What type(s) of APIs/web services are in use? Are they using Content Management Systems (CMS) such as WordPress, Joomla, Drupal, or DotNetNuke, which have their own types of vulnerabilities and misconfigurations that we may encounter?
We also care about the web servers in use, such as IIS, Nginx, Apache, and the version numbers.
If our target is running outdated frameworks or web servers, we want to dig deeper into the associated web applications.
We are also interested in the types of back-end databases in use (MSSQL, MySQL, PostgreSQL, SQLite, Oracle, etc.) as this will give us an indication of the types of attacks we may be able to perform.

Lastly, we want to enumerate virtual hosts (vhosts), which are similar to subdomains but indicate that an organization is hosting multiple applications on the same web server.

Passive information gathering: We do not interact directly with the target at this stage. Instead, we collect publicly available information using search engines, whois, certificate information, etc. The goal is to obtain as much information as possible to use as inputs to the active information gathering phase.
Active information gathering: We directly interact with the target at this stage. Before performing active information gathering, we need to ensure we have the required authorization to test. Otherwise, we will likely be engaging in illegal activities. Some of the techniques used in the active information gathering stage include port scanning, DNS enumeration, directory brute-forcing, virtual host enumeration, and web application crawling/spidering.

Last updated 4 months ago