Passive Information Gathering

Passive information is information gathered without interacting with the target.

Whois

  • The Internet Corporation of Assigned Names and Numbers (ICANN) requires that accredited registrars enter the holder's contact information, the domain's creation, and expiration dates, and other information in the Whois database immediately after registering a domain. In simple terms, the Whois database is a searchable list of all domains currently registered worldwide.

  • It is a TCP-based transaction-oriented query/response protocol listening on TCP port 43 by default.

  • We can use it for querying databases containing domain names, IP addresses, or autonomous systems and provide information services to Internet users.

  • WHOIS lookups were initially performed using command-line tools. Nowadays, many web-based (example: https://whois.domaintools.com/) tools exist, but command-line options often give us the most control over our queries and help filter and sort the resultant output.

  • Commands:

    • whois <Domain Name>

Information Gathered

  • Organisation

  • Locations

  • Domain Email address

  • Registrar Email address

  • Phone number

  • Language English (US)

  • Registrar

  • New Domain

  • DNSSEC

  • Name servers

DNS

  • DNS converts domain names to IP addresses, allowing browsers to access resources on the Internet.

  • Each Internet-connected device has a unique IP address that other machines use to locate it.

  • DNS servers minimize the need for people to learn IP addresses like 104.17.42.72 in IPv4 or more sophisticated modern alphanumeric IP addresses like 2606:4700::6811:2b48 in IPv6. When a user types www.facebook.com into their web browser, a translation must occur between what the user types and the IP address required to reach the www.facebook.com webpage.

Why DNS

  • It allows names to be used instead of numbers to identify hosts.

  • It is a lot easier to remember a name than it is to recall a number.

  • By merely retargeting a name to the new numeric address, a server can change numeric addresses without having to notify everyone on the Internet.

  • A single name might refer to several hosts splitting the workload between different servers.

DNS Hierarchy

  • There is a hierarchy of names in the DNS structure.

  • The top level is root which is designated by a dot (.)

  • The Top-Level Domains, is the second top level and it is the last portion of a hostname(in www.facebook.com, the TLD server is com)

  • After the top-level domains comes the domain of the organization (i.e. facebook/google..)

Resource Records

  • Resource Records are the results of DNS queries and have the following structure:

Resource Record

A domain name, usually a fully qualified domain name, is the first part of a Resource Record. If you don't use a fully qualified domain name, the zone's name where the record is located will be appended to the end of the name.

TTL

In seconds, the Time-To-Live (TTL) defaults to the minimum value specified in the SOA record.

Record Class

Internet, Hesiod, or Chaos

Start Of Authority (SOA)

It should be first in a zone file because it indicates the start of a zone. Each zone can only have one SOA record, and additionally, it contains the zone's values, such as a serial number and multiple expiration timeouts.

Name Servers (NS)

The distributed database is bound together by NS Records. They are in charge of a zone's authoritative name server and the authority for a child zone to a name server.

IPv4 Addresses (A)

The A record is only a mapping between a hostname and an IP address. 'Forward' zones are those with A records.

Pointer (PTR)

The PTR record is a mapping between an IP address and a hostname. 'Reverse' zones are those that have PTR records.

Canonical Name (CNAME)

An alias hostname is mapped to an A record hostname using the CNAME record.

Mail Exchange (MX)

The MX record identifies a host that will accept emails for a specific host. A priority value has been assigned to the specified host. Multiple MX records can exist on the same host, and a prioritized list is made consisting of the records for a specific host.

Nslookup and DIG

  • With Nslookup and DIG, we can search for domain name servers on the Internet and ask them for information about hosts and domains.

  • The tool has 2 modes, interactive and non-interactive.

  • We can query A records by just submitting a domain name. But we can also use the -query parameter to search specific resource records (i.e. nslookup google.com).

  • We can also specify a nameserver if needed by adding @<Name Server/IP Address of Name Server> to the command (i.e. dig facebook.com @1.1.1.1)

  • Commands:

    • We can specifiy a specific query or we can use any to get all the data avaliable: nslookup <-query=A/PTR/ANY/TXT/MX/...> <Domain Name>

      • Example: nslookup -query=PTR 31.13.92.36

      • Example: nslookup -query=ANY google.com

    • The same goes for DIG: dig <a/txt/mx/any/... <Domain Name>

      • Example: dig mx facebook.com

      • Example: dig any cloudflare.com

  • Organizations are given IP addresses on the Internet, but they aren't always their owners. They might rely on ISPs and hosting providers that lease smaller netblocks to them.

  • We can combine some of the results gathered via nslookup with the whois database to determine if our target organization uses hosting providers.

Passive Subdomain Enumeration

  • Subdomain enumeration refers to mapping all available subdomains within a domain name.

Virus Total

  • VirusTotal maintains its DNS replication service, which is developed by preserving DNS resolutions made when users visit URLs given by them.

  • To receive information about a domain, type the domain name into the search bar and click on the "Relations" tab.

Certificates

  • Another interesting source of information we can use to extract subdomains is SSL/TLS certificates.

  • The main reason is Certificate Transparency (CT), a project that requires every SSL/TLS certificate issued by a Certificate Authority (CA) to be published in a publicly accessible log.

  • Organization using two primary resources:

  • We can use this command to get the results in a more organized way: curl -s "https://crt.sh/?q=<Domain Name>&output=json" | jq -r '.[] | "(.name_value)\n(.common_name)"' | sort -u > "<Domain Name>_crt.sh.txt"

  • We can also do the enumeration manually using this command openssl s_client -ign_eof 2>/dev/null <<<$'HEAD / HTTP/1.0\r\n\r' -connect "<Domain>:<Port Number (443)>" | openssl x509 -noout -text -in - | grep 'DNS' | sed -e 's|DNS:|\n|g' -e 's|^*.*||g' | tr -d ',' | sort -u

Automating Passive Subdomain Enumeration

TheHarvester

  • Simple-to-use yet powerful and effective tool for early-stage penetration testing and red team engagements.

  • The tool collects emails, names, subdomains, IP addresses, and URLs from various public data sources for passive information gathering.

  • There are many modules that can be used with the tool, to automate the proess of using the modules, we can use this command: cat sources.txt | while read source; do theHarvester -d "<Domain>" -b $source -f "${source}_<Domain>";done

    • Example: cat sources.txt | while read source; do theHarvester -d "facebook.com" -b $source -f "${source}_facebook.com";done

    • Where the sources.txt contains the modules.

Passive Infrastructure Identification

Netcraft

  • Netcraft can offer us information about the servers without even interacting with them, and this is something valuable from a passive information gathering point of view.

  • We can use the service by visiting https://sitereport.netcraft.com and entering the target domain.

  • Some interesting details we can observe from the report are:

    • Background: General information about the domain, including the date it was first seen by Netcraft crawlers.

    • Network Information about the netblock owner, hosting company, nameservers, etc.

    • Hosting history Latest IPs used, webserver, and target OS.

  • We need to pay special attention to the latest IPs used. Sometimes we can spot the actual IP address from the webserver before it was placed behind a load balancer, web application firewall, or IDS, allowing us to connect directly to it if the configuration allows it.

Wayback Machine

  • The Internet Archive is an American digital library that provides free public access to digitalized materials, including websites, collected automatically via its web crawlers.

  • We can access several versions of these websites using the Wayback Machine to find old versions that may have interesting comments in the source code or files that should not be there.

  • This tool can be used to find older versions of a website at a point in time.

  • We can also use the tool waybackurls to inspect URLs saved by Wayback Machine and look for specific keywords.

  • We can install the tool using Go: go install github.com/tomnomnom/waybackurls@latest

Last updated