Open Source Intelligence (OSINT) encompasses the systematic collection and analysis of publicly available information to produce actionable intelligence. Unlike penetration testing or red teaming, OSINT does not require exploiting vulnerabilities or circumventing access controls. It requires knowing where to look, how to correlate findings, and when to stop. This guide covers the practical toolkit, established methodologies, and ethical framework that professional OSINT practitioners operate within.
The distinction between OSINT and general internet research is methodology. A Google search is not OSINT. A structured investigation using defined requirements, multiple corroborating sources, documented collection procedures, and rigorous analysis -- that is OSINT. The MITRE ATT&CK framework catalogs adversary reconnaissance techniques under the Reconnaissance tactic (TA0043), including Search Open Websites (T1593), Search Open Technical Databases (T1596), and Gather Victim Network Information (T1590). Understanding these techniques from the defender's perspective is what makes OSINT valuable to security teams.
The OSINT Methodology Stack
Professional OSINT follows a layered methodology. Each layer builds on the previous one, moving from broad collection to focused analysis.
Layer 1: Footprinting
Footprinting establishes the scope of a target's digital presence. For an organization, this means identifying all associated domains, IP ranges, subdomains, email addresses, and public-facing infrastructure. The goal is to answer: "What does this target look like from the outside?"
# Domain footprinting: DNS, WHOIS, subdomains
whois example.com
dig example.com ANY +noall +answer
curl -s "https://crt.sh/?q=%.example.com&output=json" | jq -r '.[].name_value' | sort -u
# IP footprinting: ASN, geolocation, reverse DNS
curl -s "http://ip-api.com/json/93.184.216.34" | jq '.'
curl -s "https://internetdb.shodan.io/93.184.216.34"Layer 2: Fingerprinting
Once you know what assets exist, fingerprinting determines what they are running. Technology stack identification reveals web servers, frameworks, CMS platforms, JavaScript libraries, CDN providers, and cloud hosting. This information directly maps to potential vulnerabilities.
# HTTP header fingerprinting
curl -sI https://example.com | grep -iE "^(server|x-powered|x-aspnet|x-generator)"
# TLS certificate inspection
openssl s_client -connect example.com:443 -servername example.com 2>/dev/null \
| openssl x509 -noout -subject -issuer -datesLayer 3: Vulnerability Correlation
With fingerprints in hand, cross-reference against known vulnerability databases. An Apache 2.4.49 server has a known path traversal vulnerability (CVE-2021-41773). A WordPress 5.8 installation has a different set of known issues. This correlation does not require active scanning -- it is a lookup operation against public CVE databases.
Layer 4: Threat Intelligence Enrichment
Check every discovered IP, domain, and hash against threat intelligence feeds. Has this IP been reported for malicious activity? Has this domain appeared in phishing campaigns? Is this server a known command-and-control node? Sources include AlienVault OTX, ThreatFox, URLhaus, VirusTotal, and AbuseIPDB.
Layer 5: Analysis and Correlation
The most valuable layer. Raw data from layers 1-4 becomes intelligence through analysis. A domain registered last week, hosted on a bulletproof provider, with a self-signed certificate, and an IP that appears in three threat feeds -- that pattern tells a story. Entity resolution (connecting related data points across sources) reveals relationships invisible in any single dataset.
Essential OSINT Tools by Category
Domain and DNS Intelligence
| Tool | Purpose | Pricing | Best For |
|---|---|---|---|
| MAGO | Full domain intelligence reports | Free tier + paid | One-click reports, non-technical users |
| subfinder | Passive subdomain enumeration | Open source | Automated recon pipelines |
| amass | Comprehensive DNS enumeration | Open source | Deep subdomain discovery |
| SecurityTrails | Historical DNS/WHOIS data | Free tier + API | Domain history research |
| DNSdumpster | DNS recon with visualization | Free | Quick visual overview |
Network and Infrastructure
| Tool | Purpose | Pricing | Best For |
|---|---|---|---|
| Shodan | Internet-wide device search | Free + $49-$399/mo | IoT/device discovery |
| Censys | Internet asset discovery | Free community + enterprise | Certificate and host search |
| GreyNoise | Internet background noise classification | Free community + paid | Distinguishing targeted vs mass scanning |
| BGP.tools | BGP routing intelligence | Free | ASN and routing analysis |
Threat Intelligence
| Tool | Purpose | Pricing | Best For |
|---|---|---|---|
| AlienVault OTX | Collaborative threat intel | Free | IOC enrichment |
| VirusTotal | File/URL/domain reputation | Free + enterprise | Malware and phishing checks |
| ThreatFox | IOC sharing platform | Free | C2 and malware IOCs |
| URLhaus | Malicious URL database | Free | URL reputation checks |
| AbuseIPDB | IP abuse reporting | Free + API | IP reputation scoring |
Entity Analysis and Visualization
| Tool | Purpose | Pricing | Best For |
|---|---|---|---|
| Maltego | Entity relationship graphing | Premium licensing | Complex investigations |
| SpiderFoot | Automated OSINT recon | Open source + HX | Broad automated collection |
| theHarvester | Email, subdomain, name gathering | Open source | Quick personnel recon |
OSINT Techniques in Practice
Technique 1: Certificate Transparency Mining
Every TLS certificate issued since 2018 is logged in public Certificate Transparency logs. This is one of the most reliable OSINT sources because it is mandatory -- browsers reject certificates not logged in CT. Querying crt.sh reveals not just current subdomains but historical ones, wildcard patterns, and certificate lifecycle information.
Advanced CT analysis goes beyond subdomain enumeration. Certificate issuance patterns reveal organizational behavior: which CAs they use, how frequently they rotate certificates, whether they use wildcard certs (common in environments with many subdomains), and whether they have adopted modern security practices like HSTS and certificate pinning.
Technique 2: Passive DNS Correlation
Passive DNS databases record DNS query/response pairs observed by distributed sensors. Unlike CT (which only covers TLS-enabled domains), passive DNS captures every domain that resolves -- including HTTP-only services, internal redirects, and ephemeral infrastructure. Cross-referencing passive DNS with CT data produces a more complete inventory than either source alone.
Technique 3: Google Dorking for Exposed Assets
# Find exposed admin panels
site:example.com intitle:"dashboard" OR intitle:"admin" OR intitle:"login"
# Find exposed documents
site:example.com filetype:pdf OR filetype:xlsx OR filetype:docx
# Find exposed configuration files
site:example.com ext:env OR ext:yml OR ext:conf OR ext:ini
# Find exposed API documentation
site:example.com inurl:swagger OR inurl:api-docs OR inurl:graphqlTechnique 4: Code Repository Mining
Public code repositories on GitHub, GitLab, and Bitbucket frequently contain sensitive information committed by mistake: API keys, database credentials, internal hostnames, network diagrams, and infrastructure configurations. GitHub's advanced search operators enable targeted discovery:
# Search for accidentally committed secrets
org:example-corp "api_key" OR "password" OR "secret"
org:example-corp filename:.env
org:example-corp filename:docker-compose.ymlTechnique 5: WHOIS History and Domain Profiling
Current WHOIS records show who registered a domain and when. Historical WHOIS data (available through services like DomainTools, WHOXY, and SecurityTrails) reveals ownership changes, registrar transfers, and contact information updates. A domain that changed registrars three times in a year and uses privacy protection on every iteration behaves differently from a stable corporate domain.
Building an OSINT Workflow
An effective OSINT workflow moves from broad to specific, passive to active, and automated to manual:
- Define scope and requirements -- What do you need to know? What are the boundaries?
- Automated passive collection -- Run tools like MAGO, subfinder, and theHarvester to gather baseline data from public sources without touching the target.
- Manual enrichment -- Review automated results. Investigate anomalies. Follow leads that automation missed. Check code repositories, social media, and forums.
- Threat correlation -- Cross-reference every IP, domain, and hash against threat intel feeds.
- Analysis and synthesis -- Connect the dots. Identify patterns. Assess risk. Prioritize findings.
- Reporting -- Structure findings for the intended audience. An executive needs a risk summary. A SOC analyst needs IOCs and detection rules. A legal team needs evidence with chain of custody.
Platforms like MAGO automate steps 2-4 for domain intelligence investigations. Enter a domain, receive a structured report with DNS analysis, subdomain enumeration, security header assessment, technology fingerprinting, and threat intelligence correlation -- all from passive sources, delivered in seconds.
The Ethics of OSINT
OSINT operates within a framework of legal permissions and ethical obligations. The fact that data is publicly accessible does not mean every use of that data is ethical or legal.
Legal Framework
In the United States, the Computer Fraud and Abuse Act (CFAA) criminalizes "unauthorized access" to computer systems. Passive OSINT -- querying public APIs, reading web pages, checking DNS records -- does not constitute unauthorized access. Active scanning (port scanning, vulnerability probing) enters a gray area depending on jurisdiction and the specific activity.
In the European Union, the General Data Protection Regulation (GDPR) regulates the processing of personal data, even publicly available data. Collecting and storing personal information about EU residents requires a lawful basis (legitimate interest, consent, legal obligation, etc.). OSINT practitioners operating in or targeting EU data must comply.
Ethical Principles
- Necessity. Collect only information required for the stated purpose. Mass collection without purpose is surveillance, not intelligence.
- Proportionality. The investigative methods must be proportional to the objective. A routine vendor assessment does not justify months of deep-dive investigation.
- Accuracy. Corroborate findings from multiple sources. Single-source intelligence is unreliable and potentially misleading.
- Accountability. Document methodology. If your conclusions are challenged, your process should withstand scrutiny.
- Minimization. Retain data only as long as necessary. Securely dispose of personal information when the engagement concludes.
- No harm. Do not publish or distribute information that could endanger individuals. De-identify personal data in reports when possible.
Where OSINT Crosses the Line
These activities are NOT OSINT, regardless of how they are labeled:
- Creating fake social media profiles to connect with targets (social engineering)
- Accessing systems using default or guessed credentials (unauthorized access)
- Exploiting vulnerabilities discovered during reconnaissance (penetration testing, requires authorization)
- Purchasing stolen data from dark web marketplaces (receiving stolen property)
- Intercepting network traffic (wiretapping)
- Doxing individuals (harassment, potentially illegal)
OSINT for Organizational Security
The most impactful application of OSINT is turning it inward -- using OSINT techniques to discover what an attacker would find when investigating your own organization. The Verizon 2025 DBIR reports that vulnerability exploitation accounts for 20% of initial access vectors. Many of these exploited vulnerabilities exist on assets the organization does not know about.
An attack surface management program is essentially continuous OSINT against your own infrastructure. Regular subdomain enumeration, security header auditing, certificate monitoring, and technology fingerprinting create visibility into the assets an attacker would target first.
The IBM Cost of a Data Breach 2025 report found that organizations using security AI and automation saved $1.9M per breach on average and reduced the breach lifecycle by 80 days. Automated OSINT platforms contribute directly to this reduction by continuously monitoring for new exposures before adversaries discover them.
MITRE ATT&CK -- TA0043 Reconnaissance, T1593, T1596, T1590. Verizon 2025 DBIR -- 22,000+ incidents, exploitation at 20% of initial access. IBM Cost of a Data Breach 2025 -- $4.44M average, AI saves $1.9M. NIST SP 800-150 -- Guide to Cyber Threat Information Sharing. OWASP Testing Guide v4.2 -- Section 4.1, Information Gathering.