What is OSINT? A Complete Beginner's Guide

Open Source Intelligence (OSINT) is the collection, analysis, and application of information gathered from publicly available sources. It is not hacking. It is not surveillance. It is the disciplined practice of finding what is already visible -- and making sense of it. Governments, corporations, journalists, law enforcement, and security researchers all use OSINT daily. If you have ever searched for someone on LinkedIn, checked a company's WHOIS record, or looked up an IP address, you have performed OSINT.

The term originated in military intelligence circles during the 1940s, when analysts realized that 80-90% of the intelligence needed for strategic decisions could be derived from newspapers, radio broadcasts, and public records. That ratio still holds. The U.S. Director of National Intelligence has repeatedly stated that open sources provide the foundation for the majority of finished intelligence products. What has changed is the volume of available data and the tools to process it.

OSINT Defined: What Qualifies as Open Source

A source qualifies as "open" when it meets two criteria: it is publicly accessible, and accessing it does not require breaching any legal, contractual, or technical access control. This includes:

Public websites and web applications -- any content indexable by search engines or accessible via a standard HTTP request
Government records -- court filings, corporate registrations, patent databases, property records, regulatory filings (SEC EDGAR, Companies House, etc.)
Social media -- public profiles, posts, comments, metadata on platforms like LinkedIn, Twitter/X, Facebook, Instagram, Reddit
Technical infrastructure data -- DNS records, WHOIS registrations, SSL/TLS certificates, BGP routing tables, Shodan/Censys scan data
Academic and research publications -- journals, preprints, conference proceedings, dissertations
Media -- news articles, press releases, podcasts, videos, satellite imagery (Google Earth, Sentinel Hub)
Dark web monitoring -- publicly accessible .onion sites, paste sites, breach notification databases

What does not qualify: anything behind a login wall that requires credentials you are not authorized to use, data obtained by exploiting vulnerabilities, intercepted communications, or information obtained through social engineering under false pretenses.

The OSINT Intelligence Cycle

Raw data is not intelligence. A list of 500 subdomains is data. Identifying which of those subdomains runs an unpatched Apache server exposed to the internet -- that is intelligence. OSINT follows the same intelligence cycle used by government agencies:

1. Requirements Definition

Before collecting anything, define what you need to know. "Tell me everything about example.com" is not a requirement. "Identify all internet-facing assets belonging to Example Corp, assess their security posture, and determine if any are running software with known critical vulnerabilities" -- that is a requirement. Clear requirements prevent scope creep and focus collection effort where it matters.

2. Collection

Systematic gathering of data from relevant sources. For a domain investigation, this might involve querying DNS records, enumerating subdomains via Certificate Transparency, pulling WHOIS history, fingerprinting web technologies, and checking threat intelligence feeds. The key word is systematic -- ad hoc Googling is not collection.

3. Processing

Raw collected data needs normalization, deduplication, and structuring. When you query five different subdomain sources, you get overlapping results in different formats. Processing merges them into a single, clean dataset. This is where automation becomes essential -- manual processing does not scale beyond trivial investigations.

4. Analysis

The step where data becomes intelligence. Analysis identifies patterns, anomalies, relationships, and risk indicators. A domain resolving to an IP address in a different country than the organization's headquarters is an anomaly worth investigating. A cluster of subdomains all running the same outdated WordPress version is a pattern indicating systemic patch management failure.

5. Dissemination

Intelligence is useless unless it reaches the right decision-maker in a format they can act on. A 200-page raw data dump is not a deliverable. A structured report with an executive summary, prioritized findings, and remediation recommendations is. This is where tools like MAGO's domain intelligence reports provide value -- they automate the entire cycle from collection to formatted output.

Categories of OSINT

OSINT practitioners typically organize their work into categories based on the type of source material:

Category	Description	Example Sources
SIGINT-adjacent	Technical signals from internet infrastructure	DNS, BGP, TLS certs, Shodan, Censys
SOCMINT	Social media intelligence	LinkedIn, Twitter/X, Reddit, forums
GEOINT	Geospatial intelligence from open imagery	Google Earth, Sentinel Hub, Mapillary
FININT	Financial intelligence from public filings	SEC EDGAR, OpenCorporates, sanctions lists
TECHINT	Technical intelligence from code and configs	GitHub, GitLab, Pastebin, npm registry
HUMINT-adjacent	Human intelligence from public interactions	Forums, Q&A sites, conference talks

For cybersecurity professionals, the technical categories (SIGINT-adjacent and TECHINT) are most relevant. An attack surface management program relies heavily on automated collection from DNS, certificate, and scan databases.

OSINT Tools and Techniques

Passive Reconnaissance

Passive techniques collect data without sending any traffic to the target. The target has no way to detect that an investigation is occurring. This is the safest and most common starting point.

# Passive DNS lookup -- no traffic to target
curl -s "https://crt.sh/?q=%.example.com&output=json" \
  | jq -r '.[].name_value' | sort -u

# WHOIS record -- public registry query
whois example.com

# Check threat intelligence feeds
curl -s "https://otx.alienvault.com/api/v1/indicators/domain/example.com/general"

Semi-Active Reconnaissance

Semi-active techniques send standard requests that blend with normal internet traffic. A single HTTP GET request to check a website's headers is semi-active -- the target's web server logs the request, but it is indistinguishable from any other visitor.

# Check HTTP security headers
curl -sI https://example.com | grep -iE "^(strict|content-security|x-frame|x-content)"

# Technology fingerprinting via HTTP response
curl -sI https://example.com | grep -iE "^(server|x-powered-by|x-aspnet)"

Active Reconnaissance

Active techniques -- port scanning, vulnerability scanning, brute force enumeration -- generate detectable traffic and may trigger security alerts. These require explicit authorization from the asset owner. Running Nmap against a target without permission is not OSINT; it is unauthorized scanning and potentially illegal.

Real-World OSINT Applications

Cybersecurity and Threat Intelligence

Security operations centers (SOCs) use OSINT to enrich indicators of compromise (IOCs), investigate phishing campaigns, and map attacker infrastructure. When a phishing email contains a suspicious domain, OSINT reveals who registered it, when, what IP it resolves to, what other domains share that IP, and whether any threat intelligence feeds have flagged it. The Verizon 2025 DBIR found that credential abuse (22%) and vulnerability exploitation (20%) are the top initial access vectors -- both are detectable through OSINT before they result in a breach.

Corporate Due Diligence

Before acquisitions, partnerships, or vendor agreements, organizations use OSINT to assess the target's digital footprint. How many internet-facing assets do they have? Are any running outdated software? Do they have proper security headers? Is their email infrastructure configured to prevent spoofing? A domain intelligence scan reveals more about an organization's security maturity than a questionnaire.

Journalism and Investigations

Investigative journalists use OSINT to verify sources, trace financial flows through corporate registries, geolocate images and videos, and uncover connections between entities. The Bellingcat investigation team's work on the MH17 shootdown and the Salisbury poisoning are landmark examples of OSINT used for accountability.

Law Enforcement

Law enforcement agencies use OSINT for locating suspects, tracing digital evidence, investigating fraud, and monitoring for threats. Public social media posts, domain registrations, cryptocurrency transactions, and dark web marketplace listings all fall within the scope of lawful OSINT collection.

Competitive Intelligence

Businesses use OSINT to monitor competitors -- tracking their technology stack changes, new product launches (visible through subdomain creation and job postings), patent filings, and hiring patterns. A competitor suddenly registering subdomains like ai-platform.competitor.com and posting ML engineer positions signals a strategic pivot.

Legal and Ethical Boundaries

OSINT operates in a legal gray area that varies by jurisdiction. The general principle: if information is publicly accessible and you access it through normal means, it is legal to collect and analyze. But "publicly accessible" and "legal to use" are not synonyms.

Key Legal Considerations

Computer Fraud and Abuse Act (CFAA) -- US: Accessing a computer system "without authorization" is a federal crime. If a website requires login credentials, accessing it without authorization is illegal regardless of how easy the credentials are to guess.
GDPR -- EU: Collecting personal data of EU residents requires a lawful basis. OSINT on individuals must comply with data protection regulations, even if the data is technically public.
Terms of Service: Scraping data from platforms that prohibit it in their ToS may violate the CFAA (per the hiQ Labs v. LinkedIn precedent, this remains contested) or constitute breach of contract.
Wiretapping laws: Intercepting communications in transit -- even if technically feasible -- is illegal in most jurisdictions.
Local regulations: Many countries have specific laws governing surveillance, data collection, and privacy that apply to OSINT activities.

Ethical Guidelines

Legal does not mean ethical. Responsible OSINT practice follows these principles:

Proportionality -- Collect only what is necessary for the stated purpose. Do not vacuum up everything because you can.
Minimization -- Retain data only as long as needed. Dispose of personal information when the investigation concludes.
Accuracy -- Verify findings through multiple sources before acting on them. Single-source intelligence is unreliable.
Transparency -- Document your methodology. If your findings are challenged, you should be able to explain exactly how you obtained each piece of information.
No deception -- Do not create fake profiles, impersonate others, or use social engineering to obtain information. That crosses into human intelligence (HUMINT), not OSINT.

MAGO's Position

MAGO Intelligence performs passive reconnaissance only -- standard HTTP requests to publicly accessible endpoints. No active exploitation, no credential brute forcing, no vulnerability scanning. Every data point in a MAGO report comes from publicly available sources: DNS records, WHOIS registries, Certificate Transparency logs, and HTTP response headers.

Getting Started with OSINT

If you are new to OSINT, start with a structured approach rather than downloading every tool you find on GitHub:

Learn the fundamentals. Understand how DNS works, how WHOIS registries function, what Certificate Transparency is, and how HTTP headers reveal information. These building blocks underpin every OSINT tool.
Practice on authorized targets. Use your own domains and infrastructure. Many OSINT training platforms (like SANS Cyber Ranges, TryHackMe, and Hack The Box) provide legal practice environments.
Master search operators. Google dorking alone can reveal an extraordinary amount of information. Learn site:, filetype:, intitle:, inurl:, and cache: operators.
Use automation sparingly at first. Running tools without understanding what they do leads to noise, not intelligence. Manually perform each technique before automating it.
Document everything. Maintain a record of what you searched, when, what you found, and from which source. This is essential for reproducibility and legal defensibility.

For domain and IP investigations specifically, platforms like MAGO automate the collection and processing phases, letting you focus on analysis. Enter a domain, and within seconds you have subdomain enumeration, DNS analysis, security header assessment, technology fingerprinting, and threat intelligence correlation -- structured into an actionable report.

The Future of OSINT

Three trends are reshaping OSINT in 2026:

AI-assisted analysis. Large language models are being integrated into OSINT workflows for entity extraction, relationship mapping, sentiment analysis, and report generation. The IBM Cost of a Data Breach 2025 report found that organizations using AI in their security operations cut breach lifecycle by 80 days and saved $1.9M on average. AI does not replace analysts -- it handles the volume problem that makes manual analysis impossible.

Attack surface management convergence. OSINT and ASM are merging. The ASM market, valued at $1.5B in 2025, is projected to reach $5-12B by 2030. Gartner projects that 60% of organizations will have formal ASM programs by 2026. Every ASM platform is, at its core, an automated OSINT engine focused on an organization's own digital footprint.

Democratization. Tools that once required deep technical expertise are becoming accessible to non-technical users. Modern OSINT platforms abstract away the complexity of querying dozens of data sources and correlating results, making intelligence accessible to legal teams, compliance officers, journalists, and executives.

References

Verizon 2025 DBIR -- 22,000+ incidents, credential abuse (22%) and vulnerability exploitation (20%) as top vectors. IBM Cost of a Data Breach 2025 -- $4.44M global average, AI reducing lifecycle by 80 days. MITRE ATT&CK -- T1593 (Search Open Websites/Domains), T1596 (Search Open Technical Databases). NIST SP 800-150 -- Guide to Cyber Threat Information Sharing.