Google Dorks For OSINT

Google dorks for OSINT — an in-depth guide

Google “dorking” (aka Google dorks or Google hacking) is simply the disciplined use of Google’s advanced search operators to find specific, often hard-to-locate information that search engines have indexed. Many people use simple dorks to try to get better results from the search engine they use. For OSINT investigators (Google Dorks for OSINT), the technique is indispensable: used carefully, it speeds discovery, surface-maps a target, and reveals publicly indexed documents, directories, and footprint artefacts you’d otherwise miss. Used carelessly, it can expose private data or cross legal/ethical lines — so ethical guardrails and an understanding of what Google actually supports are essential. For more details on how to start learning OSINT visit here.

Below, I will cover:

1) Quick operator cheat-sheet (most useful operators)
2) How to build complex Google dorks — a step-by-step recipe
3) Practical OSINT workflows using Google dorks
4) Ethics, law, and safety — non-negotiables
5) Useful references & getting better
Final tips (practical and safe)

1) Most useful operators

Below are the most common terms (Operators) you may have seen before and used in searches. As an OSINT investigator, you will use these operators and others, but combine them like a chef combines ingredients to create a culinary masterpiece. You, as an OSINT Investigator, will learn to create your own masterpiece using the individual operators as your ingredients

– site: — restrict results to a domain, subdomain or URL prefix.
– filetype: — restrict by file extension (pdf, xls, docx, rtf, etc.). Useful for finding reports, spreadsheets, and slides.
– intitle: / allintitle: — search words within page title(s).
– inurl: / allinurl: — search words that appear in the URL.
– intext: / allintext: — search words that appear in the page body text.
– “” (quotes) — exact phrase match.
– – (minus) — exclude terms.
– OR — logical OR (capitalised).
– related: — find pages related to a URL.
– cache: — view Google’s cached copy of a page.

It is worth mentioning the “*” here. It is often misunderstood and misused. The result is you don’t get the results you are looking for.

In Google web search, the asterisk * is a whole-word wildcard that only reliably works inside quoted phrases — it stands for one or more whole words (not partial words or characters). It does not work as a generic wildcard inside most operators (e.g. site:facebook.* or inside a URL) and it will not match part of a single word. Use OR, inurl:, intitle: or explicit domain lists instead, which * won’t do the job.

Below is a practical breakdown with examples, gotchas, and alternatives.

Google Dork Cheatsheet.I will spend more time explaining the wildcard “*” because it is an operator that many people are familiar with and appears to offer numerous benefits. The problem is it only works in certain situations. Google no longer offers this as an operator with all the functionality it once had. Most other operators can be followed with a little explanation in the downloadable cheat sheet. However, unless you understand “*”, you will not get the results you are looking for

Where * does work (reliable behaviour)

Inside quoted phrases are only used as a placeholder for one or more whole words. Example:

"Google * my life"

Will match phrases like “Google changed my life”, “Google runs my life”, etc. Each * stands for at least one whole word. This is the classic, documented use.

As a proximity-ish helper inside quotes. You can use multiple * to allow for varying distances between two words in the exact-order phrase:

"house * dog"   → matches "house big dog", "house the brown dog", etc.

Where `*` does not work (common mistakes)

Not a character wildcard. It does not replace characters inside a word. compu* will not reliably match computer, computing, etc. Google doesn’t support single-character or suffix/prefix wildcards in general web search the way file systems or SQL LIKE patterns do. For stemming-like matching, you must rely on Google’s internal synonym/stemming behaviour or try multiple explicit terms.
Not supported inside site: for wildcard TLDs or arbitrary domain patterns. site:facebook.* It is not a supported documented way to match all Facebook country TLDs — it’s unreliable. Use explicit OR lists (e.g. site:facebook.com OR site:facebook.co.uk) or other enumeration techniques.
Not a wildcard inside URLs or operators. You cannot use * to the wildcard part of an inurl: or intitle: token in the way you might expect; inurl:admin* won’t reliably return all adminXYZ paths. Use inurl:admin (it matches substrings) or explicit patterns. (Google tokenises and normalises URLs/terms; wildcard semantics are limited.)
Not supported everywhere (UI vs APIs vs CSE): Some Google products differ — e.g., Custom Search Engine (CSE) has different wildcard/option behaviour. Don’t assume a wildcard will behave the same across UI, APIs, or CSE.

Practical examples and what you should use instead

Goal: find pages titled “annual report 2024”, “annual report 2023”, etc.

Using * (works inside quotes):

"annual report * 2024"

But that’s awkward. Better:

intitle:"annual report" 2024

Goal: match many TLDs for facebook (e.g., .com, .co.uk, .fr)

Don’t use: site:facebook.* (unreliable)
Do: explicitly list what you need:

(site:facebook.com OR site:facebook.co.uk OR site:facebook.fr)

Goal: match different words between two known words (proximity)

Use * inside quotes:

"privacy * policy"

This will match “privacy and cookie policy”, “privacy policy” (note: a single * may require at least one word — test), or use Google’s AROUND(n) for flexible proximity:

privacy AROUND(3) policy

AROUND(n) is often better when you want a maximum word distance regardless of order.

Testing & verification tips

Always test queries interactively. Results can vary by region, personalisation, and Google’s index updates.
If a wildcard attempt returns odd results, rewrite using explicit tokens or phrase matching.
For URL/domain wildcard needs, use domain enumeration tools or passive DNS (OSINT toolset) instead of relying on site:*.example.com. Those tools are built for wildcard/subdomain discovery.

Notes: Not every historical operator is still active; Google’s support may change periodically. If something seems flaky, check Google’s documentation.

2) How to build complex Google dorks for OSINT— a step-by-step recipe

If you haven’t already downloaded the free Google Dork Cheat Sheet, I suggest you do so now. As I mentioned earlier, building complex Dorks is akin to formulating a recipe. Many people compare it to writing short pieces of code. Think of a dork as a mini program, and you follow the process of:

Define the objective → Pick the operators → Combine with boolean logic → Test and iterate.

Define the objective (be precise).

Consider your goals and what you are seeking. It is essential to clarify your goals. Here are some example objectives:
Example objectives:
– “Find public PDF reports published by example.org that mention supplier invoices.”
– “Find pages on subdomains of company.com that expose directories or backup files.”

2. Pick high-value operators for the objective.

Once you have defined your objectives, you know what you are looking for and can choose the operators that are likely to provide the best results.

– If you want PDFs:

filetype:pdf

– If you want a specific domain:

site:company.com

– If you want the term in the filename or URL:

inurl:invoice or 
inurl:backup

3. Start simple, then refine.

Once you have got your basic dork, you can then start refining it. Review your results and ask yourself what you can add to the document to reduce the number of results and help focus on your goal.

Start: site:

example.org filetype:pdf invoice

If too broad, add intitle: or inurl:, or exclude noise:

site:example.org filetype:pdf intitle:invoice -intitle:newsletter

4. Combine operators and booleans in parentheses to control logic.

Operators on their own will only get you so far in developing effective dorks; to supercharge them, you need to incorporate Booleans in parentheses to control the logic and achieve the best results possible.

Example

site:example.org filetype:pdf (intitle:invoice OR intitle:billing)

5. Use negative filters to remove noise.

Sometimes data can appear confusing because it is contaminated with information that falls under your criteria, but isn’t relevant; this is known as “Noise”. These are removed using negative filters.

Example:

site:example.org filetype:xls inurl:finance -inurl:blog

6. Leverage data and site constraints where available.

Often, you may be working within a known time period, so it is advantageous to keep your search results within the time frame you are examining. There is no point in having information relating to people born before 2000 if you know that your target was born in 2000. You can reduce it further, depending on the level of detail you want regarding the individual.

Google’s UI advanced search / before:/after: syntaxes can help in some contexts (behaviour may change — verify).

7. Iterate and document your successful dorks.

As with all areas of OSINT, keeping accurate and detailed records can be beneficial. This is particularly useful when working with dorks. Recording what search terms were used and to what extent the search was successful can help you develop your skills and make your work more efficient.
Save templates and the exact queries that worked; small changes to a target’s site can make a dork stop working overnight.

Example complex dork (benign example for public research):

site:gov.uk filetype:pdf (intitle:"annual report" OR intitle:"annual accounts") "climate change" -site:gov.uk/news

3) Practical OSINT workflows using Google dorks

When conducting any OSINT Investigation, workflow should be a priority. I have enclosed a simple workflow that will assist in developing your search terms. You can then alter it to fit your methods. It is essential to bear in mind when examining OSINT and other intelligence gathering techniques that, as we become more proficient, we may be inclined to jump to conclusions. For example, with a dork, you can write an advanced dork that provides you with a concise set of data. This is acceptable, but you may lose information if you don’t follow a workflow process that allows you to view additional information as you progress through your research.

– Target footprinting: start with site: + company domain and enumerate subdomains via site:*.company.com and inurl: for known service paths. Combine with filetype: to search for exposed documents (such as resumes and spreadsheets). More about OSINT Workflow here and a free download

– Document discovery: search for filetype:pdf, filetype:xls, filetype:docx combined with department/name keywords to find public reports, CVs or published spreadsheets.

– Public directory discovery: queries like intitle:”index of” inurl:/uploads will show public directory listings — use only for legitimate public-facing content.

– Cross-platform chaining: use an initial dork result (a leaked PDF, say) to extract internal strings, then search those strings across site:linkedin.com, site:twitter.com, or news domains to build a richer profile.

– Automation with care: many OSINT pros store dork templates and run queries through safe tooling (e.g., APIs or custom scrapers) that respect robots.txt, rate limits, and terms of service. If you automate, ensure that you don’t overload services or bypass authentication.

4) Ethics, law, and safety — non-negotiables

Ethics is a crucial element of OSINT that can often be overlooked, especially when everything is going well and rapid progress and a significant amount of intel are being gathered. Google Dorking is frequently viewed as a straightforward technique, and its associated ethics can be easily overlooked. Let’s face it, we do Google searches every day. Google Dorking is dual-use. The method can reveal sensitive but unintentionally public data, such as usernames, internal documents, and database dumps. It can also reveal real vulnerabilities that attackers might exploit. Therefore:

– Only access information that is publicly indexed and you have legitimate authority to research. Do not attempt to bypass authentication or access private content.
– If you’re performing security testing, get written permission (scope, targets, allowed methods). For corporate security assessments, this is a standard practice.
– Be aware of local laws (e.g., CFAA in the U.S., GDPR in Europe) — discovering data is not the same as exploiting it, but handling or publishing personal data can have legal consequences. If your work could handle personal data, consult legal counsel or your organisation’s DPO.
– Don’t publish dorks that explicitly point to credentials, private databases or ways to exploit systems. Public education is fine; posting lists that directly lower the bar for attackers is irresponsible.

5) Useful references & getting better

Google dorks is an area that can change over time as Google updates what each operator produces. We must stay up-to-date with changes. Combine this with practice and skill development, and you will recognise that keeping up with skill development and maintaining our knowledge level current will make us better and more efficient OSINT investigators.- Google Search Central

— official list and behaviour of search operators. Excellent starting point.
– Security vendor explainers (e.g., Recorded Future, Imperva) for examples, risk framing and common pitfalls.
– OSINT community writeups and blogs for practical patterns and case studies — learn templates, then adapt ethically.

Final tips (practical and safe)

Google dorks can be a highly efficient and effective method of gathering data and intelligence that can aid in investigations. Some tools appear more “trendy” and create an image of an investigator that some people may strive to obtain. But we must not forget that we have a goal to achieve in our investigations, and if we can save time and effort by using a technique like Google dorking, we should. Personally, I would rather be known as an individual who was known for helping people rather than have an image of being a “Secretive hacker” gathering covert intelligence. OSINT can be a tedious and boring activity, but it can also be the most exciting. So enjoy your work and thrive on the success you have.

Here are a couple of pointers to help you do this.

– Start with benign objectives (public reports, research papers, marketing materials) to practice combining operators.
– Keep a query log and versioned templates. Small changes in site structure break dorks.
– Respect rate limits and robots.txt when automating; prefer APIs or manual queries when investigating sensitive targets.
– When you find sensitive exposures, follow responsible disclosure: contact the site owner and avoid publishing the data.