Skip to content

Commit a6194f8

Browse files
committed
Add curated public notes batch
1 parent 5132371 commit a6194f8

18 files changed

Lines changed: 9990 additions & 32 deletions

TryHackMe/00-foundations/sql-fundamentals.md

Lines changed: 816 additions & 0 deletions
Large diffs are not rendered by default.

TryHackMe/00-foundations/writing-pentest-reports.md

Lines changed: 714 additions & 0 deletions
Large diffs are not rendered by default.

TryHackMe/10-web/gobuster-the-basics.md

Lines changed: 443 additions & 0 deletions
Large diffs are not rendered by default.

TryHackMe/00-foundations/google-dorking.md renamed to TryHackMe/10-web/google-dorking.md

Lines changed: 38 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,34 @@
11
---
2-
2+
type: resource-note
3+
status: done
4+
created: 2026-02-28
5+
updated: 2026-03-12
6+
tags: [security-writeup, tryhackme, osint, google-dorking]
7+
source: TryHackMe - Google Hacking
38
platform: tryhackme
49
room: Google Hacking
510
slug: google-hacking
6-
path: notes/00-foundations/google-dorking.md
11+
path: TryHackMe/10-web/google-dorking.md
712
topic: 10-web
8-
domain: [osint, web-recon]
9-
skills: [search-engines, crawling-indexing, seo-basics, robots-sitemaps, google-dorking]
10-
artifacts: [concept-notes, pattern-cards, cookbook]
11-
status: done
12-
date: 2026-02-28
13+
domain: [osint, web]
14+
skills: [search-engines, crawling-indexing, web-enum, google-dorking]
15+
artifacts: [concept-notes, pattern-card, cookbook]
16+
sanitized: true
1317
---
1418

15-
0. Summary
19+
# Google Hacking
20+
21+
## Summary
1622

1723
* Search engines are *public, large-scale indexes* built by crawlers/spiders that fetch URLs, parse content, and store signals for retrieval.
1824
* “Google dorking” is precision querying with operators (`site:`, `filetype:`, `intitle:`…) to shrink search space and surface exposed content.
1925
* `robots.txt` controls *crawling behavior* (advisory), not access control; blocked URLs can still appear as “URL-only” results. Don’t treat robots as a secrecy mechanism.
2026
* `sitemap.xml` accelerates discovery by listing canonical URLs; it has hard size/URL limits and supports sitemap index files.
2127
* Defensive takeaway: periodically “dork your own org” to find exposures before others do.
2228

23-
1. Key Concepts (plain language)
29+
## Key Concepts
2430

25-
1.1 Crawl → index → rank → serve (the pipeline)
31+
### 1.1 Crawl → index → rank → serve (the pipeline)
2632

2733
* Crawling: fetch pages and discover new URLs.
2834
* Indexing: extract content + metadata and store it in an index.
@@ -47,7 +53,7 @@ Key vocabulary
4753
* Index: database mapping terms/signals → documents.
4854
* SERP: Search Engine Results Page.
4955

50-
1.2 Google operators: what works and what drifts
56+
### 1.2 Google operators: what works and what drifts
5157

5258
Reality check:
5359

@@ -77,7 +83,7 @@ Important nuance for `filetype:`
7783

7884
* It filters by file type/extension and indexable formats. If Google doesn’t index a format, `filetype:` won’t help.
7985

80-
1.3 robots.txt (Robots Exclusion Protocol; advisory)
86+
### 1.3 robots.txt (Robots Exclusion Protocol; advisory)
8187

8288
What it is
8389

@@ -104,7 +110,7 @@ Practical OSINT heuristic
104110

105111
* Treat `Disallow:` entries as *high-signal leads* (admin panels, backups, staging, old paths). Verify carefully and ethically.
106112

107-
1.4 Meta robots and X-Robots-Tag (index control)
113+
### 1.4 Meta robots and X-Robots-Tag (index control)
108114

109115
Crawling vs indexing
110116

@@ -116,7 +122,7 @@ Operational consequence
116122
* If you block a page via robots.txt, Googlebot won’t crawl it and therefore won’t read `noindex` on the page.
117123
* If you allow crawling but set `noindex`, Google can crawl and then drop it from results.
118124

119-
1.5 sitemap.xml (Sitemaps Protocol)
125+
### 1.5 sitemap.xml (Sitemaps Protocol)
120126

121127
What it is
122128

@@ -131,15 +137,15 @@ Why it matters
131137

132138
* Sitemaps reduce discovery cost for crawlers and help with crawl efficiency.
133139

134-
1.6 Ethical boundary (OSINT vs intrusion)
140+
### 1.6 Ethical boundary (OSINT vs intrusion)
135141

136142
* OSINT (including dorking) uses publicly reachable information.
137143
* Crossing the line typically happens when you attempt access to restricted resources, mass-download sensitive data, or exploit what you find.
138144
* In public notes: do not publish real targets or sensitive URLs; use placeholders.
139145

140-
2. Pattern Cards (generalizable)
146+
## Pattern Cards
141147

142-
2.1 Query design card (minimize → sharpen)
148+
### 2.1 Query design card (minimize → sharpen)
143149

144150
* Step 1: reduce scope
145151

@@ -154,7 +160,7 @@ Why it matters
154160

155161
* `"confidential"`, `"password"`, `"backup"`, `"api key"`
156162

157-
2.2 “robots + sitemap first” card
163+
### 2.2 “robots + sitemap first” card
158164

159165
* Check early:
160166

@@ -164,14 +170,14 @@ Why it matters
164170

165171
* `site:TARGET_DOMAIN inurl:<path>`
166172

167-
2.3 Sensitive filetype shortlist (defensive awareness)
173+
### 2.3 Sensitive filetype shortlist (defensive awareness)
168174

169175
* Config/secrets: `env`, `ini`, `conf`, `yml`, `yaml`, `properties`
170176
* Data dumps: `sql`, `bak`, `db`, `sqlite`, `csv`, `json`
171177
* Keys/certs: `pem`, `key`, `pfx`, `p12`, `crt`
172178
* “internal docs”: `pdf`, `docx`, `xlsx`, `pptx`
173179

174-
2.4 Defensive remediation mapping
180+
### 2.4 Defensive remediation mapping
175181

176182
* If it’s publicly accessible, fix at source:
177183

@@ -180,9 +186,9 @@ Why it matters
180186
* add `noindex`/`X-Robots-Tag` where appropriate
181187
* remove/rotate exposed credentials
182188

183-
3. Command Cookbook (placeholders only)
189+
## Command Cookbook
184190

185-
3.1 Operator templates
191+
### 3.1 Operator templates
186192

187193
```text
188194
# Domain scoping
@@ -210,23 +216,23 @@ site:TARGET_DOMAIN inurl:/admin/
210216
site:TARGET_DOMAIN "incident report"
211217
```
212218

213-
3.2 robots + sitemap retrieval
219+
### 3.2 robots + sitemap retrieval
214220

215221
```bash
216222
curl -s https://TARGET_DOMAIN/robots.txt | sed -n '1,200p'
217223
curl -s https://TARGET_DOMAIN/sitemap.xml | sed -n '1,200p'
218224
```
219225

220-
3.3 Defensive self-audit (run on your own assets)
226+
### 3.3 Defensive self-audit (run on your own assets)
221227

222228
```text
223-
site:YOUR_DOMAIN filetype:env
224-
site:YOUR_DOMAIN (filetype:sql OR filetype:bak)
225-
site:YOUR_DOMAIN intitle:"index of" "backup"
226-
site:YOUR_DOMAIN "BEGIN PRIVATE KEY"
229+
site:TARGET_DOMAIN filetype:env
230+
site:TARGET_DOMAIN (filetype:sql OR filetype:bak)
231+
site:TARGET_DOMAIN intitle:"index of" "backup"
232+
site:TARGET_DOMAIN "BEGIN PRIVATE KEY"
227233
```
228234

229-
4. Evidence (sanitized; assets/)
235+
## Evidence
230236

231237
* This note was expanded from a walkthrough transcript provided by the user.
232238
* If you later add screenshots, store under `assets/` and redact:
@@ -235,14 +241,14 @@ site:YOUR_DOMAIN "BEGIN PRIVATE KEY"
235241
* user identifiers
236242
* unique query outputs that expose sensitive paths
237243

238-
5. Takeaways
244+
## Takeaways
239245

240246
* Indexing turns “unknown paths” into “search queries.” Attackers can recon at scale with no scanning.
241247
* The strongest dorks are not complicated; they are *well scoped*.
242248
* robots.txt is not a lock; it is public metadata and often a recon hint.
243249
* Defensive action item: schedule periodic “self-dorking” and treat findings like vuln reports.
244250

245-
6. References (official/docs-first; list titles in public notes)
251+
## References
246252

247253
* Google Search: Advanced Search page
248254
* Google Search Central: File types Google can index (mentions `filetype:` operator)
@@ -253,7 +259,7 @@ site:YOUR_DOMAIN "BEGIN PRIVATE KEY"
253259
* Google Search Central: Build and submit a sitemap + sitemap index files
254260
* Google Search Central: Control what you share on Search (noindex, robots meta, X-Robots-Tag)
255261

256-
CN–EN Glossary (mini)
262+
## CN–EN Glossary (mini)
257263

258264
* Search engine: 搜索引擎
259265
* Crawler / spider: 爬虫/蜘蛛

0 commit comments

Comments
 (0)