✍️
CTFs
Home
  • CTF Writeups
  • Tools and Payloads
  • TryHackMe
    • TryHackMe Overview
      • Advent of Cyber 2
        • Day 01 - Christmas Crisis
        • Day 02 - The Elf Strikes Back!
        • Day 03 - Christmas Chaos
        • Day 04 - Santa's Watching
        • Day 05 - Someone stole Santa's gift list!
        • Day 06 - Be careful with what you wish on a Christmas night
        • Day 07 - The Grinch Really Did Steal Christmas
        • Day 08 - What's Under the Christmas Tree?
        • Day 09 - Anyone can be Santa!
        • Day 10 - Don't be sElfish!
        • Day 11 - The Rogue Gnome
        • Day 12 - Ready, set, elf
        • Day 13 - Coal for Christmas
        • Day 14 - Where's Rudolph?
        • Day 15 - There's a Python in my stocking!
        • Day 16 - Help! Where is Santa?
        • Day 17 - ReverseELFneering
        • Day 18 - The Bits of Christmas
        • Day 19 - The Naughty or Nice List
        • Day 20 - PowershELlF to the rescue
        • Day 21 - Time for some ELForensics
        • Day 22 - Elf McEager becomes CyberElf
        • Day 23 - The Grinch strikes again!
        • Day 24 - The Trial Before Christmas
      • Web Fundamentals
      • Anonymous
      • Printer Hacking 101
      • OWASP Top 10
        • Injection
        • Broken Authentication
        • Sensitive Data Exposure
        • XML External Entity
        • Broken Access Control
        • Security Misconfiguration
        • Cross-Site Scripting
        • Insecure Deserialization
        • Components with Known Vulnerabilities
        • Insufficent Logging & Monitoring
      • Vulnversity
      • Nmap
      • Google Dorking
      • Blog
      • Metasploit
      • OhSINT
      • Searchlight - IMINT
      • Basic Pentesting
      • Crack the Hash
      • Crack the Hash 2
      • Year of the Jellyfish
      • VulnNet - DotJar
      • Encryption - Crypto 101
      • CC: Pen Testing
      • Kenobi
      • Linux Backdoors
      • Root Me
      • DNS Manipulation
      • OWASP Juice Shop
      • Pickle Rick
      • CC: Steganography
      • OverPass
      • OverPass 2 - Hacked
      • OverPass 3 - Hosting
      • Mr Robot CTF
      • VulnNet
      • Linux PrivEsc
      • Git Happens
      • Buffer Overflow Prep
      • BrainPan
      • CC: Ghidra
      • Intro to x86-64
      • CC: Radare2
      • Linux Forensics
      • ReverseEngineering
      • Reversing ELF
      • Simple CTF
      • c4ptur3-th3-fl4g
      • Cat Pictures
      • Bounty Hacker
      • That's the Ticket
      • Brute It
      • Smag Grotto
      • Ignite
      • Ninja Skills
      • Break It
      • Mustacchio
      • Agent Sudo
      • Poster
      • Fowsniff CTF
      • Juicy Details
      • The Impossible Challenge
      • Golden Eye
      • Lian_Yu
      • Couch
      • GateKeeper
      • WebAppSec 101
      • Advent of Cyber 1
        • Day 01 - Inventory Management
        • Day 02 - Arctic Forum
        • Day 03 - Evil Elf
        • Day 04 - Training
        • Day 05 - Ho-Ho-Hosint
        • Day 06 - Data Elf-iltration
        • Day 07 - Skilling Up
        • Day 08 - SUID Shenanigans
        • Day 09 - Requests
        • Day 10 - Metasploit-a-ho-ho-ho
        • Day 11 - Elf Applications
        • Day 12 - Elfcryption
        • Day 13 - Accumulate
        • Day 14 - Unknown Storage
      • Hacker of the Hill
  • HackTheBox
    • HackTheBox Overview
      • Emdee five for life
      • Templated
      • Phonebook
  • HackTheBox Academy
    • HTB Academy Overview
  • PortSwigger Academy
    • PortSwigger Overview
      • Authenication bypass via OAuth implicit flow
      • Forced Oauth Profile Linking
      • OAuth account hijacking via redirect_uri
      • Stealing OAuth access tokens via an open redirect
      • Stealing OAuth access tokens via a proxy page
  • 2021 CTFs
    • Gurugram Cyber Heist CTF 2021
      • All About Web
      • Are You Web Expert
      • Mobile Phones are Bad
      • The Last Step
      • Social Media Havoc
    • ZH3R0 CTF 2.0 2021
      • Misc - Small Maniac's Game
      • Web - bxss
      • Web - Sparta
      • Web - Baby SSRF
      • Web - Original Store v1 and v2
      • Web - strpos and substr
    • NahamCon 2021
      • esab64
      • Bionic & Meet the Team
      • Gus & Hercules
      • Pollex
  • 2020 CTFs
    • VulnCon2020 Overview
      • Noob Bot Welcomes You!
      • Maze
      • Pcaped
Powered by GitBook
On this page
  • Crawlers
  • Name the key term of what a "Crawler" is used to do
  • What is the name of the technique that "Search Engines" use to retrieve this information about websites?
  • What is an example of the type of contents that could be gathered from a website?
  • Search Engine Optimisation
  • Robots.txt
  • Where would "robots.txt" be located on the domain "ablog.com"?
  • If a website was to have a sitemap, where would that be located?
  • How would we only allow "Bingbot" to index the website?
  • How would we prevent a "Crawler" from indexing the directory "/dont-index-me/"?
  • What is the extension of a Unix/Linux system configuration file that we might want to hide from "Crawlers"?
  • Sitemaps
  • What is the typical file structure of a "Sitemap"?
  • What real life example can "Sitemaps" be compared to?
  • Name the keyword for the path taken for content on a website
  • Google Dorking
  • What would be the format used to query the site bbc.co.uk about flood defences?
  • What term would you use to search by file type?
  • What term can we use to look for login pages?
  1. TryHackMe
  2. TryHackMe Overview

Google Dorking

PreviousNmapNextBlog

Last updated 1 year ago

Date: 15, January, 2021

Author: Dhilip Sanjay S


to go to the TryHackMe room.

Crawlers

  • Crawlers discover content through various means.

    • Pure Discovery - URL visited by the crawler and information regarding the content type fo the website is returned to the Search Engine.

    • Following URLs found from previously crawled sites.


Name the key term of what a "Crawler" is used to do

  • Answer: index

What is the name of the technique that "Search Engines" use to retrieve this information about websites?

  • Answer: Crawling

What is an example of the type of contents that could be gathered from a website?

  • Answer: Keywords


Search Engine Optimisation

  • SEO ranking - Search Engines will priortise those domains that are easier to index.

  • Many factors:

    • How responsive your website is to the different browser types.

    • How easy it is to crawl your website.

    • What kind of keywords your websit has.

Robots.txt

  • This text file defines the permissions the Crawler has to the website.

  • It can specify what files and directories that we do or don't want to be indexed by the Crawler (like admin panel).

Keyword
Function

User-agent

Specify the type of "Crawler" that can index your site (the asterisk being a wildcard, allowing all "User-agents"

Allow

Specify the directories or file(s) that the "Crawler" can index

Disallow

Specify the directories or file(s) that the "Crawler" cannot index

Sitemap

Provide a reference to where the sitemap is located (improves SEO as previously discussed, we'll come to sitemaps in the next task)

  • You can use Regex to allow/disallow contents to be indexed by Crawlers.

  • Generally, Sitemap is located at /sitmeap.xml.


Where would "robots.txt" be located on the domain "ablog.com"?

  • Answer: ablog.com/robots.txt

If a website was to have a sitemap, where would that be located?

  • Answer: /sitemap.xml

How would we only allow "Bingbot" to index the website?

  • Answer: User-Agent: Bingbot

  • Other bots like Googlebot, msnbot won't be allowed to index the site.

How would we prevent a "Crawler" from indexing the directory "/dont-index-me/"?

  • Answer: Disallow: /dont-index-me/

What is the extension of a Unix/Linux system configuration file that we might want to hide from "Crawlers"?

  • Answer: .conf


Sitemaps

  • Sitemaps are indicative resources that are helpful for crawlers - they specify the necessary routes to find content on the domain.

  • It is in XML file format.

  • They show the route to the nested content.

  • Sitemaps are favourable for search engines - because all the necessary routes to content are already provided in this file.

What is the typical file structure of a "Sitemap"?

  • Answer: XML

What real life example can "Sitemaps" be compared to?

  • Answer: Map

Name the keyword for the path taken for content on a website

  • Answer: route


Google Dorking

  • Using google for advanced searching.


What would be the format used to query the site bbc.co.uk about flood defences?

  • Answer: site:bbc.co.uk flood defences

What term would you use to search by file type?

  • Answer: filetype:

What term can we use to look for login pages?

  • Answer: intitle:login


Refer

Click Here
Google Site Analyser
Google Hacking on Wikipedia