Glossary › robots.txt
What Is robots.txt?
robots.txt is a plain-text file placed at your site's root (e.g., yoursite.com/robots.txt) that tells search engine crawlers which pages they are and are not allowed to visit. It is the first thing most crawlers check before exploring a site.
Plain-English Definition
The robots.txt file uses a simple set of rules to communicate with web crawlers. A basic file might say: allow all crawlers to access everything. A more configured one might block crawlers from accessing admin areas, staging environments, or duplicate parameter-based URLs that you don't want indexed.
It is important to understand that robots.txt is a request, not a wall. Well-behaved crawlers like Googlebot respect it. Bad actors do not. It also does not prevent a blocked page from appearing in search results if other sites link to it — for that you need a noindex meta tag.
Why It Matters
Every website has a crawl budget — the amount of time and resources Googlebot allocates to crawling your site. A misconfigured robots.txt can waste that budget on pages that don't need to be indexed (like login pages or internal search results), leaving important pages crawled less frequently.
A correctly configured robots.txt focuses Googlebot's attention on the pages that matter — your service pages, blog posts, and location pages — and steers it away from admin paths and duplicate parameter URLs.
Common Mistake to Avoid
The most damaging robots.txt error is accidentally blocking Googlebot from the entire site with a Disallow: / rule. This causes the site to disappear from search results. It happens more often than you'd think — frequently when a developer blocks crawlers during site builds and forgets to update the file before launch.
Related Terms
Learn More
Need a technical SEO audit? Book a free call.
Book a Free Call