Glossary › robots.txt

What Is robots.txt?

robots.txt is a plain-text file placed at your site's root (e.g., yoursite.com/robots.txt) that tells search engine crawlers which pages they are and are not allowed to visit. It is the first thing most crawlers check before exploring a site.

Plain-English Definition

The robots.txt file uses a simple set of rules to communicate with web crawlers. A basic file might say: allow all crawlers to access everything. A more configured one might block crawlers from accessing admin areas, staging environments, or duplicate parameter-based URLs that you don't want indexed.

It is important to understand that robots.txt is a request, not a wall. Well-behaved crawlers like Googlebot respect it. Bad actors do not. It also does not prevent a blocked page from appearing in search results if other sites link to it — for that you need a noindex meta tag.

Why It Matters

Every website has a crawl budget — the amount of time and resources Googlebot allocates to crawling your site. A misconfigured robots.txt can waste that budget on pages that don't need to be indexed (like login pages or internal search results), leaving important pages crawled less frequently.

A correctly configured robots.txt focuses Googlebot's attention on the pages that matter — your service pages, blog posts, and location pages — and steers it away from admin paths and duplicate parameter URLs.

Common Mistake to Avoid

The most damaging robots.txt error is accidentally blocking Googlebot from the entire site with a Disallow: / rule. This causes the site to disappear from search results. It happens more often than you'd think — frequently when a developer blocks crawlers during site builds and forgets to update the file before launch.

Learn More

Technical SEO Services →

Need a technical SEO audit? Book a free call.

Book a Free Call

What Is robots.txt?

Plain-English Definition

Why It Matters

Common Mistake to Avoid

Related Terms

Learn More