Blog

What is: Robots.txt

Troy

Robots.txt is a text file which allows a website to provide instructions to web crawling bots.

Search engines like Google use these web crawlers, sometimes called web robots, to archive and categorize websites. Mosts bots are configured to search for a robots.txt file on the server before it reads any other file from the website. It does this to see if a website’s owner has some special instructions on how to crawl and index their site.

The robots.txt file contains a set of instructions that request the bot to ignore specific files or directories. This may be for the purpose of privacy or because the website owner believes that the contents of those files and directories is irrelevant to the categorization of the website in search engines.

If a website has more than one subdomain, each subdomain must have its own robots.txt file. It is important to note that not all bots will honor a robots.txt file. Some malicious bots will even read the robots.txt file to find which files and directories they should target first. Also, even if a robots.txt file instructs bots to ignore a specific pages on the site, those pages may still appear in search results of they are linked to by other pages that are crawled.

This post was originally published in the wpbeginner glossary.

Blog

What is: Robots.txt

Troy

Additional Reading

Let's Work Together

We want to hear about your project

Get in touch

I spent 5 hours fixing one issue

The day that Cloudflare crashed.

We’ve been doing a lot of planning lately.

Free Donation Resources

Opportunities in Mobile Commerce

Blog

What is: Robots.txt

Troy

Additional Reading

Get in touch

I spent 5 hours fixing one issue

The day that Cloudflare crashed.

We’ve been doing a lot of planning lately.

Free Donation Resources

Opportunities in Mobile Commerce

Ask Me Anything About Red Clay Creative

Get kinda fast, informative answers