Block Search Engines From Accessing Files

Notebook // November 2012

The robot.txt file restricts search engines from crawling live pages that you don't want indexed in searches. If you don't want to restrict any files from search engines, the robot.txt file isn't necessary, not even an empty one.

Using Robot.txt

What type of pages could you restrict? The possibilities are endless and it's really up to you. To get started, you'll need to create a new file and name it robot.txt. You'll place this file in your root directory.

The first line you'll want to place in your new file is the following:

  • User-agent: *

The asterix is used for the User-agent to restrict all search engines. You can also target a specific search engine from crawling your site like:

  • User-agent: Googlebot

Beneath the User-agent you can restrict different files, directories, etc. by using the following examples:

  • Blocks entire site, Disallow: /
  • Blocks directory and all of its contents, Disallow: /foldername/
  • Blocks a specific file name, Disallow: /pagename.html
  • Blocks a specific file type, Disallow: /*.gif$

Restricting Robot.txt From Public View

One thing to remember is that the robot.txt file is public, so anyone could access it. I don't know about you, but if I'm blocking search engines from certain links, I certainly don't want anyone else accessing them either. One of the tricks I like to do is add a redirect within my .htaccess* file. Here's what it would look like:

  • Redirect 301 /robot.txt http://www.awmcreative.com/

This command redirects anyone trying to view http://www.awmcreative.com/robot.txt to the landing page of AWM Creative. This keeps the curious/hackers from seeing the blocked files, pages, etc.

*The .htaccess file only works on Linux-based hosting. If you're on a Windows-based hosting provider it will not work. Another thing to note, the .htacess is a hidden file, so you may have need to change your settings to show hidden files.

Learn More About Robot.txt

If you'd like to read up more on the robot.txt file, check out robottxt.org.

Join the Conversation