MagentoSearch Engine Optimization

SEO Robots.txt for Magento 1.x

It seems like every week I get an email asking about what a robots.txt file is and how to set one up for Magento 1.x so I put together this little guide to get you started.

Firstly, it is important to remember that a robots.txt is only obeyed by good robots/crawlers and not malicious ones. So don’t look at a robots.txt file as a security measure, instead look at it like adding signs to doors inside building saying “Employees Only” or “Keep Out”. You still need to remember to actually lock the doors to keep the dishonest people out. In short, a robots.txt tells search engines where they are allowed to look and where they should ignore for search engine optimization purposes only!

We want to ensure we allow search engines to crawl images, CSS, JavaScript and content but then ignore customer specific URLs such as wishlists, comparisons etc. We also want to ensure that search engines index only the search engine friendly URLs such as domain.com/red-velvet-hat.html and not products and categories retrieved from the frontend controller of index.php by ID

The robots.txt file is located in the root of your Magento 1.x store, if it isn’t there you can create the file yourself.

# Crawlers
User-agent: *

# Directories
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /shell/
Disallow: /var/

# SEF and Customer URLS
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

# Miscellaneous files. 
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /composer.json
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /mage
Disallow: /scheduler_cron.sh
Disallow: /*.php$

# Remove Filtered URLs
Disallow: /*?tx_indexedsearch
Disallow: /*?min*
Disallow: /*?max*
Disallow: /*?q*
Disallow: /*?cat*
Disallow: /*?dir*
Disallow: /*?limit=all
Disallow: /*?mode*

## Disallow session IDs
Disallow: /*?SID=

Once you have created/updated your robots.txt you will want to include any frontend Ajax controllers from third party extensions.

You should also never in anyway include the URL for your store’s administration panel in your robots.txt for security reasons.

Hans-Eirik Hanifl

Hans-Eirik Hanifl is a forward thinking e-commerce and marketing consultant. As an advocate for the free exchange of knowledge, he founded E-Commerce Gorilla as a place where like-minded individuals can ask questions and share their expertise on practical solutions in the area of e-commerce and marketing. He is the owner of TRM Marketing and an avid supporter of the open source community.

Related Articles

Leave a Reply

Back to top button
Sign up to the E-Commerce Gorilla newsletter for updates & special promotions.
Join Our Newsletter
SUBSCRIBE
We value your privacy and protect your information like our own. Unsubscribe at anytime.