SEO Robots.txt for Magento 1.x
It seems like every week I get an email asking about what a robots.txt file is and how to set one up for Magento 1.x so I put together this little guide to get you started.
Firstly, it is important to remember that a robots.txt is only obeyed by good robots/crawlers and not malicious ones. So don’t look at a robots.txt file as a security measure, instead look at it like adding signs to doors inside building saying “Employees Only” or “Keep Out”. You still need to remember to actually lock the doors to keep the dishonest people out. In short, a robots.txt tells search engines where they are allowed to look and where they should ignore for search engine optimization purposes only!
We want to ensure we allow search engines to crawl images, CSS, JavaScript and content but then ignore customer specific URLs such as wishlists, comparisons etc. We also want to ensure that search engines index only the search engine friendly URLs such as domain.com/red-velvet-hat.html and not products and categories retrieved from the frontend controller of index.php by ID
The robots.txt file is located in the root of your Magento 1.x store, if it isn’t there you can create the file yourself.
# Crawlers User-agent: * # Directories Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /includes/ Disallow: /lib/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /shell/ Disallow: /var/ # SEF and Customer URLS Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /onestepcheckout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ Disallow: /catalog/product/gallery/ # Miscellaneous files. Disallow: /cron.php Disallow: /cron.sh Disallow: /composer.json Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt Disallow: /mage Disallow: /scheduler_cron.sh Disallow: /*.php$ # Remove Filtered URLs Disallow: /*?tx_indexedsearch Disallow: /*?min* Disallow: /*?max* Disallow: /*?q* Disallow: /*?cat* Disallow: /*?dir* Disallow: /*?limit=all Disallow: /*?mode* ## Disallow session IDs Disallow: /*?SID=
Once you have created/updated your robots.txt you will want to include any frontend Ajax controllers from third party extensions.
You should also never in anyway include the URL for your store’s administration panel in your robots.txt for security reasons.