Forums

When doing page analysis getting - Error: 403 Forbidden by robots.txt (12 posts)

  1. spabizgal
    Member
    Posted 5 years ago #

    I have robot.txt file uploaded to my root directory

    This is what is in there.

    User-agent: *
    Disallow: /

    So why is it BLOCKING any search engines?

    This makes me a bit nervous as it my WP site
    http://www.spaparazzi.com

  2. whooami
    Member
    Posted 5 years ago #

    do you understand what a robots.txt is for?

    its for "exclusion"

    exclusion == blocking

    what you have there will block ANY robot that respects robots.txt files from spidering ANY of your site. Its doing exactly what its intended to do.

    This is precisely why ppl need to read before they do things, and then re-read.

    robots.txt files are NOT for SEO, they are to prevent indexing

    What exactly are you trying to accomplish by using a robots.txt? I ask because it clearly looks like you dont know what its for, much less how to use 1.

  3. spabizgal
    Member
    Posted 5 years ago #

    User-agent: *
    Disallow:

    Corrected

    I have a few other sites and its awhile since I added this... robot.txt file

    Now its searchable

    Gosh thanks for letting me know

  4. whooami
    Member
    Posted 5 years ago #

    if you are going to do that, you may as well remove the file.

    Now you are disallowing nothing, which is the same as the file not being there at all.

  5. spabizgal
    Member
    Posted 5 years ago #

    User-agent: *
    Disallow: /cgi-bin/

    Is that where you are going with this or what would you suggest?

  6. Otto
    Tech Ninja
    Posted 5 years ago #

    spabizgal: I guess we're confused why you have a robots.txt file at all. The whole point of a robots.txt file is to tell systems not to index your site.

    If you want them to index your site, just delete the robots.txt file entirely. You don't need it except to *stop* them from indexing.

  7. DianeV
    Member
    Posted 5 years ago #

    I've found that sometimes search engines request the robots.txt file and, if they don't find it, don't spider the pages.

    spabizgal, you're right: excluding the cgi-bin from being spidered would be a good use of robots.txt.

  8. spabizgal
    Member
    Posted 5 years ago #

    HI Diane,

    Yes that's what I am reading as well. It's better to stay on the side of caution and have it in there, in fact Google is sort of weird about it if you don't have one and you are absolutely right some spiders won't even go near your site without it.

    Ironically I did have this robot.txt added correctly for the rest of my 3 other sites, its just been a long time ago since I first put one in so my mind was drawing a blank.

    Glad you caught that whooami.

    Wouldn't have been good to have it the other way for sure!

  9. whooami
    Member
    Posted 5 years ago #

    >I've found that sometimes search engines request the robots.txt file and, if they don't find it, don't spider the pages.

    that dosnt even make sense, and defies the standard set up for the file.

    what you are more than likely seeing, if in fact, they are legitimate search bots that respect it's usage, is them coming and checking, leaving, then coming back again. They dont necessarily read the file and then sweep your site.

    and yes, spabizgal thats more in line with what you want to accomplish, like diane said.

  10. Otto
    Tech Ninja
    Posted 5 years ago #

    Yeah, that's completely silly and the exact *opposite* of what the robots.txt file is there for.

    No robots.txt = spider all you like. That's the whole point. That's how all spiders work. There's absolutely no reason to have one if you're not blocking anybody from spidering. Google works just fine without it being there.

    I refer you to http://www.robotstxt.org .

  11. cycad
    Member
    Posted 5 years ago #

    I've been working on search engine optimisation for quite a lot of different websites for 8-9 years now. I've never seen a site that wasn't spidered because it didn't have a robots.txt.

    Much, much more likely that the site that started this myth (I'm sure there was one) didn't have a robots.txt and coincidentally, wasn't spidered for some other reason, probably lack of inbound links.

    I rarely use robots.txt on a site because it's not really reliable as an exclusion (sometimes engines miss it - I have seen this happen, though not often). If you do want stuff spidered, you don't need it. If you don't want stuff spidered, put it behind a password.

  12. gmakmel
    Member
    Posted 5 years ago #

    Hi,

    i am facing the same problem for http://www.yourmobile.nl .
    i am getting in the google sitemap admin doing diagnostic, its giving me a 403 error. i added to the htaccess file the following

    ErrorDocument 404 /index.php?error=404

    and to the 404.php the header that was recommended in the wordpress helppages

    still my site is not listet in google. check in google.com

    site:www.yourmobile.nl

    Help is very welcome

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags