WordPress.org

Ready to get started?Download WordPress

Forums

robots.txt set to disallow, can't change (35 posts)

  1. mazondo
    Member
    Posted 3 years ago #

    I'm working with wordpress multisite, and have verified that the primary blog is set to allow crawlers in the privacy settings. Unfortunately, the generated robots.txt file is still showing disallow for all the sites. Any ideas on why this would be the case and how to fix it?

  2. WordPress doesn't naturally make the robots.txt file. Edit it manually.

  3. mazondo
    Member
    Posted 3 years ago #

    The thing is that I don't have a manual robots.txt file. It's definitely being generated by wordpress and added to by the google sitemap xml for multisite plugin. I deactivated all plugins but the file is still generated.

    The thing is that when I first started the multisite I had it set to private. I don't recall ever changing this, but now it's set to public and the robots.txt file is being generated to disallow.

  4. That's contradictory. Either you have a file called robots.txt or you don't. If you do, you edit it. If you don't, then perhaps you're referring to something else? Like meta data perhaps?

  5. Go look at your site via ftp. Is there a robots.txt file in there? Your first posts says there is, your next post says it was generated by a plugin.

    you can still edit it. manually.

  6. faceonline
    Member
    Posted 3 years ago #

    Hi, funnily enough I'm having the same problem: I've just set a site to 'search engines allowed', and uploaded a robots.txt to domain/wp-content/themes/theme and when I go to domain.co.uk/robots.txt, my uploaded file isnt there?

  7. Because you uploaded it to the theme folder, not the root.

    and uploaded a robots.txt to domain/wp-content/themes/theme

    Upload it to /domain/

  8. faceonline
    Member
    Posted 3 years ago #

    Of course! Sorry, excuse my stupidity!

  9. mazondo
    Member
    Posted 3 years ago #

    There isn't a physical robots.txt file in the wordpress root, and it isn't being generated by a plugin. It's the virtual robots file generated by wordpress when your blog is set to private. Unfortunately, my blog is now set to public but is still generating the wrong robots file.

    I've deactivated all plugins and confirmed that the file is still being created, Which leads me to believe I must have done something wrong with the multisite install somehow. I can't create my own robots file to override it because I'm depending on the goole XML sitemaps plugin which requires you to use wordpress for generating the robots file. Any ideas?

  10. mazondo
    Member
    Posted 3 years ago #

    Sorry, I guess I can see the issue with my first post. I meant generated robots file as in the one created on the fly by wordpress. There is no physical robots file in my root directory.

  11. Can you get to it at domain.com/robots.txt ? If indeed it IS virtual, it should still show up.

    I'm really skeptical that it's actual a robots.txt file, virtual or otherwise, and strongly suspect you mean something else, but since that's the familiar term, that's what you're using. If there's no physical file, how do you know it's there? You understand why I'm a skeptic here? I have manually created robots.txt files for my servers, and I've used XML sitemap plugins before by simply adding in the XML info to the file.

    Now. How are you determining that your robots.txt file is there?

  12. And if you give us the URL of the site, we can solve it in maybe five minutes.

  13. mazondo
    Member
    Posted 3 years ago #

    Yes, it is available at http://brandedlocal.com/robots.txt.

    You can tell Its generated by wordpress and not a physical file because of the sitemaps line. Every blog in the network receives it's own robots file with it's own sitemap line. That wouldn't be the case with a static file.

  14. Okay, that is most likely created by a plugin (I'd put money down on your site map plugin). Not wordpress. WordPress doesn't make one, and it sure as heck doesn't make one with the site map bit at the end.

    The disallow line is blank, and IIRC it has to be
    Disallow: /
    To actually disallow all.

    The blank is allow all, basically. Read this: http://www.robotstxt.org/robotstxt.html

  15. David Sader
    Member
    Posted 3 years ago #

    http://codex.wordpress.org/Settings_Privacy_SubPanel

    WordPress does make the robots.txt, virtually. And any plugin can hook into it to add its own rules via do_robotstxt or do_robots. do_robots() is located in wp-includes/functions.php.
    http://codex.wordpress.org/Function_Reference/do_robots

    If you actually have a file there, WordPress doesn't generate robots.txt.

  16. Learn something new... I've always had my own!

    Either way, though, a blank disallow line is the same as saying allow all.

  17. mazondo
    Member
    Posted 3 years ago #

    Really ipstenu? I didn't realize that was set to allow all. Google webmaster tools is saying I have crawling blocked. Can you think of anything else that would be causing that?

  18. mazondo
    Member
    Posted 3 years ago #

    I just checked google webmaster tools again and it seems like everything is working ok now. There may have been more than a few days lag between when I turned off privacy and when it showed as such in google. I have no idea why.

    For anyone looking into this in the future, WordPress DOES generate it's own robots.txt file when you have privacy on, but you do have the option of creating your own and adding it into the root directory to override the generated one.

    Thanks everyone for all your help!!!! I learned a lot about robots.txt files from that link ipstenu, really appreciate it.

  19. ellp
    Member
    Posted 3 years ago #

    I found the problem: The function that creates the virtual robots.txt file is wrong.

    In wp-includes/funcions.php file from the 1779 line starts do_robots function:

    do_robots function () {
    header ('Content-Type: text / plain; charset = utf-8');
    
    do_action ('do_robotstxt');
    
    $ output ='';
    $ public = get_option ('blog_public');
    if ('0 '== $ public) {
    $ output .= "User-agent: * \ n ";
    $ output .= "Disallow: / \ n ";
    Else {}
    $ output .= "User-agent: * \ n ";
    $ output .= "Disallow: \ n ";
    }
    
    apply_filters echo ('robots_txt', $ output, $ public);
    }

    Change the line 1788 to:

    $ output .= "Allow: / \ n ";

    Now the virtual robots.txt file will work correctly.

    I'm trying to figure out how to send this bug report to the folks at wordpress.org, but am having no success: /

  20. report bugs to
    http://core.trac.wordpress.org/
    login with your forum credentials from here, then pick "file ticket" from the nav bar on the right. Fill in as many details as possible.

  21. Andrew Nacin
    Lead Developer
    Posted 3 years ago #

    @ellp:

    If not public, then Disallow: / (disallow everything, allow nothing)
    If public, then Disallow: (disallow nothing, allow everything)

    I don't see a bug here. That's proper.

  22. ellp
    Member
    Posted 3 years ago #

    If not public, then Disallow: / (disallow everything, allow nothing)
    If public, then Disallow: (disallow nothing, allow everything)

    Is syntactically correct, but for some reason Google does not find the file sitemap.xml for example. Changing the parameter to "Allow" google was able to find the file sitemap.xml consequently the entire contents of the blog.

  23. Andrew Nacin
    Lead Developer
    Posted 3 years ago #

    I doubt that was the issue.

    Google finds my blog fine.

  24. ellp
    Member
    Posted 3 years ago #

    Question: you've added your blog's sitemap in Google Webmaster Tools? The error I had was related to that: the robots.txt prevented the reading of the sitemap file.

  25. It may just be becuase Google's a freakin' dink. I read through their webmaster whoopla, and it LOOKS like they're giving weighted preference to allow vs disallow. So while both are, technically, correct, they won't always scan a Disallow: (nothing).

    I'm playing around with their webmaster tools, and seeing different results with 'fake' robots.txt files when I set it as disallow nothing or allow everything.

  26. sergeletko
    Member
    Posted 3 years ago #

    Hey there's a simplier way : just add this to theme functions.php

    function custom_robots($output) {
    	$public = get_option( 'blog_public' );
    	if ( '0' != $public )
    		return str_replace('Disallow:','Allow: /',$output);
    }
    add_filter('robots_txt','custom_robots');

    that will preserve your options and avoid hacking the core

  27. rickfish182
    Member
    Posted 3 years ago #

    Hello all - I'm having this issue as well. Similar to the original poster, I had my site set to "private" while I loaded up all the content and modified the theme. I am using the XML sitemap plugin.

    A few days ago, I set my site to "public" in the privacy settings through wordpress so the site could get indexed properly. However, google still sees the robots.txt file as set to disallow. Here's what it looks like:

    User-agent: *
    Disallow:
    
    Sitemap: http://www.howdoistoppanicattacks.com/sitemap.xml.gz

    I saw some solutions posted by editing the functions.php file, but wasn't sure this solution would work for me?

    Thanks all...

  28. rickfish182
    Member
    Posted 3 years ago #

    @ellp - just tried your solution with my functions.php file and got this error:

    Parse error: syntax error, unexpected T_STRING, expecting T_VARIABLE or '$' in /home3/xtractor/public_html/howdoistoppanicattacks/wp-includes/functions.php on line 1788

    Please help?

    Thanks guys...

  29. Like I mentioned here, it's Google being dumb:

    It may just be becuase Google's a freakin' dink. I read through their webmaster whoopla, and it LOOKS like they're giving weighted preference to allow vs disallow. So while both are, technically, correct, they won't always scan a Disallow: (nothing).

    I'm playing around with their webmaster tools, and seeing different results with 'fake' robots.txt files when I set it as disallow nothing or allow everything.

    The even longer version is that once Google's cached you with Disallow: / (which is 'don't allow anything!'), it DOES NOT cleanly flip back when you re-set to Disallow: (i.e. follow everything). Sometimes.

    I would manually make a robots.txt and force-set it to allow.

    Once that's been re-cached by google, kill the robots.txt and see if it can correctly pick up the auto-generated one.

Topic Closed

This topic has been closed to new replies.

About this Topic