WordPress.org

Ready to get started?Download WordPress

Forums

WordPress SEO by Yoast
Google Webmaster Tools cannot parse sitemap.xml (20 posts)

  1. RichardWantsToKnow
    Member
    Posted 2 years ago #

    WP v.3.4
    WPSEO v 1.2.3
    After installing WPSEO, I checked my Google Webmaster account. It appears that according to google, my sitemap.xml cannot be properly parsed. So I checked the WPSEO settings and found I had nine sitemap.xml files. The relevant ones being (all beginning with http://www.mysite.com/): post-sitemap.xml, page-sitemap.xml, post tag-sitemap.xml. I added one of them to my google webmaster Tools and tested, resulting in the error: "Sitemap contains urls which are blocked by robots.txt." All the examples listed were from the directory: http://mysite.com/wp-content/uploads/.
    In checking my robots.txt file I found that that directory was disallowed. Fine; I removed if from the robots.txt file and retested. Still the same error.
    I'm coming to the conclusion that WPSEO is not on the same page as Google with respect to Webmaster Tools.
    Anyone have a fix or ideas???

    http://wordpress.org/extend/plugins/wordpress-seo/

  2. RichardWantsToKnow
    Member
    Posted 2 years ago #

    Add-on bit just discovered: If I correct the post-sitemap.xml by resubmitting it in Webmaster Tools, it tests fine. I then move on to the next xml: page-sitemap.xml, and do the same correction. Tools then reports that sitemap as having no errors; however, the first sitemap (post-sitemap.xml) returns to reporting the same original error that the robots.txt file is blocking the directory mentioned in my original post above.
    How curious. Both sitemaps use or refer to the same robots.txt file, and that file has the offensive disallow clause removed, yet the two sitemaps conflict causing the other to have a fit whereby Google Tools reports it as having errors.
    And why do I need nine sitemap files???
    Anyone? I'm open to ideas.

  3. RichardWantsToKnow
    Member
    Posted 2 years ago #

    And can anyone point me to the documentation in the WP SEO plugin that possibly addressed this potential issue so the user can avoid hours of research?

  4. RichardWantsToKnow
    Member
    Posted 2 years ago #

    Curious: Am I just suppose to list only one sitemap file in Google Tools for the site; and if so, and let's say I list the pages-sitemap.xml, does that only list the pages and then all my posts, categories, et al are unknown to google's search?

    ... Really would have been nice to have had some documentation....

  5. RichardWantsToKnow
    Member
    Posted 2 years ago #

    If I go back to a single sitemap.xml, which is what google originally saw before we started using WP SEO 1.2.3, Tools give us the error: "We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting."
    Well, no wonder! We no longer have a sitemap.xml in the root. How did that happen?

  6. afshinmokhtari
    Member
    Posted 2 years ago #

    I am experiencing what I believe is the same problem. Using latest wordpress version and latest WPSEO. Google webmaster is telling me that it cant parse my sitemap_index.xml because 'URL restricted by robots.txt'

    robots.txt is generated by wp, no? when I go to my http://site/robots.txt file, I see:
    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    ... which looks ok to my untrained eye. Its not allowing robots to crawl anything under the admin and include directories, right? So anything under the root of the site, including the xml file, should be accessible, no?

    thanks,

  7. RichardWantsToKnow
    Member
    Posted 2 years ago #

    It's my understanding that WordPress creeates the original robots.txt file, and that various add-ons such as themes, plugins, etc. either modify it, concatenate it's content or simply overwrite it.
    Correct. What you indicated as being in the robots.txt file would stop search engines from crawling those directories; though that seems like a 'too-short' of a list. I would not want them crawling:
    /wp-includes, /wp-content/plugins, /error_log, /cgi-bin, /wp-content/themes, /wp-content/install.php, /wp-admin, /wp-content/uploads, and /.htaccess. Your site might need more depending on what else you have installed.
    And yes, if it's not on the disallow list, I would say it's open for the world to see.
    Moving forward, I'm not seeing anyway to determine what the WPSEO plugin has changed or added; and, what it's requirements are and why are they requirements. There simply isn't any documentation I can find that explains it. Seems the best or closest I can find is to scour the forums and other discussion areas and hope someone else has already had the same problem. Even then, the bugfixes/revisions are coming out so frequently, it's tough trying to decipher whether a previously reported issue is from a current version or a previous one.
    (I've spent 20+ years in IT [hardware, software, network and database administration] and I've never seen such 'shoot from the hip' methodology. I find it irresponsible. Even a simple set-up of bugzilla would save a lot of potential customers a lot of frustration.

  8. afshinmokhtari
    Member
    Posted 2 years ago #

    Richard, good point about not wanting search engines to crawl those other directories you mentioned.

    I believe my robots.txt is the default that either wp generated, or that wp generated and the WPSEO plugin modified. In either case, its not allowing access to my sitemap (which the plugin also generated) according to google webmaster... so i dont understand at all what is going on.

    I didnt have this problem with previous versions of WP in combo with this plugin... and I'm assuming this is all related to the issues you bring up in this thread and what you are experiencing... am I off?

    Just as reference to robots.txt issues, here is one article from the author himself from earlier this year:
    http://yoast.com/example-robots-txt-wordpress/

  9. RichardWantsToKnow
    Member
    Posted 2 years ago #

    In my opinion, .... a typical article from him; and more importantly, I'm sure it makes total sense to him, but apparently Google sees it otherwise and it is Google that determines my rankings and analytics.
    Thousands of the best in the world vs the opinion of one.
    I spent the day reviewing Google's approach, and I'm thinking that using their method as stated on this page ( https://support.google.com/analytics/bin/answer.py?hl=en&answer=1009686&topic=1009685&parent=1726910&rd=1 ) is an alternative. I think I'll set up one site according to Google's method, run it for a couple of days and see what the raw logs and what Google Analytics look like.
    I'm frustrated, and since there are 71 pages (as of this morning) of questions and problem postings in this forum alone for this plugin, I don't think I'm alone.

  10. PowerBird
    Member
    Posted 2 years ago #

    In Webmastertool erscheint immer wieder ein Fehler mit folgendem Wortlaut: yoast-ga/outbound-article/http://www.zeitungsgenerator.info. Wie kann ich diesen Fehler beheben. Hat das etwas mit den Einstellungen zu tun. Dank denen die mir weiterhelfen.

  11. esmi
    Forum Moderator
    Posted 2 years ago #

    These are English language forums. Please use English.

  12. RichardWantsToKnow
    Member
    Posted 2 years ago #

    I knew I should of taken more languages in college....

  13. trixienolix
    Member
    Posted 1 year ago #

    hi just in case this helps anyone...
    Just had the same problem (installed yoast, submitted sitemap to google webmaster, had error message "URL restricted by robots.txt").
    Took ages to figure out... realised that I'd installed wordpress in it's own directory and so my robots.txt said this:

    User-Agent: *
    Disallow: /wp-content/plugins/

    when it should have said this:

    User-Agent: *
    Disallow: /wordpress/wp-content/plugins/

    Then i re-submitted my sitemap_index.xml in google webmaster and I have no errors...

  14. satguy01
    Member
    Posted 1 year ago #

    I have had the same problem. Just sick of waiting so I turned off XML stie maps from Yoast and use Google sitemap generator. Works fine. Actually, I like it better, as you are able to add pages to the sitemap manually (when they are created outside of WordPress)

    Anyhow, that is my take on this.

  15. RichardWantsToKnow
    Member
    Posted 1 year ago #

    satguy01,

    Thanks for confirming. I've done the same thing, and seems to be working fine here.

    We know the guidelines that the plugin suggests (which appear to be no different than many others); and they are part of our normal practices; which are also part of Google's suggestions. We continue to take the time to follow Google closer and use their suggestions. We do the same for Bing.

  16. RichardWantsToKnow
    Member
    Posted 1 year ago #

    afshinmokhtari,
    I don't think you are probably 'off' if the access to the robots.txt is inaccessible after the install. If it was my site, I would be concerned as well.

    Maybe using the Google Sitemap XML plugin may correct the issue. Let us know if you try it and get any positive changes in your rankings.

  17. Mike Imken
    Member
    Posted 1 year ago #

    Just to add to this, because I've had the same issue.

    Latest install of WordPress 3.4.2 and SEO by Yoast 1.2.8.5.

    Webmaster tools states "Sitemap contains urls which are blocked by robots.txt." then lists the sitemaps "category-sitemap.xml", "page-sitemap.xml" and "post-sitemap.xml". It lists 4 errors but doesn't list "tags-sitemap.xml". Not sure why.

    So I looked and there was no robots.txt file in my directories anywhere! I don't know a lot about this, so I created one with no restrictions.

    Re-tested the main sitemap-index.xml and it still shows the errors.

    So I've deleted the robots.txt file, deactivated Yoast SEO sitemaps and installed and activated Google XML Sitemaps and resubmitted to Webmaster tools.

    I tested this new sitemap and it still reports errors. So I don't know if that's just a delay or what's going on, but I'll monitor it and hopefully it is fixed.

  18. LLP
    Member
    Posted 1 year ago #

    I am having the same issue...cannot find my robots.txt file in my root and I have searched everywhere.

    Google WMT has so many errors and is blocking 13 links are blocked in my robots.txt...the one that doesn't exist.

    I have uninstalled google site-maps and reinstalled it but I am totally lost here...I am not the most internal tech savvy person so please understand that I do not know what I did at all.

    Any help would be appreciated. Thank you.

  19. LLP
    Member
    Posted 1 year ago #

    Also...when I check my robots.txt in WMT I get the following message:

    Allowed
    Detected as a directory; specific files may have different restrictions

    If that helps someone understand my issue better. Thanks in advance.

  20. Mike Imken
    Member
    Posted 1 year ago #

    @LLP

    I'm still learning this as I go as well. WordPress creates a "virtual" robots.txt file.

    So if you look in WMT under Health > Blocked URLs you should see this virtual robots.txt file as it appears to Google.

    You can then see if there are specific files or directories that are blocked.

    Hope that helps. For what it's worth, I turned off Yoast sitemaps and I'm using Google XML Sitemaps and it seems to have fixed it. Yoast SEO sitemaps was also generating errors when trying to create the Post sitemap, so I had to go another route.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic