WordPress.org

Ready to get started?Download WordPress

Forums

[resolved] "Fetch as Googlebot" returns 404 for working pages. (4 posts)

  1. ProfessionalGun
    Member
    Posted 4 years ago #

    Apologies in advance if this issue has been discussed somewhere. I searched multiple strings and couldn't find this problem.

    When using Google Webmaster Tools, our site is giving really conflicted "Fetch as Googlebot" results - not to mention that the root URL is the ONLY indexed URL (out of 40) from our sitemap.

    When our custom permalink structure is setup as /%postname%/ ...Googlebot only reports success for URLs that do not have a trailing slash.

    So when I tell Googlebot to fetch

    http://www.example.com/about-us

    Googlebot reports "Success!" with a 301 redirect to http://www.example.com/about-us/

    BUT - when I tell Googlebot to fetch http://www.example.com/about-us/ ...Googlebot returns a 404 Not Found!

    Both URLs open the appropriate webpage when loaded in a browser. . . but my concern is that Google can't index our site!

    I had a lot of trouble getting our permalinks to work initially as /%postname%/. To finally get it working, I had to edit our .htaccess file and insert the following:

    # BEGIN WordPress
    
    <IfModule mod_rewrite.c>
    ErrorDocument 404 /index.php?error=404
    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    
    # END WordPress

    I don't know what any of that code means - but it seemed to work. Now, I'm wondering if it's responsible for preventing Google from seeing any pages deeper than the root. Does anyone know what's happening here?

  2. ProfessionalGun
    Member
    Posted 4 years ago #

    I'm still working on this issue - but I may be close to identifying the problem. In addition to the portion of my .htaccess file included above, I have several working 301 redirects for pages from a legacy site, prior to using wordpress. If I remove those redirects from the .htaccess file, "Fetch as Googlebot" quickly reports success on both urls (with or without the trailing slash.)

    This is exciting, because I'm starting to feel more confident now that I know Google can see our pages. But now that the 301 redirects for our legacy pages have been removed, those links still go to our old website. Is my only option to replace each page with an html redirect?

    . . . Probably not a question for the WordPress forums - but perhaps someone has an idea. Thanks!

  3. ProfessionalGun
    Member
    Posted 4 years ago #

    Final update: (I believe in making sure an answer to my question exists on record, even if I'm the one answering it!)

    . . . It seems my problems were solved by simply moving all my legacy site 301 redirects below the wordpress .htaccess code. (In other words, below # END WordPress) My legacy pages redirect just fine now - and Googlebot is properly reporting for URLs with and without the trailing slash.

    Mission accomplished! I had no idea the .htaccess file was so sensitive to arrangement. Hopefully this helps in the event that anyone else hits a wall on this issue.

  4. Wow. That's a pretty awesome catch :)

    And yes, .htaccess is a top-down file. That means whatever's at the top goes first. Sometimes you want your redirects above the WP code, and sometimes you don't. It sounds like the redirects from your old site were conflicting with WP, which is ... interesting. I wonder if you had a weird regex in there.

Topic Closed

This topic has been closed to new replies.

About this Topic