Support » Plugin: All In One WP Security & Firewall » Blocking Real GoogleBot Fetches and Indexing!

  • Resolved presswizards

    (@presswizards)


    I am opening my own thread, I had commented on a related but different thread regarding this same issue.

    The Block Fake Googlebots option ended up blocking real GoogleBots from indexing a large client site, causing major SEO damage. Once eliminating firewall and proxy servers as possible causes, we investigated each plugin one by one, and finally identified and disabled this option in your plugin. Google was able to index the sitemaps and URLs again immediately after disabling it.

    I assume the real Googlebot IPs weren’t kept up-to-date, as the last update was 2 months ago, and so now Google sporadically and then eventually became 100% blocked.

    This is obviously very dangerous to any site’s rankings and SEO indexing.

    I will be leaving this option off on all client sites going forward, and may look use other security plugins now as this security plugin is not being kept as up-to-date as it should.

    Rob Marlbrough
    https://presswizards.com/
    https://wpsitedr.com/

Viewing 13 replies - 1 through 13 (of 13 total)
  • Plugin Contributor mbrsolution

    (@mbrsolution)

    Hi, I have enabled this feature in all my sites for a few years and so far I have never had this issue. Google has never blocked the sitemap in any of my sites. Did you also check to make it was not a cache issue? Do you use CDN in your site? Does your server have a cache system?

    I have also submitted a message to the plugin developers to investigate further this issue.

    Thank you.

    Plugin Author wpsolutions

    (@wpsolutions)

    Hi,
    This plugin uses the technique recommended by Google whereby it will do a reverse DNS lookup and also a forward DNS->IP lookup to identify a genuine “Googlebot”, ie, based on the technique here:
    https://webmasters.googleblog.com/2006/09/how-to-verify-googlebot.html

    This plugin’s first check is to see whether the visiting bot uses the string “Googlebot” in its useragent string. Directly quoting from Google:
    “Google’s main crawler is called Googlebot.”
    As you can see on their web crawlers page, all of the indexing bots use “Googlebot” in the useragent string which is what this plugin first checks for.
    https://support.google.com/webmasters/answer/1061943?hl=en

    Maybe in your case the IP address obtained by this plugin might be something other than the real visiting bot IP address. This can happen if your hosting environment is returning a proxy IP address for $_SERVER[‘REMOTE_ADDR’] instead of the actual visitor address.
    You should check with your host support people whether $_SERVER[‘REMOTE_ADDR’] returns the actual visitor address and if it doesn’t then you can find out which global contains the real address and then you could add some code to your wp-config.php file to make sure that the correct IP is populated inside $_SERVER[‘REMOTE_ADDR’] (see my reply in this thread for an example)

    https://wordpress.org/support/topic/wp-security-doesnt-recognize-external-ip-addresses/

    Do you have some examples of IP addresses of any Googlebots which you think were blocked?

    The site is behind CLoudflare reverse proxy… so this would make sense, but we had it configured to return the visitor IP.

    It was strange because often it would work just fine, for months even, but then recently it got worse, our search console errors would increase, and then in the last 5 days, the site was essentially delisted completely, with a ton of search fetch errors, and stayed that way, so we dug into it more.

    Perhaps there’s an issue with the visitor IP configuration, which helped caused this issue, but I think at this point if the site is behind a CDN/proxy, it’s best to leave this off vs relying on the real IP being worked out successfully by some other means.

    I appreciate your thorough response. Perhaps there’s some opportunity to improve proxy detection and deactivate this option if so.

    It was also strange because other things besides Fetch as Google would work fine, including robots.txt testing, sitemap indexing often worked fine but then would give an error other times, but Fetch (and Render) as Google never worked at all, but rendering the public view would work fine.

    Plugin Author wpsolutions

    (@wpsolutions)

    Hi again,
    I noticed that you marked this as resolved. I was wondering – what action resolved it for you?
    Are you still able to consistently reproduce this issue when using the “Fetch as Google” feature in the Google webmaster tools?

    Hi there,

    It was because the site was on Cloudflare, and the proxy was not consistently returning the correct remote IP address. Once I turned it off, Google is consistently able to fetch the site, and our indexing issues were resolved. I also found out that Cloudflare offers the same fake Googlebot blocking, so we didn’t need the plugin checking the same thing anyway.

    Thanks,
    Rob

    Plugin Author wpsolutions

    (@wpsolutions)

    Ok thanks for the info.

    Hello Rob,

    I am interested in your response as i have a site that used Cloudflare and am getting the same issue as you.

    In your last response you state “once i turned it off” what did you turn off please? did you stop using Cloudflare??

    I sue Cloudflare to supply SSL certs etc as a free account and to stop using them would mean a rethink on converting from HTTP to HTTPS.

    THanks

    John W

    Hi guys,

    I also have this problem. Is there any workaround without disabling cloudflare or AIWPS?

    Thanks,
    Kostas

    Plugin Contributor mbrsolution

    (@mbrsolution)

    Hi @sokostas, please read the following thread. This should help you.

    Kind regards

    presswizards

    (@presswizards)

    I simply turned off the Block Fake Googlebots option in this plugin, and problem was solved.

    silviuclg

    (@silviuclg)

    This is the solution:

    wpsolutions (@wpsolutions)
    1 month, 2 weeks ago
    Hi @svayam,
    Did you try changing the IP address settings to see if that makes a difference?
    I’m referring to the following settings:
    Settings >> Advanced Settings >> IP Retrieval Settings

    Sometimes people have a hosting setup where there might be load-balancers or proxies and retrieving the IP address via the $_SERVER[‘REMOTE_ADDR’] gives the proxy address instead of the real visitor IP address.
    Try selecting “HTTP_X_FORWARDED_FOR” and see if that improves the situation.

    svayam

    (@svayam)

    Hi,

    Well I use WP Rocket which already gives me the real IPs. But yes I tried that setting also…unfortunately pages started getting unlisted.

    As I use Cloudflare on my site https://rahulmukherjee.com so what is that HTTP_CF_CONNECTING_IP? is that for Cloudflare?

    Using this HTTP_X_FORWARDED_FOR gets submitted but pages are de-indexed.

    Really can’t understand.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Blocking Real GoogleBot Fetches and Indexing!’ is closed to new replies.