• Resolved Lee

    (@leecollings)


    Apologies if this has nothing to do with this plugin, but I feel it’s related, and I could really do with the help or advice.

    Since around the start of this month (around when WP 6.9 was released), I’ve started getting a bunch of 404 links showing up in Google Search Console:

    /wp-*.php
    /*
    /wp-content/*
    /wp-content/plugins/*
    /wp-content/themes/[theme]/*
    /wp-content/uploads/

    Obviously, I Don’t have any links on any of my pages going to these, and doing a search in the page source for the first example, has found this in a block of speculation rules (which seems to have been added since 6.8).

    But there must have been something changed in 6.9 which is now apparently trying to fetch these as actual URLs, resulting in GSC showing them as 404 errors, which is wrong.

    Can anyone tell me how this can be rectified, if it’s been acknowledged internally, and if there’s anything that can be done?

    I’m not sure if this plugin will exactly help this issue, so as I said at the start, apologies if it’s unrelated – but I have a feeling it’s not.

    Can anyone help me?

Viewing 4 replies - 1 through 4 (of 4 total)
  • Plugin Support tunetheweb

    (@tunetheweb)

    I don’t think this is related to 6.9, as I’ve heard of this happening before. I followed up with the Search team and they confirmed they will attempt to fetch things that look like a URL, including these things in Speculation Rules JSON syntax. They have no plans to change this that I’m aware of and say the 404s can be ignored.

    I know of one plugin that moved from inline Speculation Rules in the HTML, and instead moved to a JSON file, referenced with an HTTP header because of this. See here: https://developer.chrome.com/docs/web-platform/prerender-pages#speculation-rules-http-header

    Maybe that’s something this plugin and/or WordPress core should consider?

    Plugin Author Weston Ruter

    (@westonruter)

    @tunetheweb Interesting. So given this:

    <script type="speculationrules">
    {
    "prerender": [
    {
    "source": "document",
    "where": {
    "and": [
    {
    "href_matches": "/*"
    },
    {
    "not": {
    "href_matches": [
    "/wp-*.php",
    "/wp-admin/*",
    "/wp-content/uploads/*",
    "/wp-content/*",
    "/wp-content/plugins/*",
    "/wp-content/themes/twentytwentyfive/*",
    "/*\\?(.+)"
    ]
    }
    },
    {
    "not": {
    "selector_matches": "a[rel~=\"nofollow\"]"
    }
    },
    {
    "not": {
    "selector_matches": ".no-prerender, .no-prerender a"
    }
    },
    {
    "not": {
    "selector_matches": ".no-prefetch, .no-prefetch a"
    }
    }
    ]
    },
    "eagerness": "moderate"
    }
    ]
    }
    </script>

    You’re saying that Search may discover what look like URLs inside of the SCRIPT tag and try to crawl them (e.g. /wp-content/themes/twentytwentyfive/*) even though they aren’t explicitly links? That’s surprising.

    Plugin Support tunetheweb

    (@tunetheweb)

    Correct. It’s like a very primitive preload scanner.

    This is easily shown btw. Add something like above (maybe something even more unique) and within a few hours you’ll see GoogleBot crawling it in your web server logs.

    Apparently it causes no problems as failures are almost expected with this. But it does create noise in GSC.

    Plugin Author Weston Ruter

    (@westonruter)

    What if the slashes were escaped? Would this block Googlebot from seeing them as URLs to crawl? So instead of the above, have this instead:

    \/wp-content\/themes\/twentytwentyfive\/*

    Still, seems like Googlebot could be a tiny bit smarter with how it discovered possible URLs to crawl, like automatically exclude speculationrules.

Viewing 4 replies - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.