• Resolved djwilko12

    (@djwilko12)


    Hello,

    I have noticed an issue with URLs related to LiteSpeed Cache being crawled by Google in Google Search Console. These URLs point to JavaScript files within the /wp-content/litespeed/js/ directory, and there are hundreds of them appearing in the indexation report.

    This behavior is unexpected, as such URLs are not meant to be indexed by search engines.

      It seems that these URLs are being crawled despite the fact that they should ideally be excluded from indexation by default.

      Could you please investigate whether this behavior might be caused by the plugin’s configuration or how it generates these resource URLs? If it’s an issue within the plugin, I would appreciate a fix or guidance on how to resolve it.

      Thank you in advance for your assistance.

    Viewing 15 replies - 1 through 15 (of 15 total)
    • Plugin Support litetim

      (@litetim)

      Thank you for posting the issue. We will fix this issue in next Major version of plugin(version 7).

      Thread Starter djwilko12

      (@djwilko12)

      Ok, thanks. When do you plan to release it? Have you identify the issue? is it about file permissions?

      Plugin Support litetim

      (@litetim)

      No permission issue, we added a robots.txt in wp-content/litespeed with content:
      User-agent: *
      Disallow: /

      See here: https://github.com/litespeedtech/lscache_wp/commit/a0cd0dc097cab64687d6eb1405c69b4182db30f2#diff-6bc3ed3c014016bd11f372afc394425a561e41bd48eb307569b06fd325cd68e8R156

      Thread Starter djwilko12

      (@djwilko12)

      Hi,

      When will you release this?

      Plugin Support litetim

      (@litetim)

      In february as fast as possible 🙂

      Plugin Support litetim

      (@litetim)

      @djwilko12 hey! great news 🙂
      We fixed the issue on V7, you can update it from your admin area.

      Thank you for waiting 🙂

      I don’t think disallowing via robots.txt is the correct solution. That it just a directive that bots can choose to ignore. It does not keep pages out of search results, especially if they have already been indexed. Should the directory not instead be blocked from indexing with “noindex”, e.g. with an X-Robots-Tag HTTP header?

      Plugin Support litetim

      (@litetim)

      @davidpeake yes, bots can ignore it, but all the big bots will follow rules from that file and will stop following.
      I will follow the ideea to devs and if other improvements will come, will be added to plugin

      Thread Starter djwilko12

      (@djwilko12)

      Thank you @litetim for letting me know.

      I have updated to the last version, and when i check the robots.txt it shows the same as before. Could you please check? I cannot find any new directive added…

      Plugin Support litetim

      (@litetim)

      @djwilko12
      There should be a robots.txt in /public_html/wp-content/litespeed. Please confirm you have it.
      Also please tell me how do you test.

      Thank you!

      Thread Starter djwilko12

      (@djwilko12)

      Yes, i know, but it is not showin ip. I just go to my website . com / robots.txt. I see the same information as i had before, there is no new line related to Litespeed

      Plugin Support litetim

      (@litetim)

      @djwilko12 I do not fully understand your message.
      You have robots.txt and content will stop all bots from crawling in that folder + subfolders

      Thread Starter djwilko12

      (@djwilko12)

      Yes, i have robots.txt content of course. But what im trying to explain is that after updating your latest version, the robots.txt rules that i have are the same as i had before. This means, your update didnt add any new lines related to blocking “/public_html/wp-content/litespeed“.

      Plugin Support litetim

      (@litetim)

      @djwilko12
      Ok, got it.
      The thing is that LSC did not had robots.txt before. That’s why we considered it as an improvement.

      Thread Starter djwilko12

      (@djwilko12)

      Hi @litetim ,

      Thanks for your response, but I honestly don’t see the point of adding a robots.txt file inside the /wp-content/litespeed/ subfolder.

      Search engines like Google only check the robots.txt file located at the root of the site — for example, https://example.com/robots.txt.
      Any robots.txt placed in a subdirectory is simply ignored and not part of the crawling standard.

      So, this current solution doesn’t really prevent bots from crawling or indexing those files.

      If the goal is to prevent Google from indexing anything under /wp-content/litespeed/, then I believe there are only two proper ways to handle it — and I’d strongly suggest implementing either of these at the plugin level:

      1. Add a Disallow: /wp-content/litespeed/ directive to the main robots.txt file, or
      2. Serve an X-Robots-Tag: noindex HTTP header for those resources.

      I appreciate your efforts, but I think this needs to be revisited — the current approach doesn’t effectively solve the issue.

    Viewing 15 replies - 1 through 15 (of 15 total)
    • You must be logged in to reply to this topic.