Support » Plugin: LiteSpeed Cache » crawler blacklists all urls

  • Resolved chipro

    (@chipro)


    Hello!
    The crawlers blacklists all URLs. (red status).
    How can I debug / see what the reason is?
    I did not find anything useful in the logs.

    The page I need help with: [log in to see the link]

Viewing 11 replies - 1 through 11 (of 11 total)
  • chipro

    (@chipro)

    10/23/20 13:05:47.999 [37.120.131.40:8860 1 psI] 🐞 Init
    10/23/20 13:05:47.999 [37.120.131.40:8860 1 psI] [Router] parsed type: start
    10/23/20 13:05:47.999 [37.120.131.40:8860 1 psI] 🐞 ……crawler manually ran……
    10/23/20 13:05:47.999 [37.120.131.40:8860 1 psI] 🐞 ……crawler started……
    10/23/20 13:05:48.020 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /transalpina-cea-mai-inalta-sosea-din-romania/
    10/23/20 13:05:48.021 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /muntele-basarab/
    10/23/20 13:05:48.022 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /manastirea-cisterciana-carta/
    10/23/20 13:05:48.023 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /biserica-fortificata-din-biertan/
    10/23/20 13:05:48.024 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /valea-frumoasei-pe-bicicleta/
    10/23/20 13:05:48.025 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /mohoru-uriasul-domol-din-parang-2337m/
    10/23/20 13:05:48.026 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /podul-lui-dumnezeu/
    10/23/20 13:05:48.027 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /piatra-closanilor-necunoscutul-din-muntii-mehedinti/
    10/23/20 13:05:48.028 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /podul-natural-valja-prerast/
    10/23/20 13:05:48.029 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /majdanpek-si-varful-starica-768m-serbia/
    10/23/20 13:05:48.030 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /cetatea-golubac-serbia/
    10/23/20 13:05:48.031 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /trescovat-neckul-vulcanic-de-la-dunare/
    10/23/20 13:05:48.031 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /biserica-fortificata-din-hosman/
    10/23/20 13:05:48.032 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /toamna-in-dosul-builei-buila-vanturarita/
    10/23/20 13:05:48.033 [37.120.131.40:8860 1 psI] 🐞 Crawling [url] /satul-olari-horezu/

    chipro

    (@chipro)

    Report: LHRJDGXE

    Thank you!

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    please edit file /litespeed-cache/src/crawler.cls.php

    at line 586 , you will private function _status_parse( $header, $code ) {

    insert these 2 lines BELOW it

    error_log($code);
    error_log($header);

    then at line 439 , you will see if ( empty( $this->_summary[ 'crawler_stats'

    add this code BEFORE line 439

    error_log($row ['url']);

    the clean your blacklist and run crawler again

    this will log response into your php error log file

    now check the blacklist , and check blacklist’ed URL in error log , and post the content of it

    Best regards,

    chipro

    (@chipro)

    Hello! Here’s the output:

    2020-10-26 09:44:32.737884 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.737893 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/tuturdanu/
    2020-10-26 09:44:32.738629 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.738663 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.738671 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/tyulenovo/
    2020-10-26 09:44:32.739412 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.739426 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.739431 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/urdele/
    2020-10-26 09:44:32.740187 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.740201 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.740206 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/ursache/
    2020-10-26 09:44:32.740947 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.740962 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.740967 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/vaideeni/
    2020-10-26 09:44:32.741766 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.741787 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.741793 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valcea/
    2020-10-26 09:44:32.742532 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.742566 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.742576 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valea-frumoasei/
    2020-10-26 09:44:32.743305 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.743320 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.743326 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valea-lazaurlui/
    2020-10-26 09:44:32.744143 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.744163 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.744171 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valea-mare/
    2020-10-26 09:44:32.744925 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.744944 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.744953 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valea-marii/
    2020-10-26 09:44:32.745697 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.745710 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.745718 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valea-oltului/
    2020-10-26 09:44:32.746533 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.746560 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.746569 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/valja-prerast/
    2020-10-26 09:44:32.747372 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.747388 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.747396 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/vanturarita/
    2020-10-26 09:44:32.748145 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.748161 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.748167 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/vizita/
    2020-10-26 09:44:32.748870 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.748884 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.748895 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/vladeasa/
    2020-10-26 09:44:32.749676 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.749690 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.749695 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/vulcanic/
    2020-10-26 09:44:32.750438 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 0
    2020-10-26 09:44:32.750484 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] 
    2020-10-26 09:44:32.750493 [NOTICE] [45.132.244.92:58898:HTTP2-1#doihoinari.ro] [STDERR] /tag/zanoaga/
    • This reply was modified 1 month ago by chipro.
    • This reply was modified 1 month ago by chipro.
    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    OK , so for whatever reason

    [STDERR] /tag/tuturdanu/
    [STDERR] 0

    it returns “0”

    this is how it supposed to be :

    [STDERR] /about-us
    [STDERR] 200
    [STDERR] HTTP/1.1 200 OK
    content-type: text/html; charset=UTF-8
    x-litespeed-cache: hit

    please double check that you have put code in right location ?

    and also run this in your server terminal

    curl -I -XGET https://your-domain.com/tag/tuturdanu/

    and

    curl -I -XGET –resolve your-domain.com:443:123.123.123.123 https://your-domain.com/tag/tuturdanu/

    * replace your-domain.com to your actual , and 123.123.123.123 to your server IP

    and see what it returns ?

    Best regards,

    Hello, I double checked, the code seems placed fine:

    https://pasteboard.co/Jxyg5EZ.png
    https://pasteboard.co/Jxygkva.png

    Also, here’s the output of the 2 curl queries:
    https://pasteboard.co/Jxyguyu.png
    https://pasteboard.co/JxygA9J.png

    As you can see, the site has redirection to add www to the pages (also redirection from http to https).

    Thanks a lot!

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    Please check in your sitemap if all links there are https:// ?

    if you didn’t set custom site map , please make sure in your WP setting -> general -> both site URL and home URL are set to https:// ?

    Best regards,

    Plugin Support qtwrk

    (@qtwrk)

    no , sorry

    in my curl command example

    replace your-domain.com to your site domain , if you use www as main domain , attach it as well

    chipro

    (@chipro)

    Hmmm, since you brought up the sitemap issue, I added the sitemap address (generated by Yoast SEO) to Custom Sitemap in LiteSpeed Cache plugin and now the crawler works.

    I have noticed now that in the crawler map i see full url, not just /tag/tuturdanu/. Before adding the custom sitemap, the crawler map did not show full url.

    Shall I mark this as solved, or do you need more information?

    Thank you!

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    that’s OK , we always recommended to use custom site map

    Best regards,

    Thank you for your support!

Viewing 11 replies - 1 through 11 (of 11 total)
  • You must be logged in to reply to this topic.