• Resolved generosus

    (@generosus)


    *** PUBLIC SERVICE ANNOUNCEMENT ***

    I have thoroughly investigated the bots, Bytedance and Bytespider. As a result, I highly recommend blocking them. Main reasons:

    1. They do not respect robots.txt rules.
    2. They are using well-known hosting services to bypass normal blocking channels or methods. For example, they are using AmazonAWS services to crawl websites and mask (or re-route) their origin IPs. Details: https://prnt.sc/7Zst7eUZaxZ3
    3. Their crawling rates are extremely high.
    4. Blocking Bytedance and Bytespider via AS138699 and AS396986 does not help much.
    5. Typically, the bots’ origin IPs geolocation is China. When blocking the bots’ User Agents, the origin IPs geolocation changes to Singapore (another haven for malicious bots or bad actors).

    In short, I recommend blocking the User Agents, Bytedance and Bytespider, for best results.

    Hope this helps a bit.

    Cheers 🙂

Viewing 15 replies - 1 through 15 (of 16 total)
  • Creato

    (@creatomatic)

    Hello, I am intrigued that you posted that 2 days ago because it also came to my attention around this time too. Do we know more about it?

    It created 1.4 million hits to a customer’s website yesterday and 14GB of traffic, mostly originating 110.249.x.x and 111.225.x.x networks.

    It does not appear to honour any caching nor have internal cache nor implement any rate limiting. A shoddy bot by every measure.

    Full UA:

    “Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)”

    Best regards

    • This reply was modified 1 year, 1 month ago by Creato.

    Generosus, thankyou for the warning. I have also registered problems with this bot due to excessive use of resources. I am trying to block it with the following code in the .htaccess file:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.(Bytedance|Bytespider).$ [NC]
    RewriteRule .* – [F,L]

    Do you think this is enough to stop him?
    Thanks in advance

    Thread Starter generosus

    (@generosus)

    Hey @luisdesousa,

    There are several ways you can block the bots. We’re using the following methods:

    1. Cloudflare: Create a WAF custom rule to block the noted User Agents with the keywords “Bytedance” and “Bytespider”
    2. 7G Firewall: Download the file (7G firewall code) and add it to your htaccess file, then add the keywords “bytedance” and “bytespider” in the 7G:[USER AGENT] section of the code. In the future, you can add any other bot you wish to block.

    Note: It sure would be nice if Wordfence offered the same bot-blocking capability as either Cloudflare or 7G Firewall. Currently, it appears there’s no way to custom-block a User Agent via Wordfence.

    Hope this helps a bit.

    Cheers 🙂

    Thread Starter generosus

    (@generosus)

    Update:

    Please disregard above “Note.” User Agents, etc. can be custom-blocked using Wordfence. Details: https://prnt.sc/_Gw9phgNU_sW

    Cheers 🙂

    I’ve also been hit by this bot and banned it, in this case via Cloudflare. I can confirm it ignores robots.txt as it aggressively hoovers up content. Hopefully it doesn’t do away with the user agent, otherwise it’ll be whack-a-mole for a while.

    I have repeatedly tried to block bytespider. I have tried through Wordfence custom block and htaccess as well and it is still hitting the website.

    User Agent – Bytespider – in wordfence

    I notice that when it visits it show Bytespider;, should I also be putting the ; in?

    I tried using the 7G in htaccess too but seems that it blocks the php pages, but all PNGs are going through.

    Any ideas how to stop this? I do not have cloudflare and not sure how to deal with this.

    It hit a website with at least 50.000 requests a day, only images, js and css. I read that is related to a new GPT training.

    I blocked it with cloudflare

    Hi,

    How to block bytedance from Cloudflare.. I tried a lot times from WAF settings but nothing changed.

    They are changing their ip address a lot times and i think impossible to make it via ip address?

    I blocked that user agent from Singapore.

    Very useful information in this thread. Thanks @generosus !

    Thread Starter generosus

    (@generosus)

    All,

    Via Cloudflare, it’s easy to block these User Agents. Simply create a new WAF Custom Rule as noted here, then choose the action: Block.

    Cheers 🙂

    Thomas Jarvis

    (@thomasjarvisdesign)

    @generosus

    Thank you for posting this up.

    Imunify 360 on Plesk automatically blocks Bytedance if you enable serverwide Captcha.

    But it creates thousands of entries.

    The bytedance/bytespider crawler seems to have thousands of IP addresses to attack from. So I also recommend adding it to WordFence blocking by useragent. Because if they do use the USA server you wont really want to block traffic from the USA. But you could arguably do a country wide block on China/Singapore if you dont operate in those countries.

    yes i already blocked time ago ith WAF rule in cloudflare… I also block this and other bots with nginx rule on my server

    Thread Starter generosus

    (@generosus)

    All,

    Here’s another sneaky one from Bytedance: TIKTOK-AS-AP TIKTOK PTE. LTD (AS138699)

    The malicious bot (Bytedance) is also routing its queries through Malaysia.

    Highly recommend blocking the above ASN as well.

    Cheers!

Viewing 15 replies - 1 through 15 (of 16 total)
  • You must be logged in to reply to this topic.