• Resolved TheSmartOne

    (@thesmartone)


    Hi,

    Your plugin is what I needed after some days of search. It seems and works awesome.
    Great support, guys like you rock.

    <sharing my issue with other so it can help>

    In short:
    Can you help me make a working expression to 404 all URLS ending in .jp
    Can you help me make a working expression to 404 all URLS with keyword u_b in the URL?

    Story (interesting)

    Due to a vulnerability in an old joomla version our chinese friends have been able to have had access to my Webmaster Tools console as owner, via dropping a verification file in the root. Wit that access submitting thousands of spamlinks to be indexed, which where copied to my server too.

    After dropping the bad bad idea to try to upgrade joomla (years behind, paying plugins) I migrated sucessfully to wordpress (should have done it ages ago)
    Webmaster Tools succesfully indexed 4000 links, and google search and cache is just full of this chinese garbage.

    I cleaned all, rebmitted a sitemap, but google does not allow you to BULK unregister SPAM links. Great. One by one it does.

    More annoying: wordpress seems to trow a 200 (OK) or a 301 (redirect) > 200 (OK) to any of this spam links, and apparently almost anything I add to the URL. Also with plugins disabled, I was unable to generate 1 404 it all with to the search > not gound (but 200 (OK) So Google thinks its still valid. Amepty, but valid. damned.

    For the crawling I excluded it in robots.txt such:
    Disallow: *.jp* #block access to crawlers for URLS containing .jp

    More ideally would have it crawled and get a noindex but not if this is possible, since that is valid for migrating pages.

    Next step, make wordpress bloody trow a 404 for all spam URLS except my normal pages and posts

    I edited my 404.php to be completely empty and added via notepad and plain text
    404 – page not found
    , so no template, archive, author tag links etc are found. The oldest 404 page on the internet to make sure google understand the links are not here anymore.

    Spam link Type 1:
    send similar urls to 404: Adding the whole line worked like a charm:
    http://www.example.com/?bqp-kd6014c/u_b-jbc/u_b-jClsQa-a2b7b.jp
    so I sent /?bqp-kd6014c/u_b-jbc/u_b-jClsQa-a2b7b.jp to 404 (server response!)

    Tested with http://tools.seobook.com/server-header-checker/ Bam, it trows 404 (Not Found), it opens my 404.php within my theme folder, and shows in the browser as http://www.example.com/404.php with plain text:
    404 – page not found.
    Success!

    Now my issue: I could not get the .jp ending working: *.jp / *.jp*, / ?*jp / …
    – Can you help me make a working expression to 404 all URLS ending in .jp
    – Can you help me make a working expression to 404 all URLS with keyword u_b in the URL?

    Spam link Type 2: (exactly like below I can find in google search…, 2 domains in 1 URL. How can google index this?

    http://www.example.com/http:/101.99.65.250:8086/wp-content/themes/bookstore/tgu87e1/bnfn7me.php?5wrDqpE_f_WEe6014c/u_b-j5c/u_b-jrHuFvTmdEeinAheKfeE98e94.jp

    Note: URLS ending in .jp will cover both spam links, but the keyword can be useful for me, and others later.

    Thinking about donating :))

    • This topic was modified 5 years, 8 months ago by TheSmartOne.
Viewing 5 replies - 1 through 5 (of 5 total)
  • Thread Starter TheSmartOne

    (@thesmartone)

    So anyone that understand regex to exclude urls with the word u_b ?

    You could try the following:

    For URLs ending in .jp:
    Source: /.*\.jp$

    For URLs including u_b:
    Source: /.*u_b.*

    When matched: Error (404) with HTTP code: 404/403… (take your pick)

    Thread Starter TheSmartOne

    (@thesmartone)

    Hi @cbrandt

    Working like a charm mister!!
    I used the u_b since the .jp was excluding .jpg and .jpeg

    As a reference for others I chose:
    SOURCE URL: /.*u_b.*
    TITLE: (optional) My China sucks redirect
    MATCH: URL only
    WHEN MATCHED: Error (404) WITH HTTP CODE: 410 – Gone
    GROUP: Redirections (or any other group name you chose/created) POSITION 0

    I already had added a redirection for all spam URLS (+- 300), and just noticed that slowed down my site up to getting a timeout. So I disabled those, keeping just 1 rule.

    Result: see server response check below

    I chose 410 – Gone since 404 can be a temporary error, and 403 might mean the page exists, but google has no access. With 410 I’m indicating this spammy link is Gone – forever.

    I have removed the *.u_b* from my robots.txt so google can crawl all spam links and see they dont exist, and remove forever from index and cache.

    I’ll donate for this kick-ass plug-in. Thanks guys.

    Server response check using http://tools.seobook.com/server-header-checker

    > GET http://www.mysite.com/?jsdfjfdfu_bdsdd.jp HTTP/1.1
    > Host:
    > User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246
    Response
    < HTTP/1.1 410 Gone
    < Date: Tue, 05 Feb 2019 18:05:38 GMT
    < Server: Apache/2
    < Pragma: no-cache
    < Expires: Wed, 11 Jan 1984 05:00:00 GMT
    < Cache-Control: no-cache, must-revalidate, max-age=0
    < X-Redirect-Agent: redirection
    < Link: ; rel=shortlink
    < Set-Cookie: PHPSESSID=e3ptr3b73a84deo4qbpvv2h6m6; path=/
    < Vary: Accept-Encoding,User-Agent
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8

    My other pages nicely show a 200 OK response

    Final response
    < HTTP/1.1 200 OK
    < Date: Tue, 05 Feb 2019 18:05:38 GMT
    < Server: Apache/2
    < Expires: Thu, 19 Nov 1981 08:52:00 GMT
    < Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    < Pragma: no-cache
    < X-Pingback:
    < Link: ; rel=shortlink
    < Set-Cookie: PHPSESSID=kp1nrjh7d7ehap936b5md6nl34; path=/
    < Vary: Accept-Encoding,User-Agent
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8

    • This reply was modified 5 years, 8 months ago by TheSmartOne.

    I’m glad it worked. And curious about

    used the u_b since the .jp was excluding .jpg and .jpeg

    Are you sure you didn’t forget the dollar sign ($) right after “jp” in the suggested regex?

    Thread Starter TheSmartOne

    (@thesmartone)

    Correct, you are right, I was talking about the exclusion I made in robots.txt before
    Since I was lucky to have 2 patterns in the spam links I used *u_b*, instead of .jp* there.

    So for the redirection rule, I also opted to just use your regex for u_b, which works flawless.

    I’m pretty sure your solution of jp will work too, just haven’t tested it yet.
    Thinking about writing an article about it 🙂

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Redirect chinese spam links to 404 endig in *.jp’ is closed to new replies.