• PJ Brunet

    (@knowingart_com)


    I could simply rename the file to index2.php, change the lines in .htaccess, but will that break the rest of the code?

    I have bad listings in Google from the previous owner (a cybersquatter) of the domain (who also used an index.php) and I need to make sure that this:

    http://example/index.php?a=cat&cc=20.html

    results in a 404 error. Right now it defaults to the homepage somehow. Is this simply an .htaccess issue? I need real 404 errors for nonexistant pages, otherwise Google will think the previous owner’s pages still exist. Eeek!

Viewing 10 replies - 1 through 10 (of 10 total)
  • I shouldn’t worry about it, I have just tried appending index.php?a=cat&cc=20.html to several wordpress blogs, and all return index.php on its own.
    Google lists your pages fine and in time the pages on your site not linked to will evaporate, it should not be a problem.
    Google does not edit listings down if it thinks they are bad, it only applies its algorithm. Of course it does ban some sites it sees as bad, but since it lists some of your pages are listed this is not the case.
    Keep up the quality content and your ranking will increase.

    Thread Starter PJ Brunet

    (@knowingart_com)

    I thought the same, nearly a year ago. But, this cybersquatter’s porn links are still in the domain’s supplemental listings, almost a year later! I need to have these links removed ASAP but I can’t do that until these invalid pages return 404.

    “Google lists your pages fine and in time the pages on your site not linked to will evaporate, it should not be a problem.”

    You’re wrong, they do not just evaporate. Like I said it’s been almost a year and the porn content still comes up with you do a “site:” search.

    The other issue is that this is a doctor’s website and these supplemental results can scare people if they do a “site:” or related search. Google will probably never recrawl these pages either because nothing links to them.

    Thread Starter PJ Brunet

    (@knowingart_com)

    Well it sounds like this is not an .htaccess issue because I looked at my .htaccess and I didn’t see anything. And like you said, every WordPress blog is doing the same thing so this is probably an index.php issue. So somehow I need to get index.php for return 404’s for invalid parameters? Wonderful. If I get desperate I might just take the whole site down and 404 everything, and hope that I can get these URL’s removed quickly.

    Did you try to have a 404 page/file in your theme?

    The 404 works fine, identical to other wordpress blogs, but when spurious arguments are passed to index.php it just ignores them and KnowingArt_com wants 404 to be returned.
    I did a site: search before and I have done one again and I do not see any porn links in google.
    If you were thinking about taking the whole site down, you could avoid this by moving wordpress to a subdirectory and then creating and then editing your 404 page to explain to visitors where the site had gone. This would then break all the links.

    Thread Starter PJ Brunet

    (@knowingart_com)

    Yes, dje, it sounds like I would have to edit index.php myself to return 404 and that sounds like a lot of trouble. Google says this process to remove a URL takes “3-5 business days” so the I’ll probably rename index.php to index2.php and put up something like a temporary index.html and hope that she doesn’t lose her great position in MSN in those 3-5 days.

    Thread Starter PJ Brunet

    (@knowingart_com)

    Or maybe I can figure out how to get .htaccess to force 404 errors on just the 14 particular index.php?whatever url’s that I’m having a problem with.

    Thread Starter PJ Brunet

    (@knowingart_com)

    Ok, many moons later I have an .htaccess solution that seems to work:

    RewriteCond %{query_string} a=(cat)
    RewriteRule ^(.*) [R=404]

    All the problem URLs in Google had “a=cat” in common. IMHO .htaccess documentation is terrible. Anyway, what I think this does is it matches index.php?whatever with: ^(.*) Now I think that typically “RewriteRule” is used to make an ugly URL look pretty by looking for pretty URLs to make ugly. In this case we are not looking for a pretty URL to make ugly. We’re actually looking for an ugly URL to do nothing with. The “php?blah” problem with RewriteRule and Redirect is that “blah” is ignored, so if you’re a WordPress person that’s probably a bad thing. So you need this “query_string” test to catch the bad URL. The crazy thing about .htaccess is that it looks for a RewriteCond to be above a RewriteRule. It does the first part of the RewriteRule match test, then I think it actually goes up to test the query_string, and if that matches too, then it comes back down to RewriteRule and changes the URL to something else. In this case I couldn’t get RewriteRule to change my bad index.php?a=cat to “bad.html” for instance. I would always get error 500. So my temporary? solution to this problem is to just leave out the parameter for the new filename, so when .htaccess sees index.php?a=cat it says:

    The requested URL /[R=404] was not found on this server.

    Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

    I wish it would say “index.php?a=cat” was not found on this server, but at least I get the 404 error I was looking for. And, the original index.php works as it should, so I’m happy about that. And, in case you are wondering, R=404 tells .htaccess to “redirect” with a 404 error. But, since we are 404’ing there’s probably no reason to redirect anywhere, so I’m guessing that only the regexp match is necessary and the second parameter in this case should? be left out. This is not exactly clear in the .htaccess documentation so I’m a little upset about that. If you know of a more elegant solution I would love to hear it.

    In any case, I have the bad URLs pending removal in Google’s system. It takes 3-5 days to clear supposedly. I used the “Remove an outdated link.” option inside the Remove URL tool. Here is the link you need to remove your URL:

    http://services.google.com:8882/urlconsole/controller

    First you need to verify your email address. I’m not sure if the email address needs to match the domain of the website you are removing URLs from. We’ll see. Then you log in and submit the URLs you want removed. When you are all done it has an option to show you a list of the URLs that are pending removal. I sent this pretty list to the owner of the domain and I’m sure she will appreciate it.

    So, last week this website (which has been registered for years probably) had a PR0 and was nowhere to be found in the SERPs. That seemed crazy to me and that PR0 is why I have worked so hard to get this domain ranked. It could be a coincidence, or not, but as of last night she has a PR4 and is ranked #11 for the keywords that she wants. I discovered this when last night I used the Google Sitemaps tool. I verified my “ownership” of this domain and that’s when I saw that she’s ranked #11. After that I did a check and noticed that she’s PR4. Did the “remove tool” fix this domain, was it that I “verified” the domain in Sitemaps, or was it simply random luck that the Googlebot *finally* got to her page after all this time? God knows. And no, I did not generate or use a “sitemap” for this domain. But I encourage everyone to try the Google Sitemaps tool because it is very cool and will provide you w/ *very* useful information about your domain, such as a list of all the keywords that you are ranked for in Google’s index. Did my “claiming” this domain in Sitemaps w/ a meta tag get her from PR0 to PR4? I have no idea. Anyhow, you get my drift. All the work has finally paid off. I’m just recording most of what I did for posterity so that maybe someone else that has had a cybersquatter problem can repair the damage without going through what I had to go through.

    Thread Starter PJ Brunet

    (@knowingart_com)

    “I did a site: search before and I have done one again and I do not see any **** links in google.
    If you were thinking about taking the whole site down, you could avoid this by moving wordpress to a subdirectory and then creating and then editing your 404 page to explain to visitors where the site had gone. This would then break all the links.”

    For the record, this problem had nothing to do with KnowingArt.com so a “site:KnowingArt.com” wouldn’t tell you anything. I am deliberately not mentioning her website’s URL because she’s a doctor and she probably wants to distance herself from this cybersquatter episode as much as possible.

    And, I decided not to rename or move her URLs because she’s ranked #3 in MSN for the keywords she wanted. I couldn’t risk a bot finding her page down (or changed) in those 3-5 business days that Google requires to remove URLs. I was even considering some kind of “Under Construction” page, or even a “non-interactive” version of her WordPress PHP pages, but again, I thought it was too risky to drastically change the pages’ contents.

    Thread Starter PJ Brunet

    (@knowingart_com)

    Update: Big mistake. I added my .htaccess code inside the # BEGIN WordPress block and my changes were erased by WordPress, and so my application to remove the URLs was denied. Of course I tested my code first, but I didn’t anticipate that WordPress would remove my changes later in the week. Fortunately, it looks like Google is giving me another chance, so I had to submit all 13 URLs again.

    Read this carefully…

    http://codex.wordpress.org/Using_Permalinks

    A few notes about creating and editing your .htaccess file:

    * WordPress will play nice with an existing .htaccess and will not delete your existing rules
    * If you have other mod_rewrite rules, they should go before WordPress’ rules

Viewing 10 replies - 1 through 10 (of 10 total)
  • The topic ‘How can I get “index.php?a=cat&cc=20.html” to result in a 404 error.’ is closed to new replies.