• In checking out the log file for my WP blog – CVS version I came across something strange.
    I have the SE friendly feature activated via the mod_rewrite system that WP has – a great feature BTW.
    Everything is working like it should. ie all of the topics and monthly archive links look like a non php path to a subdirectory and googlebot has come by and chewed down deep on all of the links.
    The weird thing is that the log shows googlebot was also looking for the SE Friendly links BUT with an “index.php” in the line after the SE Friendly directory slash /
    Pasted below is a log example of both an SE Friendly link that did get indexed by googlebot and a link with the mysterious addition of “index.php” that got a Failed, 404 – Not Found error and was not indexed.
    Where is the addition of the “index.php” comming from? Does mod_rewrite go weird sometimes? Is googlebot itself adding the “index.php” to see if we are fooling it with a SE Freindly feature? Is googlebot thinking that there are TOO many “directories” with big long hyphenated names at this site? Or is this just a fluky glitch?
    Address: 64.68.82.137
    Browser: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
    Protocol: HTTP/1.0
    Date: Fri Jan 02, 2004
    —————————————————————————-
    07:57:44 GET 15.12K /archives/2003/11/23/
    how-to-read-a-report/
    07:58:56 GET 12.32K /archives/2003/12/30/
    newfoundlanders-go-in-on
    line-venture/
    —————————————————————————-
    Address: 64.68.82.142
    Browser: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
    Protocol: HTTP/1.0
    Date: Fri Jan 02, 2004
    —————————————————————————-
    08:07:35 GET 0.0K /archives/2003/10/28/
    school-cooks-share-secret-recipies/index.php
    Failed, 404 – Not Found

Viewing 3 replies - 1 through 3 (of 3 total)
  • Thread Starter Tons

    (@tons)

    Sushubh – WP’s SE Friendly URLs should make google love your site!
    I had a good look at my log file and saw that googlebot did a deep crawl and indexed all of the subject links and all of the archives links but then continued on with the “index.php” mysteriously added to all of the former link lines ending with the / and googlebot got all file not found errors.
    Is googlebot figuring out that this feature is trying to mask the former unedible “php?” etc links or am I am I just being SE Paranoid? 🙂
    I guess I’ll know in a few days when google does the usual directory rebuild and I see if the pages are listed or if the site is banned.

    u cant trust google… they are crazy people… they can come up with anything…
    let’s hope SE friendly URLs are not blocked by google.
    they are pretty easy to catch though…

    I can’t wait for 1.0 to be released! So many yummy new features! 🙂

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘mysterious “index.php”’ is closed to new replies.