• I’m starting a blog over at http://www.djp.ch/blog/

    Display is perfect in regular browser.

    When I get this url with lynx or wget, I’m being redirected to http://www.robotstxt.org/

    Can someone explain what’s happening? (This does not happen with other pages on the same website, and there is nothing special in robots.txt)

    Here is a transcript from the wget command:

    gl@malvoisie:~/tmp2$ wget http://www.djp.ch/blog
    –18:49:26– http://www.djp.ch/blog
    => blog'
    Resolving www.djp.ch... 84.16.81.39
    Connecting to www.djp.ch|84.16.81.39|:80... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: http://www.robotstxt.org/ [following]
    --18:49:26-- http://www.robotstxt.org/
    =>
    index.html’
    Resolving http://www.robotstxt.org… 216.129.106.114
    Connecting to http://www.robotstxt.org|216.129.106.114|:80… connected.
    HTTP request sent, awaiting response… 200 OK
    Length: 1,659 (1.6K) [text/html]

    100%[====================================>] 1,659 –.–K/s

    18:49:26 (25.11 MB/s) – `index.html’ saved [1659/1659]

Viewing 9 replies - 1 through 9 (of 9 total)
  • No chance your .htaccess has a redirect occuring to robotstxt.org based on the user agent?

    Thread Starter amigne

    (@amigne)

    My .htaccess was generated by wordpress (see below). Even if I remove the file from the server, I still get the redirect to robotstxt.org. I also get the redirect when I specify index.php in the command wget http://www.djp.ch/blog/index.php

    Content of .htaccess:

    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /blog/
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /blog/index.php [L]
    </IfModule>

    And there’s no .htaccess in your site’s root?

    EDIT: The reason I’m harping on .htaccess is that redirecting to robotstxt.org is an old method that sites have used to lock out agents like wget.

    Thread Starter amigne

    (@amigne)

    If have a file /.passwd
    If have a file /.rewrites (owned by root, I don’t have read permission)
    No other dot-files in /

    Then all my website is in /web which is mapped to http://www.djp.ch

    I have a file /web/.htaccess (listed above)
    I have a file /web/robots.txt mapped to http://www.djp.ch/robots.txt
    No other dot-files in /web/

    Then wordpress is installed in /web/blog which is mapped to http://www.djp.ch/blog
    I have no dot-files in /web/blog/

    Any suggestion?

    If you’re not managing the server, contact your host and ask if they may have something set up that’s redirecting certain user agents (i.e. lynx and wget).

    Thread Starter amigne

    (@amigne)

    Ok, thank you for the help Kafkaesqui! I will post an update here.

    I’ve checked your host’s website and some of the sites that share your virtual host’s IP addy, they all work fine with lynx. It’s just you, from what I can see.

    Thread Starter amigne

    (@amigne)

    My website http://www.djp.ch works fine with lynx, it’s just the http://www.djp.ch/blog url that does not work with lynx.

    Something I just noticed:

    http://www.djp.ch/asdlkjalsdfjl (or any nonexistent url)

    shows the content of the wordpress blog in firefox, but redirects to robotstxt.org in lynx and wget.

    Is this some kind of 404 redirection? I don’t have a custom 404 page. All 404 errors seem to be redirected to the blog.

    At some point in the past, the wordpress blog was located in a different directory. I changed the url in the wordpress option tab and the url http://www.djp.ch/blog was working even though I did not have a blog directory. Does wordpress do some 404 tricks?

    I still don’t understand why http://www.djp.ch/blog/index.php would raise a 404 at all, because that file exists.

    Thread Starter amigne

    (@amigne)

    My host is blocking some user-agents form some subdirectories (like /blog), which is why everything is accessible except wordpress. The problem can be circumvented by using the lynx -useragent command line option (or by renaming blog to something else).

    There is still a 404 redirect to the wordpress directory (/blog) — I still have no idea how this came to be.

Viewing 9 replies - 1 through 9 (of 9 total)
  • The topic ‘lynx and wget are redirected from blog to robotstxt.org’ is closed to new replies.