lynx and wget are redirected from blog to robotstxt.org (10 posts)

  1. amigne
    Posted 9 years ago #

    I'm starting a blog over at http://www.djp.ch/blog/

    Display is perfect in regular browser.

    When I get this url with lynx or wget, I'm being redirected to http://www.robotstxt.org/

    Can someone explain what's happening? (This does not happen with other pages on the same website, and there is nothing special in robots.txt)

    Here is a transcript from the wget command:

    gl@malvoisie:~/tmp2$ wget http://www.djp.ch/blog
    --18:49:26-- http://www.djp.ch/blog
    => blog'
    Resolving www.djp.ch...
    Connecting to www.djp.ch||:80... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: http://www.robotstxt.org/ [following]
    --18:49:26-- http://www.robotstxt.org/
    Resolving http://www.robotstxt.org...
    Connecting to http://www.robotstxt.org||:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1,659 (1.6K) [text/html]

    100%[====================================>] 1,659 --.--K/s

    18:49:26 (25.11 MB/s) - `index.html' saved [1659/1659]

  2. Kafkaesqui

    Posted 9 years ago #

    No chance your .htaccess has a redirect occuring to robotstxt.org based on the user agent?

  3. amigne
    Posted 9 years ago #

    My .htaccess was generated by wordpress (see below). Even if I remove the file from the server, I still get the redirect to robotstxt.org. I also get the redirect when I specify index.php in the command wget http://www.djp.ch/blog/index.php

    Content of .htaccess:

    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /blog/
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /blog/index.php [L]

  4. Kafkaesqui

    Posted 9 years ago #

    And there's no .htaccess in your site's root?

    EDIT: The reason I'm harping on .htaccess is that redirecting to robotstxt.org is an old method that sites have used to lock out agents like wget.

  5. amigne
    Posted 9 years ago #

    If have a file /.passwd
    If have a file /.rewrites (owned by root, I don't have read permission)
    No other dot-files in /

    Then all my website is in /web which is mapped to http://www.djp.ch

    I have a file /web/.htaccess (listed above)
    I have a file /web/robots.txt mapped to http://www.djp.ch/robots.txt
    No other dot-files in /web/

    Then wordpress is installed in /web/blog which is mapped to http://www.djp.ch/blog
    I have no dot-files in /web/blog/

    Any suggestion?

  6. Kafkaesqui

    Posted 9 years ago #

    If you're not managing the server, contact your host and ask if they may have something set up that's redirecting certain user agents (i.e. lynx and wget).

  7. amigne
    Posted 9 years ago #

    Ok, thank you for the help Kafkaesqui! I will post an update here.

    Posted 9 years ago #

    I've checked your host's website and some of the sites that share your virtual host's IP addy, they all work fine with lynx. It's just you, from what I can see.

  9. amigne
    Posted 9 years ago #

    My website http://www.djp.ch works fine with lynx, it's just the http://www.djp.ch/blog url that does not work with lynx.

    Something I just noticed:

    http://www.djp.ch/asdlkjalsdfjl (or any nonexistent url)

    shows the content of the wordpress blog in firefox, but redirects to robotstxt.org in lynx and wget.

    Is this some kind of 404 redirection? I don't have a custom 404 page. All 404 errors seem to be redirected to the blog.

    At some point in the past, the wordpress blog was located in a different directory. I changed the url in the wordpress option tab and the url http://www.djp.ch/blog was working even though I did not have a blog directory. Does wordpress do some 404 tricks?

    I still don't understand why http://www.djp.ch/blog/index.php would raise a 404 at all, because that file exists.

  10. amigne
    Posted 9 years ago #

    My host is blocking some user-agents form some subdirectories (like /blog), which is why everything is accessible except wordpress. The problem can be circumvented by using the lynx -useragent command line option (or by renaming blog to something else).

    There is still a 404 redirect to the wordpress directory (/blog) --- I still have no idea how this came to be.

Topic Closed

This topic has been closed to new replies.

About this Topic