lynx and wget are redirected from blog to robotstxt.org

amigne
(@amigne)

18 years, 1 month ago

I’m starting a blog over at http://www.djp.ch/blog/

Display is perfect in regular browser.

When I get this url with lynx or wget, I’m being redirected to http://www.robotstxt.org/

Can someone explain what’s happening? (This does not happen with other pages on the same website, and there is nothing special in robots.txt)

Here is a transcript from the wget command:

gl@malvoisie:~/tmp2$ wget http://www.djp.ch/blog
–18:49:26– http://www.djp.ch/blog
=> blog' Resolving www.djp.ch... 84.16.81.39 Connecting to www.djp.ch|84.16.81.39|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.robotstxt.org/ [following] --18:49:26-- http://www.robotstxt.org/ =>index.html’
Resolving http://www.robotstxt.org… 216.129.106.114
Connecting to http://www.robotstxt.org|216.129.106.114|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 1,659 (1.6K) [text/html]

100%[====================================>] 1,659 –.–K/s

18:49:26 (25.11 MB/s) – `index.html’ saved [1659/1659]

Viewing 9 replies - 1 through 9 (of 9 total)

Kafkaesqui
(@kafkaesqui)

18 years, 1 month ago

No chance your .htaccess has a redirect occuring to robotstxt.org based on the user agent?

Thread Starter amigne
(@amigne)

18 years, 1 month ago

My .htaccess was generated by wordpress (see below). Even if I remove the file from the server, I still get the redirect to robotstxt.org. I also get the redirect when I specify index.php in the command wget http://www.djp.ch/blog/index.php

Content of .htaccess:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]
</IfModule>

Kafkaesqui
(@kafkaesqui)

18 years, 1 month ago

And there’s no .htaccess in your site’s root?

EDIT: The reason I’m harping on .htaccess is that redirecting to robotstxt.org is an old method that sites have used to lock out agents like wget.

Thread Starter amigne
(@amigne)

18 years, 1 month ago

If have a file /.passwd
If have a file /.rewrites (owned by root, I don’t have read permission)
No other dot-files in /

Then all my website is in /web which is mapped to http://www.djp.ch

I have a file /web/.htaccess (listed above)
I have a file /web/robots.txt mapped to http://www.djp.ch/robots.txt
No other dot-files in /web/

Then wordpress is installed in /web/blog which is mapped to http://www.djp.ch/blog
I have no dot-files in /web/blog/

Any suggestion?

Kafkaesqui
(@kafkaesqui)

18 years, 1 month ago

If you’re not managing the server, contact your host and ask if they may have something set up that’s redirecting certain user agents (i.e. lynx and wget).

Thread Starter amigne
(@amigne)

18 years, 1 month ago

Ok, thank you for the help Kafkaesqui! I will post an update here.

prjg
(@iiiiiiiv)

18 years, 1 month ago

I’ve checked your host’s website and some of the sites that share your virtual host’s IP addy, they all work fine with lynx. It’s just you, from what I can see.

Thread Starter amigne
(@amigne)

18 years, 1 month ago

My website http://www.djp.ch works fine with lynx, it’s just the http://www.djp.ch/blog url that does not work with lynx.

Something I just noticed:

http://www.djp.ch/asdlkjalsdfjl (or any nonexistent url)

shows the content of the wordpress blog in firefox, but redirects to robotstxt.org in lynx and wget.

Is this some kind of 404 redirection? I don’t have a custom 404 page. All 404 errors seem to be redirected to the blog.

At some point in the past, the wordpress blog was located in a different directory. I changed the url in the wordpress option tab and the url http://www.djp.ch/blog was working even though I did not have a blog directory. Does wordpress do some 404 tricks?

I still don’t understand why http://www.djp.ch/blog/index.php would raise a 404 at all, because that file exists.

Thread Starter amigne
(@amigne)

18 years ago

My host is blocking some user-agents form some subdirectories (like /blog), which is why everything is accessible except wordpress. The problem can be circumvented by using the lynx -useragent command line option (or by renaming blog to something else).

There is still a 404 redirect to the wordpress directory (/blog) — I still have no idea how this came to be.

Viewing 9 replies - 1 through 9 (of 9 total)

The topic ‘lynx and wget are redirected from blog to robotstxt.org’ is closed to new replies.

lynx and wget are redirected from blog to robotstxt.org

Tags

Topics

Topics with no replies

Non-support topics

Resolved topics

Unresolved topics

All topics