Viewing 13 replies - 1 through 13 (of 13 total)
  • do you want to know what its for?

    http://www.robotstxt.org/

    or what ppl typically put it in?

    Its a browsable file, just look for some on the net:

    http://www.google.com/robots.txt

    Thread Starter h8dk97

    (@h8dk97)

    No, sorry I should’ve explained it better. What do you usually put in your robots.txt for your wordpress installation, what directories / pages do you usually exclude? Apart from wordpress I’ve also got gallery and I don’t want to make a mistake excluding directories and / or pages that may actually be needed.
    Thanks.

    look at mine.. πŸ™‚ the easiest way to answer that question is just to check out some, like I already suggested.

    Note that the only sites that are going to have them are ones where the site op is paying attention to their logs, or is actually cognizant of what a robots.txt is for and it’s effects.

    Be careful with what you put in your Robots.txt file though. For instance, whooami, I don’t know if you realized it or not, but you exposed your downloads URL in your Robots.txt file:

    http://www.village-idiot.org/file-downloads/

    Too bad you didn’t have any nice porn:(

    umm yeah, whats your point? click it..its no more ‘exposed’ that it is being linked off every page on my site.

    Thats the entire point OF a robots.txt, to exclude directories from being spidered .. how do you think you do THAT without listing them.. I DONT want that directory spidered.

    And guess what, google doesnt spider it:

    http://www.google.com/search?hl=en&q=file-downloads+village-idiot&btnG=Google+Search

    too bad you dont understand that, and/or dont read links that provide explanations. (tech support, lemme guess, level one)

    .. and just so folks that are clicking that link (i see the traffic) understand .. the redirect to my front page is intentional. That directory cannot be linked to either off any domains other than those that Ive expressly allowed, wordpress.org isnt one of them, for obvious reasons.

    Also, I apologize for being ‘rude’ to the guy up there. However its frustrating as hell to see someone pipe into a thread with completely misleading remarks and/or info.

    “too bad you dont understand that, and/or dont read links that provide explanations. (tech support, lemme guess, level one)”

    Damn whooami! Chill out! Flying off the handle like that is one of the main reasons why people (like myself) choose NOT to contribute to free tech support. Ridiculed when we’re only trying to help? If I was incorrect with what I had said, then my bad, but that gives you no reason, nor right to react and comment the way you did! Seriously, how many forums to you actively participate in?

    That all said, I was merely offering valid input to people that are either new to web development or could possibly (for some reason) not realize something very important. Correct me if I’m wrong, but what I’ve read about the robots.txt file is that unless access to your robots.txt is password protected, when you list directories in your robots.txt file, you are exposing these directories to the public. In other words, if you do not want your average person finding out about a directory, then be sure to take extra precautions to prevent access to these directories.

    FYI, I’m not a web developer, nor am I “level 1” tech support, so keep your guessing cuz I find it quite humerous:)

    Moderator Samuel Wood (Otto)

    (@otto42)

    WordPress.org Admin

    …unless access to your robots.txt is password protected…

    Uhhh.. What would be the point of that? Having robots.txt password protected completely nullifies the reason for having it in the first place.

    …you are exposing these directories to the public…

    Unless you take positive action to deny public access, you’re always exposing things to the public on a website. Just having a hidden directory name that is not linked or listed anywhere is not sufficent protection.

    The point of a robots.txt is to tell what directories should not be crawled or indexed by spiders. This isn’t telling them where the directories actually are, because they already know that from spidering links around the rest of your site. A robots.txt tells a good spider to not even *try* to follow those particular links.

    That all said, I was merely offering valid input to people that are either new to web development or could possibly (for some reason) not realize something very important. Correct me if I’m wrong, but what I’ve read about the robots.txt file is that unless access to your robots.txt is password protected, when you list directories in your robots.txt file, you are exposing these

    Quite honestly, given your obvious lack of knowledge on this topic, anyone reading is better off without you posting. Again, my first reply in this thread provided a link that would happily correct if youre wrong.. but youve obviously not checked it … yet.

    So lets..

    valid input

    Its not valid, because youre wrong. Hence my responses. And you just keep repeating stuff thats not valid too.

    not realize something very important..

    Its not important because youre wrong in your premise. And thats precisely why I provided a link that explained its purpose more clearly. SO that someone reading this thread would understand its NOT to be used for “hiding” a directory.

    ..password protect..

    Where have you read this? Please feel free to share a link that suggests password protecting the robots.txt file.

    Then lets apply some common sense: And how is it that you think that a robot is going to going to “read” the robots.txt once youve password protected it?

    Thats patently false.

    if you do not want your average person finding out about a directory,

    Thats not the point of a robots.txt file. Otto explained, the link I provided explained (had you read it..)

    Again, patently false, given the context of this thread.

    What you dont seem to be grasping is that the point of a robots.txt is NOT to hide the directory, its to prevent spidering, which is entirely different.

    Here is a scenario that might bring this down to something more understandable for you:

    The link you posted .. the one on my domain. Its linked off all but very few (2-5 pages) of my site. Its no secret. I take no issue with anyone looking at it.

    BUT …

    Given the dynamic nature of it’s and any subdirectory’s content, I dont want Google to index it.

    Why? Because theres a very good chance someone will follow an outdated link.

    Because I want the user to have to see the directory in its entirity rather than through the lens of Google’s truncated listing.

    Correct me if I’m wrong …

    Consider yourself corrected.

    God whooami. I guess netiquette or even just simple common courtesy isn’t part of your vocabulary. Merry Christmas to you! I do feel sorry for those that have to be around you on a daily basis.

    That aside, Otto42, thank you for being kind with your constructive criticism:) I am very well aware of the purpose of the robots.txt file, my apologies if I wasn’t more clear about this in the first place.

    I was in fact mistaken however with what I had said regarding password protecting the robots.txt file. Where my confusion came from was when I had originally uploaded my robots.txt file to my host’s server and tried to access it via the URL, it asked me for a username/password. I thought just as you had mentioned, how in the hell does a crawler/robot see it then? When I logged into Google, however, and entered the URL for my robots.txt file, I was able to see it in the Google Webmaster’s Tools panel. I was pretty confused about the whole thing, but I still contacted my host and told them about it, and requested they publicly expose the file, because I didn’t want to take any chances.

    RoboPower,

    This thread isnt about me or you, it’s about the correct usage of a robots.txt file.

    That said, what does your version of netiquette tell you about posting a url on a public forum that you “thought I was trying to hide”? I guess you missed the fact that I opted to NOT question why you would do such a thing?

    As for my courtesy, my responses to you were no less courteous than Ottos:

    Uhhh.. What would be the point of that?

    In fact, the only personal remarks Ive made regarding you questioned your lack of reading, as evidenced by your replies.

    And I even went so far as to apologize. And yet you continue, apparantly still w/o reading.

    On the other hand.. we have your remarks.

    …I celebrate christmas with my family, we have a great time. My mom is intelligent and articulate.

    As am I.

    I’ll be sure to tell her you said hello.

    Man, you are just the little kid in the playground that always has to have the last word huh? As you had stated, “This thread isnt about me or you, it’s about the correct usage of a robots.txt file.”

    The user posted in this thread because he/she had a question about what they should put in a robots.txt file. So let’s drop our egos and help this person out, ok? You seem to be the expert on robots.txt whooami, so how about some help? You provided those links, but it doesn’t really give too much info on what to/not to put in there. I can’t help much with the answers, but I can with the questions, because I’ve seen the same questions go unanswered in many forums.

    I believe the users original question was,

    “What do you usually put in your robots.txt?”

    I also believe that they were concerned about what they should/shouldn’t block because of, and I’m taking a shot out in the dark here, to not hinder, but improve SEO? So, if this is the case, what types of files/folders should we disallow to improve SEO? Is it better or worse for SEO if we allow all directories/pages to be read?

    This is the only decent article that I’ve found on this topic, but it’s a few years old and doesn’t say too much about it:
    http://www.seotoday.com/browse.php/category/articles/id/230/index.php

    Let’s try another question, say I disallow an images folder. Does this cause the robot to break/screw up when it tries to read a page with images from that folder?

    Should I disallow all of my folders containing my php files? What would be the result of me not doing so?

    I can go on and on with these types of questions, but I’m sure this will keep anyone who wants to share their expert advice on, busy:)

    As this really isn’t about robots.txt anymore, I’m closing it.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘robots.txt’ is closed to new replies.