Hi,
I noticed that the /robots.txt that is automatically generated by WP (to allow or disallow site crawling by search engines) is only generated when there are published posts. On any install that contains only pages and no posts (or only concepts) visiting /robots.txt will result in a WP generated 404 error page.
Is this a bug?
Allard
are you using some plugin? because wordpress does not update robots.txt natively
hmmm.. that is weird.
i am not aware of any of my plugins to do such a thing! let me check...
no, with all plugins switched off, i still see
User-agent: *
Disallow:
when i visit my domain root appended with /robots.txt in a browser. if i change the public status of my blog (under Privacy) to 'Block search engines...' it changes to
User-agent: *
Disallow: /
(which seems logical)
however, if change the status of all posts to 'Draft', visiting myblogs.url/robots.txt results in a 404 page.
ahhh..ok...my apologies - it does in this case
admin - settings - privacy
go here and set your blog for:
"I would like my blog to be visible to everyone, including search engines (like Google, Sphere, Technorati) and archivers"
well, yes. that's what i have selected normally.
but my point is: when there are NO posts with status "published" on the blog (only pages) there is NO robots.txt content generated, no matter which option is selected on Settings > Privacy.
what's the problem? you'd ask... i am working on a small sitemap plugin that automatically adds the sitemap url to the generated robots.txt content. however, if someone uses WP as a CMS with only pages (and no posts; which happens quite often i am sure) the is no auto-generated robots.txt available.
pretty sure it's a bug. or am i mistaken?
you need to create your own robots.txt file
wordpress only creates it when you have it set not to be visited by search engines
adding posts and such will never update a robots.txt
http://codex.wordpress.org/Search_Engine_Optimization_for_Wordpress#Robots.txt_Optimization
strange... in my experience WP always generates a robots.txt whether visibility is set to exclude search engines or not! it just changes the content from
User-agent: *
Disallow:<code>to</code>User-agent: *
Disallow: /
except... except when i have no posts (just pages) on the blog. in that case, there is no robots.txt generated but a 404 shown.
blovett
Member
Posted 4 weeks ago #
I can confirm this behavior. The virtual robots.txt file is not generated until you have posts. This is independent of your privacy settings.
As a workaround, you can create a placeholder post that is privately published. Or, skip the virtual file altogether by creating a real robots.txt file.
ahhhh... finaly! thanks blovett, for confirming. i was thinking i was mad ;)
that private post solution is a good tip.