its in your database.
all your posts, etc go in your database. No actual physical files exist. You can find that on the godaddy interface too… under databases-mysql
Umm. WordPress pages and directories are virtual and not actual physical items. When you visit a post, page or archive it load the appropriate template from your theme and then get the content from you database. No actual directories or physical pages are created.
Then how is Teleport Pro Webcrawler finding them? And how is Google supposed to list my blog posts in their search engines if there is no .html for the post….
Google and other search engines see your site just like anyone else does. They visit your url and your server puts it all together and serves it to them. And they index it.
Even though they are virtual, search engines see them no differently than actual physical pages and directories.
ohhh I see now.
While on the subject of google: Is there a way to “Disallow all” in robots.txt with GoogleBot as an exception. Allow “googlebot” and then after that “Disallow all” or something?
User-agent: Adsbot-Google
Disallow:
User-agent: Googlebot
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /
OR
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot
Allow: /
User-agent: Mediapartners-Google
Allow: /
User-agent: *
Disallow: /
Yeah you can. Something like:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
Will this work:
User-agent: Adsbot-Google
Allow: /
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
User-agent: Googlebot
Allow: /
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
User-agent: Mediapartners-Google
Allow: /
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
User-agent: *
Disallow: /
Will that allow all except for the Disallow’d on Google’s bots?
Or does the Disallows interfere with the Allow: / ?
If you want to test your robot rules, sign up for Google Webmaster Tools and use their functionality to check them out from the bot perspective.
The last one should be first