• Hey!

    When I started to use your plugin I simply loved it, but I kinda wanted to hide it and all its contents revealing that it’s a WordPress site to the general user browsing with one of the common browsers.
    First I had the idea of doing it via .htaccess but that would have meant a real robots.txt file had to be put in the root folder of the site in order for Apache’s mod_rewrite to redirect to another custom 404 php page. So I came up with the idea of building this feature directly into your plugin.

    What it basically does is check whether or not the user is a bot or a casual user by checking the user agent and if it is a bot it creates the robots.txt as it should and if it’s a user it makes a 404 and sends the user back to the homepage. The user agents are based on this site here: http://www.useragentstring.com/pages/useragentstring.php

    Here is the DIFF:

    --- pc-robotstxt.php
    +++ pc-robotstxt.php
    @@ -74,24 +74,45 @@
    
     		if ( is_robots() ) {
    
    -			$options = $this->get_options();
    -
    -			$output = "# This virtual robots.txt file was created by the PC Robots.txt WordPress plugin.\n";
    -			$output .= "# For more info visit: http://petercoughlin.com/robotstxt-wordpress-plugin/\n\n";
    -
    -			if ( '' != $options['user_agents'] )
    -				$output .= stripslashes($options['user_agents']);
    -
    -			// if there's an existing sitemap file or we're using pc-xml-sitemap plugin add a reference..
    -			if ( file_exists($_SERVER['DOCUMENT_ROOT'].'/sitemap.xml.gz') )
    -				$output .= "\n\n".'Sitemap: http://'.$_SERVER['HTTP_HOST'].'/sitemap.xml.gz';
    -			elseif ( class_exists('pc_xml_sitemap') || file_exists($_SERVER['DOCUMENT_ROOT'].'/sitemap.xml') )
    -				$output .= "\n\n".'Sitemap: http://'.$_SERVER['HTTP_HOST'].'/sitemap.xml';
    -
    -			header('Status: 200 OK', true, 200);
    -			header('Content-type: text/plain; charset='.get_bloginfo('charset'));
    -			echo $output;
    -			exit;
    +			$rawData = $_SERVER['HTTP_USER_AGENT'];
    +			$trueBrowsers = array('Mozilla','Opera','Links','Lynx','Nokia','Samsung','MOT','SonyEricsson','Doris','HTC','Bunjalloo','PSP','wii','Amiga','ELinks','Cyberdog','Dillo','Dooble','Enigma','Galaxy','HotJava','IBM','LeechCraft','NCSA','NetSurf','retawq','Surf','Webkit','Uzbl','Vimprobable','w3m','WorldWideweb');
    +			$mozBots = array('Googlebot','Yahoo! Slurp','Ask Jeeves', 'Twiceler','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)');
    +			foreach($trueBrowsers as $key => $search_browser) {
    +			   if(stristr($rawData, $search_browser) == TRUE) {
    +				if(stristr($rawData, $search_browser) == 'Mozilla') {
    +					foreach($mozBots as $key => $search_moz_bot) {
    +						if(stristr($rawData, $search_moz_bot) == TRUE) {
    +							$isBrowser = FALSE;
    +						}
    +					}
    +				}
    +				else {
    +				$isBrowser = TRUE;
    +				}
    +			   }
    +			}
    +			if ($isBrowser == FALSE) {
    +				$options = $this->get_options();
    +
    +				if ( '' != $options['user_agents'] )
    +					$output .= stripslashes($options['user_agents']);
    +
    +				// if there's an existing sitemap file or we're using pc-xml-sitemap plugin add a reference..
    +				if ( file_exists($_SERVER['DOCUMENT_ROOT'].'/sitemap.xml.gz') )
    +					$output .= "\n\n".'Sitemap: http://'.$_SERVER['HTTP_HOST'].'/sitemap.xml.gz';
    +				elseif ( class_exists('pc_xml_sitemap') || file_exists($_SERVER['DOCUMENT_ROOT'].'/sitemap.xml') )
    +					$output .= "\n\n".'Sitemap: http://'.$_SERVER['HTTP_HOST'].'/sitemap.xml';
    +
    +				header('Status: 200 OK', true, 200);
    +				header('Content-type: text/plain; charset='.get_bloginfo('charset'));
    +				echo $output;
    +				exit;
    +			}
    +			else {
    +				header('HTTP/1.0 404 Not Found', true, 404);
    +				header('Location: '.site_url());
    +				exit();
    +			}
    
     		}// end if

    http://wordpress.org/extend/plugins/pc-robotstxt/

  • The topic ‘[Plugin: PC Robots.txt] Secure robots.txt’ is closed to new replies.