• We’ve noticed something weird in the search statistics for “Relevanssi User Searches”. More than half of the top searches have the string “no-result:” in them. And not once, but multiple times ! Really weird.

    See a screenshot of User Searches here:
    http://hhwsmit.home.xs4all.nl/misc/noresult.jpg

    I’m not sure, but I think this didn’t happen before Relevanssi was installed.

    I did a little googling, and it seems more people have seen this problem. In fact, I found many search results for many websites that had stacking “no-result:” in the URL.

    Check out this search on Google:
    http://www.google.nl/search?q=boomoren

    The first few results are all links to our website. But the 4th result is actually a “no-result:” link !
    http://www.aziatische-ingredienten.nl/?s=no-results:no-results%3Ano-results%3Aboomoren&cat=no-results

    Something seems wrong. I’m not sure whether it has to do with Relevannsi. But it certainly seems to have to do with WordPress, searching, and google indexing.

    As a workaround, I have added a line to our robots.txt file.
    Disallow: /?s=
    I hope that will prevent the Google crawlers from recursively following search links on our site. But it feels like a hack.

    Anyone who knows what exactly is going on ? I’m a WordPress noob. If the workaround in the robots.txt file works, my problem is solved. But I think it might be nice to find the root cause, and fix it for other WordPress users out there.

    Any suggestions where to look ? Any more information I can supply ?
    Thanks in advance,
    Gryz.

    http://wordpress.org/extend/plugins/relevanssi/

Viewing 10 replies - 1 through 10 (of 10 total)
  • Plugin Author Mikko Saari

    (@msaari)

    Well, I can tell that this isn’t caused by Relevanssi. Are you sure it’s not some other plugin?

    Not allowing Google to index search results is probably a good idea, I remember reading from Google guidelines that having search results pages indexed by Google is a bad thing.

    Thread Starter Gryz

    (@gryz)

    After posting here, I realized that this support-forum isn’t about Relevanssi, but about plugins in general. Sorry about that. But if Relevanssi isn’t the cause, then this is maybe a correct place to ask ?

    So what is causing this ?
    The WordPress setup we have is nothing special. Only a handfull of plugins. And our site is clearly not the only site that has this problem.

    We have had the line:
    Disallow: /search
    in our robots.txt file for a long time. I guess that isn’t enough, and we need /?s= in there as well. But I rather see the root-cause disappear than just having every WordPress user in the world change his robots.txt file manually.

    Thread Starter Gryz

    (@gryz)

    Is there a way to see where searches are coming from (IP address or domainname). When I reset the Relevanssi logs, I’m getting log-entries with no-results: in the query-string within minutes. I can’t believe it’s google’s webcrawler that is so quick to crawl my website. However, if it’s not google, then how does google pick up those bogus URLs ?

    Also, when I disable Relevanssi, is there a way for me to see the query-strings that get processed by the default WP-search engine ? That would allow me to prove to myself that the bogus searches also happen when Relevanssi is disabled.

    Thread Starter Gryz

    (@gryz)

    I figured out how google can pickup weird search-URLs. We are running google-analytics. When some broken (or weird) site is generating those searches with the nested no-result: queries, the resulting page will trigger google-analytics. And google will be notified about the existance of the no-result: page. Maybe google uses that information in their page-ranking algorithms ? Not sure if this is what happens, but it could explain one part of the puzzle.

    Plugin Author Mikko Saari

    (@msaari)

    I don’t know, maybe if you build a filter function that triggers from the_posts hook and saves the queries? Don’t know, that’s where Relevanssi is inserted.

    Thread Starter Gryz

    (@gryz)

    Thanks for the suggestion. I’m an old C-programmer who used to write C-code for networking devices. I have no knowledge about php, and I’m not sure I wanna check out all WP code to see how it hangs together. I was hoping for a log-function of WP, where I can just go through all http-requests. Maybe I’ll see if I can write some code.

    I’ve grepped through all the php-code. The only place where I could find the exact string “no-results:” was in the google analytics code.
    From googleanalytics.php:

    } else if ($wp_query->is_search) {
    $pushstr = “‘_trackPageview’,'”.get_bloginfo(‘url’).”/?s=”;
    if ($wp_query->found_posts == 0) {
    $push[] = $pushstr.”no-results:”.rawurlencode($wp_query->query_vars[‘s’]).”&cat=no-results'”;
    } else

    It looks like the string “no-results:” is pre-pended to the search-string. This seems like a place where excessive no-results: could be prepended.

    I disabled google analytics for a few minutes on our website, and I still saw new searches with the mangled query. 🙁 It’s very weird. I’ll look into it again this weekend.

    The problem is happening at many sites.
    When searching on google for “no-results:no-results:” I’m getting 15.8 millions results ! Although google only gave me 355 results. Still doesn’t look good. I’m surprised nobody ever looked at this before.

    http://www.google.nl/search?complete=0&q=%22no-results%3Ano-results%3A%22

    Thread Starter Gryz

    (@gryz)

    It turns out our webhost keeps a logfile with all HTTP requests.
    I can see that many of the “no-results” queries are from Googlebot.
    I now also understand why the
    Disallow: /?s=
    line in robots.txt didn’t work. It turns out Google does queries for
    /page/3/?s=no-results:no-results:<etc>
    /page/8/?s=no-results:<etc>

    So I added another line to robots.txt.
    Disallow: /page/*/?s=
    I hope the ? and = characters are not special characters, like * is.

    Msaari, if you are still reading this.
    I have a small suggestion.
    Maybe you can include a line:
    <meta name=”robots” content=”noindex”>
    in all result-pages from searches ?
    I don’t think people want dynamic search results indexed in search engines anyway. So if wordpress/relevanssiwould include the “noindex” tag in all search results, that could prevent problems ?

    Thread Starter Gryz

    (@gryz)

    One more update.

    I couldn’t believe something was wrong in the google-analytics code. Google isn’t that sloppy. But then I realized that that code does not come from Google. It is part of the “Google Analytics for Wordpres” plugin. And the code is written by a volunteer from the WP community, not by Google.

    When I looked at a lot of those websites that had the same problem, I noticed they were all using the “Google Analytics for Wordpres” plugin. So this plugin might very well be the cause of the problems.

    I disabled it, and replaced it with another plugin.
    Ultimate Google Analytics
    Let’s hope this fixes the root of the problem.
    We should know in a few days. I’ll update this post.

    So there’s three parts to the solution.
    1) Use a different GA plugin.
    2) Add rules to robots.txt to prevent googlebots crawling for old malformed search-URLs.
    3) Wait for old search-URLs to depricate from the google database.

    If this turns out to fix the problem, I wonder if I should notify parties involved. Would Google remove the bogus 18 million entries from their database ? Should I contact the author of the GA for WordPress plugin ?

    Anyway, Relevanssi had nothing to do with this. It was a very useful tool to warn us that something was wrong. Thanks for all your effort, msaari !

    Plugin Author Mikko Saari

    (@msaari)

    Relevanssi doesn’t do any changes in the search results template, I’ve so far left that under user control, also many people use all sorts of SEO plugins and good SEO plugin will cover that. But yeah, maybe I could add it the meta noindex field as an option.

    Gryz, did you happen to solve this problem?

    Sander

Viewing 10 replies - 1 through 10 (of 10 total)
  • The topic ‘[Plugin: Relevanssi – A Better Search] Stacking no-result:no-result:no-result … queries by Google’ is closed to new replies.