WordPress.org

Ready to get started?Download WordPress

Forums

Relevanssi - A Better Search
Stacking no-result:no-result:no-result ... queries by Google (11 posts)

  1. Gryz
    Member
    Posted 2 years ago #

    We've noticed something weird in the search statistics for "Relevanssi User Searches". More than half of the top searches have the string "no-result:" in them. And not once, but multiple times ! Really weird.

    See a screenshot of User Searches here:
    http://hhwsmit.home.xs4all.nl/misc/noresult.jpg

    I'm not sure, but I think this didn't happen before Relevanssi was installed.

    I did a little googling, and it seems more people have seen this problem. In fact, I found many search results for many websites that had stacking "no-result:" in the URL.

    Check out this search on Google:
    http://www.google.nl/search?q=boomoren

    The first few results are all links to our website. But the 4th result is actually a "no-result:" link !
    http://www.aziatische-ingredienten.nl/?s=no-results:no-results%3Ano-results%3Aboomoren&cat=no-results

    Something seems wrong. I'm not sure whether it has to do with Relevannsi. But it certainly seems to have to do with WordPress, searching, and google indexing.

    As a workaround, I have added a line to our robots.txt file.
    Disallow: /?s=
    I hope that will prevent the Google crawlers from recursively following search links on our site. But it feels like a hack.

    Anyone who knows what exactly is going on ? I'm a WordPress noob. If the workaround in the robots.txt file works, my problem is solved. But I think it might be nice to find the root cause, and fix it for other WordPress users out there.

    Any suggestions where to look ? Any more information I can supply ?
    Thanks in advance,
    Gryz.

    http://wordpress.org/extend/plugins/relevanssi/

  2. Mikko Saari
    Member
    Plugin Author

    Posted 2 years ago #

    Well, I can tell that this isn't caused by Relevanssi. Are you sure it's not some other plugin?

    Not allowing Google to index search results is probably a good idea, I remember reading from Google guidelines that having search results pages indexed by Google is a bad thing.

  3. Gryz
    Member
    Posted 2 years ago #

    After posting here, I realized that this support-forum isn't about Relevanssi, but about plugins in general. Sorry about that. But if Relevanssi isn't the cause, then this is maybe a correct place to ask ?

    So what is causing this ?
    The WordPress setup we have is nothing special. Only a handfull of plugins. And our site is clearly not the only site that has this problem.

    We have had the line:
    Disallow: /search
    in our robots.txt file for a long time. I guess that isn't enough, and we need /?s= in there as well. But I rather see the root-cause disappear than just having every WordPress user in the world change his robots.txt file manually.

  4. Gryz
    Member
    Posted 2 years ago #

    Is there a way to see where searches are coming from (IP address or domainname). When I reset the Relevanssi logs, I'm getting log-entries with no-results: in the query-string within minutes. I can't believe it's google's webcrawler that is so quick to crawl my website. However, if it's not google, then how does google pick up those bogus URLs ?

    Also, when I disable Relevanssi, is there a way for me to see the query-strings that get processed by the default WP-search engine ? That would allow me to prove to myself that the bogus searches also happen when Relevanssi is disabled.

  5. Gryz
    Member
    Posted 2 years ago #

    I figured out how google can pickup weird search-URLs. We are running google-analytics. When some broken (or weird) site is generating those searches with the nested no-result: queries, the resulting page will trigger google-analytics. And google will be notified about the existance of the no-result: page. Maybe google uses that information in their page-ranking algorithms ? Not sure if this is what happens, but it could explain one part of the puzzle.

  6. Mikko Saari
    Member
    Plugin Author

    Posted 2 years ago #

    I don't know, maybe if you build a filter function that triggers from the_posts hook and saves the queries? Don't know, that's where Relevanssi is inserted.

  7. Gryz
    Member
    Posted 2 years ago #

    Thanks for the suggestion. I'm an old C-programmer who used to write C-code for networking devices. I have no knowledge about php, and I'm not sure I wanna check out all WP code to see how it hangs together. I was hoping for a log-function of WP, where I can just go through all http-requests. Maybe I'll see if I can write some code.

    I've grepped through all the php-code. The only place where I could find the exact string "no-results:" was in the google analytics code.
    From googleanalytics.php:

    } else if ($wp_query->is_search) {
    $pushstr = "'_trackPageview','".get_bloginfo('url')."/?s=";
    if ($wp_query->found_posts == 0) {
    $push[] = $pushstr."no-results:".rawurlencode($wp_query->query_vars['s'])."&cat=no-results'";
    } else

    It looks like the string "no-results:" is pre-pended to the search-string. This seems like a place where excessive no-results: could be prepended.

    I disabled google analytics for a few minutes on our website, and I still saw new searches with the mangled query. :( It's very weird. I'll look into it again this weekend.

    The problem is happening at many sites.
    When searching on google for "no-results:no-results:" I'm getting 15.8 millions results ! Although google only gave me 355 results. Still doesn't look good. I'm surprised nobody ever looked at this before.

    http://www.google.nl/search?complete=0&q=%22no-results%3Ano-results%3A%22

  8. Gryz
    Member
    Posted 2 years ago #

    It turns out our webhost keeps a logfile with all HTTP requests.
    I can see that many of the "no-results" queries are from Googlebot.
    I now also understand why the
    Disallow: /?s=
    line in robots.txt didn't work. It turns out Google does queries for
    /page/3/?s=no-results:no-results:<etc>
    /page/8/?s=no-results:<etc>

    So I added another line to robots.txt.
    Disallow: /page/*/?s=
    I hope the ? and = characters are not special characters, like * is.

    Msaari, if you are still reading this.
    I have a small suggestion.
    Maybe you can include a line:
    <meta name="robots" content="noindex">
    in all result-pages from searches ?
    I don't think people want dynamic search results indexed in search engines anyway. So if wordpress/relevanssiwould include the "noindex" tag in all search results, that could prevent problems ?

  9. Gryz
    Member
    Posted 2 years ago #

    One more update.

    I couldn't believe something was wrong in the google-analytics code. Google isn't that sloppy. But then I realized that that code does not come from Google. It is part of the "Google Analytics for Wordpres" plugin. And the code is written by a volunteer from the WP community, not by Google.

    When I looked at a lot of those websites that had the same problem, I noticed they were all using the "Google Analytics for Wordpres" plugin. So this plugin might very well be the cause of the problems.

    I disabled it, and replaced it with another plugin.
    Ultimate Google Analytics
    Let's hope this fixes the root of the problem.
    We should know in a few days. I'll update this post.

    So there's three parts to the solution.
    1) Use a different GA plugin.
    2) Add rules to robots.txt to prevent googlebots crawling for old malformed search-URLs.
    3) Wait for old search-URLs to depricate from the google database.

    If this turns out to fix the problem, I wonder if I should notify parties involved. Would Google remove the bogus 18 million entries from their database ? Should I contact the author of the GA for WordPress plugin ?

    Anyway, Relevanssi had nothing to do with this. It was a very useful tool to warn us that something was wrong. Thanks for all your effort, msaari !

  10. Mikko Saari
    Member
    Plugin Author

    Posted 2 years ago #

    Relevanssi doesn't do any changes in the search results template, I've so far left that under user control, also many people use all sorts of SEO plugins and good SEO plugin will cover that. But yeah, maybe I could add it the meta noindex field as an option.

  11. aVirulence
    Member
    Posted 2 years ago #

    Gryz, did you happen to solve this problem?

    Sander

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic