Support » Plugin: Relevanssi - A Better Search » Some posts included, then excluded, during index

  • Resolved cedmonds

    (@cedmonds)


    First off, thanks for the plugin!

    I’ve noticed that, when building the index, some posts seem to be getting indexed, and then later within the same index process, are somehow getting de-indexed.

    To explain further, I’ll start the “Build the index” process, and then will start reviewing my post index page. Each time I refresh the index page as the Relevanssi index runs, it will show an increasing number of posts, and will initially show all posts as expected. But then later during that same index process (about 2/3 of the way through, though I don’t know if there is any relevance to the timing), some posts that were initially present on the index page are no longer displayed. I’ve also verified this behavior within the Relevanssi “Admin Search” tool in the WordPress admin and am seeing the same thing there – some posts will be present in the list at first, and then will be absent from the list later during the Relevanssi indexing process.

    I have the “Expand shortcodes when indexing” option disabled. The posts in question are from a custom post type, if that makes any difference.

    Any suggestions you have about what might be going on, or how I might try to troubleshoot this issue, would be greatly appreciated! Thank you.

Viewing 10 replies - 1 through 10 (of 10 total)
  • Plugin Author Mikko Saari

    (@msaari)

    That’s really weird. I’ve seen different issues with indexing, but this is a new one for me, having already indexed posts disappear from the index.

    Can it be another plugin that’s interfering with the indexing?

    add_filter( 'relevanssi_post_to_index', function( $post ) { error_log( "Indexing $post->ID" ); return $post; } );

    If you add this and build the index, it should print out all the post IDs that are indexed to the error log. That’s a start.

    Hi Mikko, thanks for the response. Is the “relevanssi_post_to_index” filter a premium-only feature? I’m looking at this page, and it seems like it might be:

    https://www.relevanssi.com/release-notes/premium-1-13/

    We’re just using the free version of the plugin at this time. If that filter is a premium-only feature, is there any other way to monitor / collect the details of the indexing process? Do you think that the data that the plugin writes to the browser console contains enough information to possibly see what might be going on here?

    Plugin Author Mikko Saari

    (@msaari)

    No, it’s not, it’s in the Relevanssi core. Trust me – I’m the guy who wrote all this.

    No, console output is likely not enough, because it doesn’t tell which posts Relevanssi is indexing.

    Thank you – I must’ve made an error when trying to print to the error log the first time around, because it worked as expected this time. Here is the data:

    https://pastebin.com/FSuLxFdA

    Please let me know if you have any thoughts on what might be going on.

    Plugin Author Mikko Saari

    (@msaari)

    There’s definitely something weird going on in that log. You can see it yourself: Relevanssi should index the posts in the descending order of post ID, so the numbers should run in descending order. They mostly do so, but not always:

    [26-Nov-2019 20:57:44 UTC] Indexing 1365
    [26-Nov-2019 20:57:44 UTC] Indexing 1364
    [26-Nov-2019 20:57:47 UTC] Indexing 32395
    [26-Nov-2019 20:57:47 UTC] Indexing 32709
    [26-Nov-2019 20:57:47 UTC] Indexing 1363
    [26-Nov-2019 20:57:47 UTC] Indexing 1362
    [26-Nov-2019 20:57:47 UTC] Indexing 1361

    So, after indexing post 1364, Relevanssi jumps to index 32395 and 32709, both of which were already indexed. In fact, 32709 gets indexed eight times. Is it one of the problem posts?

    Now, without knowing anything about the posts, it’s pretty hard to say what’s going on here.

    Hi Mikko,

    Thanks for your continued suggestions. The posts you mentioned which seem to be getting indexed out of order and multiple times are actually recurring events which were posted using the “The Events Calendar” plugin. I’m not sure why they would be getting indexed out of order, but I’m guessing the fact that they’re recurring events is why they’re being indexed multiple times. That said, none of those are the posts with issues.

    I tried temporarily disabling the “The Events Calendar” plugin and did a Relevanssi re-index, just to see if that particular plugin was causing any conflicts, and the original (index, then de-index) issue which I describe above is still happening with “The Events Calendar” disabled.

    An example of a post that does seem to have the issue I’m describing has an id of 35728 – I see that that post is indexed as expected in the log.

    Plugin Author Mikko Saari

    (@msaari)

    You can check the database directly. See if there’s anything in the wp_relevanssi table with the doc 35728.

    The thing is, I’m not really sure what could even be the mechanism here. Relevanssi removes posts from the index in few cases:

    – When a delete_post action is fired, which should not happen here.
    – When a post is updated and is edited in a way that tells Relevanssi not to index the post. Very unlikely to happen here.
    – When the parent of the post is deleted or made private, the children will also be removed from the index. Fairly unlikely to happen here.
    – When a post is indexed, it will be first removed from the index. This is the most likely explanation.

    However, for this to be the case, it would require that the problem posts are indexed twice, and for some reason on the second go, Relevanssi would consider the posts private and thus not indexable. And I’m not really sure how that could happen.

    Any other plugins that might have something to do with this?

    For further debugging, there are no easy hooks to use, but if you’re fine with editing plugin files, here are some suggestions you could do in lib/indexing.php.

    Line ~476 is this: relevanssi_remove_doc( $post->ID, true ); – before that, add error_log( "removing post $post->ID" );.

    Line ~427 is this: $post = isset( $post->ID ) ? get_post( $post->ID ) : null; // phpcs:ignore WordPress.WP.GlobalVariablesOverride.Prohibited – after that, add error_log( "attempting to index $post->ID" );

    These would shed some light to whether Relevanssi is removing those posts or not.

    Hey Mikko,

    Thanks again for sticking with me through this issue. After a lot more experimenting on my staging site, I’ve discovered that the issue is (as you suspected) not related to the indexing at all – it actually seems to be related to the way I’m running the Relevanssi query. When I run a standard WP query on the page, all posts are returned, as expected. When I run the query through Relevanssi, some posts are mysteriously (and seemingly randomly) excluded.

    The page in question includes a search form, and oddly enough, if a search is being performed (i.e. the “query” parameter is passed through the query string with a value), the posts that are excluded on the default (non-search) view of the page are not excluded.

    Please see the code below, and let me know if you see any issues with how I’m handling the query:

    $sBaseUrl = explode ( "/", $_SERVER["REQUEST_URI"] )[1];
    
    // Gather pagination data from query string, if available
    $nPaged = ( get_query_var ( 'paged' ) ) ? get_query_var ( 'paged' ) : 1;
    
    $aoThisPagePostTypeMeta = get_post_type_object ( get_post_type() );
    
    $sThisPagePostType = $aoThisPagePostTypeMeta -> name;
    
    $aoThisPageQueriedObject = get_queried_object();
    
    // Set up arguments for search query
    $aSearchQueryArgs['posts_per_page']     = $_REQUEST['results-per-page'] ?? 20;
    $aSearchQueryArgs['post_type']          = $sThisPagePostType;
    $aSearchQueryArgs['ignore_custom_sort'] = true;
    
    if ( "news" == $sThisPagePostType || "news" == $sBaseUrl ) {
    	$aSearchQueryArgs['order']   = $_REQUEST['sort-order'] ?? "DESC";
    	$aSearchQueryArgs['orderby'] = 'date';
    }
    
    else {
    	$aSearchQueryArgs['order']   = $_REQUEST['sort-order'] ?? "ASC";
    	$aSearchQueryArgs['orderby'] = 'title';
    }
    
    $aSearchQueryArgs['paged'] = $nPaged;
    $aSearchQueryArgs['s']     = ( isset ( $_REQUEST['query'] ) ) ? $_REQUEST['query'] : "";
    
    // If the Relevanssi plugin is active, run the search through that. Otherwise, just do a standard WP_Query
    if ( function_exists ( 'relevanssi_do_query' ) ) {
    	$aoPosts = new WP_Query();
    	$aoPosts -> parse_query ( $aSearchQueryArgs );
    	relevanssi_do_query ( $aoPosts );
    }
    
    else
    {	$aoPosts = new WP_Query ( $aSearchQueryArgs ); }

    Please let me know if you’d prefer that I submit a new, separate topic for this, since this issue has now gone in a different direction than originally suspected.

    • This reply was modified 1 month, 1 week ago by cedmonds.
    Plugin Author Mikko Saari

    (@msaari)

    Well, for starters you’re passing the query through Relevanssi whether there’s a search involved or not. Don’t do that. Use Relevanssi only when s is set. If you run the query through Relevanssi when there’s no search term, you’re going to get unexpected results.

    So, change

    if ( function_exists ( 'relevanssi_do_query' ) ) {

    to:

    if ( function_exists ( 'relevanssi_do_query' ) && ! empty( $aSearchQueryArgs['s'] ) ) {

    That could help.

    Starting a new topic is a good idea, because I don’t always look this far down the list of threads, I’ll notice updates to a new thread better.

    Arrrgh, that fixed it! That was a bone-headed oversight on my part, and I apologize that you had to take the time through all of this to point out something that ended up being so obvious. But again, thank you so much again for doing so – very much appreciated!

Viewing 10 replies - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.