• Resolved ngreeves

    (@ngreeves)


    I have used Relevanssi ever since I moved my Chemistry web site to WordPress. I have noticed that it is unable to find pages when the title has HTML tags in it. e.g. https://www.chemtube3d.com/ch4/ has the title Methane – C<sub>4</sub> as it is a chemical formula. Searching for “CH4” finds no pages.

    Is it possible to ignore HTML tags such as <sub> and <sup>, which are crucial for Chemistry, when indexing the pages?

    My users have searched for CH4, BF3, H2O, etc. and found no hits when such pages do exist on the web site.

    The page I need help with: [log in to see the link]

Viewing 6 replies - 1 through 6 (of 6 total)
  • Plugin Author Mikko Saari

    (@msaari)

    Relevanssi does ignore the HTML tags. The problem is that Relevanssi does not remove the tags but replaces them with spaces. Thus CH<sub>4</sub> is indexed as CH 4 and not CH4.

    Add this function to your site and rebuild the index:

    add_filter( 'relevanssi_post_title_before_tokenize', function( $title) {
        $title = str_replace( array( '<sub>', '</sub>', '<sup>', '</sup>' ), '', $title );
        return $title;
    } );

    This function removes all <sub> and <sup> tags from the titles before they are indexed.

    Thread Starter ngreeves

    (@ngreeves)

    Thank you, that sounds ideal. Just to be clear are you suggesting I add this to functions.php ?

    Plugin Author Mikko Saari

    (@msaari)

    Yes, that’s a good place for it.

    Thread Starter ngreeves

    (@ngreeves)

    Many thanks, that is working as intended now.

    Use the <meta name="robots" content="noindex"> tag in the HTML head section to instruct search engines to ignore HTML tags when indexing.

    Plugin Author Mikko Saari

    (@msaari)

    That’s both incorrect and irrelevant here. The noindex directive tells search engines to ignore the whole page.

    Relevanssi ignores the robots tags, because it’s not a robot.

    • This reply was modified 2 years, 2 months ago by Mikko Saari.
Viewing 6 replies - 1 through 6 (of 6 total)

The topic ‘Ignore HTML tags when indexing?’ is closed to new replies.