Support » Plugin: W3 Total Cache » W3TC and Google XML Sitemaps – XML declaration error

  • Resolved Fritex

    (@fritex)


    Hello,

    Recently, when switched to PHP 7.4 (Nginx, php-fpm) on WordPress 5.5.3 with latest version of W3 Total Cache and Google XML Sitemaps plugin I have encountered error:

    This page contains the following errors:
    error on line 2 at column 6: XML declaration allowed only at the start of the document
    Below is a rendering of the page up to the first error.

    The problem was, the sitemap was somehow included in “cache”, so it was re-generated on new post published.
    But, the generated “cached” version of sitemap.xml has had the 1st line empty and then the <xml> tag on the 2nd line.

    The options I have had enabled were:
    – pgcache_accept_uri (I have had here the stated for sitemap)
    – pgcache__cache__nginx_handle_xml (Was and still is checked)

    After I have successfully searched and modified few .php files – nothing changes, the error still persisted.

    After on, I have found out the W# Total cache HTML comment tag at the end and asked myself maybe the cache is the problem?

    So, went back to the W3 Total Cache and added the following code:

    [a-z0-9_\-]*sitemap[a-z0-9_\-]*\.(xml|xsl|html)(\.gz)?
    ([a-z0-9_\-]*?)sitemap([a-z0-9_\-]*)?\.xml

    Under the fields for Page Cache and Browser Cache:
    – pgcache_reject_uri
    – browsercache_no404wp_exceptions

    So, at the end:
    – pgcache_reject_uri and browsercache_no404wp_exceptions (+ added and contains: `[a-z0-9_\-]*sitemap[a-z0-9_\-]*\.(xml|xsl|html)(\.gz)?
    ([a-z0-9_\-]*?)sitemap([a-z0-9_\-]*)?\.xml`)
    – pgcache_accept_uri (now it is empty)
    – pgcache__cache__nginx_handle_xml (it is checked)

    The sitemap loads correctly, no error, regarding the “cache” HTML comment is still visible on the last line.

    But, the first line of the sitemap document is the starting point as is <?xml …>, which before was “empty” and the strange error occured.

    The sitemap was not indexed by Google News for 5 days since the switch to PHP 7.4.

    I have all PHP extensions enabled and installed, so, hm … maybe the W3TC or the Google XML Sitemaps?

    Because, in the meanwhile before changing the options in W3TC, I have also “just in case” went through and again “cleared” alrwady “cleared” white spaces and lines in /web/ folder of default WordPress “wp-*.php” files.
    Restarted PHP 2-3 times, but nothing happened, cleared the Page Cache, but the error was still there.

    So, after that I found out the “cace comment” on the bottom of the sitemap URI/file, wen to the W3TC Options and made the stated changes to not cache sitemap or exclude from.

    The error still persits on my custom-page for Google News Publisher:
    https://www.racunalo.com/googlenewsfeed/

    I have also added the “googlenewsfeed/” (without quotes) to the same fields, but the error still remains on the https://www.racunalo.com/googlenewsfeed/ – my PHP script to generate feed by guidelines for Google News Publisher.

    But, not anymore on the https://www.racunalo.com/sitemap.xml – generated by Google XML Sitemaps plugin.

    Also, no error on https://www.racunalo.com/feed/ – WordPress RSS default feed.

    The page I need help with: [log in to see the link]

Viewing 12 replies - 1 through 12 (of 12 total)
  • Thread Starter Fritex

    (@fritex)

    The code for displaying and generating “googlenewsfeed” on the page “Google News Feed” with selected “Page template” here:
    https://pastebin.com/7ahHDmLN

    The code for google-news-sitemap.xsl is here:
    https://pastebin.com/urm7qFUN

    That new “Google News RSS Feed” for Google News Publisher is called from functions.php with:

    // register new Google News Feed RSS - because of the error of needed "full" description for post and content in body for guidelines
    add_action('init', 'customRSS');
    function customRSS(){
        add_feed('googlenewsfeed', 'customRSSFunc');
    }
    function customRSSFunc(){
        get_template_part('rss', 'googlenewsfeed');
    }

    The code for “rss.php” file which is displaying and generating “goolenewsfeed” RSS Feed here:
    https://pastebin.com/8nm2MWLw

    Maybe I have to change the code, or add the “page” or that “feed” under the “do not cache” somewhere?

    Why that occurs on PHP 7.4 is more questioning, because it was working fine on PHP 7.2 until 5 days and it is still working on other Website running for 2-3 months on PHP 7.3.

    The setup of the W3TC configuration (cache, redis, page cache, php, opcache, nginx …) and general Linux and LEMP web server is exactly the same.

    • This reply was modified 1 year ago by Fritex. Reason: rss.php file
    Thread Starter Fritex

    (@fritex)

    Also, under “pgcache__purge__feed__types” there are checked:
    – rdf, rss, rss2, atom, googlenewsfeed (and the new RSS feed registered inside functions.php)

    Thread Starter Fritex

    (@fritex)

    Can I try it, wow can I exclude URL https://www.racunalo.com/google-news-sitemap/ from Page Cache and/or somewhere else?

    Because under “page_enhanced” folder i have “googlenewsfeed” – so it caches it regarding the value “googlenewsfeed/” added under the field “browsercache_no404wp_exceptions” and field “pgcache_reject_uri” and also checked to purge feed “googlenewsfeed” under the field “pgcache__purge__feed__types”?

    Because, still “1st line empty” when “not logged-in”:
    view-source:https://www.racunalo.com/googlenewsfeed/

    Whil being “logged in” the 1st line starts fine with <?xml> and it’s not empty.

    Either, that is again, new registered feed in functions.php, not a “page” …

    Or maybe my code is bad for the new feed “google-news-sitemap/”.

    Or maybe not the code itself, maybe in functions.php of the theme I am calling something somewhere, some file, which has “the white space”, and after calling the .php file I am actually calling to register the new rss feed “google-news-sitemap”?

    Not sure about it …

    • This reply was modified 1 year ago by Fritex. Reason: explanation
    • This reply was modified 1 year ago by Fritex.
    • This reply was modified 1 year ago by Fritex.
    • This reply was modified 1 year ago by Fritex. Reason: empty 1st line for public page cache, but not so when logged-in <1st line starts with ?xml
    Thread Starter Fritex

    (@fritex)

    Also, have to mention, behind the CloudFlare Pro package …

    Thread Starter Fritex

    (@fritex)

    So, when not logged-in and considering all options above:
    https://imgur.com/a/eSjPCuc

    When logged-in WordPress:
    https://imgur.com/a/zDGeDTa

    Difference:
    Error because “empty start line” and cached version for all visitors/bots/Google when no logged-in WordPress?
    VS
    No error when logged-in WordPress as user/admin?

    Thread Starter Fritex

    (@fritex)

    Also, index.php of the WordPress has already got the code:

    function ___wejns_wp_whitespace_fix($input) {
        $allowed = false;
        $found = false;
        foreach (headers_list() as $header) {
            if (preg_match("/^content-type:\\s+(text\\/|application\\/((xhtml|atom|rss)\\+xml|xml))/i", $header)) {
                $allowed = true;
            }
            if (preg_match("/^content-type:\\s+/i", $header)) {
                $found = true;
            }
        }
        if ($allowed || !$found) {
            return preg_replace("/\\A\\s*/m", "", $input);
        } else {
            return $input;
        }
    }
    Thread Starter Fritex

    (@fritex)

    Or maybe I should change something here in rss.php file:
    https://pastebin.com/8nm2MWLw

    The extracted code from above link:

    <?php
    /**
    * Template Name: Google News Feed RSS
    */
    //$postCount = 50;
    //$posts = query_posts('category__not_in=1,46&showposts=' . $postCount);
    $args_gnr = array(
    'post_type' => 'post',
    'post_status' => 'publish',
    'category__not_in' => array(1),
    'showposts' => 50,
    'orderby' => 'date',
    'order' => 'DESC',
    'post_status' => 'publish',
    'post_type'=>'post'
    );
    $query_gnr = new WP_Query( $args_gnr );
    header('Content-Type: '.feed_content_type('rss-http').'; charset='.get_option('blog_charset'), true);
    echo '<?xml version="1.0" encoding="UTF-8"?>';
    ?>

    Moved the above code few lines below …
    And added:

    define('DONOTCACHEPAGE', true);
    define('DONOTCACHEDB', true);
    define('DONOTCACHCEOBJECT', true);

    So, the final code modification for rss.php for my custom generated “googlenewsfeed” looks like:

    <?php
    /**
    * Template Name: Google News Feed RSS
    */
    define('DONOTCACHEPAGE', true);
    define('DONOTCACHEDB', true);
    define('DONOTCACHCEOBJECT', true);
    header('Content-Type: '.feed_content_type('rss-http').'; charset='.get_option('blog_charset'), true);
    echo '<?xml version="1.0" encoding="UTF-8"?>';
    ?>
    <rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:media="http://search.yahoo.com/mrss/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    <?php do_action('rss2_ns'); ?>>
    ...
    ...
    <?php
    $args_gnr = array(
    'post_type' => 'post',
    'post_status' => 'publish',
    'category__not_in' => array(1),
    'showposts' => 50,
    'orderby' => 'date',
    'order' => 'DESC',
    'post_status' => 'publish',
    'post_type'=>'post'
    );
    $query_gnr = new WP_Query( $args_gnr );
    if($query_gnr->have_posts()): while($query_gnr->have_posts()): $query_gnr->the_post(); ?>
    <item>
    <title><?php the_title_rss(); ?></title>
    ...
    ...
    <?php rss_enclosure(); ?>
    <?php do_action('rss2_item'); ?>
    </item>
    <?php endwhile; endif; wp_reset_postdata(); ?>
    </channel>
    </rss>
    • This reply was modified 1 year ago by Fritex. Reason: Working after disabled page cache with define() php
    Thread Starter Fritex

    (@fritex)

    And after that modification, it works perfectly and valid:
    https://imgur.com/a/t5X4kvg

    But, as saying, the Google XML Sitemaps default sitemap.xml plugin had also had the same problem.
    Then, changed the stated values on the fields few replies above, the error dissapeared.

    Kind of a strange situation … some bug or due to the hm …

    Nevermind, maybe someone finds it helpful to disable page cache and/or browser cache for sitemap.xml when using that plugin with W3TC.

    Thread Starter Fritex

    (@fritex)

    Either, do not know if the RSS is valid but still the default WordPress one has “empty 1st line”:
    https://www.racunalo.com/feed/

    So, supossing I have to somehow, modify the rss.php of the default WordPress?
    Or how to exclude sitemaps completly from Page, Database, Object and browser cache?

    • This reply was modified 1 year ago by Fritex.
    Thread Starter Fritex

    (@fritex)

    Okay, added “feed/” (without quotes) under the stated fields to exclude the “feed/” from caching …

    Maybe, the /feed/ is not “registered” as sitemap.xml URL, so the W3TC does not know?

    Also, https://www.racunalo.com/feed/ – on FireFox it gives me the option to download it as a file because the conte-type HTTP header returned is “application/rss+xml; charset=UTF-8”.

    Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @fritex

    First of all thank you for all the details provided and I am happy to assist you with this.
    I hope I got it all so please let me know if I missed something.
    When the option “Cache feeds: site, categories, tags, comments” is enabled in Performance>Page Cache>General, it means that feeds and in some cases sitemaps are cached. You are not seeing the issue when logged in because most probably the option “Don’t cache pages for logged in users is enabled”
    This being said, when you excluded the feed and the sitemap from being cached the issue went away.
    As for the application/rss+xml; charset=UTF-8, try enabling the option “Handle XML mime type” in Performance>Page Cache (the last option at the bottom of the page) which should return the correct Content-Type header for XML files (e.g., feeds and sitemaps).
    I hope this helps!
    THanks!

    Thread Starter Fritex

    (@fritex)

    Got it, thanks!

Viewing 12 replies - 1 through 12 (of 12 total)
  • The topic ‘W3TC and Google XML Sitemaps – XML declaration error’ is closed to new replies.