• Resolved stofa

    (@stofa)


    Hi There,

    The following query :

    SELECT p.ID AS id, p.post_title AS title, p.post_author AS uid, p.post_date AS date FROM wp_popularpostssummary v LEFT JOIN wp_posts p ON v.postid = p.ID WHERE 1 = 1 AND p.post_type IN(‘post’) AND v.view_datetime > DATE_SUB(‘2018-06-11 16:06:05’, INTERVAL 6 DAY) AND p.post_password = ” AND p.post_status = ‘publish’ GROUP BY v.postid ORDER BY SUM(v.pageviews) DESC LIMIT 5 OFFSET 0

    When executed by a high number ( 100 ) concurrent connections creates a lot of problem.

    My suggestions are :

    1. Create one single record by postid ad date. For each a new visit, only un’update on pageviews column in done. In this way the group by and sum are not required.
    2. Enable caching on client side as default
    3. Enable sampling on client cache as default
    4. Hold only 1 month of data in the table as default.

    Thanks

    Stofa.

Viewing 12 replies - 1 through 12 (of 12 total)
  • Plugin Author Hector Cabrera

    (@hcabrera)

    Hi Stofa,

    Thanks for the suggestions.

    1. Create one single record by postid ad date.

    The plugin does this already. You can inspect the source code if you’re insterested.

    2. Enable caching on client side as default.

    3. Enable sampling on client cache as default.

    When executed by a high number ( 100 ) concurrent connections creates a lot of problem.

    The installation instructions explicitly recommend enabling Data Sampling and/or Caching on large / high-traffic sites.

    4. Hold only 1 month of data in the table as default.

    I can agree with this one. I’ll think about it.

    Thanks for your comments!

    Plugin Author Hector Cabrera

    (@hcabrera)

    The plugin does this already. You can inspect the source code if you’re insterested.

    I’ll correct myself: the plugin used to do this. Since version 4.0.0 however WPP stores each view separately to allow users selecting & displaying custom time ranges (eg. most viewed from the last hour). Reverting back to the old behavior (storing views count per post per day) won’t happen for obvious reasons.

    You can use an older version of the plugin if you prefer the old behavior for performance reasons, however know that I only provide support to people using the latest version of the plugin.

    Thread Starter stofa

    (@stofa)

    Hi Hector,

    Let me reopen this case to suggest a rewrite of the query :

    This is the current version :

    SELECT p.ID AS id, p.post_title AS title, p.post_author AS uid, SUM(v.pageviews) AS pageviews FROM wp_popularpostssummary v LEFT JOIN wp_posts p ON v.postid = p.ID WHERE 1 = 1 AND p.post_type IN(‘post’) AND v.view_datetime > DATE_SUB(‘2018-08-28 10:00:04’, INTERVAL 29 DAY) AND p.post_password = ” AND p.post_status = ‘publish’ GROUP BY v.postid ORDER BY pageviews DESC LIMIT 5 OFFSET 0;

    We can improve it by using a subquery in this mode :

    select p.ID AS id, p.post_title AS title, p.post_author AS uid, v.pageviews from wp_posts p,(select sum(pageviews) pageviews,postid from wp_popularpostssummary where view_datetime > DATE_SUB( ‘2018-08-28 10:00:04’, INTERVAL 29 DAY) group by postid) v where v.postid=p.ID AND p.post_type IN(‘post’) and p.post_password = ” AND p.post_status = ‘publish’ order by v.pageviews desc limit 5;

    Moreover, is it possible to remove the time part in the WHERE condition and use only the date part ?

    I mean, something like this :
    DATE_SUB( ‘2018-08-28 00:00:00’, INTERVAL 29 DAY)

    In this way, if the query cache is enabled on mysql server side, the result is cached and the query will be very fast.

    By using the condition whith the time part, the query is always different and then the cache is never used.

    Thanks

    Stofa.

    Plugin Author Hector Cabrera

    (@hcabrera)

    Hi Stofa, it’s been a while.

    We can improve it by using a subquery (…)

    I’m no DB expert but as far as I understand subqueries have been historically slower than using explicit JOINs, so unless there are reliable performance tests that indicate that this change is more performant than using a join then I may consider implementing it.

    Moreover, is it possible to remove the time part in the WHERE condition and use only the date part ?

    I’m sorry, but I’m going to have to decline this one. If you’re worried about performance, WordPress Popular Posts can cache query results (and if you have access to memcached on your server you’ll get even better results by installing the Memcached Object Cache plugin).

    Thread Starter stofa

    (@stofa)

    Hi Hector,

    Basically it is not a real subquery but a join between a table and inline query.
    I agree with you about the bad performance of a subquery against a join but this is true if for instance use a query like this :

    select …. from table1 where col1 is in ( select col2 from table2 );

    The two versions of the query executed on my live instance gave me the following response time :

    Current version :5 seconds

    Modified version : 900 ms.

    As the query takes a lot, many of our customers are disintalling the plugin in order to make wordpress working.

    Thanks

    Stofa.

    Plugin Author Hector Cabrera

    (@hcabrera)

    Out of curiosity, I ran these two queries on PHPMyAdmin 15 times each to see what the results were like and this is what I got:

    Original version:

    SELECT p.ID AS id, p.post_title AS title, p.post_author AS uid, SUM(v.pageviews) AS pageviews FROM wp_popularpostssummary v LEFT JOIN wp_posts p ON v.postid = p.ID WHERE 1 = 1 AND p.post_type IN('post') AND v.view_datetime > DATE_SUB('2018-08-28 10:00:04', INTERVAL 29 DAY) AND p.post_password = '' AND p.post_status = 'publish' GROUP BY v.postid ORDER BY pageviews DESC LIMIT 5 OFFSET 0

    Avg. response time: 0.0246 seconds.

    Your version:

    select p.ID AS id, p.post_title AS title, p.post_author AS uid, v.pageviews from wp_posts p,(select sum(pageviews) pageviews,postid from wp_popularpostssummary where view_datetime > DATE_SUB( '2018-08-28 10:00:04', INTERVAL 29 DAY) group by postid) v where v.postid=p.ID AND p.post_type IN('post') and p.post_password = '' AND p.post_status = 'publish' order by v.pageviews desc limit 5

    Avg. response time: 0.0245 seconds.

    • PHP 7.1.16
    • Database: 10.1.31-MariaDB
    • Server: Apache 2.4.33

    While this can’t be taken as a professional/reliable performance test, as you can see both queries are nearly identical in terms of performance.

    Thread Starter stofa

    (@stofa)

    Hi Hector,

    I think you shoukld disable query cache by putting SQL_NO_CACHE just after SELECT in the query.

    Believe me, i re-run the test on 5.5 and 5.6 Percona mysql server.
    This time, i did the test on a separate instances on a separate server used only for test purpose.

    The times are as i stated at the begin.

    5 seconds for the original query
    900 ms for the modified one.

    It’s up to you take the final decision but believe me with the new version plugin users will be very happy.

    Thanks.

    Stofa

    Plugin Author Hector Cabrera

    (@hcabrera)

    I think you shoukld disable query cache by putting SQL_NO_CACHE just after SELECT in the query.

    You’re right, good catch! I forgot that MySQL caches subsequent queries if the query is exactly the same as the first one (and of course, since WPP adds a datetime string to each query it’s basically the same as running it with SQL_NO_CACHE as it will never be cached by MySQL).

    Alright, after running these two queries with SQL_NO_CACHE on three different servers I can confirm that what you said is true: the modified query ran up to 50% faster than the original one. That’s quite the performance boost!

    Live website, low traffic (<100 pageviews per day):

    PHP 5.6.30
    Database: MySQL 5.6.39-cll-lve
    Server: cpsrvd 11.74.0.4

    • Original query: 0.0505 seconds
    • Modified query: 0.0261 seconds (~48% faster)

    localhost, no traffic (except by me):

    PHP 7.1.16
    Database: MariaDB 10.1.31
    Server: Apache 2.4.33

    • Original query: 0.0273 seconds
    • Modified query: 0.0260 seconds (~5% faster)

    Live website, low traffic (~1000 pageviews per day):

    PHP 7.0.31
    Database: 10.1.31-MariaDB-cll-lve

    • Original query: 0.0139 seconds
    • Modified query: 0.0021 seconds (~51% faster)

    Oddly enough, results were nearly identical on localhost while on the live sites the second query was always noticeable faster. Also, note that both queries ran in less than a second in all of my tests. I find it a bit weird that on your server the original query takes ~5 seconds while the modified one less than a second.

    Since it’s obvious that the modified query does run faster than the original one -and though I still want to run some more tests on different environments just to be safe- there’s a big chance I’ll implement this change.

    Thanks for contributing, Stofa!

    Plugin Author Hector Cabrera

    (@hcabrera)

    Hey @stofa,

    Would you mind helping test out this query?

    WPP can potentially need data from the comments table as well if the user enables the option to display both views count and the comments count within a given time range (eg. last 60 days). With some help from the good folks at Stack Overflow I got here:

    SELECT SQL_NO_CACHE id, post_title AS title, post_author AS uid, 
        (SELECT SUM(pageviews) 
         FROM wp_popularpostssummary v 
         WHERE v.postid = p.id AND 
               v.view_datetime > DATE_SUB('2018-08-16 18:34:46', INTERVAL 59 DAY)
        ) AS pageviews,
        (SELECT COUNT(comment_post_ID) 
         FROM wp_comments c 
         WHERE c.comment_post_ID = p.id AND
               c.comment_date_gmt > DATE_SUB('2018-08-16 18:34:46', INTERVAL 59 DAY) AND 
               c.comment_approved = 1
        ) AS comment_count
    FROM wp_posts p
    WHERE 1 = 1 AND p.post_type IN('post') AND p.post_password = '' AND p.post_status = 'publish'
    ORDER BY pageviews DESC 
    LIMIT 5 OFFSET 0;

    I tried including the comments in your proposed query but couldn’t do so due mainly to my lack of knowledge/experience on this field. Since it seems you know a bit more I figure I would ask.

    Thread Starter stofa

    (@stofa)

    Hi Hector,

    The query suggested by Stack Overflow folks is valid. To avoid some warnings, this filter in the WHERE condition : c.comment_approved = 1 should by changed in c.comment_approved = ‘1’ as the column comment_approved in wp_comments is declared as varchar.

    To include the comments in my original query, just add another subquery as follows :

    SELECT id, post_title AS title, post_author AS uid, v.pageviews, c.comment_count from wp_posts p, (SELECT SUM(pageviews) pageviews, postid FROM wp_popularpostssummary v WHERE v.view_datetime > DATE_SUB(‘2018-08-28 10:00:04’, INTERVAL 59 DAY) group by postid) v, (SELECT COUNT(comment_post_ID) comment_count, comment_post_ID FROM wp_comments c where c.comment_date_gmt > DATE_SUB(‘2018-08-28 10:00:04’, INTERVAL 59 DAY) AND c.comment_approved = ‘1’ group by comment_post_ID) c WHERE 1 = 1 AND p.post_type IN(‘post’) AND p.post_password = ” AND p.post_status = ‘publish’ and v.postid=p.id and p.id=c.comment_post_ID ORDER BY pageviews DESC LIMIT 5 OFFSET 0;

    Testing the two versions in my environment, it seems that my version is a little bit faster. But the first version ( Stack Overflow folks) is acceptable.

    Stofa.

    Plugin Author Hector Cabrera

    (@hcabrera)

    Hey Stofa,

    Thanks for the input! I actually rewrote your query exactly the same way you did above but the results returned by it were different from what the original query returns.

    The problem is that the above query excludes posts that don’t have any comments within the selected time range. The original query and the one from SO don’t have this problem, posts without comments aren’t excluded and their comments count is 0.

    But of course, right after writing these paragraphs just now I had sudden moment of clarity and rewrote the above query as below:

    SELECT ID AS id, post_title AS title, post_author AS uid, v.pageviews, IFNULL(c.comment_count, 0) AS comment_count 
    FROM wp_posts p INNER JOIN (SELECT SUM(pageviews) pageviews, postid FROM wp_popularpostssummary v WHERE v.view_datetime > DATE_SUB('2018-08-28 10:00:04', INTERVAL 59 DAY) GROUP BY postid) v ON p.ID = v.postid LEFT JOIN (SELECT COUNT(comment_post_ID) AS comment_count, comment_post_ID FROM wp_comments c WHERE c.comment_date_gmt > DATE_SUB('2018-08-28 10:00:04', INTERVAL 59 DAY) AND c.comment_approved = '1' GROUP BY comment_post_ID) c ON p.ID = c.comment_post_ID 
    WHERE 1 = 1 AND p.post_type IN('post') AND p.post_password = '' AND p.post_status = 'publish' ORDER BY pageviews DESC LIMIT 5 OFFSET 0;

    By using LEFT JOIN on the comments table, posts with views that don’t have comments are no longer being excluded from the results. And the query seems to be faster than the one from SO as well which is nice!

    Plugin Author Hector Cabrera

    (@hcabrera)

    Hey Stofa,

    Thought that you might want to know: #195.

    Also, you might be interested in this: Caching pageviews.

Viewing 12 replies - 1 through 12 (of 12 total)
  • The topic ‘Query performance’ is closed to new replies.