• I’m researching a plan to optimize WordPress SQL/database indexes for highly-trafficked sites, especially ones that have a lot of comments.

    The WP code and SQL has already been worked over and are great, of course. But I’m wondering if at HIGH traffic (300+ comments a day, 100K+ pageviews a day) it might not be worth squeezing even more efficient queries — even if it means more PHP code. It might not, I know. And if I can’t do it in a plugin, I will deem any efforts unworthy.

    If anyone has advice of things I should look at as I do this, let me know. I’ve search through this forum and the codex (might have missed something).

    =========

    So far, I’m trying out an index on “post_date” and “post_date_gmt” on wp_posts right now. I think that might be worth it.

    Also, in the interest of being flexible, classes.php builds queries that are slightly inefficient. Like the front page uses this query to get posts:


    SELECT DISTINCT * FROM wp_posts WHERE 1=1 AND post_date_gmt <= '2006-07-07 17:50:59' AND (post_status = "publish") AND post_status != "attachment" GROUP BY wp_posts.ID ORDER BY post_date DESC LIMIT 0, 10

    So I’ve hacked classes.php so it uses this:


    SELECT * FROM wp_posts WHERE 1=1 AND post_date_gmt <= '2006-07-07 17:47:59' AND (post_status = "publish") AND post_status != "attachment" GROUP BY post_date_gmt DESC ORDER BY post_date_gmt DESC LIMIT 0, 10

    It’s a bit faster. I’ll see if it’s enough to make it worth it. Ultimately I don’t want to hack classes.php, of course.

Viewing 3 replies - 1 through 3 (of 3 total)
  • if you don’t use future publishing, you can get rid of the date check (and actually, I thought they were going to add a new post status for future posts, so that could be eliminated completely…). I also don’t understand checking =”publish” AND !=”attachment” — if it’s publish, it’s inherently NOT attachment.

    I’d think that if the post ID is unique, I’m not sure what the DISTINCT >AND< the GROUP BY are needed for. Neither should be needed — and I’d probably stick with ID (vs gmt) if needed as it’s the primary key and I’d assume primaries are ‘more optimized’ (though no proof of that!).

    lastly, it’s been shown that query efficiency is less of an impact than caching systems — WP-Cache/Staticize make an order of magnitude (I think) impact on performance. And a next-gen cache-to-HTML system (yes, moving ‘backwards’ in minor ways) would be even better for huge sites to never touch PHP.

    Thread Starter willhines

    (@willhines)

    All good changes for the query. The trick is making them in a portable way so you don’t mess with WP’s WP_Query object, which builds these queries. I tried hacking WP_QUERY so it dropped the GROUP BY totally when it got posts in the loop, but that broke pagination (which re-uses the SQL string and substitutes in new SQL where the GROUP BY had been).

    I guess what I’m trying to do is optimize WP_Query via a plugin that would benefit someone who had put indexes in their posts and comments table.

    I have 700 categories and 40,000 comments. The things I’m working on might be worth it at this scale. Maybe not though!

    I agree WP-Cache makes a bigger difference than query optimizing. But I think whatever improvments I can make might be worth it anyway!

    right, well the comments table is a completely separate issue. 😉

    If the changes make sense, and don’t break things, you might want to try to get them committed to the core. Though they usually have reasons for things, the queries have been overhauled, and pulled apart, and put back together, across versions by multiple people — so there can be inefficiencies.

    -d

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘optimizing SQL for high traffic; lots of comments’ is closed to new replies.