• The problem: When you get a blog that is large enough, with a big enough archive, wordpress becomes a massive performance pig. The culprits are some of the queries used, particularly for the sidebar calendar and in building archive pages.

    There are queries that are made in such a way that they cannot be index and require all rows to be processed in the posts database. So if you have a blog with 10,000 records, example, you have to process all 10,000 records. There are cases where queries like “COUNT_NUM_ROWS” is used, as well as using year and month as seperate items querying against the date field, rather than setting a top and bottom for a monthly list.

    Where this becomes a particular problem is on an older blog, say with 10,000 posts and 2 or 3 years of archives. Googlebot comes calling, does a near simultaneous request for 36 archive pages (1st page for each month for 3 years) and suddenly there are 36 queries lined up that have to treat every single row in a 10,000 record db to produce only 10 items on a page. It is a huge load on the system, and appears to all but lock everyone else out while it is happening. That means that a backlog of other queries piles up, and the server pretty much comes to a stop. 36 requests like that at the same time can drive server load well past 30 or 40.

    Worse, get someone with a caching remote fetch bot that gets all of the archive pages, and essentially you get server shutdown.

    Why have there been 5 or 6 complete interface redo’s, and yet basic queries are still horrible?

Viewing 4 replies - 1 through 4 (of 4 total)
  • There is work being done on the archives queries at WordPress’s bug tracker.

    I’m not sure what queries you’re referring to with “COUNT_NUM_ROWS”, but trac would be the best place to post examples of slow queries and other performance issues, as it’s really a development/bug issue, not support.

    Also, if there’s really a search engine bot requesting every archive page simultaneously, that’s going to be a problem for almost any platform (see Cuil in mid-2008) and might need to be addressed separately.

    Thread Starter rawalex

    (@rawalex)

    I will check to see what I can find in the bug tracker.

    The answer “others have a problem too” isn’t really a good answer, especially when things can be fixed and made better. 🙂

    If you have particular optimisations to suggest then please do so as a new enhancement on the trac that filosofo has linked to. Some of the queries have been improved, but others I think get overlooked, and could do with an overhaul.

    There does tend to be an over-use of DISTINCT where it isn’t necessary, for one thing.

    Thread Starter rawalex

    (@rawalex)

    I found one trac on this… check out this query, as an example:

    SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts WHERE 1=1 AND wp_posts.post_type = ‘post’ AND (wp_posts.post_status = ‘publish’ OR wp_posts.post_status = ‘private’) ORDER BY wp_posts.post_date DESC LIMIT 0, 6

    SQL_CALC_FOUND_ROWS is a horrible query, no matter what limit you put on it, it has to consider all records in the DB.

    Read it all here: http://trac.wordpress.org/ticket/7415

    The problem is that this has been floating since July. WordPress has had two complete revision to the admin panel in this time, but underlying code keeps getting bumped and milestoned off to the future. This was 2.7 now 2.8 and likely going to end up as 3.0 soon enough.

    Testing it all out isn’t hard. Load up a sample WP install with 10,000 posts spread over a period of time. Log all the mysql queries execution time. Click around, visit archive, do searches, etc. The slow queries will show up very quickly indeed.

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘WordPress 2.x just doesn’t scale.’ is closed to new replies.