WordPress.org

Ready to get started?Download WordPress

Forums

is "SELECT DISTINCT *" ever a good idea? (9 posts)

  1. mykle
    Member
    Posted 7 years ago #

    Hi,

    we've been using the mysql slow-query log to analyze WP performance on our site, and we've figured out that some of our regular slow queries include this sort of SQL construct:

    SELECT DISTINCT * from (table, join, etc.) ...

    today i did some empirical tests and found that that DISTINCT clause was slowing us down significantly, while not changing the results we get. right now we're running without, in a "watch carefully" mode.

    now, i try not to understand things until i have to, or to any greater depth than necessary. =) but today i did the research, and it seems to me that to put the three strings together 'SELECT DISTINCT *' and hand them to SQL is probably a mistake.

    if i understand SQL right (and maybe i don't), if you're getting every single column back from every table in your possibly multi-table query (as specified by SELECT *), your results will never include duplicate rows. at least not in MySQL, in the way it's documented to behave. at least when you're selecting on tables that contain unique keys, as all the wp_tables do.

    so with no duplicate rows, the DISTINCT clause, at best, does nothing. but at worst -- in mysql at least -- it seems to cause the query to be kicked into a 'temporary table' aka a file, aka a massive slowdown.

    at least that's what we were seeing with the get-recent-comments plugin. we took out the DISTINCT and our performance problems literally vanished. so far we've seen no duplicate comments displayed. (but i'll let you know if we get bitten.)

    i would like the opinion of more wizened wp/sql hackers about this, because it appears that another use of 'SELECT DISTINCT *' is sitting right in the get_post function. is this a performance improvement waiting to be seized, or a dumb n00b idea? please advise.

    -mykle-

  2. TechGnome
    Moderator
    Posted 7 years ago #

    Ugh.... are you serious.... there's SELECT DISTINCT * code in there? If so, I may have to go spellunking though the code and eradicate them. In general, it's not a good idea to use DISTINCT anything.... unless absolutely necessary. As a matter of practice, I make sure that my where rules out as many dupes as possible.... in those rare caseswhere it's not sufficient... I find Group Bys to me more effective.

    -tg

  3. mykle
    Member
    Posted 7 years ago #

    i have a wp 2.2 install where i found it in wp-includes/functions.php ... i'm not sure how current that is. i did a find/exec/grep through every .php file looking for it, and that's the only one i found. but in the get-recent-comments plugin (in a different, 2.0 install) it's like night and day!

    having thought about it more, i can see how if you have duplicate rows in your table or one of the tables in your join, SELECT DISTINCT * could be useful. so i guess the answer to my question in the subject line is "occasionally". but for the cases where you join between tables that all have primary not-null keys, it should never be necessary.

    we are continuing to enjoy the significant performance boost that we got from removing that DISTINCT clause in just two spots in that plugin. we are using it on our front page, so it gets hit constantly.

  4. Samuel Wood (Otto)
    Tech Ninja
    Posted 7 years ago #

    There is only one use of SELECT DISTINCT * code in WordPress, and that is in the get_posts() function, where it's selecting the posts from the posts table.

    In that case, you could probably remove the DISTINCT without any issues.

  5. mrmist
    Forum Janitor
    Posted 6 years ago #

    DISTINCT within SQL is quite often mis-used, and will slow down SELECT jobs as it forces sorts in order to resolve duplicates.

    If you remove the DISTINCT from whichever bit of code you're looking at and suddenly experience problems with duplicate results or similar, you could always put it back again - but I'd say any piece of SQL that is relying on a DISTINCT to bring the correct row back is skating on thin ice anyway.

  6. Samuel Wood (Otto)
    Tech Ninja
    Posted 6 years ago #

    I removed the "DISTINCT" from my own wp-includes/post.php file last week, in the get_posts() function, and have not noticed any significant differences. The results all seem the same, and I can't see any obvious measurable speed increase, without further testing. Note that it's also using a GROUP BY the posts->ID, which is the primary key for that table. I'm not sure that's strictly needed either.

    There are valid uses of DISTINCT in WordPress to eliminate duplicate results, such as in the wp_get_archives() function, to get the various years/months that older posts were made on. These are limited cases though. The one in the get_posts() function does seem unnecessary, although there may be unusual cases that I'm missing.

    I put in a trac ticket for this one, to get that query optimized some more: http://trac.wordpress.org/ticket/4665

  7. Root
    Member
    Posted 6 years ago #

    I would be very interested to hear what Matt says about this. Apparently the mysql guy himself has been through all the db interactions in WP.

  8. sabxin
    Member
    Posted 6 years ago #

    Can some one help me out to create new user and database in MYSQL server

    Thanks,
    Saurabh

  9. Root
    Member
    Posted 6 years ago #

    @sabxin: You need to search. If that fails start a new thread please.

Topic Closed

This topic has been closed to new replies.

About this Topic