is “SELECT DISTINCT *” ever a good idea?
we’ve been using the mysql slow-query log to analyze WP performance on our site, and we’ve figured out that some of our regular slow queries include this sort of SQL construct:
SELECT DISTINCT * from (table, join, etc.) …
today i did some empirical tests and found that that DISTINCT clause was slowing us down significantly, while not changing the results we get. right now we’re running without, in a “watch carefully” mode.
now, i try not to understand things until i have to, or to any greater depth than necessary. =) but today i did the research, and it seems to me that to put the three strings together ‘SELECT DISTINCT *’ and hand them to SQL is probably a mistake.
if i understand SQL right (and maybe i don’t), if you’re getting every single column back from every table in your possibly multi-table query (as specified by SELECT *), your results will never include duplicate rows. at least not in MySQL, in the way it’s documented to behave. at least when you’re selecting on tables that contain unique keys, as all the wp_tables do.
so with no duplicate rows, the DISTINCT clause, at best, does nothing. but at worst — in mysql at least — it seems to cause the query to be kicked into a ‘temporary table’ aka a file, aka a massive slowdown.
at least that’s what we were seeing with the get-recent-comments plugin. we took out the DISTINCT and our performance problems literally vanished. so far we’ve seen no duplicate comments displayed. (but i’ll let you know if we get bitten.)
i would like the opinion of more wizened wp/sql hackers about this, because it appears that another use of ‘SELECT DISTINCT *’ is sitting right in the get_post function. is this a performance improvement waiting to be seized, or a dumb n00b idea? please advise.
- The topic ‘is “SELECT DISTINCT *” ever a good idea?’ is closed to new replies.