• substitute

    (@substitute)


    Hi

    I have my aggregator up and running but many duplicates or better said similar posts not real duplicates.
    I ‘d like to remove them.
    now i guess I should run SQL. Thing is I am having doubts:
    those posts are not real duplicates as all the rows are not identical,sometimes one post is reposted by the same author at different dates, hence the date-filed is not identical.
    Anyway, the only field that shows that posts are smilar is the the “post_exerpt”.
    SO I’d like to remove all the rows where post_exerpt are similar which is a not an SQL way of putting things I should be tracking DISTINCT in fact. Ok then select all rows where post_expert is distcinct and then delete the others?
    I’ve read the wp codex on databases and I do remember some of my SQL training 10 years ago…

    I would benefit some help here I gues….

Viewing 2 replies - 1 through 2 (of 2 total)
  • vtxyzzy

    (@vtxyzzy)

    Unless the excerpt fields are identical, I don’t think you will be able to do this. And, even if you can delete the posts, you will leave a lot or orphan records in other tables (postmeta for example). However, assuming the excerpt fields are identical, try this query:

    SELECT posts1.*
    FROM wp_posts posts1
    WHERE posts1.post_type = 'post'
    AND posts1.post_status = 'publish'
    AND posts1.post_excerpt IS NOT NULL
    AND posts1.post_excerpt <> ''
    AND posts1.ID <
       (SELECT max(posts2.ID)
       FROM wp_posts posts2
       WHERE posts2.post_excerpt = posts1.post_excerpt)

    Then if this selects the correct posts, and you still want to, change the query to a DELETE. BE SURE TO BACK UP FIRST!!!

    Thread Starter substitute

    (@substitute)

    Thanks!! But then if it creates orphan records in other databases how do you clean your db?
    Post_expert are identical indeed that’s what makes then similar records, the title can change, but the exerpt or even pots_content are the same most of the time. Sometimes though I get different titles and similar content, that’s a tricky one… such as those posts form the California Digital Library announcing trainings on different dates but with the same exerpt on APril 3rd….
    http://ace.dpwatch.com/2010/04

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘SQL to remove similar posts’ is closed to new replies.