WordPress.org

Ready to get started?Download WordPress

Forums

Delete similar posts (13 posts)

  1. Gabismo
    Member
    Posted 7 years ago #

    Hi. Is there any plugin or SQL Query to auto-delete duplicated or similar posts? How I could do it? Thank you.

  2. Gabismo
    Member
    Posted 7 years ago #

    Compare the post titles and delete them if there is a number of similar words. Anybody can help me?

  3. vkaryl
    Member
    Posted 7 years ago #

    There's probably a query you can run in sql. I however am not a programmer and have no facility with that at all. If no one can help you here within the next couple of days, you might try one of the evolt lists (probably http://lists.evolt.org/mailman/listinfo/thelist) or similar to see if someone there can help you write a query.

  4. h8dk97
    Member
    Posted 7 years ago #

    You can setup a cron job on unix/linux or schedule a task on windows to regularly check your database for duplicates, either in title or message body, or both. The query to find duplicates for MySQL can be found here, just put it in a shell script and schedule it to run at regular intervals. How ofter will you have duplicate posts anyway?

  5. Kafkaesqui

    Posted 7 years ago #

    I think what you're looking for is a lot more than a couple SQL clauses.

    The issue is not so much the query as the deleting part, which would have to include more than just the posts table. You also want to take into account the tables post2cat (for categories assigned to a post), postmeta (custom field and other 'meta' information about a post), and possibly comments.

    Additionally (as if you needed more!), if you've uploaded images or other files when editing the posts, there's a concern for attachments, which are special records in the posts table holding information about the files 'attached' to that post.

  6. Chris_K
    Member
    Posted 7 years ago #

    ... and Kaf brings us 'round to the real question: Why?

    In other words, what's the problem statement? What are we trying to solve?

  7. interrupt
    Member
    Posted 7 years ago #

    Hey, maybe Gabismo is like me and is aggregating multiple local newspaper RSS sources. They may or may not include articles from reuters and/or other larger news corps, and FeedWP is coming up with duplicates quite regularly. As for Deleting posts and doing it cleanly, how about breaking into one of the various codebases/trunks and searching for wp_delete_post() and the like. If you see a thing you want to do and WordPress already does it(e.g. in the admin panel), then there is code already written for it, so track it down, start with the HTML page where you see the thing happening. I'm too 'lazy' to do this for anyone but myself.

    Duplicate posts can be systematicaly destroyed by doing fun things like [in robotron voice] CREATE TEMPORARY TABLE dupetable AS (SELECT * FROM mypostsTable )... DELETE * FROM mypostsTable WHERE GOD SQL DEEMS myUser WORTHY OF noMoreDupes... or something like that anyway

    the delete function may not be optimized for speed but I use it for thousands of posts at a time and I wrote a plugin that can cut down 12,000-15,000 posts, about 500 processed before the exaggeratedly large timeout that I set on my dev server for that task. Averaging about 60-90 duplicates out of those thousands. /* what? */ NOt sUre if the wp_delete_post() takes care of the attatchments but it will kill the post2cat entries nicely. I am sure you could delete from the attatchments table where the id is nonexistent with out too big of a paralel SQL stain on your pants.
    fyi:
    I racked up all these rss articles over four months and had to cut them down for relevance to a certain search criteria. I had the duplicate search and destroy working but then I made the rest of my SQL a little cleaner (I think), then the project lost priority and I didn't finish the dupe part.
    I would be happy to help more but as HandySolo has said that Kaf brought up, why do YOU want to delete duplicate posts today????

  8. Chris Burgess
    Member
    Posted 7 years ago #

    I'll confess my WHY: i was using mail filters, Postie (email-to-post plugin) and WP1.5 to trag jobs, then reading the RSS feed in Lilina.

    And I got about 400 dupes, which I resolved with a couple of SQL queries as touched on above.

    I wrote up how I did this so that others who suffered the touch of post duplication might be saved.

    HOWTO remove / delete duplicate posts in wordpress 1.5

    I guess I'll have to update it for 2.0 and 2.1 now. If you modify the query for these, please paste it back here, and I can update the page.

  9. pizdin_dim
    Member
    Posted 7 years ago #

    I found this blog post on finding duplicate records, which looks simple. See if that helps.

  10. nolageek
    Member
    Posted 6 years ago #

    @xurizaemon

    Any idea if that works? :) I'm a little spooked to try it. :)

  11. whooami
    Member
    Posted 6 years ago #

    theres nothing wrong with that sql query.

  12. nolageek
    Member
    Posted 6 years ago #

    I looked through the tables and I added a couple that were missing:

    post_type, post_mime_type,comment_count

    ----------------------
    CREATE TABLE wp_posts_NODUPLICATES
    SELECT DISTINCT
    MIN(ID) as ID, post_author, min(post_date) as post_date,
    MIN(post_date_gmt) as post_date_gmt,
    post_content, post_title, post_category, post_excerpt,
    post_status, comment_status, ping_status,
    post_password, post_name, to_ping, pinged,
    MIN(post_modified) as post_modified, MIN(post_modified_gmt) as post_modified_gmt,
    post_content_filtered, post_parent, guid, menu_order, post_type, post_mime_type,comment_count
    FROM wp_posts
    GROUP BY post_content, post_title;
    ----------------------

    I ran it in PHPMyAdmin and I got no errors - except afterwards:

    No index defined!

    My stats from running FeedWordPress for a few days (and importing a LOT of content:

    I went from:

    Space usage
    Data 3,387 KiB
    Index 256,000 B
    Total 3,637 KiB

    Row Statistics
    Format dynamic
    Collation utf8_general_ci
    Rows 3,083
    Row length ø 1,124
    Row size ø 1,208 B
    Next Autoindex 3,194

    to:

    Space usage
    Data 2,176 KiB
    Index 1,024 B
    Total 2,177 KiB

    Row Statistics
    Format dynamic
    Collation latin1_swedish_ci
    Rows 2,367
    Row length ø 941
    Row size ø 942 B

    Is there a way to re-sync the wp_posts with post2cat

  13. nolageek
    Member
    Posted 6 years ago #

    bump

Topic Closed

This topic has been closed to new replies.

About this Topic