Delete similar posts
Compare the post titles and delete them if there is a number of similar words. Anybody can help me?
There’s probably a query you can run in sql. I however am not a programmer and have no facility with that at all. If no one can help you here within the next couple of days, you might try one of the evolt lists (probably http://lists.evolt.org/mailman/listinfo/thelist) or similar to see if someone there can help you write a query.
You can setup a cron job on unix/linux or schedule a task on windows to regularly check your database for duplicates, either in title or message body, or both. The query to find duplicates for MySQL can be found here, just put it in a shell script and schedule it to run at regular intervals. How ofter will you have duplicate posts anyway?
I think what you’re looking for is a lot more than a couple SQL clauses.
The issue is not so much the query as the deleting part, which would have to include more than just the posts table. You also want to take into account the tables post2cat (for categories assigned to a post), postmeta (custom field and other ‘meta’ information about a post), and possibly comments.
Additionally (as if you needed more!), if you’ve uploaded images or other files when editing the posts, there’s a concern for attachments, which are special records in the posts table holding information about the files ‘attached’ to that post.
… and Kaf brings us ’round to the real question: Why?
In other words, what’s the problem statement? What are we trying to solve?
Hey, maybe Gabismo is like me and is aggregating multiple local newspaper RSS sources. They may or may not include articles from reuters and/or other larger news corps, and FeedWP is coming up with duplicates quite regularly. As for Deleting posts and doing it cleanly, how about breaking into one of the various codebases/trunks and searching for wp_delete_post() and the like. If you see a thing you want to do and WordPress already does it(e.g. in the admin panel), then there is code already written for it, so track it down, start with the HTML page where you see the thing happening. I’m too ‘lazy’ to do this for anyone but myself.
Duplicate posts can be systematicaly destroyed by doing fun things like [in robotron voice] CREATE TEMPORARY TABLE dupetable AS (SELECT * FROM mypostsTable )… DELETE * FROM mypostsTable WHERE GOD SQL DEEMS myUser WORTHY OF noMoreDupes… or something like that anyway
the delete function may not be optimized for speed but I use it for thousands of posts at a time and I wrote a plugin that can cut down 12,000-15,000 posts, about 500 processed before the exaggeratedly large timeout that I set on my dev server for that task. Averaging about 60-90 duplicates out of those thousands. /* what? */ NOt sUre if the wp_delete_post() takes care of the attatchments but it will kill the post2cat entries nicely. I am sure you could delete from the attatchments table where the id is nonexistent with out too big of a paralel SQL stain on your pants.
I racked up all these rss articles over four months and had to cut them down for relevance to a certain search criteria. I had the duplicate search and destroy working but then I made the rest of my SQL a little cleaner (I think), then the project lost priority and I didn’t finish the dupe part.
I would be happy to help more but as HandySolo has said that Kaf brought up, why do YOU want to delete duplicate posts today????
I’ll confess my WHY: i was using mail filters, Postie (email-to-post plugin) and WP1.5 to trag jobs, then reading the RSS feed in Lilina.
And I got about 400 dupes, which I resolved with a couple of SQL queries as touched on above.
I wrote up how I did this so that others who suffered the touch of post duplication might be saved.
I guess I’ll have to update it for 2.0 and 2.1 now. If you modify the query for these, please paste it back here, and I can update the page.
I found this blog post on finding duplicate records, which looks simple. See if that helps.
Any idea if that works? 🙂 I’m a little spooked to try it. 🙂
I looked through the tables and I added a couple that were missing:
---------------------- CREATE TABLE wp_posts_NODUPLICATES SELECT DISTINCT MIN(ID) as ID, post_author, min(post_date) as post_date, MIN(post_date_gmt) as post_date_gmt, post_content, post_title, post_category, post_excerpt, post_status, comment_status, ping_status, post_password, post_name, to_ping, pinged, MIN(post_modified) as post_modified, MIN(post_modified_gmt) as post_modified_gmt, post_content_filtered, post_parent, guid, menu_order, post_type, post_mime_type,comment_count FROM wp_posts GROUP BY post_content, post_title; ----------------------
I ran it in PHPMyAdmin and I got no errors – except afterwards:
No index defined!
My stats from running FeedWordPress for a few days (and importing a LOT of content:
I went from:
Data 3,387 KiB
Index 256,000 B
Total 3,637 KiB
Row length ø 1,124
Row size ø 1,208 B
Next Autoindex 3,194
Data 2,176 KiB
Index 1,024 B
Total 2,177 KiB
Row length ø 941
Row size ø 942 B
Is there a way to re-sync the wp_posts with post2cat
- The topic ‘Delete similar posts’ is closed to new replies.