I have need for something that will scan all posts and find exact (or near exact) duplicates by content.
It could be done by title or by the post contents.
I would like to actually delete them. Right now, I download and import the data to MS Access and then put it in a table that doesn't allow duplicates, then re-upload it. (Labor intensive).
I have approximately 10% duplicate content due to the way my system collects data.