Something that's been bugging me for a while but finally moved me to write. I mainly notice it because I tend to link back to my older posts a lot, and when I do, it's often in the first sentence of the new post. "Remember the thing-I-wrote-about? Well, here's what's new..." The excerpt chosen by WP when receiving a pingback works OK if the link is in the middle of a paragraph, but it's terrible if the link is at the beginning of the post, particularly on Kubrick or Kubrick-derived themes.
All too often I get something like "[...] Previous Post Title This Post Title Half a sentence including a link and then a couple of wor [...]"
More importantly, it seems to break by character without regard for entities, so I often end up with things like "[...] 212;as you know from reading this—it's an example [...]" where an entity has been split in half.
In either case, I end up going in and editing the comment.
So I would suggest the following:
- Break the excerpt at word boundaries.
- Avoid breaking in the middle of a character entity.
- Try to figure out where the actual content begins, in case the link is near the edge. Maybe by looking for large areas of whitespace after stripping HTML, or looking for h1...h6 blocks near the link.