WordPress.org

Ready to get started?Download WordPress

Forums

Looking for a program that could scan number of comments, number of words, etc. (6 posts)

  1. baal666
    Member
    Posted 5 years ago #

    Hi,

    I'm looking for a way to "scan" blogs (any blogs, whatever the format) and get results for each entry for number of comments, number of words and number of words in comments. Can that be done? If yes, is it complicated?

    Thanks

  2. Samuel Wood (Otto)
    Tech Ninja
    Posted 5 years ago #

    I would say that it's pretty much impossible.

    You're asking for a generalized solution to "scan" any given randomly designed site, determine what is a "post" and what is a "comment", and then count them up. Counting is easy, computers are good at it. Determining what is the type of the content is very difficult, computers are very bad at it.

    Computers count. They don't make judgment calls.

  3. baal666
    Member
    Posted 5 years ago #

    Hi Otto42,

    Thanks for your reply!

    What you mean, if I understand correctly, is that if someone wants to judge blogs based on the number of comments and the length of them, he would have to do it manually. Is this right? What a long job it must be!

  4. s_ha_dum
    Member
    Posted 5 years ago #

    Best case, you would have to customize the scan for each of the blog engines you wish to incorporate... and hope that there is something reliable that you can search for to ID the various content types.

  5. ardgedee
    Member
    Posted 5 years ago #

    It's something that could be written relatively easily for a subset of blogs: WordPress blog templates frequently have similar markup and selectors (posts are usually contained within a DIV with class "post", comments are usually contained within an OL with class "commentlist"), so your script could scan the contents of the post area being reasonably certain what the boundaries of that post area are.

    But there are many WordPress blog templates that don't follow the common markup syntax, and the script would fail. And almost any Movable Type, Blogger, Drupal, etc. website could be counted on for not sharing that syntax either.

    At best, without making a career of it, you could write something that could scan some subset of sites. But you couldn't scan any arbitrary site without investigating the site first.

  6. baal666
    Member
    Posted 5 years ago #

    Thanks for sharing your thoughts. I see it is complex stuff... Well, it is difficult to do as I see..

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags