WordPress.org

Ready to get started?Download WordPress

Forums

FEATURE REQUEST: 410 Deleted message instead of 404 Not Found (23 posts)

  1. maerk
    Member
    Posted 8 years ago #

    I recently deleted a lot of posts, and it would have been really cool if, when these posts are requested, wordpress could return a "Post Deleted" message instead of "Not Found" (with the right HTTP status code sent, naturally). That way people know what's happening and search engines would be better able to update their indexes.

  2. pizdin_dim
    Member
    Posted 8 years ago #

    According to RFC2616, a HTTP return status of "410 Gone" means:

    "This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead."

    Going by that definition, isn't the "404 Not Found" actually correct? In order to determine whether a post can be displayed (whether it exists or not) WP uses the function get_posts() to determine the various "criteria", of which one rule is:

    Only those posts with a status of "publish" are contenders, and all others should righfully be disgarded.

    Q: So then what's the difference between (1) a post that doesn't actually exist in the database, (2) one that does but is not yet published and (3) one that has been published in the past but then had it's published status revoked?

    A: Nothing.

    I think that's why "404 Not found" is actually correct as opposed to "410 Gone". The former is temporary while the latter is permanent.

  3. maerk
    Member
    Posted 8 years ago #

    Deleting a post is a permanent removal of a resource.

    You wrote:

    Q: So then what's the difference between (1) a post that doesn't actually exist in the database, (2) one that does but is not yet published and (3) one that has been published in the past but then had it's published status revoked?

    A: Nothing.

    Your answer is wrong on two accounts. Firstly, from the point of view of a user requesting a post, there is clearly a difference between not being able to find a post because it doesn't exist, and not being able to find a post because it once existed but has now been deleted.

    Secondly, the standard you quoted does not intend 404 to be used for situations where a resource is temporarily missing. It's to be used when a resource can't be found, whether that's temporary or not. 410 is to intended for use when the server knows that the resource has been deleted.

    Also, why would WP's current mechanism for retrieving posts have anything to do with which status code is returned? Just because get_posts() currently only operates on posts which are in the database, doesn't mean that it couldn't be extended to check whether, for instance, posts have a label of "deleted". It seems it already checks whether they exist, why couldn't it check whether they had existed in the past?

  4. pizdin_dim
    Member
    Posted 8 years ago #

    "Your answer is wrong on two accounts. Firstly, from the point of view of a user requesting a post, there is clearly a difference between not being able to find a post because it doesn't exist, and not being able to find a post because it once existed but has now been deleted."

    Not at all. That's just your interpretation. My answer is no more wrong than your suggestion.

    Can you explain just how there is a difference between the two? The way I see it is the two are identical because either way the post doesn't exist at this point in time. That doesn't mean that it didn't exist in the past and it doesn't mean that it won't exist in the future.

    "Secondly, the standard you quoted does not intend 404 to be used for situations where a resource is temporarily missing. It's to be used when a resource can't be found, whether that's temporary or not. 410 is to intended for use when the server knows that the resource has been deleted."

    Not quite. Read this bit again:

    "If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead."

    There are three scenarios from the point of WP:

    1. the post never existed
    2. it still exists but it's status is "published" and it's marked for "public" viewing
    3. it once existed but has been deleted

    No matter which scenario you pick the conclusion is exactly the same: the condition is not permanent because it can be rectified by an authorised human with a few minutes to spare.

    Good luck getting WP changed to suit your way of thinking though. Something tells me you'll have to hack it if you want it to be the way you like.

    And that's your choice.

  5. maerk
    Member
    Posted 8 years ago #

    I still think you're wrong, and I don't think it's a question of interpretation.

    Can you explain just how there is a difference between the two? The way I see it is the two are identical because either way the post doesn't exist at this point in time.

    It's different because of why the post doesn't exist. If you accessed a page that you had been on before, and you got a 404, you might try looking for it, thinking the address had changed or something. If you got a 410, you would know that the page had been deleted. While a 404 would technically be valid, a 410 would be more useful, and more appropriate.

    ... the condition is not permanent because it can be rectified by an authorised human with a few minutes to spare.

    While that's a conceivable scenario, it seems unlikely. If I deleted a post, why would I bring it back? If I wanted to make edits to it, I would set its status to draft, and then a 404 would be correct, but if I purposefully delete it, a 410 would be much more appropriate.

  6. IIIIIIIV
    Member
    Posted 8 years ago #

    410 is usually used where someone has had stuff hosted, such as a student on a University account, and they moved and taken it with them and the host has no information on where they've gone.

    It's basically a "return to sender, address unknown" code.

    If *you* delete something from your system, *your* system mind you, not somebody elses, that is, a domain you control as opposed to just a /~ account, then it's a 404 because it isn't technically "gone" because you *know* where it went.

    I think that's why "404 Not found" is actually correct as opposed to "410 Gone". The former is temporary while the latter is permanent.

    They are both permanent conditions, in fact you could argue 410 is the more temporary as it's expected the host will remove the code sooner rather than later once search engines and friends of the "gone missing" get the idea the resource is gone. It'd then become a 404 if anyone or anything tried accessing the resource.

    So, 410 makes little to no sense on a website/domain controlled by one person/entity who knows where stuff has gone.

  7. maerk
    Member
    Posted 8 years ago #

    Check out the W3C HTTP 1.1 status code definitions, quoted below:

    The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.

    To summarise: the 410 code marks the resource as intentionally deleted. It's an optional code, but it makes perfect sense to use it on your own domain since it gives important information to search engines and the like.

  8. IIIIIIIV
    Member
    Posted 8 years ago #

    OK then, you're still stuck with the logistical problem of how to send a 410 header for something the server no longer has a record of. 410's need to be explicitly set and how are you going to do that with a deleted dynamically generated URL?

    If you do Redirect gone /blog/03/26/whatever it won't work unless you manually mirror each and every post you intend to trash into real and canonical filenames. Because technically, they do not exist until the page is called and PHP and MySQL do their thing.

    Definitely in the too hard basket, if you ask me.

  9. pizdin_dim
    Member
    Posted 8 years ago #

    "I still think you're wrong, and I don't think it's a question of interpretation."

    I think you're wrong so that makes us about even, at least from someone else's point of view.

    Seriously though, it's not about what's likely or not, it's about how stuff happens. People delete and reinstate content all the time. Why would you want to obey any rule that says you can't bring something back, once it was deleted? That just seems a ridiculous imposition to make.

    Read RFC2616 again, especially the opening paragraph:

    http://rfc.net/rfc2616.html#p6

    "The requested resource is no longer available at the server and no forwarding address is known."

    See what it says? No longer available. Which means it had to have been there in the first place. So how do you propose WP is to know that exactly? How can it tell the difference between "once there" and "never there"?

    It can't.

  10. maerk
    Member
    Posted 8 years ago #

    WP doesn't currently keep a log of deleted posts, as far as I know, but that doesn't mean it couldn't do it. In fact, that would be really easy.

    Before akismet, when you marked a comment as spam, it appeared to have been deleted forever, but in actuality it was moved into another table that wasn't accessible without a plugin of some kind.

    It wouldn't even be that hard to do something similar for deleted posts. You'd only need to store the page slug and ID in a separate table.

  11. IIIIIIIV
    Member
    Posted 8 years ago #

    Why bother though? Seems to me you're going to the ends of the earth in the name of semantics and minutiae. Of course, you're free to do that...

  12. maerk
    Member
    Posted 8 years ago #

    Fair point, just seems like a sensible idea. I don't know if search engines react differently to 404s and 410s, but it seems that they might.

  13. pizdin_dim
    Member
    Posted 8 years ago #

    "WP doesn't currently keep a log of deleted posts, as far as I know, but that doesn't mean it couldn't do it. In fact, that would be really easy."

    Easy? No, not really. It's complicated by the fact that you would need to also allow for any posts where you change the "slug", assuming you're using that for your pemalinks identifier. So, your extra table where you keep a record of posts which have been deleted is insufficient to determine whether it's the equivalent of a 404 or a 410 condition.

    The counter-argument I see is that one should never change the "slug" but if you're gonna say that you must also then insist that the slug textbox in admin must always remain "readonly" for existing posts.

    Which kinda brings us back to what I was saying in the first place. How can you clearly differentiate between the various possibilities :
    1. does it exist?
    2. did it once exist
    3. does it exist but the "slug" has changed?
    4. is it private?
    5. is it draft?
    6. is it published at a future date?

    Like IIIIIIIV says, why bother though? The real question should be:

    Even if you hack parts of WP so it incorporates the concept of trying to differentiate between a 404 and a 410, it will never work reliably for all installations, so where's the improvement? How do the changes benefit the end user?

    They probably don't.

  14. maerk
    Member
    Posted 8 years ago #

    Just because something seems difficult to do, doesn't mean you shouldn't do it.

    Another possibility that's just occurred to me is that you could change its status. Currently, wordpress allows 6 statuses for posts:

    publish
    draft
    private
    static
    object
    attachment

    You could add a seventh -- deleted -- for deleted posts. That would allow you to differentiate.

    But that's irrelevant to the discussion at this stage. All that matters is that you can imagine the idea working -- deleted posts, when requested, return the 410 status code. If it really turns out to be impossible or impractical then that's too bad, but I don't think it is.

    To answer your last question, it benefits the end user because they know what's happened to the post (and here I'm assuming you can include visitors as end users). If you get a 404 you have no idea what's happened to it, but if you get a 410, you know it's deleted. It might be a pity, depending on how interesting the post was, but it's useful information.

  15. IIIIIIIV
    Member
    Posted 8 years ago #

    It'd be fair to say most end users wouldn't know what a 410 Gone code was, if you paid them.

  16. pizdin_dim
    Member
    Posted 8 years ago #

    "Just because something seems difficult to do, doesn't mean you shouldn't do it."

    Fair enough. I didn't mean to suggest that, instead I was trying to convey that it's not worth doing because (1) there is no benefit for the end user of the website and (2) the mechanism to implement it places unreasonable restrictions on the webmaster.

    But go ahead and hack it to suit yourself. That's the whole idea with open source software, after all.

  17. maerk
    Member
    Posted 8 years ago #

    IIIIIIIV: Obviously, you'd have a more user friendly message than just "410: Gone".

    pizdin_dim: Personally I disagree. If I run up against a Not Found error, I might check back later, but I wouldn't if I got a Page Deleted message.

    And what restrictions does it place on the webmaster?

  18. ladydelaluna
    Member
    Posted 8 years ago #

    Why not just edit the 404 page to say that you have deleted some posts and the one they're looking for may have moved, or is no longer available? Then give them a list of category archives or something... I've seen many news sites say "this post has either been moved or deleted, please use the search utility on the (top/right/left) to search the site, or simply navigate to the category you're looking for..."

    I personally have never heard of a 410 and I've been in this field of design and development for over 10 years. I've never used one anywhere.

  19. maerk
    Member
    Posted 8 years ago #

    No, it's quite rare to see a 410, I think, but it's a perfectly valid (and some might say underused!) status code, and I'm quite fussy about my HTTP status codes :)

    If you've ever used a Redirect gone directive in your .htaccess, your configuring your server to return a 410.

  20. ladydelaluna
    Member
    Posted 8 years ago #

    i've used redirects... but not a 410. trust me on that.
    again, i'm not saying a 410 is invalid, nor am i saying it's not underused. i don't know a damn thing about them, that's why i don't use them... and i've not ever heard of them before this thread, so that tells me i'm not in the minority there.

    either way, i think that status codes are irrelevant when it comes to getting the right result for the end user. apparantly i'm not alone in that, but it seems you're pretty much alone in your desire to use a 410 in wordpress.

    i hate to say i think you're arguing for argument's sake here, but it appears that way.

  21. IIIIIIIV
    Member
    Posted 8 years ago #

    Like I wrote earlier, the main time I've seen a 410 used is when someone on a shared host, like a Uni or work account has deleted their stuff and the webmaster doesn't know where they've gone.

    They're rare, but not unheard of. Google for "410 Gone" and you'll find a bunch.

  22. maerk
    Member
    Posted 8 years ago #

    Yeah, 410 is for when you know something has been deleted and you know it's not coming back.

    The actual 410 code is sent in the HTTP header, so the user never sees it (unless they can check the headers or it appears on an error page). There's a whole bunch of status codes in use that nobody knows about. For example, if you can access a page with no problems, you get 200. If the page has been moved permanently, you'll get a 301. If it's contents hasn't changed since the last time you viewed it, you'll get a 304, and so on.

    They act largely in the background, and they're there largely for the software that a user is using, but the software often acts differently when they see certain codes. For instance, IE displays the friendly Not found page on receipt of a 404, and all browsers change the address when they see a 301.

    The benefits wouldn't be direct. For instance, search engines might update their indexes more accurately, and as I've mentioned before, visitors requesting deleted posts would know what had happened to them.

    Adding a 410 feature to wordpress would be beneficial for the visitors to the blog, not necessarily the owner, and since a fundamental part of blogging is that it gets read, it seems a good idea to make things better for your visitors. But I will concede that it only makes things slightly better :)

    It's probably more appropriate to put this in a plugin. Maybe I'll write one when I know a bit more about the WordPress API.

    Anyways, I found another enquiry about returning a 410. I'm not completely alone!

    http://wordpress.org/support/topic/41258

  23. pizdin_dim
    Member
    Posted 8 years ago #

    "If I run up against a Not Found error, I might check back later, but I wouldn't if I got a Page Deleted message.

    Really? I think that might apply in theory but probably not in practice for most others.

    And what restrictions does it place on the webmaster?

    Please read what I already said above about the required changes to the admin interface and the potential confusion they might cause.

    Like ladydelaluna said, it looks like you're (almost) on your own here.

Topic Closed

This topic has been closed to new replies.

About this Topic