WordPress.org

Ready to get started?Download WordPress

Forums

Attachments
[resolved] UTF-8 with ANSI issue (6 posts)

  1. sebastian.friedrich
    Member
    Posted 1 year ago #

    Hey,

    I found another little problem, which is possibly limited to my php setup, but I want to post it here, because it maybe is relevant to others anyway.

    I am running php with UTF-8 character encoding, I guess it is relevant here. Uploading files via attachments 1.6.x before, seems to be not compatible to UTF-8 and so there now files which have double byte substitutions in filename instead of the original character. For instance there is a "ü" instead of "ü" ans so on. Before update, it worked properly, because attachments plugin just converted it back the same way, so that the wrongly converted filenames are only on the servers side.

    Now the new attachments 3.3.2 seems to take care of character encoding or is using UTF-8 (through using the json storage thing). As result, it returns the wrong converted filesnames as is.

    So, technically attachments 3 is working better, but on the wrong filenames installed by the old version. The problem will occur on migrating old data only.

    For me, it was not so important, because there not so many attachments affected at this time, if there a massive count of attachments with special characters in filename, it could become a mess.

    Greets

    http://wordpress.org/extend/plugins/attachments/

  2. Jonathan Christopher
    Member
    Plugin Author

    Posted 1 year ago #

    Thanks so much for the detailed explanation. Dealing with character encoding was definitely an issue both before and after the rewrite. Previously Attachments 1.x base64 encoded the storage which got around issues like these, but also prevented things that should not be prevented (like searching metadata) so I'm actively trying to beef up how those strings are handled.

    If possible, would you mind posting a sample of the problematic string? That way I can do some more testing and better handle future users who run into the same issue. Thanks!

  3. sebastian.friedrich
    Member
    Posted 1 year ago #

    Hello again,

    the filenames had UTF-8 encoding but were interpreted as ANSI. So now, they have to be read as ANSI and interpreted as UTF-8 character mapping to reverse the effect. For German language there 7 characters where the effect will be visible:

    UTF-8 interpreted as ANSI	UTF-8
    Ä				Ä
    ä				ä
    Ãœ				Ü
    ü				ü
    Ö				Ö
    ö				ö
    ß				ß

    I've never done a conversion between this charcter maps by myself, but it seems that php already provides possibilities to detect the encoding:
    http://www.php.net/manual/en/function.mb-detect-encoding.php

    I guess the special thing is, that when attachments 1.6 runs in an php environment with ISO-8859-1 encoding, the effect probably does not occur and nothing needs to be converted (just guessing it, not tested it). So it will be mandatory to detect which encoding has been used (mistakenly) for filenames.

    And remember, it is all about the filename of the attachment, which carries the error. The json strings are migrated correctly. After the migration process, attachments v.3.3.2 tries to address a file with the exact filename (e.g. ./Übersicht.pdf), but the file is only available with the malformed name (./Ãœbersicht.pdf). As result server will return 404.

  4. Jonathan Christopher
    Member
    Plugin Author

    Posted 1 year ago #

    Thank you for the detailed follow-up, it's really helpful. To shed some more light: the only thing Attachments does is store the ID of the attachment, so it's not actually storing the filename, so it must be with the data handling on some other layer. I'll test with some characters as you've provided and hopefully get a fix up shortly. In the meantime if you come up with a patch I'd be more than happy to review and integrate a pull request!

  5. sebastian.friedrich
    Member
    Posted 1 year ago #

    Hey Jonathan,

    I kept on testing, you're totally right. I decoded your base64 strings and got what you mean. The problem is not related to attachments plugin, you are done!

    It's is equal whether a windows or a linux machine runs apache/wordpress, the problem comes down to wordpress inability to handle special (non-ascii) letters at all. So the error occurs even if I upload the file via the media library, and nothing has changed with from wp3.4 to wp3.5 - I think I had just not realized it before.

    Thanks for support, solution for me is prevent these characters on files in general.

  6. Jonathan Christopher
    Member
    Plugin Author

    Posted 1 year ago #

    Thanks a ton for following up, much appreciated!

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic