sebknzl

Bug: Returned Json Escapes Utf-8 In Messages

Recommended Posts

Hi!

 

Localized strings are returned with wrong escaping in JSON, e.g.:

 

{ "message": "Der ausgew\u00c3\u00a4hlte Ordner wurde bereits zu BitTorrent Sync hinzugef\u00c3\u00bcgt.", "result": 200 }

 

 

These are escaped UTF-8 bytes, which is wrong. Either don't escape at all, as UTF-8 is default for JSON anyways and if you must escape then see 2.5 in RFC 4627.

 

You should also send a charset=utf-8 along with Content-Type.

 

Regards

 

Share this post


Link to post
Share on other sites

I stumbled upon the same problem, while writing a python api wrapper. A fix would be nice, as it regards also directory names, if they contain umlaute/non ASCII characters.

Share this post


Link to post
Share on other sites

@sebknzl, @scus,

 

I'll check why do we escape characters and if this can be changed. However, the RFC4627 allows character escaping, i'm citing:

 

 

 

2.5. Strings

The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".

Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".

Share this post


Link to post
Share on other sites

Tanks for your reply. If it is necessary to escape those characters, they should be escaped by \uXXXX and not \u00XX\u00XX. The escaped characters above represent ä and not ä as it should.

Share this post


Link to post
Share on other sites

Ah, now I see what you are saying. They are escaped in a pretty strange manner - split by 7 bits. I'm still checking why we do so and how it can be changed. Thanks for explanations.

Share this post


Link to post
Share on other sites

This seems to have caused a very odd regression around UTF in the 1.4.x series.  My Linux and Mac computers sync fine, but my Synology NAS cannot figure out any file or folder structure with extended characters.  While most were already synced, the NAS is convinced that they are not, and is constantly trying to sync them.

 

All devices are running 1.4.93.  I can provide logs if that would be of any use.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.