Bug: Returned Json Escapes Utf-8 In Messages

sebknzl · December 28, 2013

Hi!

Localized strings are returned with wrong escaping in JSON, e.g.:

{ "message": "Der ausgew\u00c3\u00a4hlte Ordner wurde bereits zu BitTorrent Sync hinzugef\u00c3\u00bcgt.", "result": 200 }

These are escaped UTF-8 bytes, which is wrong. Either don't escape at all, as UTF-8 is default for JSON anyways and if you must escape then see 2.5 in RFC 4627.

You should also send a charset=utf-8 along with Content-Type.

Regards

scus · April 14, 2014

I stumbled upon the same problem, while writing a python api wrapper. A fix would be nice, as it regards also directory names, if they contain umlaute/non ASCII characters.

RomanZ · April 15, 2014

@sebknzl, @scus,

I'll check why do we escape characters and if this can be changed. However, the RFC4627 allows character escaping, i'm citing:

2.5. Strings

The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".

Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".

scus · April 15, 2014

Tanks for your reply. If it is necessary to escape those characters, they should be escaped by \uXXXX and not \u00XX\u00XX. The escaped characters above represent Ã¤ and not ä as it should.

tuxpoldo · April 19, 2014

Believe me! This makes it really hard to handle the strings correctly. Look at this ugly code and this much more ugly ways to handle it....

RomanZ · May 8, 2014

Ah, now I see what you are saying. They are escaped in a pretty strange manner - split by 7 bits. I'm still checking why we do so and how it can be changed. Thanks for explanations.

tuxpoldo · May 9, 2014

It would be really nice, since currently the code is not really consistent...

RomanZ · May 12, 2014

Hi all,

Thanks for your feedback. Issue is fixed and will be available in one of 1.3 release builds soon.

joncamfield · November 2, 2014

This seems to have caused a very odd regression around UTF in the 1.4.x series. My Linux and Mac computers sync fine, but my Synology NAS cannot figure out any file or folder structure with extended characters. While most were already synced, the NAS is convinced that they are not, and is constantly trying to sync them.

All devices are running 1.4.93. I can provide logs if that would be of any use.

GreatMarko · November 2, 2014

@joncamfield, please update to 1.4.99 as this contained a fix for non-ASCII character issues

Sign In

Bug: Returned Json Escapes Utf-8 In Messages

Recommended Posts

sebknzl

Link to comment

Share on other sites

scus

Link to comment

Share on other sites

RomanZ

Link to comment

Share on other sites

scus

Link to comment

Share on other sites

tuxpoldo

Link to comment

Share on other sites

RomanZ

Link to comment

Share on other sites

tuxpoldo

Link to comment

Share on other sites

RomanZ

Link to comment

Share on other sites

joncamfield

Link to comment

Share on other sites

GreatMarko

Link to comment

Share on other sites

Join the conversation

Browse

Activity