Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode support in rejson #35

Closed
wgjak47 opened this issue Sep 11, 2017 · 13 comments
Closed

unicode support in rejson #35

wgjak47 opened this issue Sep 11, 2017 · 13 comments

Comments

@wgjak47
Copy link

wgjak47 commented Sep 11, 2017

I'm trying to store a json with some chinese character in redis with rejson like this:

127.0.0.1:6379> JSON.SET test . '{"key":"测试"}'
OK
127.0.0.1:6379> JSON.GET test .key
"\u00e6\u00b5\u008b\u00e8\u00af\u0095"
127.0.0.1:6379> SET test2 "测试"
OK
127.0.0.1:6379> GET test2 
测试
127.0.0.1:6379> 

As you can see the value "测试" convert to "\u00e6\u00b5\u008b\u00e8\u00af\u0095" when using JSON.GET... It looks like rejson didn't support unicode yet...

@dvirsky
Copy link
Contributor

dvirsky commented Sep 15, 2017

@wgjak47 the data should be encoded as UTF-8 IIRC.

@wgjak47
Copy link
Author

wgjak47 commented Sep 16, 2017

python
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'
'æµ\x8bè¯\x95'
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode()
b'\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode('ISO-8859-1')
b'\xe6\xb5\x8b\xe8\xaf\x95'
>>> _.decode('utf-8')
'测试'

It looks like the data is encoded with 'ISO-8859' not utf-8...

And I have tried to encoded the data with utf-8, but it doesn't work:

>>> result = client.execute_command("JSON.GET", "test", ".", encoding="ISO-8895-1")
>>> result
b'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'
>>> result = client.execute_command("JSON.SET", "test", ".", a, encoding="ISO-8895-1")
>>> result = client.execute_command("JSON.GET", "test", ".", encoding="ISO-8895-1")
>>> result
b'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'
>>> result.decode()
'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'

Any example?

@thienit5
Copy link

I have the same problem, can you show me how to decode?

@dvirsky
Copy link
Contributor

dvirsky commented Sep 19, 2017

@wgjak47 @thienit5

>>> u'\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode('utf-8')
'\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'
>>> '\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'.decode('utf-8')
u'\xe6\xb5\x8b\xe8\xaf\x95' # ====> which is the same as the input string.
>>>

@wgjak47
Copy link
Author

wgjak47 commented Oct 2, 2017

127.0.0.1:6379> JSON.SET a . {"test": "\xe6\xb5\x8b\xe8\xaf\x95"}
Invalid argument(s)
127.0.0.1:6379> JSON.SET a . '{"test": "\xe6\xb5\x8b\xe8\xaf\x95"}'
ERR JSON lexer error ESCAPE_INVALID at position 7

look like it doesn't work

@wgjak47
Copy link
Author

wgjak47 commented Oct 6, 2017

I found the reson, In json_object.c/_JSONSerialize_StringValue(Node *n, void *ctx), for any char not in ascii, it just response in hex:

            default:
                if ((unsigned char)*p > 31 && isprint(*p))
                    b->buf = sdscatprintf(b->buf, "%c", *p);
                else
                    b->buf = sdscatprintf(b->buf, "\\u%04x", (unsigned char)*p);
                break;

@mnunberg
Copy link
Contributor

mnunberg commented Oct 19, 2017

I've just fixed this issue in master. Can you tell me if this works for you?

@wgjak47
Copy link
Author

wgjak47 commented Oct 20, 2017

127.0.0.1:6379>  JSON.SET a . '{"test":"测试"}'
OK
127.0.0.1:6379> JSON.GET a
{"test":"\u00e6\u00b5\u008b\u00e8\u00af\u0095"}
127.0.0.1:6379> JSON.GET NOESCAPE a

127.0.0.1:6379> JSON.GET a NOESCAPE
{"test":"测试"}
127.0.0.1:6379> JSON.GET a.test NOESCAPE

It works! Thank you. @mnunberg

@nelsonlarocca
Copy link

Guys, here asking for a recommendation.
I've found that nodejs redis rejson clients ("redis-rejson" and "iorejson") handling the REJSON extension always retrieve the json data using escaped unicode and since they are wrapping the redis and ioredis modules, there's no way to include the arg NOESCAPE as an option

Do you know about any nodejs module or any trick that could handle that without much overhead ?
Thanks !

@xtianus79
Copy link

@mnunberg thank you jesus... This was a memory leak until this fix was put into place. Why such a drastic difference for something such as NOESCAPE?

here is what I did using IOREDIS

this.client.send_command('JSON.GET', escapedKey, 'NOESCAPE');

@xtianus79
Copy link

Just to show you guys how this was causing a memory leak... I was using a read for an internal state object and that would be used as a write down the line. here was the result. YIKES!!!

conversationData\":{\"response\":\"{\\\"result\\\":{\\\"rows\\\":[{\\\"type\\\":\\\"text\\\",\\\"content\\\":{\\\"text\\\":\\\"Chops Grille\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a2\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0084\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a0 is on Deck 8, Midship.\\\"},\\\"metadata\\\":{\\\"productIds\\\":[\\\"100000003203782959\\\",\\\"100000003258955125\\\",\\\"100000002149401504\\\"],\\\"categoryIds\\\":[\\\"dining\\\"],\\\"locationCode\\\":\\\"CHOP\\\",\\\"responseType\\\":\\\"venue\\\",\\\"source\\\":{\\\"sourceName\\\":\\\"VcKg\\\",\\\"sourceId\\\":\\\"64ae09bc-2bcd-43f1-a1a1-a3ed8e62295e\\\"}}}]}}\",\"intentDetails\":{\"topIntent\":\"Venue.Location\",\"entities\":[{\"type\":\"Venue\",\"entity\":\"Chops Grille\"}],\"userUtterance\":\"where is chops\",\"contextOptions\":{\"contextOptions\":{\"restartMsg\":\"ENDING\"},\"contextActivatedFlag\":false,\"conversationData\":{\"response\":\"{\\\"result\\\":{\\\"rows\\\":[{\\\"type\\\":\\\"text\\\",\\\"content\\\":{\\\"text\\\":\\\"Chops Grille\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a2\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0084\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u0

so needless to say, thanks for the fix.

@ghost
Copy link

ghost commented Apr 13, 2020

Here is the JAVA(JReJSON) solution:

JReJSON redisClient = new JReJSON(redisHost, redisPort);
redisClient.get(key, new Path("NOESCAPE"));

@gkorland
Copy link
Contributor

Should work now see:

$ redis-cli --raw
127.0.0.1:6379> SET test2 "测试"
OK
127.0.0.1:6379> GET test2 
测试
127.0.0.1:6379>  JSON.SET test . '{"key":"测试"}'
OK
127.0.0.1:6379> JSON.GET test .key
"测试"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants