Skip to content

unicode support in rejson #35

Closed
Closed
@wgjak47

Description

@wgjak47

I'm trying to store a json with some chinese character in redis with rejson like this:

127.0.0.1:6379> JSON.SET test . '{"key":"测试"}'
OK
127.0.0.1:6379> JSON.GET test .key
"\u00e6\u00b5\u008b\u00e8\u00af\u0095"
127.0.0.1:6379> SET test2 "测试"
OK
127.0.0.1:6379> GET test2 
测试
127.0.0.1:6379> 

As you can see the value "测试" convert to "\u00e6\u00b5\u008b\u00e8\u00af\u0095" when using JSON.GET... It looks like rejson didn't support unicode yet...

Activity

dvirsky

dvirsky commented on Sep 15, 2017

@dvirsky
Contributor

@wgjak47 the data should be encoded as UTF-8 IIRC.

wgjak47

wgjak47 commented on Sep 16, 2017

@wgjak47
Author
python
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'
'æµ\x8bè¯\x95'
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode()
b'\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'
>>> '\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode('ISO-8859-1')
b'\xe6\xb5\x8b\xe8\xaf\x95'
>>> _.decode('utf-8')
'测试'

It looks like the data is encoded with 'ISO-8859' not utf-8...

And I have tried to encoded the data with utf-8, but it doesn't work:

>>> result = client.execute_command("JSON.GET", "test", ".", encoding="ISO-8895-1")
>>> result
b'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'
>>> result = client.execute_command("JSON.SET", "test", ".", a, encoding="ISO-8895-1")
>>> result = client.execute_command("JSON.GET", "test", ".", encoding="ISO-8895-1")
>>> result
b'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'
>>> result.decode()
'{"key":"\\u00e6\\u00b5\\u008b\\u00e8\\u00af\\u0095"}'

Any example?

thienit5

thienit5 commented on Sep 18, 2017

@thienit5

I have the same problem, can you show me how to decode?

dvirsky

dvirsky commented on Sep 19, 2017

@dvirsky
Contributor

@wgjak47 @thienit5

>>> u'\u00e6\u00b5\u008b\u00e8\u00af\u0095'.encode('utf-8')
'\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'
>>> '\xc3\xa6\xc2\xb5\xc2\x8b\xc3\xa8\xc2\xaf\xc2\x95'.decode('utf-8')
u'\xe6\xb5\x8b\xe8\xaf\x95' # ====> which is the same as the input string.
>>>
wgjak47

wgjak47 commented on Oct 2, 2017

@wgjak47
Author
127.0.0.1:6379> JSON.SET a . {"test": "\xe6\xb5\x8b\xe8\xaf\x95"}
Invalid argument(s)
127.0.0.1:6379> JSON.SET a . '{"test": "\xe6\xb5\x8b\xe8\xaf\x95"}'
ERR JSON lexer error ESCAPE_INVALID at position 7

look like it doesn't work

wgjak47

wgjak47 commented on Oct 6, 2017

@wgjak47
Author

I found the reson, In json_object.c/_JSONSerialize_StringValue(Node *n, void *ctx), for any char not in ascii, it just response in hex:

            default:
                if ((unsigned char)*p > 31 && isprint(*p))
                    b->buf = sdscatprintf(b->buf, "%c", *p);
                else
                    b->buf = sdscatprintf(b->buf, "\\u%04x", (unsigned char)*p);
                break;

mnunberg

mnunberg commented on Oct 19, 2017

@mnunberg
Contributor

I've just fixed this issue in master. Can you tell me if this works for you?

wgjak47

wgjak47 commented on Oct 20, 2017

@wgjak47
Author
127.0.0.1:6379>  JSON.SET a . '{"test":"测试"}'
OK
127.0.0.1:6379> JSON.GET a
{"test":"\u00e6\u00b5\u008b\u00e8\u00af\u0095"}
127.0.0.1:6379> JSON.GET NOESCAPE a

127.0.0.1:6379> JSON.GET a NOESCAPE
{"test":"测试"}
127.0.0.1:6379> JSON.GET a.test NOESCAPE

It works! Thank you. @mnunberg

nelsonlarocca

nelsonlarocca commented on Nov 14, 2019

@nelsonlarocca

Guys, here asking for a recommendation.
I've found that nodejs redis rejson clients ("redis-rejson" and "iorejson") handling the REJSON extension always retrieve the json data using escaped unicode and since they are wrapping the redis and ioredis modules, there's no way to include the arg NOESCAPE as an option

Do you know about any nodejs module or any trick that could handle that without much overhead ?
Thanks !

xtianus79

xtianus79 commented on Dec 24, 2019

@xtianus79

@mnunberg thank you jesus... This was a memory leak until this fix was put into place. Why such a drastic difference for something such as NOESCAPE?

here is what I did using IOREDIS

this.client.send_command('JSON.GET', escapedKey, 'NOESCAPE');
xtianus79

xtianus79 commented on Dec 24, 2019

@xtianus79

Just to show you guys how this was causing a memory leak... I was using a read for an internal state object and that would be used as a write down the line. here was the result. YIKES!!!

conversationData\":{\"response\":\"{\\\"result\\\":{\\\"rows\\\":[{\\\"type\\\":\\\"text\\\",\\\"content\\\":{\\\"text\\\":\\\"Chops Grille\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a2\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0084\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a0 is on Deck 8, Midship.\\\"},\\\"metadata\\\":{\\\"productIds\\\":[\\\"100000003203782959\\\",\\\"100000003258955125\\\",\\\"100000002149401504\\\"],\\\"categoryIds\\\":[\\\"dining\\\"],\\\"locationCode\\\":\\\"CHOP\\\",\\\"responseType\\\":\\\"venue\\\",\\\"source\\\":{\\\"sourceName\\\":\\\"VcKg\\\",\\\"sourceId\\\":\\\"64ae09bc-2bcd-43f1-a1a1-a3ed8e62295e\\\"}}}]}}\",\"intentDetails\":{\"topIntent\":\"Venue.Location\",\"entities\":[{\"type\":\"Venue\",\"entity\":\"Chops Grille\"}],\"userUtterance\":\"where is chops\",\"contextOptions\":{\"contextOptions\":{\"restartMsg\":\"ENDING\"},\"contextActivatedFlag\":false,\"conversationData\":{\"response\":\"{\\\"result\\\":{\\\"rows\\\":[{\\\"type\\\":\\\"text\\\",\\\"content\\\":{\\\"text\\\":\\\"Chops Grille\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u00a2\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0084\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0082\\u00c3\\u0082\\u00c2\\u0082\\u00c3\\u0083\\u00c2\\u0083\\u00c3\\u0082\\u00c2\\u0083\\u00c3\\u0083\\u00c2\\u0082\\u0

so needless to say, thanks for the fix.

ghost

ghost commented on Apr 13, 2020

@ghost

Here is the JAVA(JReJSON) solution:

JReJSON redisClient = new JReJSON(redisHost, redisPort);
redisClient.get(key, new Path("NOESCAPE"));

gkorland

gkorland commented on Dec 19, 2021

@gkorland
Contributor

Should work now see:

$ redis-cli --raw
127.0.0.1:6379> SET test2 "测试"
OK
127.0.0.1:6379> GET test2 
测试
127.0.0.1:6379>  JSON.SET test . '{"key":"测试"}'
OK
127.0.0.1:6379> JSON.GET test .key
"测试"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mnunberg@dvirsky@gkorland@thienit5@wgjak47

        Issue actions

          unicode support in rejson · Issue #35 · RedisJSON/RedisJSON