-
Notifications
You must be signed in to change notification settings - Fork 321
unzip not correct with cjk filename. #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you please attach the zip file? |
The issue is with the tool that was used to create the zip file. For non-ascii file names, utf8 flag has to be set in zip file header, and also the filename has to be encoded with utf8 charset. This flag was not set in this case, and therefore zip4j uses the default charset. I tried extracting the zip file with another zip tool, and I see the same behaviour (random characters in file name). Zip4j uses utf8 by default, so it this zip file was (most likely) not created by Zip4j. If this was indeed created with zip4j, please post the code used to create the zip file. If not, I am afraid I cannot help you much, because the issue is with the tool that created the zip file. |
It created by 7zip. Zip4j unzip is only support it self zipped file? |
It is not a question of wether the zip4j created the zip file or not. But it is more a question of wether the utf8 flag was set in zip headers or not. Zip4j checks to see if this flag is set. If yes, it uses utf8 and if not it uses cp437. Please note that the file name also has to be encoded with utf8. This is according to the zip specification. I tried extracting your zip file with the default compression tool on mac, The Unarchiver and also Keka, and all three of them had the same trouble extracting the zip file. I really wonder why 7Zip is not using utf8 flag. Do you use any custom charset when zipping files in 7Zip? Can you please check those settings? I would have presumed 7Zip uses utf8 when necessary by default. |
Looks like the file name was encoded using the charset One way to workaround this issue is with this code:
Basically what we do in the above code is to overwrite the default charset which zip4j uses with the charset that is used in your case. |
It depend on os language? |
I can not specify the zip file encoding, It maybe chinese, korea, japanese, or english. |
Hey guys. But when I tried to reproduce this problem with srikanth-lingala's code. I found that It seems the |
As I have mentioned above, Zip specification only allows for cp437 or utf8 charsets. Any other charsets that are used by zip tools will not be zip specification compliant and may not be supported by other zip tools. This is not a bug in zip4j, but zip4j just sticks with the zip specification. If you have custom charsets, you can use the above code and just replace the charset used with your custom charset. |
I just tested and found you are right @srikanth-lingala . Sorry that I didn't notice the filename is already encoded with
|
I think srikanth-lingala is right : we should follow Zip specification. That's the only offical rules.
I tested this on Windows and it works. But I can not guarantee it works on other operating systems or other languages. Maybe you can have a try and let me know @dousee163 . |
@dousee163 我觉得你需要的是先判断这个压缩包里面的编码,然后在根据对应的编码去设置,这样获取到的正确的名称了,上面他们提供的一个案例就是对应我们中文的编码。 @srikanth-lingala Can we use zip4j API to judge zip encoding ? |
@LeeYoung624 @azhao-2019 |
with chinese folder |
Read here for more info on this issue on 7zip side. I am really surprised that 7zip does not use utf8 as default. I cannot change zip4j to use any other charsets. This is not according to the zip specification. A workaround is to use "cu" (without quotes) as a parameter in 7zip settings. Try and see if that works. The text box where this has to be entered is highlighted below: |
@srikanth-lingala but didn't work with windows default function. |
@dousee163 @azhao-2019 Well. I think you guys are not familiar with ZIP File Format Specification like me. You guys can check this chapter out
As srikanth-lingala has already said, the zip file only support 2 kinds of charset :
It's 7-zip's problem, not zip4j's. 7-zip perform some other operations that are not required by ZIP File Format Specification. |
thank you |
I usually try the following command line parameter: For example: (Refer to the discussion on: https://sourceforge.net/p/sevenzip/bugs/2198/) The characters "副本" appears correctly in the extracted file's name. Regarding the encoding within the file, try BTW, The other software |
@Erich-Chen 哥们,你在哪里看到 -mcp=936 这个选项的, 真的有用 谢谢 |
很高兴看到它对你有用。我一直在这样用,其实记不清最初的来源是哪里。互联网上相对比较早的讨论可以参考这里: |
@Erich-Chen 😃我后来在文档里找到了, -mcp其实是-m 选项在zip情况下的一个子选项cp(即code page),如下图所示 |
fff - ╕▒▒╛.txt
correct name is
fff - 副本.txt
How can i resolve it?
The text was updated successfully, but these errors were encountered: