Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbled chinese character #73

Closed
ldbing opened this issue Sep 28, 2019 · 3 comments
Closed

Garbled chinese character #73

ldbing opened this issue Sep 28, 2019 · 3 comments

Comments

@ldbing
Copy link

ldbing commented Sep 28, 2019

I encounter chinese garbled character,can anyone help me? all directories and file names are garbled

@joorei
Copy link
Contributor

joorei commented Sep 28, 2019

Assuming you know how to get to the FileHeaders:
You can try to get the entry names from the UPath extra data record. If you're lucky it exists in each FileHeader whose name contains non ASCII characters. The following function just shows the necessary steps and is not clean code (in my opinion), please adjust it to your code style if it works for you.

/**
 * Returns the the UTF-8 encoded entry name from the UPath extra data record or
 * if that was not found the file name directly from the given FileHeader.
 * 
 * @param fileHeader
 * @return
 */
public static String getFileNameFromExtraData(FileHeader fileHeader) {
	for (ExtraDataRecord extraDataRecord : fileHeader.getExtraDataRecords()) {
		long identifier = extraDataRecord.getHeader();
		if (identifier == 0x7075) {
			byte[] bytes = extraDataRecord.getData();
			ByteBuffer buffer = ByteBuffer.wrap(bytes);
			byte version = buffer.get();
			assert (version == 1);
			int crc32 = buffer.getInt();
			return new String(bytes, 5, buffer.remaining(), StandardCharsets.UTF_8);
		}
	}
	return fileHeader.getFileName();
}

@LeeYoung624
Copy link
Contributor

As Unicode Path Extra Field(0x7075 field) is not a must field, you may not be able to get the UTF-8 encoded filename.
You can refer to this issue #45

public void test() throws ZipException, UnsupportedEncodingException {
  ZipFile zipFile = new ZipFile("/Users/someuser/Downloads/a.zip");
  for (FileHeader fileHeader : zipFile.getFileHeaders()) {
    zipFile.extractFile(fileHeader, "/Users/someuser/Downloads/extract", getGbkEncodedFileName(fileHeader.getFileName()));
  }
}

private String getGbkEncodedFileName(String fileName) throws UnsupportedEncodingException {
  return new String(fileName.getBytes("Cp437"), "GBK");
}

@srikanth-lingala
Copy link
Owner

Almost once a week there is a similar issue opened. I will try to pin an issue here so that users will know the root cause of this issue and how to get around this.

Basically, this is not an issue with zip4j. This is an issue with the tool that created the zip file. You can go through the issue that @LeeYoung624 tagged to know more info about this issue.

To get around this problem: with the version 2.2.1 released today you can use: ZipFile.setCharset(Charset) or if you are using streams pass in the Charset in the constructor of the ZipInputStream, and define your charset which was used in creating the zip file. This is most likely the system default charset that the zip file was created on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants