Garbled chinese character #73

Closed

Closed

Garbled chinese character#73

I encounter chinese garbled character,can anyone help me? all directories and file names are garbled

Contributor

Assuming you know how to get to the FileHeaders:
You can try to get the entry names from the UPath extra data record. If you're lucky it exists in each FileHeader whose name contains non ASCII characters. The following function just shows the necessary steps and is not clean code (in my opinion), please adjust it to your code style if it works for you.

/**
 * Returns the the UTF-8 encoded entry name from the UPath extra data record or
 * if that was not found the file name directly from the given FileHeader.
 * 
 * @param fileHeader
 * @return
 */
public static String getFileNameFromExtraData(FileHeader fileHeader) {
	for (ExtraDataRecord extraDataRecord : fileHeader.getExtraDataRecords()) {
		long identifier = extraDataRecord.getHeader();
		if (identifier == 0x7075) {
			byte[] bytes = extraDataRecord.getData();
			ByteBuffer buffer = ByteBuffer.wrap(bytes);
			byte version = buffer.get();
			assert (version == 1);
			int crc32 = buffer.getInt();
			return new String(bytes, 5, buffer.remaining(), StandardCharsets.UTF_8);
		}
	}
	return fileHeader.getFileName();
}

Contributor

As Unicode Path Extra Field(0x7075 field) is not a must field, you may not be able to get the UTF-8 encoded filename.
You can refer to this issue #45

public void test() throws ZipException, UnsupportedEncodingException {
  ZipFile zipFile = new ZipFile("/Users/someuser/Downloads/a.zip");
  for (FileHeader fileHeader : zipFile.getFileHeaders()) {
    zipFile.extractFile(fileHeader, "/Users/someuser/Downloads/extract", getGbkEncodedFileName(fileHeader.getFileName()));
  }
}

private String getGbkEncodedFileName(String fileName) throws UnsupportedEncodingException {
  return new String(fileName.getBytes("Cp437"), "GBK");
}

srikanth-lingala

Owner

Almost once a week there is a similar issue opened. I will try to pin an issue here so that users will know the root cause of this issue and how to get around this.

Basically, this is not an issue with zip4j. This is an issue with the tool that created the zip file. You can go through the issue that @LeeYoung624 tagged to know more info about this issue.

To get around this problem: with the version 2.2.1 released today you can use: ZipFile.setCharset(Charset) or if you are using streams pass in the Charset in the constructor of the ZipInputStream, and define your charset which was used in creating the zip file. This is most likely the system default charset that the zip file was created on.

srikanth-lingala

closed this as completed

on Sep 29, 2019

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Participants