-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description
When adding an image, it would be useful to detect if the image is a duplicate and instead of adding the image again, return a reference to the existing media.
We have a header that gets added in some large workbooks to each sheet, and this substantially increases the file size. If we open the file in Excel and save it again, Excel detects the duplicate images and consolidates them on save, reducing the file size.
I think it would be possible to change addMedia
in picture.go
to handle this behavior. AddPictureFromBytes
would change slightly as well:
addMedia
steps through all the saved media and looks for byte slices that have the same length as the file we are trying to save. Compute the hash for both (we can obviously re-use the hash of the file being saved for future comparisons)- If the hash matches, return the media path for the existing image. If no matches are found, save the new media and return its path
AddPictureFromBytes
callsaddMedia
before callingaddDrawingRelationships
and uses the media path provided byaddMedia
in the call toaddDrawingRelationships
.
Since checking the length of a slice is a constant time operation, and very few slices should have the exact same length without actually being the same media there shouldn't be much of a performance impact from the hashing. However, it would be worth adding a benchmark to ensure this doesn't cause a regression for #274. It would also be useful to incorporate a benchmark for actually saving the xlsx file since it's likely that the performance impact of this check would be offset by not having to write as many files.
I will wait for feedback on this one since while I think it may be useful but I understand not wanting to risk the performance impact.
Activity
xuri commentedon Mar 20, 2019
Thanks for the insight @mlh758. That's a useful feature. I'll certainly accept that patch if somebody did that. we can store the hash of the image in the
File
object for reducing the impact on performance.mlh758 commentedon Mar 20, 2019
A hash may actually be overkill, the bytes package has has an Equal function that already handles equality and would probably be more efficient.
resolve #359, optimize for saving duplicate images
Do not save duplicate images
resolve qax-os#359, optimize for saving duplicate images