New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image deletion not really deleting image blobs? #1183
Comments
@falzm The deletes implemented for manifests are soft deletes. Please see the release notes. We are still working on proper garbage collection. If you are using the filesystem driver, there are scripts that can cleanup images for you that leverages soft-deletes. |
@stevvooe could you point to one of those scripts please? I can't seem to find the ones that leverage soft deletes (only unsafe ones). |
Registry 2.2. does pruning in background as a scheduled process. You don't need additional scripts then. |
@sergeyfd is it documented somewhere? Possible to know when the process is triggered? |
Great, thank you! |
The maintenance doc covers "upload purging" and "read-only mode". Upload purging does not perform garbage collection. Read-only mode helps implement garbage collection, but distribution 2.2 does not remove unreferenced images when in read-only mode. See #462. The Docker Trusted Registry has a garbage collection feature. |
I might be wrong but I think that upload purging is the garbage collection. It's supposed to remove orphaned blobs. I don't know what else can be orphaned there. |
@sergeyfd This is purging orphaned uploads. Orphaned blobs need to have a full sweep to ensure they are unreferenced. |
@travisgroth In the future, please avoid commenting on closed issues. The best information on GC is from the ROADMAP. It describes the issues with GC. We are actively working on this for an upcoming release. To provide a little background, the limiting factor of adding GC to the registry is having a transactional store to ensure a consistent data set during the GC cycle. The open source registry currently lacks this facility. The GC implementation in DTR landed first because DTR has a much more controlled deployment scenario, allowing us to ensure consistency of the registry dataset. We can understand this better by reviewing the recommended GC procedure:
This describes what is actually implemented in DTR. What is interesting is the actual GC code is only 100 lines or so. The complexity is implementing the surrounding coordination. The core GC code in DTR, or a version of it, will land in the open source project. There are scripts floating around that do this, as well (I won't outright recommend any, as we haven't fully vetted them). For many deployment scenarios, adding this coordination can be done as a matter of internal operations procedures. A simple mark-sweep script can be written in an afternoon. Many already do this (and we are collecting feedback to ensure quality in the solution we release). Before making this widely available, we must ensure that users have the tools to correctly and safely use it, without losing critical data. The problem lies in finding a solution that works in the wide scenarios in which the open source registry find itself deployed. I hope this answers your questions. |
Sorry. I glommed on to this issue as (a) it looks like it was closed due to misinformation (b) it mentions the DTR support for GC. If you want I can open an Issue for the discussion. I’m aware of the roadmap and concerns but I’d love to have an idea of timelines and reasonable interim scripts that do GC. The 2.3 todo list is fairly long and I didn’t see a target date (though I may have missed it). DTR appears to be supported in the configurations that I’ve seen worry about eventual consistency which is why it seems odd that it got the code first. Even if it only works in a subset of backends (POSIX FS + S3 + Whatever), there’s certainly been enough attention/need that I’d expect early versions of it to be available if the solution is safe enough for commercial support. I’m pretty sure anyone running the registry is happy to coordinate putting their fronted in read-only manually and hitting the API via a cron job off hours if it means not spending a day figuring out how to safely script through garbage cleanup. Speaking for the community here I’m not sure why we’re stuck implementing this ourselves. If you’re still working on the feature and the community hasn’t produced a good enough script, why can’t Docker publish a blessed script for common backends (I’d put money on S3 + POSIX FS covering 90% of your user base). Requiring a registry admin to guarantee the registry is read-only is a very reasonable trade-off while the perfect built-in GC approach is hashed out or ported to the one source edition. This would mean registries that are deployed right now can be maintained easily until 2.3 is out.
|
@travisgroth distribution and DTR are two different projects, with different requirements, different roadmaps, different use-cases and different teams. If you ask me, OSS registry and DTR are even two very different products. We don't merge in distribution just about anything just because it's in DTR. Conversely, DTR picks / rewrite whatever makes sense for their product. Speaking for the community here (as well): I strongly believe we want something that cover all cases that the open-source registry supports in a satisfactory manner, and we will not merge in mainstream unless the maintainers are happy about it (@sday specifically, who designed and wrote most of the code you are using here...). Also, roadmap for open-source are indicative, not a commitment (unlike for DTR), so, no, there is no date for this feature right now. GC will land, eventually, we all want it, agreed, but ^. Hope that clarifies. |
I'm trying to get the hang of the image deletion HTTP API call described in the documentation, but I can't seem to understand how it works behind the scenes:
The main reason I'd like to delete images is to avoid consuming too much disk space – the API responds that it correctly deleted the image, however the blobs are not deleted. Am I missing something?
The text was updated successfully, but these errors were encountered: