Image deletion not really deleting image blobs? #1183

falzm · 2015-11-12T17:11:05Z

I'm trying to get the hang of the image deletion HTTP API call described in the documentation, but I can't seem to understand how it works behind the scenes:

curl -i -X DELETE docker.example.net/v2/myimage/manifests/sha256:1204013f5200c49d999ec9b29e3b3eb0c6fb9e120cd18608fd0088a5a721d69b
HTTP/1.1 202 Accepted
Server: nginx/1.6.2
Date: Thu, 12 Nov 2015 17:03:20 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 0
Connection: keep-alive
Docker-Distribution-Api-Version: registry/2.0
X-Content-Type-Options: nosniff

The main reason I'd like to delete images is to avoid consuming too much disk space – the API responds that it correctly deleted the image, however the blobs are not deleted. Am I missing something?

The text was updated successfully, but these errors were encountered:

stevvooe · 2015-11-12T22:39:31Z

@falzm The deletes implemented for manifests are soft deletes. Please see the release notes.

We are still working on proper garbage collection. If you are using the filesystem driver, there are scripts that can cleanup images for you that leverages soft-deletes.

falzm · 2015-11-13T07:18:38Z

@stevvooe could you point to one of those scripts please? I can't seem to find the ones that leverage soft deletes (only unsafe ones).

sergeyfd · 2015-11-19T17:14:26Z

Registry 2.2. does pruning in background as a scheduled process. You don't need additional scripts then.

falzm · 2015-11-19T17:17:45Z

@sergeyfd is it documented somewhere? Possible to know when the process is triggered?

sergeyfd · 2015-11-20T04:55:29Z

Yes it is: http://docs.docker.com/registry/configuration/#maintenance

falzm · 2015-11-20T07:17:25Z

Great, thank you!

bwb · 2015-11-20T22:19:49Z

The maintenance doc covers "upload purging" and "read-only mode". Upload purging does not perform garbage collection. Read-only mode helps implement garbage collection, but distribution 2.2 does not remove unreferenced images when in read-only mode.

See #462.

The Docker Trusted Registry has a garbage collection feature.

sergeyfd · 2015-11-21T00:25:22Z

I might be wrong but I think that upload purging is the garbage collection. It's supposed to remove orphaned blobs. I don't know what else can be orphaned there.

stevvooe · 2015-11-25T01:18:26Z

@sergeyfd This is purging orphaned uploads. Orphaned blobs need to have a full sweep to ensure they are unreferenced.

travisgroth · 2015-12-02T20:46:37Z

@bwb @stevvooe when will this be implemented in the private (non-paid) registry? Seems like the work has been done if GC is in DTR. Is there a script we can run to purge orphaned blobs? Is there a release timeline for the GC API endpoint?

stevvooe · 2015-12-02T22:00:22Z

@travisgroth In the future, please avoid commenting on closed issues.

The best information on GC is from the ROADMAP. It describes the issues with GC. We are actively working on this for an upcoming release.

To provide a little background, the limiting factor of adding GC to the registry is having a transactional store to ensure a consistent data set during the GC cycle. The open source registry currently lacks this facility. The GC implementation in DTR landed first because DTR has a much more controlled deployment scenario, allowing us to ensure consistency of the registry dataset.

We can understand this better by reviewing the recommended GC procedure:

Put registry instances into read-only mode.
Walk registry metadata, creating a set of reachable layers.
Delete all unreachable layers.
Return registry to read write mode.

This describes what is actually implemented in DTR. What is interesting is the actual GC code is only 100 lines or so. The complexity is implementing the surrounding coordination. The core GC code in DTR, or a version of it, will land in the open source project. There are scripts floating around that do this, as well (I won't outright recommend any, as we haven't fully vetted them).

For many deployment scenarios, adding this coordination can be done as a matter of internal operations procedures. A simple mark-sweep script can be written in an afternoon. Many already do this (and we are collecting feedback to ensure quality in the solution we release). Before making this widely available, we must ensure that users have the tools to correctly and safely use it, without losing critical data. The problem lies in finding a solution that works in the wide scenarios in which the open source registry find itself deployed.

I hope this answers your questions.

travisgroth · 2015-12-02T22:35:49Z

Sorry. I glommed on to this issue as (a) it looks like it was closed due to misinformation (b) it mentions the DTR support for GC. If you want I can open an Issue for the discussion.

I’m aware of the roadmap and concerns but I’d love to have an idea of timelines and reasonable interim scripts that do GC. The 2.3 todo list is fairly long and I didn’t see a target date (though I may have missed it).

DTR appears to be supported in the configurations that I’ve seen worry about eventual consistency which is why it seems odd that it got the code first. Even if it only works in a subset of backends (POSIX FS + S3 + Whatever), there’s certainly been enough attention/need that I’d expect early versions of it to be available if the solution is safe enough for commercial support. I’m pretty sure anyone running the registry is happy to coordinate putting their fronted in read-only manually and hitting the API via a cron job off hours if it means not spending a day figuring out how to safely script through garbage cleanup.

Speaking for the community here I’m not sure why we’re stuck implementing this ourselves. If you’re still working on the feature and the community hasn’t produced a good enough script, why can’t Docker publish a blessed script for common backends (I’d put money on S3 + POSIX FS covering 90% of your user base). Requiring a registry admin to guarantee the registry is read-only is a very reasonable trade-off while the perfect built-in GC approach is hashed out or ported to the one source edition. This would mean registries that are deployed right now can be maintained easily until 2.3 is out.

On Dec 2, 2015, at 5:00 PM, Stephen Day notifications@github.com wrote:

@travisgroth https://github.com/travisgroth In the future, please avoid commenting on closed issues.

The best information on GC is from the ROADMAP https://github.com/docker/distribution/blob/master/ROADMAP.md#deletes. It describes the issues with GC. We are actively working on this for an upcoming release.

To provide a little background, the limiting factor of adding GC to the registry is having a transactional store to ensure a consistent data set during the GC cycle. The open source registry currently lacks this facility. The GC implementation in DTR landed first because DTR has a much more controlled deployment scenario, allowing us to ensure consistency of the registry dataset.

We can understand this better by reviewing the recommended GC procedure:

Put registry instances into read-only mode.
Walk registry metadata, creating a set of reachable layers.
Delete all unreachable layers.
Return registry to read write mode.
This describes what is actually implemented in DTR. What is interesting is the actual GC code is only 100 lines or so. The complexity is implementing the surrounding coordination. The core GC code in DTR, or a version of it, will land in the open source project. There are scripts floating around that do this, as well (I won't outright recommend any, as we haven't fully vetted them).

For many deployment scenarios, adding this coordination can be done as a matter of internal operations procedures. A simple mark-sweep script can be written in an afternoon. Many already do this (and we are collecting feedback to ensure quality in the solution we release). Before making this widely available, we must ensure that users have the tools to correctly and safely use it, without losing critical data. The problem lies in finding a solution that works in the wide scenarios in which the open source registry find itself deployed.

I hope this answers your questions.

—
Reply to this email directly or view it on GitHub #1183 (comment).

dmp42 · 2015-12-03T01:58:54Z

@travisgroth distribution and DTR are two different projects, with different requirements, different roadmaps, different use-cases and different teams.

If you ask me, OSS registry and DTR are even two very different products.

We don't merge in distribution just about anything just because it's in DTR. Conversely, DTR picks / rewrite whatever makes sense for their product.

Speaking for the community here (as well): I strongly believe we want something that cover all cases that the open-source registry supports in a satisfactory manner, and we will not merge in mainstream unless the maintainers are happy about it (@sday specifically, who designed and wrote most of the code you are using here...).

Also, roadmap for open-source are indicative, not a commitment (unlike for DTR), so, no, there is no date for this feature right now.

GC will land, eventually, we all want it, agreed, but ^.

Hope that clarifies.

falzm closed this as completed Nov 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image deletion not really deleting image blobs? #1183

Image deletion not really deleting image blobs? #1183

falzm commented Nov 12, 2015

stevvooe commented Nov 12, 2015

falzm commented Nov 13, 2015

sergeyfd commented Nov 19, 2015

falzm commented Nov 19, 2015

sergeyfd commented Nov 20, 2015

falzm commented Nov 20, 2015

bwb commented Nov 20, 2015

sergeyfd commented Nov 21, 2015

stevvooe commented Nov 25, 2015

travisgroth commented Dec 2, 2015

stevvooe commented Dec 2, 2015

travisgroth commented Dec 2, 2015

dmp42 commented Dec 3, 2015

Image deletion not really deleting image blobs? #1183

Image deletion not really deleting image blobs? #1183

Comments

falzm commented Nov 12, 2015

stevvooe commented Nov 12, 2015

falzm commented Nov 13, 2015

sergeyfd commented Nov 19, 2015

falzm commented Nov 19, 2015

sergeyfd commented Nov 20, 2015

falzm commented Nov 20, 2015

bwb commented Nov 20, 2015

sergeyfd commented Nov 21, 2015

stevvooe commented Nov 25, 2015

travisgroth commented Dec 2, 2015

stevvooe commented Dec 2, 2015

travisgroth commented Dec 2, 2015

dmp42 commented Dec 3, 2015