container registry pruning #11512

Open
opened 2023-09-11 18:50:50 +00:00 by kevin · 14 comments
Owner

Our cancidate and production container registries are growing without much bound currently.

See:

https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-9022400

and

https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-candidate-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-11622400

We have a manual script in playbooks/manual/oci-registry-prune.yml but I'm not convinced it's understanding the new ostree containers and flatpak containers. It seems to operate on tags and deletes everything older than 30days, which isn't likely what we want anyhow.

So, We need to:

  • Come up with a retention plan. What do we keep and for how long?

  • Implement something that enforces that. cron job or whatever, just not anything manual. ;)

Note that even if we move to quay.io, we likely want to prune some things anyhow, not keep everything forever.

Any input welcome!

CC: @cverna @otaylor @dustymabe @siosm

Our cancidate and production container registries are growing without much bound currently. See: https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-9022400 and https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-candidate-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-11622400 We have a manual script in playbooks/manual/oci-registry-prune.yml but I'm not convinced it's understanding the new ostree containers and flatpak containers. It seems to operate on tags and deletes everything older than 30days, which isn't likely what we want anyhow. So, We need to: - Come up with a retention plan. What do we keep and for how long? - Implement something that enforces that. cron job or whatever, just not anything manual. ;) Note that even if we move to quay.io, we likely want to prune some things anyhow, not keep everything forever. Any input welcome! CC: @cverna @otaylor @dustymabe @siosm

The current playbook we have is for the candidate registry, we had the discussion in the container SIG (https://pagure.io/ContainerSIG/container-sig/issue/33) and thought that 1 months was good enough there.

I think that the ansible module should work for both ostree and flatpak containers since it is using the registry APIs and nothing specific to the actual container image itself. But folks could take a look at it here https://pagure.io/fedora-infra/ansible/blob/main/f/library/delete_old_oci_images.py

For the production registry, I think we should look in terms of if a Fedora version is EOL or not. Something like we keep container images up to 2 EOL version (for example we would have now 35, 36, 37, 38, 39, 40) and when 37 gets EOL we would prune the images based on F35.

The current playbook we have is for the candidate registry, we had the discussion in the container SIG (https://pagure.io/ContainerSIG/container-sig/issue/33) and thought that 1 months was good enough there. I think that the ansible module should work for both ostree and flatpak containers since it is using the registry APIs and nothing specific to the actual container image itself. But folks could take a look at it here https://pagure.io/fedora-infra/ansible/blob/main/f/library/delete_old_oci_images.py For the production registry, I think we should look in terms of if a Fedora version is EOL or not. Something like we keep container images up to 2 EOL version (for example we would have now 35, 36, 37, 38, 39, 40) and when 37 gets EOL we would prune the images based on F35.
Member

Metadata Update from @phsmoura:

  • Issue priority set to: Waiting on Assignee (was: Needs Review)
  • Issue tagged with: medium-gain, medium-trouble, ops
**Metadata Update from @phsmoura**: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops

Please keep at least the latest base image for all Fedora release. All other container images for EOL'ed images can be pruned.

Please keep at least the latest base image for all Fedora release. All other container images for EOL'ed images can be pruned.

For Quay.io, we should enforce a default lifetime for all image by default: https://docs.projectquay.io/use_quay.html#tag-expiration

We could set this to 2 years for example for all images and override it on a per image basis.

For Quay.io, we should enforce a default lifetime for all image by default: https://docs.projectquay.io/use_quay.html#tag-expiration We could set this to 2 years for example for all images and override it on a per image basis.
Author
Owner

Just to go over what we have/produce... each day we:

for each release rawhide f39 f38-updates f37-updates f39-updates-testing f38 updates-testing f37-updates-testing
a fedora-base image
a fedora-base-minimal image
a fedora-toolbox image

for rawhide and 39 we also do the 4 ostree desktop variants right?

Are all of those useful all the time? Can they be good for rolling back/bisecting problems?

It's unclear to me how much space things are using (is there any way to get that sort of information from a registry?) or how often people want to use older images for whatever reason.

I would think it might be useful to keep eol initial images at release and the last updated one for historical reasons?

Just to go over what we have/produce... each day we: for each release rawhide f39 f38-updates f37-updates f39-updates-testing f38 updates-testing f37-updates-testing a fedora-base image a fedora-base-minimal image a fedora-toolbox image for rawhide and 39 we also do the 4 ostree desktop variants right? Are all of those useful all the time? Can they be good for rolling back/bisecting problems? It's unclear to me how much space things are using (is there any way to get that sort of information from a registry?) or how often people want to use older images for whatever reason. I would think it might be useful to keep eol initial images at release and the last updated one for historical reasons?
Owner

[backlog refinement]
The discussion for this ticket is still open.

[backlog refinement] The discussion for this ticket is still open.
Owner

With most of the container artefacts being published to quay.io this is not that pressing issue anymore.

After flatpaks will move as well this issue will no longer be relevant. Closing this.

With most of the container artefacts being published to quay.io this is not that pressing issue anymore. After flatpaks will move as well this issue will no longer be relevant. Closing this.
Owner

Metadata Update from @zlopez:

  • Issue close_status updated to: Will Not/Can Not fix
  • Issue status updated to: Closed (was: Open)
**Metadata Update from @zlopez**: - Issue close_status updated to: Will Not/Can Not fix - Issue status updated to: Closed (was: Open)
Author
Owner

So... we still arent fully moved and the volume has grown to... 15TB.

So... we still arent fully moved and the volume has grown to... 15TB.
Author
Owner

Metadata Update from @kevin:

  • Issue status updated to: Open (was: Closed)
**Metadata Update from @kevin**: - Issue status updated to: Open (was: Closed)

Time-based pruning won't work for flatpaks because some apps have no non-runtime dependencies and the app itself doesn't get updates for a whole stable release. I'm pretty sure we need to look at what's tagged stable or f42 etc. There could be other intricacies wrt the flatpak manifest of which I am not aware.

Time-based pruning won't work for flatpaks because some apps have no non-runtime dependencies and the app itself doesn't get updates for a whole stable release. I'm pretty sure we need to look at what's tagged stable or f42 etc. There could be other intricacies wrt the flatpak manifest of which I am not aware.
Member

We talked about this during today's infra weekly Matrix meeting and it's still an issue.
There's a chance this will be made moot by switching to quay.io (another ongoing ticket [1]) but we still may need a solution there too.
One approach could be to get all of them that exist and then exclude any that are ever tagged in stable releases.

[1] #11543

We talked about this during today's infra weekly Matrix meeting and it's still an issue. There's a chance this will be made moot by switching to quay.io (another ongoing ticket [1]) but we still may need a solution there too. One approach could be to get all of them that exist and then exclude any that are ever tagged in stable releases. [1] https://forge.fedoraproject.org/infra/tickets/issues/11543
Member

This iss definitely still a problem, we might need a solution - we're at 27TB now. @zlopez will the quay.io migration help?

This iss definitely still a problem, we might need a solution - we're at 27TB now. @zlopez will the quay.io migration help?
Owner

@gwmngilfen wrote in #11512 (comment):

This iss definitely still a problem, we might need a solution - we're at 27TB now. @zlopez will the quay.io migration help?

It depends how long we want to keep uploading to both registry and quay.io. Right now we are uploading to both and I still need to find time to write the automation script for making the repositories public.

@gwmngilfen wrote in https://forge.fedoraproject.org/infra/tickets/issues/11512#issuecomment-765349: > This iss definitely still a problem, we might need a solution - we're at 27TB now. @zlopez will the quay.io migration help? It depends how long we want to keep uploading to both registry and quay.io. Right now we are uploading to both and I still need to find time to write the automation script for making the repositories public.
Sign in to join this conversation.
No milestone
No project
No assignees
8 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
infra/tickets#11512
No description provided.