container registry pruning #11512
Labels
No labels
announcement
anubis
authentication
aws
backlog
blocked
bodhi
ci
cloud
communishift
copr
database
day-to-day
dc-move
deprecated
dev
discourse
dns
downloads
easyfix
epel
firmitas
forgejo_migration
Gain
High
Gain
Low
Gain
Medium
gitlab
greenwave
hardware
help wanted
high-trouble
koji
koschei
lists
low-trouble
medium-trouble
mirrorlists
monitoring
Needs investigation
odcs
OpenShift
ops
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
release-monitoring
releng
request-for-resources
s390x
security
SMTP
sprint-0
sprint-1
src.fp.o
staging
unfreeze
waiverdb
websites-general
wiki
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
8 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/tickets#11512
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our cancidate and production container registries are growing without much bound currently.
See:
https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-9022400
and
https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=oci-candidate-registry01.iad2.fedoraproject.org;plugin=df;plugin_instance=srv-registry;type=df_complex;begin=-11622400
We have a manual script in playbooks/manual/oci-registry-prune.yml but I'm not convinced it's understanding the new ostree containers and flatpak containers. It seems to operate on tags and deletes everything older than 30days, which isn't likely what we want anyhow.
So, We need to:
Come up with a retention plan. What do we keep and for how long?
Implement something that enforces that. cron job or whatever, just not anything manual. ;)
Note that even if we move to quay.io, we likely want to prune some things anyhow, not keep everything forever.
Any input welcome!
CC: @cverna @otaylor @dustymabe @siosm
The current playbook we have is for the candidate registry, we had the discussion in the container SIG (https://pagure.io/ContainerSIG/container-sig/issue/33) and thought that 1 months was good enough there.
I think that the ansible module should work for both ostree and flatpak containers since it is using the registry APIs and nothing specific to the actual container image itself. But folks could take a look at it here https://pagure.io/fedora-infra/ansible/blob/main/f/library/delete_old_oci_images.py
For the production registry, I think we should look in terms of if a Fedora version is EOL or not. Something like we keep container images up to 2 EOL version (for example we would have now 35, 36, 37, 38, 39, 40) and when 37 gets EOL we would prune the images based on F35.
Metadata Update from @phsmoura:
Please keep at least the latest base image for all Fedora release. All other container images for EOL'ed images can be pruned.
For Quay.io, we should enforce a default lifetime for all image by default: https://docs.projectquay.io/use_quay.html#tag-expiration
We could set this to 2 years for example for all images and override it on a per image basis.
Just to go over what we have/produce... each day we:
for each release rawhide f39 f38-updates f37-updates f39-updates-testing f38 updates-testing f37-updates-testing
a fedora-base image
a fedora-base-minimal image
a fedora-toolbox image
for rawhide and 39 we also do the 4 ostree desktop variants right?
Are all of those useful all the time? Can they be good for rolling back/bisecting problems?
It's unclear to me how much space things are using (is there any way to get that sort of information from a registry?) or how often people want to use older images for whatever reason.
I would think it might be useful to keep eol initial images at release and the last updated one for historical reasons?
[backlog refinement]
The discussion for this ticket is still open.
With most of the container artefacts being published to quay.io this is not that pressing issue anymore.
After flatpaks will move as well this issue will no longer be relevant. Closing this.
Metadata Update from @zlopez:
So... we still arent fully moved and the volume has grown to... 15TB.
Metadata Update from @kevin:
Time-based pruning won't work for flatpaks because some apps have no non-runtime dependencies and the app itself doesn't get updates for a whole stable release. I'm pretty sure we need to look at what's tagged stable or f42 etc. There could be other intricacies wrt the flatpak manifest of which I am not aware.
We talked about this during today's infra weekly Matrix meeting and it's still an issue.
There's a chance this will be made moot by switching to quay.io (another ongoing ticket [1]) but we still may need a solution there too.
One approach could be to get all of them that exist and then exclude any that are ever tagged in stable releases.
[1] #11543
This iss definitely still a problem, we might need a solution - we're at 27TB now. @zlopez will the quay.io migration help?
@gwmngilfen wrote in #11512 (comment):
It depends how long we want to keep uploading to both registry and quay.io. Right now we are uploading to both and I still need to find time to write the automation script for making the repositories public.