Add compose action runner for the Atomic Desktops organization #364
Labels
No labels
Backlog Status
Needs Review
Backlog Status
Ready
Chore
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Depends on
#359 Enable infrastructure for aarch64 runners
forge/forge
Reference
forge/forge#364
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is an issue to request a "dedicated"/"beafier" runner for CI for the Atomic Desktops repo.
For the Atomic Desktops CI, we would like to be able to do full composes (build of the images) on PRs to validate changes.
For now, we'll limit the checks to manifest validation and dependency resolutions.
But like we had on pagure.io with the Zuul CI, we would like to be able to compose at least the smallest "base" image. Ideally we would compose all our supported images.
To avoid hogging the shared runners for everyone, it would be great if we could have a dedicated runner instance.
See initial discussion in: #356
Thanks!
And we'll also need this runner to let us create containers that have more privileges so all the more important that it's a distinct one.
I initially thought that we would be able to do dependency resolutions with those privileges but rpm-ostree wants the full thing:
github.com/coreos/rpm-ostree@953c4e8d05As pagure.io is now expected to be decommissioned around mid June (https://discussion.fedoraproject.org/t/decommissioning-of-pagure-io-anticipated-by-flock-2026/181997), please help us move forward here so that we can migrate the Atomic Desktops source repo to the new forge. Thanks
Hi @siosm,
This reply got lost in my browser tabs on multiple occasions. We are discussing with the team how to implement your request. All of our runners are unprivileged containers.
So we were talking about having a dedicated VM for each architecture for your organization with privileged containers. The work is going to land in the next forge sprint.
Thanks a lot! Note that we only need x86_64 & aarch64 for the Atomic Desktops.
And we can also start with x86_64 only for the migration, that covers 99% of our users.
There is an aarch64 runner added for experimenting on Forge Staging, playgroung org:
https://forge.stg.fedoraproject.org/playground
It's an unprivileged runner.
Would it be ok for you to test if this setup would work for your need on staging first?
I added you to the staging playground organization. You might need to logout and login again to forge-staging for it to take effect. You can create a repo there (or use any of those already created).
The runner has a label
aarch64-1.In case you'd prefer a different named org on staging -
atomicperhaps - let me know. Or if staging environment is not suitable and you'd prefer directly production.About the privileged, do you need both architectures privileged as well?
I've done some testing on the
aarch64-1unprivileged runner in theplaygroundorg, but just like with thefedora-*runners, I was unable to build a container image.We would like to request two organizations in the staging environment -
bootcandatomic- with privileged runners. I'm not sure if a dedicated VM for each architecture with privileged runner is possible in the staging environment. If not, we would like to test just privileged runners for both organizations and architectures. What is the proper way to request these organizations? Should we open a ticket? Thanks!cc @siosm
We'll discuss the option of the privileged runners in the team and let you know soon.
I opened a ticket for the staging orgs: #546
Ok, there is a privileged runner on Forge Staging Playground org available to test, label
aarch64-priv.Please let me know if it works @hricky @siosm
I did some tests and it looks good so far, but it seems that the allocated disk space for the runner is only 5G. See:
https://forge.stg.fedoraproject.org/playground/fedora-runner-experimental/actions/runs/146/jobs/0/attempt/1
https://forge.stg.fedoraproject.org/playground/fedora-runner-experimental/actions/runs/169/jobs/0/attempt/1
Can it be increased? Thanks!
Yes
And yes, we need much more disk space, at least 10GB if I remember correctly but maybe even more.
I'll get a larger VM in that case. Will let you know about the progress, but since I have to find a person to spin those VMs for me, might not be available until next week.
It looks like Zuul CI is not working anymore on pagure.io (example: https://pagure.io/workstation-ostree-config/pull-request/756, I don't know if it's intentional or a bug). Thus this is becoming more pressing now as we need to get the CI up and running again here.
workstation-ostree-configto forge.fedoraproject.org #13222@siosm wrote in #364 (comment):
Oh, was not made aware of these roadblocks, I only considered
Could help with trying to spin up a testing-farm job for those and run on the smaller workers. Can still prioritize and rush that if it still helps right now.
Sure, I don't know what would be needed / what would a testing farm job do but I will take any help I can get to have CI here. Thanks
It's basically just a different CI runner from forgejo actions with the runners managed by
testing-farminstead offedora-infra. It does require atmtsetup, but that can be configured such that it is a thin wrapper of whatever workflow you prefer, I see that in this case it isjust. I will make a PR for that repo with a fork on the ci organization to show it in actionPR up, and failing, but will figure out the issues there: atomic-desktops/config#757
There is a new arm64 VM with 20GB of storage and I just enabled two aarch64 runners there, privileged and unprivileged, available from the playground org (the
atomic-desktopsandbootcorgs will be available soon on staging I suppose and when they are, I will move those runners under those orgs).I kept the labels the same, so no need to change anything when you test it: unprivileged:
aarch64-1and privileged:aarch64-priv.Please let me know if it works!
@siosm
Here link to the playground org: https://forge.stg.fedoraproject.org/playground
I see you're member there, so you should be able to create or use any repos there and run a test action on those runners.
When you confirm the runners are suitable for you, I will get them for you in production Forge.
Ok, we have atomic-desktops and bootc orgs on Forge Staging, so I created aarch64 privileged runner for those two orgs.
Please test them and let me know if they are sufficient for you purposes!
Labels for both is
aarch64-privileged.We tested the
aarch64-privilegedrunner with the Fedora Atomic Desktops config repo. All validations and verifications are successful. The base image compose is also successful. For the current setup of this particular repo, theaarch64-privilegedrunner should be sufficient:https://forge.stg.fedoraproject.org/atomic-desktops/config-ci-test/actions/runs/20/jobs/0/attempt/1
We will next try to compose all supported images. We will also continue to test the
aarch64-privilegedrunner with other repos in both orgs.Is it possible to create a similar runner for the x86_64 arch in both orgs? Thanks!
As @hricky said, this setup works for us, so I think we should make the runners available in the main Forgejo instance.
Excellent! I'll make the runners for both architectures available on prod under. To resume, that would be:
atomic-desktopsorg:Do you want to keep the unprivileged runner for running repo events and such?
Also, I cannot see any
bootcorg on production. Do you want to create one?Yes
Yes, please, that will be useful to run smaller jobs in parallel and not block the rest of Fedora on our jobs (notably if we end up using testing-farm/tmt for testing as this locks a runner for the entire duration of the job: https://forge.fedoraproject.org/ci/testing-farm).
I think we should do that independently and in a separate ticket. Discussion was in progress in https://gitlab.com/fedora/bootc/tracker/-/work_items/75.
@lenkaseg Re the blocking resources for the rest in ☝️, I have a suggestion in #568, would appreciate some thoughts on it
We tested building all images sequentially in one job, but disk and system resources of the
aarch64-privilegedrunner seem to become exhausted. Here is the size of the./cachedirectory after building the images locally:We are already cleaning between builds with a custom action, but the issue is probably that we are hitting the disk size limit during the build process itself. With 17 GB total disk space (13 GB usable) and 6 images building sequentially, we're likely run out of resources:
See: https://forge.stg.fedoraproject.org/atomic-desktops/config-ci-test/actions/runs/61/jobs/0/attempt/2.
We switched the workflow to only run a single compose per job sequentially (with cleanup in between), but we still get the following error on some composes:
See: https://forge.stg.fedoraproject.org/atomic-desktops/config-ci-test/actions/runs/65/jobs/1/attempt/1.
Can you give me an estimate of storage requirements? Would 30 GB be sufficient? Or 50?
We'll need up to 100 GB. We realize that's a large amount of disk space. We plan to build image artifacts, and insufficient space may require cleaning between builds, as we're doing currently. Thanks!
Now we are getting into a resource requirements outside of the scope of Forge. If you require full composes for CI, we need to find a way to use the
testing-farmas @lecris is proposing.We are giving this a try in atomic-desktops/config#757 and the experience is quite bad. We would need #568 for the
testing-farmsupport to make sense as we could do the builds in parallel and not block current forge runners.What are your limits right now? Can we make this work for now and we'll migrate to something else once this is ready? We also have work in progress to get Konflux to do the composes on PRs so that could be another option in the future but we really need something that works today in the meantime.
If we reduce the scope to only the base atomic image composes (what we were doing on Zuul CI before) we can do with the runner as is. It's not great but that will cover the basics until we have something more powerful.
@siosm wrote in #364 (comment):
This was put on this team sprint with high priority.
The current storage limit is set to 20G, the
capacitywas set to default 1 for all runners. That will change in a moment, as @lenkaseg is working on it. But the privileged runner in question is running on very small VM in AWS so the capacity will be 2.For the unprivileged runners, the only constrain is the cluster HW that we are running on.
Ok, the
aarch64-privinatomic-desktopsorg on staging has capacity set to 2 concurrents jobs. More capacity probably does not make sense. The VM has 4 cores 15 GB RAM and 13GB of free space.We tested the runner with smaller images, for which the 13GB of usable space is sufficient. The runner with the new setup with 2 concurrent jobs is approximately 27% faster. We tested it by increasing the zstd compression level to the max. However, I am not entirely sure it matters, but I think it is faster anyway.