docker-slim type of forgejo runner #568

Open
opened 2026-05-11 14:33:56 +00:00 by lecris · 37 comments

Summary

Provide a light-weigh type of forgejo runner

Details

The main issue here is to have a very minimal runner type that has almost no dependencies (only docker or nodejs) for workflows that are just wrapping around testing-farm and equivalent action that do not consume much cpu and just do occasional curl operations

### Summary Provide a light-weigh type of forgejo runner ### Details The main issue here is to have a very minimal runner type that has almost no dependencies (only docker or nodejs) for workflows that are just wrapping around [`testing-farm`](https://forge.fedoraproject.org/ci/testing-farm) and equivalent action that do not consume much cpu and just do occasional curl operations
Author

One particular usage I noticed is that right now the workflows do not parallelize well, e.g. in this run using docker the jobs were basically sequential

One particular usage I noticed is that right now the workflows do not parallelize well, e.g. in [this run](https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/1/jobs/0/attempt/1) using `docker` the jobs were basically sequential
ryanlerch added this to the Backlog project 2026-05-14 05:44:31 +00:00

That sounds like a very good idea and would make using the testing-farm/tmt action significantly more interesting.

That sounds like a very good idea and would make using the testing-farm/tmt action significantly more interesting.
Member

How the small the ideal image would be?
node:20-alpine (~180MB) or better alpine:latest with ~7MB without node, curl or docker (to be added in the workflow)?

How the small the ideal image would be? node:20-alpine (~180MB) or better alpine:latest with ~7MB without node, curl or docker (to be added in the workflow)?
Author

Minimum would be node because that is needed for forgejo actions right now. Docker on top would also be good, but if it complicates deployment and sharing, then would rather be node-only

Minimum would be `node` because that is needed for forgejo actions right now. Docker on top would also be good, but if it complicates deployment and sharing, then would rather be `node`-only

One particular usage I noticed is that right now the workflows do not parallelize well, e.g. in this run using docker the jobs were basically sequential

Indeed, we should set a big limit for parallel jobs for this runner (maybe with small CPU & memory limits) to be able to run a lot of testing farm jobs in parallel.

> One particular usage I noticed is that right now the workflows do not parallelize well, e.g. in [this run](https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/1/jobs/0/attempt/1) using `docker` the jobs were basically sequential Indeed, we should set a big limit for parallel jobs for this runner (maybe with small CPU & memory limits) to be able to run a lot of testing farm jobs in parallel.
Member

Ok, there is a node:22-alpine minimal image under label docker-slim available on your org.

Ok, there is a `node:22-alpine` minimal image under label `docker-slim` available on your org.
Author

Can you also enable it for ci so that I can also experiment with it?

Can you also enable it for `ci` so that I can also experiment with it?
humaton modified the project from Backlog to Sprint 21 2026-05-21 10:12:52 +00:00
Member

docker-slim is there for ci org as well now

`docker-slim` is there for `ci` org as well now
Author

@lenkaseg wrote in #568 (comment):

docker-slim is there for ci org as well now

Doesn't seem to do what the main proposal here is aiming for, the jobs are still getting stuck on trying to get the runner https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/2/jobs/0/attempt/1, seems like these are missing a coordination to allow these to run in a more "oversubscribed" way.

(1h and it did not start)

@lenkaseg wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-721854: > `docker-slim` is there for `ci` org as well now Doesn't seem to do what the main proposal here is aiming for, the jobs are still getting stuck on trying to get the runner https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/2/jobs/0/attempt/1, seems like these are missing a coordination to allow these to run in a more "oversubscribed" way. (1h and it did not start)
Member

Yeah, I noticed when testing on staging that the runner definition was misformatted.

Yeah, I noticed when testing on staging that the runner definition was misformatted.
Member

Ok, I fixed the multiple labels problem for staging, but there's something wrong with the runner configs on production. Unfortunately I don't have access to the place where the runners are defined and we're in the evening before a PTO.
I decided to create another runner for you manually, with the fixed labels, so you're not blocked on testing the composes. I see that it picks jobs correctly. Since it is a manual thing, it will be removed next time someone runs the runner automation, but hopefully you can do some testing before that happens.

Ok, I fixed the multiple labels problem for staging, but there's something wrong with the runner configs on production. Unfortunately I don't have access to the place where the runners are defined and we're in the evening before a PTO. I decided to create another runner for you manually, with the fixed labels, so you're not blocked on testing the composes. I see that it picks jobs correctly. Since it is a manual thing, it will be removed next time someone runs the runner automation, but hopefully you can do some testing before that happens.
Thanks. This appear to be working now in https://forge.fedoraproject.org/atomic-desktops/config/pulls/757 (runs in https://forge.fedoraproject.org/ci/_atomic-desktops-config/actions).
OK, this no longer work as I have a stalled job: https://forge.fedoraproject.org/atomic-desktops/config/actions/runs/78/jobs/0/attempt/1
Member

I know, we did some maintenance and the labels got messed up again. I have a fix, but need someone to merge: infra/ansible#3377

I know, we did some maintenance and the labels got messed up again. I have a fix, but need someone to merge: https://forge.fedoraproject.org/infra/ansible/pulls/3377
Member

Ok, fixed.

Ok, fixed.
Member

Is there something more to be done on this issue?

Is there something more to be done on this issue?
Author

@lenkaseg wrote in #568 (comment):

Is there something more to be done on this issue?

Yes, adding more nodes or oversubscription for these tags. It can get in a quite long queue awaiting for a node

@lenkaseg wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-777452: > Is there something more to be done on this issue? Yes, adding more nodes or oversubscription for these tags. It can get in a quite long queue awaiting for a node
Member

Currently you have two nodes for 'docker-slim' under the 'ci' org.
Pull request under way to provide you with capacity of 20 concurrent jobs on ci-1 runner: infra/ansible#3381/files
Your request is for ci org only or for atomic-desktops as well?

Currently you have two nodes for 'docker-slim' under the 'ci' org. Pull request under way to provide you with capacity of 20 concurrent jobs on ci-1 runner: https://forge.fedoraproject.org/infra/ansible/pulls/3381/files#diff-1a074b609f6bfd901327dae7b88d1ac5c6c894cc Your request is for ci org only or for atomic-desktops as well?
Author

I was hoping that it can be more generic for other users who pop up having a similar use. But it can be as a case-by-case as they are requesting. Could you maybe make an issue that I can link in https://forge.fedoraproject.org/ci/testing-farm/ so that it is easier to request similar machines when it comes to it?

But yeah, should be enabled for both of the orgs for now at least

I was hoping that it can be more generic for other users who pop up having a similar use. But it can be as a case-by-case as they are requesting. Could you maybe make an issue that I can link in https://forge.fedoraproject.org/ci/testing-farm/ so that it is easier to request similar machines when it comes to it? But yeah, should be enabled for both of the orgs for now at least
Member

Both of the orgs have docker-slim option and the capacity increase will happen when the PR is merged.

By 'make an issue' you mean document the runner options/flavours we offer? Like in a documentation or in a template?
We have an issue template for requesting runners: https://forge.fedoraproject.org/forge/forge/issues/new?template=.forgejo%2fissue_template%2fnew_runner.yml

Both of the orgs have docker-slim option and the capacity increase will happen when the PR is merged. By 'make an issue' you mean document the runner options/flavours we offer? Like in a documentation or in a template? We have an issue template for requesting runners: https://forge.fedoraproject.org/forge/forge/issues/new?template=.forgejo%2fissue_template%2fnew_runner.yml
Author

@lenkaseg wrote in #568 (comment):

By 'make an issue' you mean document the runner options/flavours we offer? Like in a documentation or in a template?
We have an issue template for requesting runners: https://forge.fedoraproject.org/forge/forge/issues/new?template=.forgejo%2fissue_template%2fnew_runner.yml

Template approach is good too, but adding a new type in the Resource Requirements or concurrency or such. Any approach that I can document on the repo there.

Both of the orgs have docker-slim option and the capacity increase will happen when the PR is merged.

Cool will try to do some stress-test of it. Do you have an idea of what to expect if the concurrency is too high or the action is not optimized well enough?

@lenkaseg wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-789897: > By 'make an issue' you mean document the runner options/flavours we offer? Like in a documentation or in a template? > We have an issue template for requesting runners: https://forge.fedoraproject.org/forge/forge/issues/new?template=.forgejo%2fissue_template%2fnew_runner.yml Template approach is good too, but adding a new type in the `Resource Requirements` or concurrency or such. Any approach that I can document on the repo there. > Both of the orgs have docker-slim option and the capacity increase will happen when the PR is merged. Cool will try to do some stress-test of it. Do you have an idea of what to expect if the concurrency is too high or the action is not optimized well enough?
humaton modified the project from Sprint 21 to Sprint 22 2026-06-01 10:17:41 +00:00

I don't know where the issue is but the docker-slim jobs are still throttled in the atomic-desktops repo, i.e. I can not get multiple parallel runs for different PRs.

I don't know where the issue is but the `docker-slim` jobs are still throttled in the [atomic-desktops repo](https://forge.fedoraproject.org/atomic-desktops/config/actions), i.e. I can not get multiple parallel runs for different PRs.

I think there is a confusion about what we would like to do here.

We would like to have a runner with the "docker-slim" label (or maybe "testing-farm" would be more explicit) that is available to all orgs by default and that has a very high concurrency setting.

Reading https://code.forgejo.org/forgejo/runner/src/branch/main/internal/pkg/config/config.example.yaml, I'm afraid that we can not set ressource restrictions on jobs (see example for the GitLab runner: https://docs.gitlab.com/runner/configuration/advanced-configuration/#the-runnersdocker-section) so this means that it would not be a good idea to make such a runner global unfortunately.

From https://forge.fedoraproject.org/infra/ansible/src/branch/main/roles/openshift-apps/forgejo/runners/production/atomic-desktops-1.yml, I see a single runner has both labels set where instead it should be two distinct runners.

I think there is a confusion about what we would like to do here. We would like to have a runner with the "docker-slim" label (or maybe "testing-farm" would be more explicit) that is available to all orgs by default and that has a very high concurrency setting. Reading https://code.forgejo.org/forgejo/runner/src/branch/main/internal/pkg/config/config.example.yaml, I'm afraid that we can not set ressource restrictions on jobs (see example for the GitLab runner: https://docs.gitlab.com/runner/configuration/advanced-configuration/#the-runnersdocker-section) so this means that it would not be a good idea to make such a runner global unfortunately. From https://forge.fedoraproject.org/infra/ansible/src/branch/main/roles/openshift-apps/forgejo/runners/production/atomic-desktops-1.yml, I see a single runner has both labels set where instead it should be two distinct runners.

So what I think we can do instead:

  • Setup a dinstinct "testing-farm" runner per org that needs it with a high capacity limit (20 sounds OK).
  • As we can not set limits, we need to make it clear to the org members that this runner should be used for testing-farm jobs only.
So what I think we can do instead: - Setup a dinstinct "testing-farm" runner per org that needs it with a high capacity limit (20 sounds OK). - As we can not set limits, we need to make it clear to the org members that this runner should be used for testing-farm jobs only.

I've made infra/ansible#3385

I've made https://forge.fedoraproject.org/infra/ansible/pulls/3385
Author

I've tried it on https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/2/jobs/0/attempt/3, but I only got there a concurrency of 3, and there are no other runners running in the whole org. It seems that capacity is not being picked up at all

I've tried it on https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/2/jobs/0/attempt/3, but I only got there a concurrency of 3, and there are no other runners running in the whole org. It seems that `capacity` is not being picked up at all
Member

I see one problem and that are the shared labels. I will change the labels of the ci-2 runner so it is not random which runner picks up a job, since they have different configs.
ci-1 is currently on capacity: 1 and ci-2 on capacity: 2. So 3 parallel jobs seems to be exactly correct.

I will fix the labels. For the concurrency, there will be delays as we have to perform heavy load tests before we allow capacity more than 4 probably. Sorry for that, we're still in progress of establishing the runner policy. I'll keep you updated about the progress and decisions.

I see one problem and that are the shared labels. I will change the labels of the ci-2 runner so it is not random which runner picks up a job, since they have different configs. ci-1 is currently on capacity: 1 and ci-2 on capacity: 2. So 3 parallel jobs seems to be exactly correct. I will fix the labels. For the concurrency, there will be delays as we have to perform heavy load tests before we allow capacity more than 4 probably. Sorry for that, we're still in progress of establishing the runner policy. I'll keep you updated about the progress and decisions.
Member

@siosm wrote in #568 (comment):

I don't know where the issue is but the docker-slim jobs are still throttled in the atomic-desktops repo, i.e. I can not get multiple parallel runs for different PRs.

Yes, that is correct. The capacity of 20 has not been applied yet, so for the moment it stays on 1. Sorry for that. I'll get back once I have more info.

@siosm wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-794523: > I don't know where the issue is but the `docker-slim` jobs are still throttled in the [atomic-desktops repo](https://forge.fedoraproject.org/atomic-desktops/config/actions), i.e. I can not get multiple parallel runs for different PRs. Yes, that is correct. The capacity of 20 has not been applied yet, so for the moment it stays on 1. Sorry for that. I'll get back once I have more info.
Member

@siosm wrote in #568 (comment):

So what I think we can do instead:

* Setup a dinstinct "testing-farm" runner per org that needs it with a high capacity limit (20 sounds OK).

* As we can not set limits, we need to make it clear to the org members that this runner should be used for testing-farm jobs only.

That sounds pretty reasonable to me.
About the limit, I think there's a way of specifying cpu and memory use per runner: https://code.forgejo.org/forgejo/runner/issues/551 Let's see if it works.

@siosm wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-794608: > So what I think we can do instead: > > * Setup a dinstinct "testing-farm" runner per org that needs it with a high capacity limit (20 sounds OK). > > * As we can not set limits, we need to make it clear to the org members that this runner should be used for testing-farm jobs only. That sounds pretty reasonable to me. About the limit, I think there's a way of specifying cpu and memory use per runner: https://code.forgejo.org/forgejo/runner/issues/551 Let's see if it works.
Member

atomic-desktops and ci orgs now have a new runner available designed for the testing farm jobs, with following specs:

  • label: testing-farm
  • unprivileged
  • default image: node:22-alpine
  • concurrency: 20 jobs
  • org scoped

The current policy is to provide no global runners, but there will be an option in the runner issue template to request the testing-farm runner.

Also, the docker-slim label was dropped from atomic-desktops-1 runner. @lecris I supppse you don't need the docker-slim label on a regular runner either now, since there's a dedicated testing-farm runner.

Note: regular runners (label docker, docker-slim) have been bumped to 4 concurrent jobs.

Please let me know if this works.

`atomic-desktops` and `ci` orgs now have a new runner available designed for the testing farm jobs, with following specs: - label: `testing-farm` - unprivileged - default image: node:22-alpine - concurrency: 20 jobs - org scoped The current policy is to provide no global runners, but there will be an option in the runner issue template to request the testing-farm runner. Also, the `docker-slim` label was dropped from `atomic-desktops-1` runner. @lecris I supppse you don't need the `docker-slim` label on a regular runner either now, since there's a dedicated testing-farm runner. Note: regular runners (label docker, docker-slim) have been bumped to 4 concurrent jobs. Please let me know if this works.
Author

@lenkaseg wrote in #568 (comment):

Also, the docker-slim label was dropped from atomic-desktops-1 runner. @lecris I supppse you don't need the docker-slim label on a regular runner either now, since there's a dedicated testing-farm runner.

Yes, would not be needed.


Ok, the concurrency seems to be working: https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/3/jobs/0/attempt/1. We seem to be ratelimiting docker on that, but that is mainly an issue of how the action is implemented and the CI there running the setup step at the same time. I suspect it is a warming up issue, will check on it later

@lenkaseg wrote in https://forge.fedoraproject.org/forge/forge/issues/568#issuecomment-816486: > Also, the `docker-slim` label was dropped from `atomic-desktops-1` runner. @lecris I supppse you don't need the `docker-slim` label on a regular runner either now, since there's a dedicated testing-farm runner. Yes, would not be needed. --- Ok, the concurrency seems to be working: https://forge.fedoraproject.org/ci/_fedora-kiwi-descriptions/actions/runs/3/jobs/0/attempt/1. We seem to be ratelimiting docker on that, but that is mainly an issue of how the action is implemented and the CI there running the setup step at the same time. I suspect it is a warming up issue, will check on it later
Author

I suspect it is a warming up issue, will check on it later

Still consistently rate limited. I re-ran one of the other jobs on the other runner and it goes through that step almost instantly as if it has that image already cached. @lenkaseg any idea on how this thing works. The issue is with the first image that it tries to pull node:22-alpine

I also do not know how the whole forgejo action for container actions works. It seems to cache it based on how fast it goes, the logs do not tell much about that.

> I suspect it is a warming up issue, will check on it later Still consistently rate limited. I re-ran one of the other jobs on the other runner and it goes through that step almost instantly as if it has that image already cached. @lenkaseg any idea on how this thing works. The issue is with the first image that it tries to pull `node:22-alpine` I also do not know how the whole forgejo action for container actions works. It seems to cache it based on how fast it goes, the logs do not tell much about that.
Member

I guess having a node installed is kinda essential, right?
In that case...checking how to cache the image or mirror it to quay, ...

I guess having a node installed is kinda essential, right? In that case...checking how to cache the image or mirror it to quay, ...
Member

Ok, try now please! I mirrored the image from docker hub to quay.

Ok, try now please! I mirrored the image from docker hub to quay.
Author

Awesome, works like a charm. The only thing left is to document this ci/testing-farm#9 which is waiting on the template update:

options:
- Default
- Large (more CPU, memory, and disk space)

Awesome, works like a charm. The only thing left is to document this https://forge.fedoraproject.org/ci/testing-farm/pulls/9 which is waiting on the template update: https://forge.fedoraproject.org/forge/forge/src/commit/10f3412844902e88a4f33fc117039a2ce56ab7a9/.forgejo/issue_template/new_runner.yml#L52-L54

Neat work folks! was not following this, looks very nice. Do not forget to add --skip-guest-setup to get the VM up and running in 1.5 minutes for best experience (if you do not need artifact installation and Fedora CI environment). This will become later the default :)

Neat work folks! was not following this, looks very nice. Do not forget to add `--skip-guest-setup` to get the VM up and running in 1.5 minutes for best experience (if you do not need artifact installation and Fedora CI environment). This will become later the default :)

Also we have bare metal hosts now a lot better scalable, for that use --hardware virtualization.is-virtualized=false

Also we have bare metal hosts now a lot better scalable, for that use `--hardware virtualization.is-virtualized=false`
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
forge/forge#568
No description provided.