Add jgroman to sysadmin-qa and hook sysadmin-qa up to appropriate things #13297

Closed
opened 2026-04-28 18:33:46 +00:00 by adamwill · 14 comments
Member

Description of request

Right now, I'm the only person on the Quality team who can run ansible stuff. This seems like a bad desert island factor.

We'd like to add @jgroman to sysadmin-qa , and make sure sysadmin-qa is hooked up to the right things:

  • openQA
  • blockerbugs
  • testdays
  • ??? am I forgetting anything?
### Description of request Right now, I'm the only person on the Quality team who can run ansible stuff. This seems like a bad desert island factor. We'd like to add @jgroman to sysadmin-qa , and make sure sysadmin-qa is hooked up to the right things: * openQA * blockerbugs * testdays * ??? am I forgetting anything?
zlopez self-assigned this 2026-04-29 07:08:49 +00:00
Owner

I added @jgroman to sysadmin-qa and according to rbac rules these playbooks could be ran by the group:

  • groups/taskotron.yml
  • groups/resultsdb.yml
  • openshift-apps/resultsdb.yml
  • openshift-apps/resultsdb-frontend.yml
  • openshift-apps/resultsdb-ci-listener.yml
  • groups/taskotron-client-hosts.yml
  • groups/qa.yml
  • groups/beaker.yml
  • update_grokmirror_repos.yml
  • openshift-apps/testdays.yml
  • openshift-apps/oraculum.yml
  • groups/openqa-onebox-test.yml
  • groups/openqa.yml
  • groups/openqa-workers.yml
  • manual/openqa-restart-workers.yml
  • groups/blockerbugs.yml
  • openshift-apps/blockerbugs.yml
  • openshift-apps/waiverdb.yml
  • openshift-apps/greenwave.yml
  • openshift-apps/kanban.yml

So I think everything should be in order. Let us know if something is not working as expected.

I added @jgroman to sysadmin-qa and according to rbac rules these playbooks could be ran by the group: * groups/taskotron.yml * groups/resultsdb.yml * openshift-apps/resultsdb.yml * openshift-apps/resultsdb-frontend.yml * openshift-apps/resultsdb-ci-listener.yml * groups/taskotron-client-hosts.yml * groups/qa.yml * groups/beaker.yml * update_grokmirror_repos.yml * openshift-apps/testdays.yml * openshift-apps/oraculum.yml * groups/openqa-onebox-test.yml * groups/openqa.yml * groups/openqa-workers.yml * manual/openqa-restart-workers.yml * groups/blockerbugs.yml * openshift-apps/blockerbugs.yml * openshift-apps/waiverdb.yml * openshift-apps/greenwave.yml * openshift-apps/kanban.yml So I think everything should be in order. Let us know if something is not working as expected.
Author
Member

Awesome, thank you.

@jgroman you will need 2FA configured on your FAS account, if you didn't do that already. Once you've done that, you should follow this SOP to set up ssh for infra, then you should be able to do ssh batcave01.rdu3.fedoraproject.org and get in. Then to test running a playbook, you can do:

sudo rbac-playbook openshift-apps/blockerbugs.yml -l staging

which should prompt for your password and second factor, then run the blockerbugs playbook on staging only (it shouldn't change anything). Can you give that a shot and let us know how it goes? Thanks!

Awesome, thank you. @jgroman you will need 2FA configured on your FAS account, if you didn't do that already. Once you've done that, you should follow [this SOP](https://docs.fedoraproject.org/en-US/infra/sysadmin_sops/sshaccess/) to set up ssh for infra, then you *should* be able to do `ssh batcave01.rdu3.fedoraproject.org` and get in. Then to test running a playbook, you can do: ``` sudo rbac-playbook openshift-apps/blockerbugs.yml -l staging ``` which should prompt for your password and second factor, then run the blockerbugs playbook on staging only (it shouldn't change anything). Can you give that a shot and let us know how it goes? Thanks!

Looks I am almost there: I can ssh to bastion but getting Permisson denied when trying to connect directly to batcave01. I'll do some more troubleshooting.

Looks I am _almost_ there: I can ssh to `bastion` but getting Permisson denied when trying to connect directly to `batcave01`. I'll do some more troubleshooting.
Author
Member

ssh -v gives a lot of detailed info. You should see it going to bastion.fedoraproject.org first, then something like this:

Authenticated to bastion.fedoraproject.org ([38.145.32.11]:22) using "publickey".
debug1: pkcs11_del_provider: called, provider_id = (null)
debug1: channel_connect_stdio_fwd: batcave01.rdu3.fedoraproject.org:22
debug1: channel 0: new stdio-forward [stdio-forward] (inactive timeout: 0)
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: pledge: fork
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Remote: /usr/bin/sss_ssh_authorizedkeys:2: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Remote: /usr/bin/sss_ssh_authorizedkeys:2: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.7
debug1: compat_banner: match: OpenSSH_8.7 pat OpenSSH* compat 0x04000000
debug1: Authenticating to batcave01.rdu3.fedoraproject.org:22 as 'adamwill'
`ssh -v` gives a lot of detailed info. You should see it going to `bastion.fedoraproject.org` first, then something like this: ``` Authenticated to bastion.fedoraproject.org ([38.145.32.11]:22) using "publickey". debug1: pkcs11_del_provider: called, provider_id = (null) debug1: channel_connect_stdio_fwd: batcave01.rdu3.fedoraproject.org:22 debug1: channel 0: new stdio-forward [stdio-forward] (inactive timeout: 0) debug1: Requesting no-more-sessions@openssh.com debug1: Entering interactive session. debug1: pledge: network debug1: pledge: fork debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0 debug1: Remote: /usr/bin/sss_ssh_authorizedkeys:2: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding debug1: Remote: /usr/bin/sss_ssh_authorizedkeys:2: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding debug1: Remote protocol version 2.0, remote software version OpenSSH_8.7 debug1: compat_banner: match: OpenSSH_8.7 pat OpenSSH* compat 0x04000000 debug1: Authenticating to batcave01.rdu3.fedoraproject.org:22 as 'adamwill' ```

OK, I was missing IdentityFile for batcave01 itself.
I can connect to it now and run the suggested command just fine.

OK, I was missing IdentityFile for `batcave01` itself. I can connect to it now and run the suggested command just fine.

While sysadmin-qa seems to grant the rights to run appropriate playbooks, it seems it's not sufficient to perform admin tasks for the OpenShift web UI. For example, it's not possible to edit a Deployment for the blockerbugs project (link). Interestingly enough, this is allowed in Openshift staging. Are we still missing some groups for these tasks? Or is this perhaps intentionally restricted in OpenShift production, for everyone?

While `sysadmin-qa` seems to grant the rights to run appropriate playbooks, it seems it's not sufficient to perform admin tasks for the OpenShift web UI. For example, it's not possible to edit a Deployment for the blockerbugs project ([link](https://console-openshift-console.apps.ocp.fedoraproject.org/k8s/ns/blockerbugs/deployments?page=1&perPage=50)). Interestingly enough, this is allowed in Openshift staging. Are we still missing some groups for these tasks? Or is this perhaps intentionally restricted in OpenShift production, for everyone?
Author
Member

Technically you should not need to do things in the web UI much. You're supposed to do things indirectly via ansible; the deployment definition is in ansible, if you want to change it, change the file in ansible and run the playbook.

It is sometimes kinda useful/necessary to do things directly, though...removing things from openshift via ansible is I think tricky or impossible, for e.g.

Technically you should not need to do things in the web UI much. You're *supposed* to do things indirectly via ansible; the deployment definition is [in ansible](https://forge.fedoraproject.org/infra/ansible/src/branch/main/roles/openshift-apps/blockerbugs/templates/deployment.yml.j2), if you want to change it, change the file in ansible and run the playbook. It is sometimes kinda useful/necessary to do things directly, though...removing things from openshift via ansible is I think tricky or impossible, for e.g.
Owner

Permissions there are controlled via the appowners setting in the playbook. Users listed have some permissions for the application.

In staging those permissions are much wider, but in prod they are deliberately restricted so you cannot manually edit things and get them out of sync with ansible. ansible is the source of truth.

The idea being that in staging you may need to test something out before committing it, but you should never do this in production, you should test it in staging and when it's all working as you like you commit to ansible.

Does that make sense?

Permissions there are controlled via the appowners setting in the playbook. Users listed have some permissions for the application. In staging those permissions are much wider, but in prod they are deliberately restricted so you cannot manually edit things and get them out of sync with ansible. ansible is the source of truth. The idea being that in staging you may need to test something out before committing it, but you should never do this in production, you should test it in staging and when it's all working as you like you commit to ansible. Does that make sense?

Permissions there are controlled via the appowners setting in the playbook. Users listed have some permissions for the application.

That seems correct:

project_app: blockerbugs
project_description: Blockerbugs
project_appowners:
- adamwill
- kparal
- jgroman

In staging those permissions are much wider, but in prod they are deliberately restricted so you cannot manually edit things and get them out of sync with ansible. ansible is the source of truth.

OK.

What I wanted to do yesterday was to stop the Blockerbugs app temporarily in production, because I was adjusting some services it uses (a Forge repo). That can be done through OpenShift UI by configuring Scaling to be 0 pods. I'm not sure if it can be performed in some other way?

The action turned out to be unnecessary, the migration is over and everything works OK. But it was the reason why I added my previous comment. No further permission changes are needed, then.

> Permissions there are controlled via the appowners setting in the playbook. Users listed have some permissions for the application. That seems correct: project_app: blockerbugs project_description: Blockerbugs project_appowners: - adamwill - kparal - jgroman > In staging those permissions are much wider, but in prod they are deliberately restricted so you cannot manually edit things and get them out of sync with ansible. ansible is the source of truth. OK. What I wanted to do yesterday was to stop the Blockerbugs app temporarily in production, because I was adjusting some services it uses (a Forge repo). That can be done through OpenShift UI by configuring Scaling to be 0 pods. I'm not sure if it can be performed in some other way? The action turned out to be unnecessary, the migration is over and everything works OK. But it was the reason why I added my previous comment. No further permission changes are needed, then.
Owner

Huh, I thought we did allow scaling... these permissions are setup in roles/openshift/project/templates/role-appowners.yml.j2 and we can of course adjust them.

Should we close this then? or keep it open to track scaling or ?

Huh, I thought we did allow scaling... these permissions are setup in roles/openshift/project/templates/role-appowners.yml.j2 and we can of course adjust them. Should we close this then? or keep it open to track scaling or ?
Author
Member

I think we should close it and open new issues to track things like scaling if necessary. @kparal @jgroman are you OK with that?

I think we should close it and open new issues to track things like scaling if necessary. @kparal @jgroman are you OK with that?

Sure. Most probably the scaling problem was caused by our inexperience with the OpenShift UI. We will ping Infra and ask if we need it in the future. Let's close this, thank you.

Sure. Most probably the scaling problem was caused by our inexperience with the OpenShift UI. We will ping Infra and ask if we need it in the future. Let's close this, thank you.

Yes, we'll track it separately if needed. Thanks!

Yes, we'll track it separately if needed. Thanks!
Owner

Great.

Great.
kevin closed this issue 2026-05-06 14:43:00 +00:00
Sign in to join this conversation.
No milestone
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
infra/tickets#13297
No description provided.