disk auto-decryption problem on vmhost-x86-copr04 #13273

Closed
opened 2026-04-16 21:17:33 +00:00 by praiskup · 6 comments
Member

Description of request

I am able to get the console output, but I fail to understand the error.
It seems some disks are successfully decrypted, but some are not.

### Description of request I am able to get the console output, but I fail to understand the error. It seems some disks are successfully decrypted, but some are not.
Author
Member

I just evacuated the host; it's OK to reboot it anytime you need (in case debugging is needed).

Right now we face some weird FD-leak issue in virtqemud, causing headaches in Copr (and causing other FD leaks on the copr side). After some time, the SSH on the affected host dies :-/ and I need to reboot over idrac/ipmi (so it would be nice if the machine went online automatically).

I just evacuated the host; it's OK to reboot it anytime you need (in case debugging is needed). Right now we face some weird FD-leak issue in virtqemud, causing headaches in Copr (and causing other FD leaks on the copr side). After some time, the SSH on the affected host dies :-/ and I need to reboot over idrac/ipmi (so it would be nice if the machine went online automatically).
Owner

Oh. I thought something was wrong with the hardware and it was rebooting. :(

I kept entering the luks passphrase and it booted then it rebooted, so I assume that was you?

I do need to fix the luks not unlocking on boot for sure.

Oh. I thought something was wrong with the hardware and it was rebooting. :( I kept entering the luks passphrase and it booted then it rebooted, so I assume that was you? I do need to fix the luks not unlocking on boot for sure.
Owner

So, do the other ones all unlock properly?

I can't see whats different about 04 off hand...

So, do the other ones all unlock properly? I can't see whats different about 04 off hand...
Author
Member

Oh. I thought something was wrong with the hardware and it was rebooting. :(

We were on it at the same time probably, sorry for the inconvenience. :(

I kept entering the luks passphrase and it booted then it rebooted, so I assume that was you?

I bet. Sorry, I was a bit sleepy already. 🤦

I do need to fix the luks not unlocking on boot for sure.
So, do the other ones all unlock properly?
I can't see whats different about 04 off hand...

If you can fix that, it would be awesome.

No there aren't any differences, FWICT. There are some significant problems with libvirt on all of those machines, but I haven't tried to reboot all of them -- I just "started experimenting with 04", and it failed to reboot (after sshd stopped working).

> Oh. I thought something was wrong with the hardware and it was rebooting. :( We were on it at the same time probably, sorry for the inconvenience. :( > I kept entering the luks passphrase and it booted then it rebooted, so I assume that was you? I bet. Sorry, I was a bit sleepy already. 🤦 > I do need to fix the luks not unlocking on boot for sure. > So, do the other ones all unlock properly? > I can't see whats different about 04 off hand... If you can fix that, it would be awesome. No there aren't any differences, FWICT. There are some significant problems with libvirt *on all of those machines*, but I haven't tried to reboot all of them -- I just "started experimenting with 04", and it failed to reboot (after sshd stopped working).
kevin self-assigned this 2026-04-17 17:57:29 +00:00
Owner

The problem was that there were 2 more encrypted volumes... /dev/md3 and /dev/md4.

We needed to bind them in ansible so it would use tang to unlock them.

I have done that and I think this should be all fixed now.

Let us know if you still see any problems.

The problem was that there were 2 more encrypted volumes... /dev/md3 and /dev/md4. We needed to bind them in ansible so it would use tang to unlock them. I have done that and I think this should be all fixed now. Let us know if you still see any problems.
kevin closed this issue 2026-04-17 18:14:44 +00:00
Owner

For some reason it was down today and the management machine weren't reachable from noc01.

I started it again and it booted without issue.

For some reason it was down today and the management machine weren't reachable from noc01. I started it again and it booted without issue.
Sign in to join this conversation.
No milestone
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
infra/tickets#13273
No description provided.