disk auto-decryption problem on vmhost-x86-copr04 #13273
Labels
No labels
announcement
anubis
authentication
aws
backlog
blocked
bodhi
ci
cloud
communishift
copr
database
day-to-day
dc-move
deprecated
dev
discourse
dns
downloads
easyfix
epel
firmitas
forgejo_migration
Gain
High
Gain
Low
Gain
Medium
gitlab
greenwave
hardware
help wanted
high-trouble
koji
koschei
lists
low-trouble
medium-trouble
mirrorlists
monitoring
Needs investigation
odcs
OpenShift
ops
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
release-monitoring
releng
request-for-resources
s390x
security
SMTP
sprint-0
sprint-1
src.fp.o
staging
unfreeze
waiverdb
websites-general
wiki
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/tickets#13273
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Description of request
I am able to get the console output, but I fail to understand the error.
It seems some disks are successfully decrypted, but some are not.
I just evacuated the host; it's OK to reboot it anytime you need (in case debugging is needed).
Right now we face some weird FD-leak issue in virtqemud, causing headaches in Copr (and causing other FD leaks on the copr side). After some time, the SSH on the affected host dies :-/ and I need to reboot over idrac/ipmi (so it would be nice if the machine went online automatically).
Oh. I thought something was wrong with the hardware and it was rebooting. :(
I kept entering the luks passphrase and it booted then it rebooted, so I assume that was you?
I do need to fix the luks not unlocking on boot for sure.
So, do the other ones all unlock properly?
I can't see whats different about 04 off hand...
We were on it at the same time probably, sorry for the inconvenience. :(
I bet. Sorry, I was a bit sleepy already. 🤦
If you can fix that, it would be awesome.
No there aren't any differences, FWICT. There are some significant problems with libvirt on all of those machines, but I haven't tried to reboot all of them -- I just "started experimenting with 04", and it failed to reboot (after sshd stopped working).
The problem was that there were 2 more encrypted volumes... /dev/md3 and /dev/md4.
We needed to bind them in ansible so it would use tang to unlock them.
I have done that and I think this should be all fixed now.
Let us know if you still see any problems.
For some reason it was down today and the management machine weren't reachable from noc01.
I started it again and it booted without issue.