Master Mirror Issues #1531
Labels
No labels
announcement
anubis
authentication
aws
backlog
blocked
bodhi
ci
cloud
communishift
copr
database
day-to-day
dc-move
deprecated
dev
discourse
dns
downloads
easyfix
epel
firmitas
forgejo_migration
Gain
High
Gain
Low
Gain
Medium
gitlab
greenwave
hardware
help wanted
high-trouble
koji
koschei
lists
low-trouble
medium-trouble
mirrorlists
monitoring
Needs investigation
odcs
OpenShift
ops
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
release-monitoring
releng
request-for-resources
s390x
security
SMTP
sprint-0
sprint-1
src.fp.o
staging
unfreeze
waiverdb
websites-general
wiki
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/tickets#1531
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We have been having issues getting a copy of the last Fedora updates push off of the netapp. Red Hat IT was notified and they are still trying to determine the cause. They did report a steep increase on load on the master mirror machines:
http://mmcgrath.fedorapeople.org/server1-loadavg30min-2week.png
Currently, we are attempting to get a copy of the Fedora 10 and 11 updates off of the netapp to manually sync to Tier 0 mirrors, however this is occuring extremely slowly due to the above issues.
Unfortunately, we've canceled the manual sync, as it was not going to finish in a reasonable amount of time. The issue is being escalated in Red Hat IT though.
Some mirror admins are currently looking into the possibility of manually copying some commonly requested updated packages to mirrors for the time being, but this would just be a temporary course of action.
FWIW, tummy.com is up to date. nirik was clever enough to get the missing packages directly from our build system :)
I've encouraged the other mirrors to do a one time sync off of it until master is back up.
http://mirrors.tummy.com/pub/fedora.redhat.com/fedora/linux/updates/11/i386/
Hi, unfortunately, we have not received any significant updates from RH IT since last night. We have resumed the manual sync, and hope that we will be able to get at least some portion of the latest updates out to some main mirrors soon.
We have now gotten a full copy of F11 updates off of the netapp. We are currently speaking to some main mirrors and asking them to manually sync this. This should bring them up to date for Fedora 11. The other public mirrors should eventually pick up these changes from the main mirrors as well.
We are working on similar actions for F10 updates, and eventually, if the netapp issues aren't fixed by then, F10 and F11 updates-testing.
Whoops, I accidentally removed somebody from the CC list, my mistake.
The latest F11 and F10 updates are both off the netapp now. F11 updates have been pushed out onto a fast Tier 0 mirror and is in the process of being synced down to other tier mirrors. The F10 updates are in the process of syncing to a faster mirror from which it can be widely distributed downstream.
The situation with getting updates out has improved - we are still seeing significant slowness in building filelists when syncing from the master mirrors, but some mirrors have done successful syncs against the master mirrors.
We are currently working on pushing the full set of current updates to another alternate fast location for mirrors to sync from. That sync will hopefully be complete in a few hours.
Another update - we just noticed a configuration mistake that was causing all Fedora 10/11 and EPEL updates to change timestamps on all files. This could have caused large amounts of extra checksumming on the master mirrors and accounted for some of the slowness in getting updates out. We've fixed the configuration mistake now, and timestamps should be fixed in the next push. Hopefully things should start to settle down at that point.
Sorry there's not much of an update today, we're still waiting for the current updates push to finish so that timestamps will be sane on the mirrors again.
The updates push with the fixed timestamps has completed. These fixes are now being synced to our various mirrors.
One thing that I forgot to mention in this ticket. We originally had three netapps in different locations, but most unfortunately, two of them were down at the beginning of this issue.
Throughout all of the other stuff we've been doing, people in Red Hat have been trying to get those back up as well so that we wouldn't be at a single point of failure like this.
Update: Mike did some tests with the actimeo nfs option, and achieved enormous improvements in file generation time. We are now testing these options out on the master mirror rsync servers. Hopefully it will help to improve mirror syncing abilities and allow us to bump up the connection limits further without killing these machines.
The actimeo nfs option (along with noatime, nosuid, and nodev) have been added to all 4 rsync servers. Initially testing shows that filelist generation times have dropped to less than half of what we were seeing before the changes were made. Connection limits are now set at 7/server for a total of about 28 slots.
One more update - connection limits are now being raised to 12 connections x 4 servers.
Connection limits are now 12 connections x 5 servers. We're starting to ask mirrors if they are seeing pulls from the master go back to normal.
Things from the end user perspective seem to have returned to normal in the past few days, and users should hopefully be receiving updates normally. We will be doing some more testing before we send out an email that the issue has ended.
Sorry I haven't had an update on this in a while - as you've seen, things have pretty much gone back to normal with updates in the past few weeks. Right now, we are working on getting the Internet2-connected master back up so that we won't have a single point of failure with the master mirrors.
I2 mirror is back up now, and seems to be working as expected (routing to I2 and NLR working).
I'm going to close this now. Open a new ticket if problems re-occur.