proxies-reverseproxy: set keepalive=on ttl=10 for koji #3173
No reviewers
Labels
No labels
ai-review-please
freeze-break-request
post-freeze
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/ansible!3173
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "victorkoycheff/ansible:issue-12913-koji-502"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR addresses the intermittent 502 Bad Gateway errors users experience during long-running Koji connections (e.g.,
watch-task,watch-logs, orchainbuild).The Problem:
Currently, the Apache proxy has a
KeepAliveTimeoutof 15s and the Koji backend has aKeepAliveTimeoutof 16s. However, because the two layers of keep-alive are untethered, a race condition occurs.mod_proxyoccasionally attempts to reuse an idle connection from its backend pool at the exact millisecond the Koji backend is closing it, resulting in a 502.The Fix:
By adding
keepalive=on ttl=10to theproxyoptsfor the Koji proxies, we force Apache to proactively expire and drop idle backend connections after 10 seconds. This guarantees the proxy will never attempt to reuse a connection that is approaching the backend's 16-second threshold, completely eliminating the race condition.Reference: Apache mod_proxy documentation
Fixes infra/tickets#12913
I'm not fully convinced that this is the place where the keepalive is mismatched. It's a pretty complex path:
request -> proxy httpd -> anubis -> proxy httpd -> koji httpd
But it's possible at least. :)
Since we are in beta freeze this will need to wait until we are unfrozen, but we could try it then.
Or if we can duplicate it in staging we could try there.
d1912a75f4947cfd0b2cYeah, it's definitely a pretty complex path. :)
My thinking was that the race condition happens right at that very last hop to the koji backend, so dropping the connection at 10s there might just do the trick... Empirically, I've seen this exact tweak clear up similar 502s where the backend and proxy timeouts were fighting each other.
I also just force-pushed a fix for a silly yaml syntax error that was failing the linter checks.
We can definitely just leave this until the freeze is lifted and try to duplicate it in staging then.
AI Code Review
📋 MR Summary
Adds proxy keepalive and TTL options to Koji reverse proxy configurations to resolve intermittent 502 Bad Gateway errors.
proxyopts: "keepalive=on ttl=10"to the Koji production balancer configuration.proxyopts: "keepalive=on ttl=10"to the Koji staging balancer configuration.varnish_urlvariable.Detailed Code Review
The implementation aligns perfectly with the intent described in the PR and previous discussions. Setting
keepalive=on ttl=10is the standard and correct approach to preventmod_proxyfrom reusing connections that the backend is about to close. The indentation fix forvarnish_urlis a nice minor correction.📂 File Reviews
📄 `playbooks/include/proxies-reverseproxy.yml` - Updated proxy configuration to include keepalive options for Koji and fixed syntax indentation for variables.
- varnish_urltovarnish_urlcorrects the Ansible variable dictionary structure, which is good practice.✅ Summary
🤖 AI Code Review | Generated with ai-code-review | Model:
gemini-3.1-pro-preview⚠️ AI-generated suggestions may be incorrect. Verify before applying. Not a replacement for human review.
ok, lets try (staging first)...