Nagios - FAS/OpenID error on https://admin.fedoraproject.org/nagios/ #4479
Labels
No labels
announcement
anubis
authentication
aws
backlog
blocked
bodhi
ci
cloud
communishift
copr
database
day-to-day
dc-move
deprecated
dev
discourse
dns
downloads
easyfix
epel
firmitas
forgejo_migration
Gain
High
Gain
Low
Gain
Medium
gitlab
greenwave
hardware
help wanted
high-trouble
koji
koschei
lists
low-trouble
medium-trouble
mirrorlists
monitoring
Needs investigation
odcs
OpenShift
ops
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
release-monitoring
releng
request-for-resources
s390x
security
SMTP
sprint-0
sprint-1
src.fp.o
staging
unfreeze
waiverdb
websites-general
wiki
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/tickets#4479
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
= bug description =
After going to https://admin.fedoraproject.org/nagios/ and doing the FAS/openid login, I receive a page with the error message: There has been an error while attempting to authenticate.
I have attached screen shots.
= bug analysis =
FAS/OpenID works for me on other sides, such as this one, just fine.
Using Firefox 31 on Mac OS X 10.8.5
= fix recommendation =
Good luck?
screen shot 1
Screen Shot 2014-08-06 at 8.10.57 PM.png
screen shot 2
Screen Shot 2014-08-06 at 8.11.12 PM.png
FYI, I don't mind working this as a newbie to infrastructure, but some pointers might be helpful :)
Also tested with Firefox 31 on Fedora 20...same results. I'm assuming its not browser or OS related, but thought it wouldn't hurt to test.
Will attempt to work this ticket with puiterwijk
Understanding of the problem so far:
(04:35:23 PM) puiterwijk: the problem is that mod_auth_openid is sending a nonce in the authopenid.nonce GET field with the request, but when the request comes back it tries to get openid.modauthopenid.nonce
(04:35:47 PM) puiterwijk: so try to find where in mod_auth_openid it sets and reads those values and figure out why sometimes the reading is prepended with openid.
ERROR LOG:
[Thu Aug 07 11:48:22.804235 2014] [core:error] [pid 17632] [client 192.168.1.15:40445] Error in authentication: openid.modauthopenid.nonce: no such field, referer: https://admin.fedoraproject.org/nagios?&modauthopenid.nonce=Ee7yhRvS1U&openid.assoc_handle=%7BHMAC-SHA1%7D%7B53e366a3%7D%7B3%2BH8sQ%3D%3D%7D&openid.claimed_id=http%3A%2F%2Flanica.id.fedoraproject.org%2F&openid.identity=http%3A%2F%2Flanica.id.fedoraproject.org%2F&openid.invalidate_handle=%7BHMAC-SHA256%7D%7B53d53eea%7D%7Bn5LJzA%3D%3D%7D&openid.mode=id_res&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.ns.pape=http%3A%2F%2Fspecs.openid.net%2Fextensions%2Fpape%2F1.0&openid.ns.sreg=http%3A%2F%2Fopenid.net%2Fextensions%2Fsreg%2F1.1&openid.op_endpoint=https%3A%2F%2Fid.fedoraproject.org%2Fopenid%2F&openid.pape.auth_level.nist=2&openid.pape.auth_level.ns.nist=http%3A%2F%2Fcsrc.nist.gov%2Fpublications%2Fnistpubs%2F800-63%2FSP800-63V1_0_2.pdf&openid.pape.auth_policies=http%3A%2F%2Fschemas.openid.net%2Fpape%2Fpolicies%2F2007%2F06%2Fnone&openid.pape.auth_time=2014-08-07T11%3A44%3A35Z&openid.response_nonce=2014-08-07T11%3A44%3A35ZOLXs1g&openid.return_to=https%3A%2F%2Fadmin.fedoraproject.org%2Fnagios%3F%26modauthopenid.nonce%3DEe7yhRvS1U&openid.sig=13idmVFVHhE%2BnnFiEgBm1OQ4NI8%3D&openid.signed=assoc_handle%2Cclaimed_id%2Cidentity%2Cinvalidate_handle%2Cmode%2Cns%2Cns.pape%2Cns.sreg%2Cop_endpoint%2Cpape.auth_level.nist%2Cpape.auth_level.ns.nist%2Cpape.auth_policies%2Cpape.auth_time%2Cresponse_nonce%2Creturn_to%2Csigned
[Thu Aug 07 15:04:37.860927 2014] [core:error] [pid 11886] [client 10.5.126.52:52827] Error in authentication: openid.modauthopenid.nonce: no such field, referer: https://id.fedoraproject.org/
ACCESS LOG:
10.5.126.52 - - [07/Aug/2014:07:35:29 +0000] "GET /nagios?&modauthopenid.nonce=txDwhr5zQX&openid.assoc_handle=%7BHMAC-SHA1%7D%7B53e32b0a%7D%7BnfFccA%3D%3D%7D&openid.claimed_id=http%3A%2F%2Fpuiterwijk.id.fedoraproject.org%2F&openid.identity=http%3A%2F%2Fpuiterwijk.id.fedoraproject.org%2F&openid.invalidate_handle=%7BHMAC-SHA256%7D%7B53d53eea%7D%7Bn5LJzA%3D%3D%7D&openid.mode=id_res&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.ns.pape=http%3A%2F%2Fspecs.openid.net%2Fextensions%2Fpape%2F1.0&openid.ns.sreg=http%3A%2F%2Fopenid.net%2Fextensions%2Fsreg%2F1.1&openid.op_endpoint=https%3A%2F%2Fid.fedoraproject.org%2Fopenid%2F&openid.pape.auth_level.nist=2&openid.pape.auth_level.ns.nist=http%3A%2F%2Fcsrc.nist.gov%2Fpublications%2Fnistpubs%2F800-63%2FSP800-63V1_0_2.pdf&openid.pape.auth_policies=http%3A%2F%2Fschemas.openid.net%2Fpape%2Fpolicies%2F2007%2F06%2Fnone&openid.pape.auth_time=2014-08-07T07%3A30%3A14Z&openid.response_nonce=2014-08-07T07%3A30%3A18ZNhqot0&openid.return_to=https%3A%2F%2Fadmin.fedoraproject.org%2Fnagios%3F%26modauthopenid.nonce%3DtxDwhr5zQX&openid.sig=tU%2FZ7kj3xUzAZgO5%2F9KrmBgY5rY%3D&openid.signed=assoc_handle%2Cclaimed_id%2Cidentity%2Cinvalidate_handle%2Cmode%2Cns%2Cns.pape%2Cns.sreg%2Cop_endpoint%2Cpape.auth_level.nist%2Cpape.auth_level.ns.nist%2Cpape.auth_policies%2Cpape.auth_time%2Cresponse_nonce%2Creturn_to%2Csigned HTTP/1.0" 401 1734 "https://id.fedoraproject.org/openid/?transaction=86a5a7bd4f584b648c2003bab697f1d3" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
Adding sysadmin-noc-members back as one of the owners.
Have a lab box set up like noc01 in my lab, with openid currently functional. Waiting for the issue to occur to troubleshoot.
The issue has been recreated in my lab. I am accessing my lab box over the same network, using http. When it reaches out to the FAS openid, it uses https, of course.
The http process has been running since Aug 11th @ 11a and I recreated the issue at 2p today (Aug 18th). The most recent test prior to that was Aug 13th @ 7:10p. That's a pretty large gap, but the important part is it has been recreated.
Here is the error message from /var/log/httpd/error_log:
[Sat Aug 16 14:07:19.890389 2014] [core:error] [pid 1228] [client 192.168.192.236:52811] Error in authentication: openid.modauthopenid.nonce: no such field
I would attach a sosreport, but the attachment limit is 256KB...
(All times are in Central)
Upon apache restart the mod_auth_openid sqlite database is removed. It is created/updated as users attempt to authenticate. I could recreate a similar issue by moving one database from an apache server to another, however that might not be a fair test. Emptying each table one by one revealed that the fix is in emptying the "associations" table. Something must be getting corrupt and causing that table to not be updated appropriately. Here's what the table looks like (and yes, it normally only has one entry):
CREATE TABLE associations (server VARCHAR(255), handle VARCHAR(100), encryption_type VARCHAR(50), secret VARCHAR(30), expires_on INT);
INSERT INTO "associations" VALUES('https://id.fedoraproject.org/openid/','{HMAC-SHA256}{53eff2b6}{44qTIA==}','HMAC-SHA256','9xOPRq4qgQJm26b0YhUVPdSXGfe2IpKmpQMMSmHn7PU=',1409443766);
The "expires_on" field still has almost 10 days left, so it has yet to expire.
This sounds a lot like a bug mentioned 2010 at this link:
https://github.com/bmuller/mod_auth_openid/issues/4
Note that the sqlite DB is at /tmp/systemd-private-XXXXXX/tmp/mod_auth_openid.db (where XXXXXX is a random character sequence, determined on httpd start). The sqlite DB is not created until the openid mod is actually invoked.
Being that I can instantly create the error by copying the "bad" sqlite DB in place, I think this might be easier to solve. I'll take a look at the source and see if I can find anything.
After digging in a bit, it is definitely the same issue. That is, a "server"'s association handle becomes invalid prior to the expiration and mod_auth_openid fails to validate it. Looks like this has to do with the different params interfaces/classes where mod_auth_openid adds "openid." prefix for get_param and has_param. The superclass has different logic and that seems to be conflicting. Talking with puiterwijk and a possible corresponding patch has not been merged into master. Below are the steps to recreate. Of course, its somewhat artificial, as the real test would be to invalidate a handle on the server side.
How to recreate mod_auth_openid issue ("openid.modauthopenid.nonce: no such field" error in logs that prevents anyone from logging in)
{{{
--systemd based OS (RHEL7, for example)
sqlite3 /tmp/systemd-private-*/tmp/mod_auth_openid.db "UPDATE ASSOCIATIONS SET handle='invalidhandle' WHERE server='https://id.fedoraproject.org/openid/';"
--sysvinit based OS (RHEL5 and RHEL5, for example)
sqlite3 /tmp/mod_auth_openid.db "UPDATE ASSOCIATIONS SET handle='invalidhandle' WHERE server='https://id.fedoraproject.org/openid/';"
(if this doesn't work, try removing the DB file, then starting over on step 1)
}}}
With the patch here:
github.com/bmuller/mod_auth_openid@bb1a59121bthe problem seems to get "better" in the fact of we do not getting a missing field error. We now receive this error:
[Mon Aug 18 09:57:38.556657 2014] [core:error] [pid 14551] [client 192.168.192.234:55141] Error in authentication: failed to check_authentication()
https://github.com/hacker/libopkele/blob/master/lib/basic_rp.cc -- line 252 is where it throws the error. line 311 is the original function throwing the error...if it were to get to line 319, for invalidate_assoc then I would see some messages in the logs...it would tell me it was trying to invalidate it
Its possible that recreating the issue artificially is causing part of libopkele to not function as expected. The handle would not exist at all, so therefore it might not be able to locate it. I will let the server/service run for awhile and see if the issue still remains.
Set up my own instance of fedoauth and I receive the "Error in authentication: failed to check_authentication()" error message. Need to find out why libopkele is not attempting to run invalidate_assoc
Its either a problem with libopeke or FedOAuth. The result data from running direct_request within basic_RP:check_authentication has some funkyness going on. I created a function to display all of the fields that res contains, and its showing html rather than field names as it should... Again to reference this can be found at https://github.com/hacker/libopkele/blob/master/lib/basic_rp.cc ... line 315. That's where the direct_request is, and after that, res does not have valid fields...
Here's an example of what the field print looks like for the same data type at a different location in the libopkele code:
{{{
field print #0 assoc_type
field print #1 dh_consumer_public
field print #2 dh_gen
field print #3 dh_modulus
field print #4 mode
field print #5 ns
field print #6 session_type
}}}
Those are indeed sane field names.
Here's what it looks like directly after line 315 of basic_rp.cc
{{{
field print #0
field print #1
field print #2
field print #3
This is the Federated Open Authentication provider homepage.
field print #4 <link rel="openid2.provider" href="http
<html> <head>field print #5 <!doctype html>
}}}
Ok, this appears to be a combination between mod_auth_openid and libopkele both having some a bug. mod_auth_openid has better handling for get_field and has_field functions with the patch:
github.com/bmuller/mod_auth_openid@bb1a59121bFor libopkele, when an association is invalid, invalidate_handle is passed, but libopkele is supposed to confirm back with the openid server that it should invalidate it. However, the process that it goes through keeps the "openid." prefix on all of the URL data, but "openid." is prefixed again (this is normal for the "direct_request" function of opkele, but it has an option to change the prefix) when it sends another request. This is a very basic fix (read: hack), the real fix may require much deeper changes to the code and/or restructuring. It would make more sense for the developers of libopkele to come out with a final fix. I have attached the patch "libopkele-invalid-association.patch" file for this basic fix. As far as I can tell if an association handle becomes invalid (other than for expiration), libopkele will not handle it appropriately. Note that the fix will present an error once to a user, but the association handle will become invalidated on the next attempt it will work just fine -- invalidated association handles should be a rare occasion so this might be appropriate.
libopekele-invalid-assocation.patch
libopkele-invalid-association.patch
FYI, opened a libopkele ticket at https://github.com/hacker/libopkele/issues/6
Lets just wait for the upstream fix to land. ;)
Thanks so much for all the great detective work here and patches!