dist-git lookaside cache files served with wrong HTTP headers #12812
Labels
No labels
announcement
anubis
authentication
aws
backlog
blocked
bodhi
ci
cloud
communishift
copr
database
day-to-day
dc-move
deprecated
dev
discourse
dns
downloads
easyfix
epel
firmitas
forgejo_migration
Gain
High
Gain
Low
Gain
Medium
gitlab
greenwave
hardware
help wanted
high-trouble
koji
koschei
lists
low-trouble
medium-trouble
mirrorlists
monitoring
Needs investigation
odcs
OpenShift
ops
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
release-monitoring
releng
request-for-resources
s390x
security
SMTP
sprint-0
sprint-1
src.fp.o
staging
unfreeze
waiverdb
websites-general
wiki
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
8 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
infra/tickets#12812
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It looks like .tar.gz / .crate files that are served by the lookaside cache have the following http headers set:
This causes some clients (those without workarounds) to uncompress .tar.gz files on-the-fly while downloading, which results in plain tarfiles being stored on disk (but with .tar.gz file extensions).
It looks like
fedpkg sourcesalready works despite that, similar to howspectool -galso handles this case (due to a workaround that's present in its download handling code that suppresses on-the-fly decompression of gzipped files).An indicator for when this breaks is checksum mismatches, like the one that happened in https://koji.fedoraproject.org/koji/taskinfo?taskID=137346319
Also see https://discussion.fedoraproject.org/t/verifying-the-authenticity-of-files-uploaded-to-the-lookaside-cache/134196 for a discussion thread where the "clients uncompress stuff on-the-fly even if you really don't want that to happen" was also relevant.
In general, it looks like the lookaside cache should not guess the file's MIME type or encoding, but always set
content-type: application/octet-stream(which is the default for unknown / binary data): https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types and send responses with nocontent-encodingheader present at all (since the data is supposed to be used as-is, without decoding first)?The headers that are sent currently (
content-type: application/x-tar,content-encoding: x-gzip) basically tell clients to "decompress this with gzip to get the .tar file you ordered", which is wrong and explains why clients that don't specifically override this behaviour get un-gzipped plain "tar" files (with ".tar.gz" file endings) or un-gzipped ".crate" files (which are just .tar.gz in disguise).Additionally, for an as of yet unknown reason, anubis-enabled src.fedoraproject.org seems to have caused
fedpkg sourcesfrom fedpkg-minimal to break with this - or a very similar - failure mode, causing checksum mismatches during koji builds. I don't see a difference in the HTTP headers that would explain that difference in behaviour though. :(To note: I have disabled anubis for src.fedoraproject.org for now, but we should figure out a workaround so we can re-enable it.
It should be relatively easy to work around this issue in clients like fedpkg-minimal - spectool does two things for this case:
Accept-Encoding: identityfor GET requestsdecode_content=Falseon the streamIt appears that both are needed to cover all cases (i.e. don't break downloading plain-text files and don't decompress gzipped files sent by confused servers).
Metadata Update from @zlopez:
Yeah, although with that we don't fix it for external users? or do we... does 'fedpkg' (non minimal) do the right thing?
It appears that non-minimal
fedpkg sourcesdownloads the files correctly, but I'm not sure what's the difference -fedpkgusespycurlandfedpkg minimaluses thecurlCLI, and it looks like they use the same settings (i.e. don't appear to set any headers at all, other thanPragma:). So, I'm quite confused. :)so... can you duplcate this with src.stg.fedoraproject.org?
(we may need to upload/sync some lookaside files there)
if so, I can reenable anubis there and I can try some fixes on the server end...
Is that what you meant to try?
I can't seem to find any packages that have uploaded sources to the stg lookaside cache. Do you have an example?
The intent was that stg would have a read only copy of the prod lookaside.
It did, but there was a messed up link to it. ;)
Can you try again now?
Which are the same headers as for the non-stg variant.
Running
wgetfor both URLs (with and without .stg domain) gives me plain uncompressed .tar archives:... which is exactly what the HTTP headers sent by the server tell wget to do.
Coincidentally,
curling the same URLs gives me:So behaviour of what's saved to disk seems to depend on the tool that's used. FWIW I don't think "curl" is doing the correct thing here. :)
So I am a bit confused here.
stg is behaving the same was a prod? Or it's not?
The reason I wanted to check that is so that I could try and change things in staging to find a 'fix' on the server side and then we could deploy it on prod.
I just noticed that I had disabled anubis in staging src too. I am turning it back on so we can see it it has the behavior...
As far as I can tell, they're giving HTTP responses with identical headers, yes - but if you disabled anubis on stg.src.fp.o too that is kind of expected :)
I'll try again later.
anubis is now enabled on src.stg again.
I tried again, looks like
curlis still giving me (mostly) the same HTTP headers for both:@music can you try reproducing the issue again? because I can't seem to see any differences here.
(The headers are still wrong IMO - they should tell clients to use the data as-is, not that it needs to be gzip compressed on receipt - but they're not different between stg and prod as far as I can tell.)
Looks like someone else hit this in #12842
I have switched prod anubis off on src for now.
Rough and ditry pr... I applied it in staging already, and it changes the content-type... but it still ungzips it.
https://pagure.io/fedora-infra/ansible/pull-request/2904
ok. I had to put it back... things were just too bad without anubis. ;(
That said, the internal proxies that are used by builders don't have it enabled, so I think real builds should work fine.
Hmm, I wouldn’t expect changing just the
Content-Typeto affect theContent-Encoding(unless it happened to interact with a pre-existing MIME-based rule for content encoding), so at least that result is explicable.I tried
fedpkg sourcesforrust-jod-thread, and it’s failing again:I tried
fedpkg scratch-buildand that failed, too:https://koji.fedoraproject.org/koji/taskinfo?taskID=138160053
I think a (dist-git based, not
--srpm) scratch build should be the same as a real build, right? If so, then I expect real builds to fail as well.I haven’t taken the time to dig through the Ansible configurations and understand how they actually fit together, but is the lookaside cache governed by the
mod_deflateconfiguration in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/fedora-web/main/files/deflate.conf ? Might something like this work?I’m not sure if you would need to mess around with the
Varyheader as well or not, or if something likewould be processed in the right order with respect to the
mod_deflateconfiguration to have the desired effect.I think the "content-encoding: x-gzip" part is coming from the proxies. But while it's not useful for lookaside data, clients should deal with that.
The clients do deal with it, by removing the declared content-encoding to obtain what the server is claiming should be the actual payload. The problem is that the content-encoding header is lying. For it to be accurate, the HTTP response would have to be gzipped (again, on top of the gzip encoding that happens to be the outermost layer in the
.cratefile format).OOF this is now hitting me too ...
Is there anything we can do, at least as a temporary workaround? Maybe patch fedpkg-minimal (or whatever downloads the sources in the buildSRPMfromSCM task in koji) to avoid decompressing stuff on-the-fly?
It appears that adding
-H "Accept-Encoding: identity"to thecurlCLI call in fedpkg-minimal works around this issue. I'll file a PR with that ...https://src.fedoraproject.org/rpms/fedpkg-minimal/pull-request/2
I think thats fine, but it's going to take a while to get that to where it can help. I was hoping we could do something on the server side to get things working faster. ;(
Sigh, I found that dns was wrong and internal stuff was using proxy01/10 instead of proxy101/110.
I have changed it over now, please retry your scratch build from scm and I hope it will work.
Ha, it's always DNS 😆 I can confirm that a build that failed earlier this evening due to a checksum issue now passed. 🎈
sccache also started building for me
Okay, I'm going to close this as it looks like it was all DNS :(
Metadata Update from @james:
I wouldn’t consider this fixed. This is no longer breaking koji builds from dist-git, because the DNS issue causing koji to use the external proxies was fixed. However, things are still utterly broken for anything that accesses the lookaside cache externally, including COPR builds from dist-git (e.g. https://copr.fedorainfracloud.org/coprs/music/tungstenite/build/9704757/) and things like:
Trying to find, patch, and redeploy every piece of code that accesses the lookaside patch to set
Accept-Encoding: identityà la https://src.fedoraproject.org/rpms/fedpkg-minimal/pull-request/2 is a plausible workaround, but the root problem is that the server and/or proxy are confused and are lying about what they are serving by tacking on aContent-Encoding: gzipheader but not actually adding another layer of gzip compression.Yeah. We should still try and fix the server end if we can, or failing that at least fix fedpkg/fedpkg-minimal.
Metadata Update from @kevin:
Ah, so
fedpkg sourcesreally is affected too? That explains why I couldn't find a reason why it shouldn't be :(Does anybody want to submit a PR to fedpkg equivalent to the one I sent for fedpkg-minimal, or should I do that?
ok, I updated anubis today... can you see if it changed anything? There's a bunch of changes, but it's hard to tell if any would affect this issue.
Looks the same to me.
Filed a PR for
rpkgtoo: https://pagure.io/rpkg/pull-request/758 , PTAL.I just encountered this problem in a real Koji build again.
https://koji.fedoraproject.org/koji/taskinfo?taskID=139122958
Just to clarify the impact, the fact that this is happening in koji builds from dist-git again means that a large number of Rust packages will fail to build. No external workaround exists, and much Rust packaging work will be impossible until some kind of fix is implemented.
ok, I dug into this and I think I see what the problem is.
I did make changes to varnish to cache src, and I think it's doing the same thing anubis did...
I'll look at a commit to fix this.
ok, please try now. I tested and it seems to be working in that testing.
Yes, it looks like it’s working again!
https://koji.fedoraproject.org/koji/taskinfo?taskID=139139537
Thank you for investigating.
Sorry for the trouble... ;( I knew varnish would break something there, but it's really helping the scraper load...
We certainly need all the help we can get! Thanks for looking at it. I’m glad it wasn’t too difficult to get things back to the status quo.
I think I'm seeing this in copr, where I'm trying to build a rust package:
https://download.copr.fedorainfracloud.org/results/gordonmessmer/nodejs-electron/fedora-rawhide-x86_64/10071219-rust-dialoguer/builder-live.log.gz
99310bece51622968cf251bf047c0277 is the check-sum of the file after un-compressing it.
Hi @kevin, @decathorpe, and @music. I spent some time digging into this.
It looks like the root cause wasn't the backend configuration, but rather the frontend proxies configured via roles/fedora-web/main/files/deflate.conf. The mod_deflate rules were intercepting everything that wasn't an image, applying gzip compression, and tacking on the header.
I've submitted a PR (infra/ansible#3170) to add .crate, .tar.gz, .rpm, and other common archives to the SetEnvIfNoCase exclusion list so the proxies pass them through natively.
Let me know what you think and if further refinements are needed
Thanks for tracking it down!
The bug that @decathorpe originally described appears to be that the proxies where NOT applying gzip compression, but tacking on the header anyway. That is very probably a bug in httpd. It might be a bug that's specific to reverse proxying.
I found a discussion that probably describes the same bug, which mentions an entry in the upgrading document, "mod_deflate will now skip compression if it knows that the size overhead added by the compression is larger than the data to be compressed. " which could fit the behavior we see. Already-compressed data will probably be larger if compressed again, so mod_deflate doesn't compress the data stream, but it does add the header (which is bad).
Personally, I would strongly recommend throwing out the entire legacy configuration and use the sample configuration in the mod_deflate docs:
The legacy configuration is: "don't compress data for some specific ancient versions of Mozilla", "but do compress data if it is an ancient MSIE pretending to be an ancient Mozilla", "but for some specific fille types, don't compress data AND don't tell the client that its user-agent affected that decision". None of that seems necessary, and keeping it seems like a good way to keep running into Heisenbugs.
Well, it's a nice theory, but as I noted in the pr... this deflate config is only for fedoraproject.org, not src.fedoraproject.org or any other subdomain sites. Those are all using default DEFLATE settings as far as I can tell.
Quick update: Kevin was right,
deflate.confwasn't being applied to thesrcsubdomains. I've completely pivoted the linked PR (infra/ansible#3170). The fix now targets the actualdistgitrole (lookaside.conf) and cleanly disables the global DEFLATE filter for the cache directory. We should be good to test it in staging once the freeze liftsWell, staging isn't frozen, we can test in staging anytime. ;)
I'll comment on the pr. Thanks again for working on it.
ok, merged and live in staging. Can anyone test? A quick test here seems promising...
I can reproduce the problem using "https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate"
Is there a path to that file that runs through staging systems?
The copr thing is completely seperate as far as I know.
The fix should be live for src.stg.fedoraproject.org
I was seeing this before I think here, but now it seems working:
I don't think it's working:
When I get that file, the sha512sum doesn't match the one in the URL, and the file that's saved to disk isn't compressed.
Ah yes, I somehow thought they were not supposed to be compressed, but re-reading they are.
So, yeah, not working. ;(
@decathorpe wrote that "adding -H "Accept-Encoding: identity" to the curl CLI" worked around the problem, but:
I get the correct file when I request it this way, but the server's response still suggests that it's doing on-the-fly compression, which isn't supposed to happen.
Whereas, if I do this instead:
... I get the wrong (uncompressed) file as a result, but the response headers no longer indicate on-the-fly compression.
So I'm going to ask a very stupid question.
Has anyone looked at these files on disk to make sure they weren't mistakenly uncompressed after upload?
Note: As far as I can tell, curl and wget behave differently here, so that complicates things.
Ok, I dug into this quite some more.
So, it turns out the deflate config was a bit of ared herring. src doesn't even have gzip enabled, so it never applied the DEFLATE filter to anything. ;(
The real problem is the backend apache on pkgs01. It has
mod_mime_magicenabled, which sniffs the.cratefiles, guesses they are tarballs, and tacks onContent-Encoding: x-gzipbefore sending it to the proxy.This also explains why
wgetandcurlwere behaving differently.wgetsees the falsex-gzipheader and helpfully decompresses it on the fly, saving a plain tar archive.curlignores it unless you pass--compressed, but the pycurl bindingsfedpkguses try to be smart and decompress it too.I have pushed a new fix that forces octet stream and unsets the encoding so it stops guessing. I tested locally and it seems to be working in that testing.
Once it's merged we can test in staging again...
Check the PR here and let me know what you think: infra/ansible#3193
@decathorpe @kevin @gordonmessmer
They do behave differently, and I think the curl man page explains why. In the section titled "OUTPUT":
curl does not parse or otherwise "understand" the content it gets or writes as output. It does no encoding or decoding, unless explicitly asked to with dedicated command line options.Something does seem wrong with the service, because adding the
-H "Accept-Encoding: identity"option to curl sends that header as expected, but the service responds with a "content-encoding: gzip" header and a compressed response. Curl saves the compressed response to a file, just like the documentation says it will.I think the reason this seems confusing is that in some cases we're directly modifying curl's headers, but doing that doesn't change curl's behavior beyond the header it sends. Adding a header to the request does not change how curl handles the response. The header we add on the command line is basically opaque to the application.
We actually get an uncompressed file in at least 3 situations:
We only get a compressed file if we use curl and mangle its request by inserting a header that results in a compressed reply.
It's possible that there's some bad interaction between some pair of systems in the back end path, but the simplest explanation for the behavior we're seeing is that the files were inflated by mistake during the upload process.
Makes sense. In that case, the backend is setting a content-encoding header indicating that the file is being compressed, the proxy determines that the client did not indicate it can accept gzip encoding, and decompresses the file on the fly to match the client's expectations.
A different change that would confirm that would be to update /etc/httpd/conf/magic, replacing:
with:
And then I can argue with the httpd developers that this is a bug, because https://httpwg.org/specs/rfc9110.html#field.content-encoding states:
"If the media type includes an inherent encoding, such as a data format that is always compressed, then that encoding would not be restated in Content-Encoding"
I merged the pr... can folks retest?
I can try and revert and do the magic change after that if still desired?
No change to any of the 4 commands I outlined earlier.
Well, technically one change. "content-type: application/x-tar" has changed to "content-type: application/octet-stream", but the file saved to disk is the same, regardless.
I checked out an older version of the rust-jod-thread package to get its crate and sha512 sum from the sources file:
For the older file, the sha512 sum of the file matches the URL.
For the new file, the sha512 sum does not match the URL.
It seems very very likely that something changed with the upload process, and the new crate was inflated before it was written to storage in the lookaside cache.
There's some indication that the proxies are doing weird things (e.g. returning compressed data when curl requests "identity" encoding), but I think the problem is elsewhere.
@kevin checked the files on disk for me and confirmed they match the expected sha512sums.
So, I tried a few more things, and I can confirm this behavior in the default httpd config:
vs:
And the problem can be resolved by turning off mod_mime_magic.
mod_mime_magic always does the wrong thing with gzip content. RFC 9110 specifically says that content-encoding: gzip should not be added to files just because they're compressed.
I'm not sure what the underlying cause is, but it's probably something like http doesn't have the correct magic for one of the tar file types. (mod_mime_magic won't add the content-encoding header if it can't identify the mime type of the uncompressed data)
@gordonmessmer wrote in #12812 (comment):
I did not expect that! It would appear that mod_mime_magic is broken by design.
Wow. ok... good detective work. ;)
So, what are our options then?
This has been a pretty confusing issue, but one thing I'll note: external access to these files shows the issue. Internal access (for builds, which uses another set of proxies that does not have anubis enabled) doesn't? At least official builds of things using these crates have worked, just not external access? So, to me that points to the pass via anubis being involved. that is: request -> apache -> proxy to anubis -> anubis proxies back to apache -> apache proxies to backend. But it shouldn't change anything, it's just weighing the connection and if it's ok passing it along...
I recommend this regardless of any other issue. If other things break, a static mime mapping can be added to un-break them. For any file that doesn't get a mime type based on its extension, mod_mime_magic will look for a couple of bytes that indicate that it might be compressed data, and if they are present, it will fork/exec gzip and pass a block of data to that process.
Passing untrusted data to gzip, which might be vulnerable to "zip bombs" or other vulnerabilities is, frankly, insane. I already have a patch to remove that specific functionality from mod_mime_magic, and I'm going to ask the httpd developers to drop it.
That should fix the problem, whether or not mod_mime_magic is enabled (since mod_mime_magic only activates for files that don't have a mime type from another source).
You can add a line to /etc/mime.types:
Or to an httpd.conf file:
And you can test the configuration by downloading a file that ends in ".0.crate" or ".9.crate", such as jod-thread-1.0.0.crate
Why "0.crate" specifically? Apparently because a file that ends in ".<1-8>.crate" will be "application/x-troff-man" because of this entry in /etc/mime.types:
... which explains why the problem only affects a small number of files. mod_mime_magic won't add the bad content-encoding header to files that end in ".[1-8].crate" or ".[1-8].*.crate"
ok. Anyone want to do a pull request disabling mod_mime_magic? we can test it in stg...
On it
Trivia: Apache will disable this module by default in a future release:
github.com/apache/httpd@10232a0180