dist-git lookaside cache files served with wrong HTTP headers #12812

Open
opened 2025-09-21 22:01:00 +00:00 by decathorpe · 70 comments

It looks like .tar.gz / .crate files that are served by the lookaside cache have the following http headers set:

content-type: application/x-tar
content-encoding: x-gzip

This causes some clients (those without workarounds) to uncompress .tar.gz files on-the-fly while downloading, which results in plain tarfiles being stored on disk (but with .tar.gz file extensions).

It looks like fedpkg sources already works despite that, similar to how spectool -g also handles this case (due to a workaround that's present in its download handling code that suppresses on-the-fly decompression of gzipped files).

An indicator for when this breaks is checksum mismatches, like the one that happened in https://koji.fedoraproject.org/koji/taskinfo?taskID=137346319

Also see https://discussion.fedoraproject.org/t/verifying-the-authenticity-of-files-uploaded-to-the-lookaside-cache/134196 for a discussion thread where the "clients uncompress stuff on-the-fly even if you really don't want that to happen" was also relevant.


In general, it looks like the lookaside cache should not guess the file's MIME type or encoding, but always set content-type: application/octet-stream (which is the default for unknown / binary data): https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types and send responses with no content-encoding header present at all (since the data is supposed to be used as-is, without decoding first)?

The headers that are sent currently (content-type: application/x-tar, content-encoding: x-gzip) basically tell clients to "decompress this with gzip to get the .tar file you ordered", which is wrong and explains why clients that don't specifically override this behaviour get un-gzipped plain "tar" files (with ".tar.gz" file endings) or un-gzipped ".crate" files (which are just .tar.gz in disguise).


Additionally, for an as of yet unknown reason, anubis-enabled src.fedoraproject.org seems to have caused fedpkg sources from fedpkg-minimal to break with this - or a very similar - failure mode, causing checksum mismatches during koji builds. I don't see a difference in the HTTP headers that would explain that difference in behaviour though. :(

It looks like .tar.gz / .crate files that are served by the lookaside cache have the following http headers set: ``` content-type: application/x-tar content-encoding: x-gzip ``` This causes *some* clients (those without workarounds) to uncompress .tar.gz files on-the-fly while downloading, which results in plain tarfiles being stored on disk (but with .tar.gz file extensions). It looks like `fedpkg sources` already works despite that, similar to how `spectool -g` also handles this case (due to a workaround that's present in its download handling code that suppresses on-the-fly decompression of gzipped files). An indicator for when this breaks is checksum mismatches, like the one that happened in https://koji.fedoraproject.org/koji/taskinfo?taskID=137346319 Also see <https://discussion.fedoraproject.org/t/verifying-the-authenticity-of-files-uploaded-to-the-lookaside-cache/134196> for a discussion thread where the "clients uncompress stuff on-the-fly even if you really don't want that to happen" was also relevant. --- In general, it looks like the lookaside cache should not guess the file's MIME type or encoding, but *always* set `content-type: application/octet-stream` (which is the default for unknown / binary data): <https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types> *and* send responses with *no* `content-encoding` header present *at all* (since the data is supposed to be used as-is, without decoding first)? The headers that are sent currently (`content-type: application/x-tar`, `content-encoding: x-gzip`) basically tell clients to "decompress this with gzip to get the .tar file you ordered", which is *wrong* and explains why clients that don't specifically override this behaviour get un-gzipped plain "tar" files (with ".tar.gz" file endings) or un-gzipped ".crate" files (which are just .tar.gz in disguise). --- Additionally, for an as of yet unknown reason, anubis-enabled src.fedoraproject.org seems to have caused `fedpkg sources` from fedpkg-minimal to break with this - or a very similar - failure mode, causing checksum mismatches during koji builds. I don't see a difference in the HTTP headers that would explain that difference in behaviour though. :(
Owner

To note: I have disabled anubis for src.fedoraproject.org for now, but we should figure out a workaround so we can re-enable it.

To note: I have disabled anubis for src.fedoraproject.org for now, but we should figure out a workaround so we can re-enable it.
Author

It should be relatively easy to work around this issue in clients like fedpkg-minimal - spectool does two things for this case:

  • set Accept-Encoding: identity for GET requests
  • set decode_content=False on the stream

It appears that both are needed to cover all cases (i.e. don't break downloading plain-text files and don't decompress gzipped files sent by confused servers).

It should be relatively easy to work around this issue in clients like fedpkg-minimal - spectool does two things for this case: - set `Accept-Encoding: identity` for GET requests - set `decode_content=False` on the stream It appears that both are needed to cover *all* cases (i.e. don't break downloading plain-text files *and* don't decompress gzipped files sent by confused servers).
Owner

Metadata Update from @zlopez:

  • Issue priority set to: Waiting on Assignee (was: Needs Review)
  • Issue tagged with: high-gain, medium-trouble, ops
**Metadata Update from @zlopez**: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: high-gain, medium-trouble, ops
Owner

Yeah, although with that we don't fix it for external users? or do we... does 'fedpkg' (non minimal) do the right thing?

Yeah, although with that we don't fix it for external users? or do we... does 'fedpkg' (non minimal) do the right thing?
Author

It appears that non-minimal fedpkg sources downloads the files correctly, but I'm not sure what's the difference - fedpkg uses pycurl and fedpkg minimal uses the curl CLI, and it looks like they use the same settings (i.e. don't appear to set any headers at all, other than Pragma: ). So, I'm quite confused. :)

It appears that non-minimal `fedpkg sources` downloads the files correctly, but I'm not sure what's the difference - `fedpkg` uses `pycurl` and `fedpkg minimal` uses the `curl` CLI, and it looks like they use the same settings (i.e. don't appear to set any headers at all, other than `Pragma: `). So, I'm quite confused. :)
Owner

so... can you duplcate this with src.stg.fedoraproject.org?

(we may need to upload/sync some lookaside files there)

if so, I can reenable anubis there and I can try some fixes on the server end...

so... can you duplcate this with src.stg.fedoraproject.org? (we may need to upload/sync some lookaside files there) if so, I can reenable anubis there and I can try some fixes on the server end...
ben@bean:~/fedora/rust-sig/rust-jod-thread$ fedpkg -d -v sources
Creating repo object from /home/ben/fedora/rust-sig/rust-jod-thread
Initiating a koji session to https://koji.fedoraproject.org/kojihub
Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs
Full url: https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate
######################################################################## 100.0%
$ curl -L https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | tar -tz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6407  100  6407    0     0  18165      0 --:--:-- --:--:-- --:--:-- 18150
jod-thread-1.0.0/.cargo_vcs_info.json
jod-thread-1.0.0/.gitignore
jod-thread-1.0.0/Cargo.lock
jod-thread-1.0.0/Cargo.toml
jod-thread-1.0.0/Cargo.toml.orig
jod-thread-1.0.0/LICENSE-APACHE
jod-thread-1.0.0/LICENSE-MIT
jod-thread-1.0.0/README.md
jod-thread-1.0.0/src/lib.rs
$ curl -L https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>

Is that what you meant to try?

``` ben@bean:~/fedora/rust-sig/rust-jod-thread$ fedpkg -d -v sources Creating repo object from /home/ben/fedora/rust-sig/rust-jod-thread Initiating a koji session to https://koji.fedoraproject.org/kojihub Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs Full url: https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate ######################################################################## 100.0% $ curl -L https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | tar -tz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 6407 100 6407 0 0 18165 0 --:--:-- --:--:-- --:--:-- 18150 jod-thread-1.0.0/.cargo_vcs_info.json jod-thread-1.0.0/.gitignore jod-thread-1.0.0/Cargo.lock jod-thread-1.0.0/Cargo.toml jod-thread-1.0.0/Cargo.toml.orig jod-thread-1.0.0/LICENSE-APACHE jod-thread-1.0.0/LICENSE-MIT jod-thread-1.0.0/README.md jod-thread-1.0.0/src/lib.rs $ curl -L https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL was not found on this server.</p> </body></html> ``` Is that what you meant to try?
Author

I can't seem to find any packages that have uploaded sources to the stg lookaside cache. Do you have an example?

I can't seem to find any packages that have uploaded sources to the stg lookaside cache. Do you have an example?
Owner

The intent was that stg would have a read only copy of the prod lookaside.

It did, but there was a messed up link to it. ;)

Can you try again now?

The intent was that stg would have a read only copy of the prod lookaside. It did, but there was a messed up link to it. ;) Can you try again now?
Author
$ curl -I https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate
(...)
content-type: application/x-tar
content-encoding: x-gzip
(...)

Which are the same headers as for the non-stg variant.

Running wget for both URLs (with and without .stg domain) gives me plain uncompressed .tar archives:

$ file jod-thread-1.0.0.crate 
jod-thread-1.0.0.crate: POSIX tar archive (GNU)

... which is exactly what the HTTP headers sent by the server tell wget to do.

Coincidentally, curl ing the same URLs gives me:

$ file jod-thread-1.0.0.crate
jod-thread-1.0.0.crate: gzip compressed data, was "jod-thread-1.0.0.crate", max compression, original size modulo 2^32 25088 

So behaviour of what's saved to disk seems to depend on the tool that's used. FWIW I don't think "curl" is doing the correct thing here. :)

``` $ curl -I https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate (...) content-type: application/x-tar content-encoding: x-gzip (...) ``` Which are the same headers as for the non-stg variant. Running `wget` for both URLs (with and without .stg domain) gives me plain uncompressed .tar archives: ``` $ file jod-thread-1.0.0.crate jod-thread-1.0.0.crate: POSIX tar archive (GNU) ``` ... which is exactly what the HTTP headers sent by the server tell wget to do. Coincidentally, `curl` ing the same URLs gives me: ``` $ file jod-thread-1.0.0.crate jod-thread-1.0.0.crate: gzip compressed data, was "jod-thread-1.0.0.crate", max compression, original size modulo 2^32 25088 ``` So behaviour of what's saved to disk seems to depend on the tool that's used. FWIW I don't think "curl" is doing the correct thing here. :)
Owner

So I am a bit confused here.

stg is behaving the same was a prod? Or it's not?

The reason I wanted to check that is so that I could try and change things in staging to find a 'fix' on the server side and then we could deploy it on prod.

I just noticed that I had disabled anubis in staging src too. I am turning it back on so we can see it it has the behavior...

So I am a bit confused here. stg is behaving the same was a prod? Or it's not? The reason I wanted to check that is so that I could try and change things in staging to find a 'fix' on the server side and then we could deploy it on prod. I just noticed that I had disabled anubis in staging src too. I am turning it back on so we can see it it has the behavior...
Author

So I am a bit confused here.

stg is behaving the same was a prod? Or it's not?

As far as I can tell, they're giving HTTP responses with identical headers, yes - but if you disabled anubis on stg.src.fp.o too that is kind of expected :)

I'll try again later.

> So I am a bit confused here. > stg is behaving the same was a prod? Or it's not? As far as I can tell, they're giving HTTP responses with identical headers, yes - but if you disabled anubis on stg.src.fp.o too that is kind of expected :) I'll try again later.
Owner

anubis is now enabled on src.stg again.

anubis is now enabled on src.stg again.
Author

I tried again, looks like curl is still giving me (mostly) the same HTTP headers for both:

$ curl -I https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sort 

accept-ranges: bytes
apptime: D=64700
content-encoding: x-gzip
content-length: 6407
content-type: application/x-tar
date: Fri, 10 Oct 2025 18:21:42 GMT
etag: "1907-63f20ec0c5b00"
HTTP/2 200 
last-modified: Fri, 19 Sep 2025 05:41:32 GMT
referrer-policy: same-origin
server: Apache
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
x-fedora-appserver: pkgs01.rdu3.fedoraproject.org
x-fedora-proxyserver: proxy10.rdu3.fedoraproject.org
x-fedora-requestid: aOlOthUO8dOW_ab7KBDV2QAI0xQ
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
$ curl -I https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sort 

accept-ranges: bytes
apptime: D=73227
content-encoding: x-gzip
content-length: 6407
content-type: application/x-tar
date: Fri, 10 Oct 2025 18:21:46 GMT
etag: "1907-63f20ec0c5b00"
HTTP/2 200 
last-modified: Fri, 19 Sep 2025 05:41:32 GMT
referrer-policy: same-origin
server: Apache
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
x-fedora-appserver: pkgs01.stg.rdu3.fedoraproject.org
x-fedora-proxyserver: proxy02.stg.rdu3.fedoraproject.org
x-fedora-requestid: aOlOug3vXHWevdWTx-OcoQAAENE
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block

@music can you try reproducing the issue again? because I can't seem to see any differences here.

(The headers are still wrong IMO - they should tell clients to use the data as-is, not that it needs to be gzip compressed on receipt - but they're not different between stg and prod as far as I can tell.)

I tried again, looks like `curl` is still giving me (mostly) the same HTTP headers for both: ``` $ curl -I https://src.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sort accept-ranges: bytes apptime: D=64700 content-encoding: x-gzip content-length: 6407 content-type: application/x-tar date: Fri, 10 Oct 2025 18:21:42 GMT etag: "1907-63f20ec0c5b00" HTTP/2 200 last-modified: Fri, 19 Sep 2025 05:41:32 GMT referrer-policy: same-origin server: Apache strict-transport-security: max-age=31536000; includeSubDomains; preload x-content-type-options: nosniff x-fedora-appserver: pkgs01.rdu3.fedoraproject.org x-fedora-proxyserver: proxy10.rdu3.fedoraproject.org x-fedora-requestid: aOlOthUO8dOW_ab7KBDV2QAI0xQ x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block ``` ``` $ curl -I https://src.stg.fedoraproject.org/repo/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sort accept-ranges: bytes apptime: D=73227 content-encoding: x-gzip content-length: 6407 content-type: application/x-tar date: Fri, 10 Oct 2025 18:21:46 GMT etag: "1907-63f20ec0c5b00" HTTP/2 200 last-modified: Fri, 19 Sep 2025 05:41:32 GMT referrer-policy: same-origin server: Apache strict-transport-security: max-age=31536000; includeSubDomains; preload x-content-type-options: nosniff x-fedora-appserver: pkgs01.stg.rdu3.fedoraproject.org x-fedora-proxyserver: proxy02.stg.rdu3.fedoraproject.org x-fedora-requestid: aOlOug3vXHWevdWTx-OcoQAAENE x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block ``` @music can you try reproducing the issue again? because I can't seem to see any differences here. (The headers are still *wrong* IMO - they should tell clients to use the data as-is, not that it needs to be gzip compressed on receipt - but they're not *different* between stg and prod as far as I can tell.)
Owner

Looks like someone else hit this in #12842

I have switched prod anubis off on src for now.

Looks like someone else hit this in #12842 I have switched prod anubis off on src for now.
Owner

Rough and ditry pr... I applied it in staging already, and it changes the content-type... but it still ungzips it.

https://pagure.io/fedora-infra/ansible/pull-request/2904

Rough and ditry pr... I applied it in staging already, and it changes the content-type... but it still ungzips it. https://pagure.io/fedora-infra/ansible/pull-request/2904
Owner

ok. I had to put it back... things were just too bad without anubis. ;(

That said, the internal proxies that are used by builders don't have it enabled, so I think real builds should work fine.

ok. I had to put it back... things were just too bad without anubis. ;( That said, the internal proxies that are used by builders don't have it enabled, so I think real builds should work fine.

Rough and ditry pr... I applied it in staging already, and it changes the content-type... but it still ungzips it.

https://pagure.io/fedora-infra/ansible/pull-request/2904

Hmm, I wouldn’t expect changing just the Content-Type to affect the Content-Encoding (unless it happened to interact with a pre-existing MIME-based rule for content encoding), so at least that result is explicable.

I tried fedpkg sources for rust-jod-thread, and it’s failing again:

$ fedpkg sources
Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs

Could not execute sources: jod-thread-1.0.0.crate failed checksum

I tried fedpkg scratch-build and that failed, too:

DEBUG util.py:461:  jod-thread-1.0.0.crate: FAILED
DEBUG util.py:459:  sha512sum: WARNING: 1 computed checksum did NOT match

https://koji.fedoraproject.org/koji/taskinfo?taskID=138160053

That said, the internal proxies that are used by builders don't have it enabled, so I think real builds should work fine.

I think a (dist-git based, not --srpm) scratch build should be the same as a real build, right? If so, then I expect real builds to fail as well.

> Rough and ditry pr... I applied it in staging already, and it changes the content-type... but it still ungzips it. > > https://pagure.io/fedora-infra/ansible/pull-request/2904 Hmm, I wouldn’t expect changing just the `Content-Type` to affect the `Content-Encoding` (unless it happened to interact with a pre-existing MIME-based rule for content encoding), so at least that result is explicable. I tried `fedpkg sources` for `rust-jod-thread`, and it’s failing again: ``` $ fedpkg sources Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs Could not execute sources: jod-thread-1.0.0.crate failed checksum ``` I tried `fedpkg scratch-build` and that failed, too: ``` DEBUG util.py:461: jod-thread-1.0.0.crate: FAILED DEBUG util.py:459: sha512sum: WARNING: 1 computed checksum did NOT match ``` https://koji.fedoraproject.org/koji/taskinfo?taskID=138160053 > That said, the internal proxies that are used by builders don't have it enabled, so I think real builds should work fine. I think a (dist-git based, not `--srpm`) scratch build should be the same as a real build, right? If so, then I expect real builds to fail as well.

I haven’t taken the time to dig through the Ansible configurations and understand how they actually fit together, but is the lookaside cache governed by the mod_deflate configuration in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/fedora-web/main/files/deflate.conf ? Might something like this work?

<Directory /srv/cache/lookaside>
    SetOutputFilter NONE
    […]
</Directory>

I’m not sure if you would need to mess around with the Vary header as well or not, or if something like

<Directory /srv/cache/lookaside>
    SetEnv no-gzip
    SetEnv dont-vary
    […]
</Directory>

would be processed in the right order with respect to the mod_deflate configuration to have the desired effect.

I haven’t taken the time to dig through the Ansible configurations and understand how they actually fit together, but is the lookaside cache governed by the `mod_deflate` configuration in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/fedora-web/main/files/deflate.conf ? Might something like this work? ``` <Directory /srv/cache/lookaside> SetOutputFilter NONE […] </Directory> ``` I’m not sure if you would need to mess around with the `Vary` header as well or not, or if something like ``` <Directory /srv/cache/lookaside> SetEnv no-gzip SetEnv dont-vary […] </Directory> ``` would be processed in the right order with respect to the `mod_deflate` configuration to have the desired effect.
Member

I think the "content-encoding: x-gzip" part is coming from the proxies. But while it's not useful for lookaside data, clients should deal with that.

I think the "content-encoding: x-gzip" part is coming from the proxies. But while it's not useful for lookaside data, clients should deal with that.

I think the "content-encoding: x-gzip" part is coming from the proxies. But while it's not useful for lookaside data, clients should deal with that.

The clients do deal with it, by removing the declared content-encoding to obtain what the server is claiming should be the actual payload. The problem is that the content-encoding header is lying. For it to be accurate, the HTTP response would have to be gzipped (again, on top of the gzip encoding that happens to be the outermost layer in the .crate file format).

> I think the "content-encoding: x-gzip" part is coming from the proxies. But while it's not useful for lookaside data, clients should deal with that. The clients do deal with it, by removing the declared content-encoding to obtain what the server is claiming should be the actual payload. The problem is that the content-encoding header is lying. For it to be accurate, the HTTP response would have to be gzipped (again, on top of the gzip encoding that *happens* to be the outermost layer in the `.crate` file format).
Author

OOF this is now hitting me too ...

Is there anything we can do, at least as a temporary workaround? Maybe patch fedpkg-minimal (or whatever downloads the sources in the buildSRPMfromSCM task in koji) to avoid decompressing stuff on-the-fly?

OOF this is now hitting me too ... Is there anything we can do, at least as a temporary workaround? Maybe patch fedpkg-minimal (or whatever downloads the sources in the buildSRPMfromSCM task in koji) to avoid decompressing stuff on-the-fly?
Author

It appears that adding -H "Accept-Encoding: identity" to the curl CLI call in fedpkg-minimal works around this issue. I'll file a PR with that ...

It appears that adding `-H "Accept-Encoding: identity"` to the `curl` CLI call in fedpkg-minimal works around this issue. I'll file a PR with that ...
Author
https://src.fedoraproject.org/rpms/fedpkg-minimal/pull-request/2
Owner

I think thats fine, but it's going to take a while to get that to where it can help. I was hoping we could do something on the server side to get things working faster. ;(

Sigh, I found that dns was wrong and internal stuff was using proxy01/10 instead of proxy101/110.
I have changed it over now, please retry your scratch build from scm and I hope it will work.

I think thats fine, but it's going to take a while to get that to where it can help. I was hoping we could do something on the server side to get things working faster. ;( Sigh, I found that dns was wrong and internal stuff was using proxy01/10 instead of proxy101/110. I have changed it over now, please retry your scratch build from scm and I hope it will work.
Author

Ha, it's always DNS 😆 I can confirm that a build that failed earlier this evening due to a checksum issue now passed. 🎈

Ha, it's always DNS :laughing: I can confirm that a build that failed earlier this evening due to a checksum issue now passed. :balloon:

sccache also started building for me

sccache also started building for me
Member

Okay, I'm going to close this as it looks like it was all DNS :(

Okay, I'm going to close this as it looks like it was all DNS :(
Member

Metadata Update from @james:

  • Issue close_status updated to: Fixed
  • Issue status updated to: Closed (was: Open)
**Metadata Update from @james**: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)

Okay, I'm going to close this as it looks like it was all DNS :(

I wouldn’t consider this fixed. This is no longer breaking koji builds from dist-git, because the DNS issue causing koji to use the external proxies was fixed. However, things are still utterly broken for anything that accesses the lookaside cache externally, including COPR builds from dist-git (e.g. https://copr.fedorainfracloud.org/coprs/music/tungstenite/build/9704757/) and things like:

rust-jod-thread$ fedpkg sources
Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs

Could not execute sources: jod-thread-1.0.0.crate failed checksum

Trying to find, patch, and redeploy every piece of code that accesses the lookaside patch to set Accept-Encoding: identity à la https://src.fedoraproject.org/rpms/fedpkg-minimal/pull-request/2 is a plausible workaround, but the root problem is that the server and/or proxy are confused and are lying about what they are serving by tacking on a Content-Encoding: gzip header but not actually adding another layer of gzip compression.

> Okay, I'm going to close this as it looks like it was all DNS :( I wouldn’t consider this fixed. This is no longer breaking koji builds from dist-git, because the DNS issue causing koji to use the external proxies was fixed. However, things are still utterly broken for anything that accesses the lookaside cache externally, including COPR builds from dist-git (e.g. https://copr.fedorainfracloud.org/coprs/music/tungstenite/build/9704757/) and things like: ``` rust-jod-thread$ fedpkg sources Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs Could not execute sources: jod-thread-1.0.0.crate failed checksum ``` Trying to find, patch, and redeploy every piece of code that accesses the lookaside patch to set `Accept-Encoding: identity` à la https://src.fedoraproject.org/rpms/fedpkg-minimal/pull-request/2 is a plausible workaround, but the root problem is that the server and/or proxy are confused and are lying about what they are serving by tacking on a `Content-Encoding: gzip` header but not actually adding another layer of gzip compression.
Owner

Yeah. We should still try and fix the server end if we can, or failing that at least fix fedpkg/fedpkg-minimal.

Yeah. We should still try and fix the server end if we can, or failing that at least fix fedpkg/fedpkg-minimal.
Owner

Metadata Update from @kevin:

  • Issue status updated to: Open (was: Closed)
**Metadata Update from @kevin**: - Issue status updated to: Open (was: Closed)
Author

Okay, I'm going to close this as it looks like it was all DNS :(

I wouldn’t consider this fixed. This is no longer breaking koji builds from dist-git, because the DNS issue causing koji to use the external proxies was fixed. However, things are still utterly broken for anything that accesses the lookaside cache externally, including COPR builds from dist-git (e.g. https://copr.fedorainfracloud.org/coprs/music/tungstenite/build/9704757/) and things like:

rust-jod-thread$ fedpkg sources
Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs

Could not execute sources: jod-thread-1.0.0.crate failed checksum

Ah, so fedpkg sources really is affected too? That explains why I couldn't find a reason why it shouldn't be :(

Does anybody want to submit a PR to fedpkg equivalent to the one I sent for fedpkg-minimal, or should I do that?

> > Okay, I'm going to close this as it looks like it was all DNS :( > > I wouldn’t consider this fixed. This is no longer breaking koji builds from dist-git, because the DNS issue causing koji to use the external proxies was fixed. However, things are still utterly broken for anything that accesses the lookaside cache externally, including COPR builds from dist-git (e.g. https://copr.fedorainfracloud.org/coprs/music/tungstenite/build/9704757/) and things like: > > ``` > rust-jod-thread$ fedpkg sources > Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs > > Could not execute sources: jod-thread-1.0.0.crate failed checksum > ``` Ah, so `fedpkg sources` really is affected too? That explains why I couldn't find a reason why it shouldn't be :( Does anybody want to submit a PR to fedpkg equivalent to the one I sent for fedpkg-minimal, or should I do that?
Owner

ok, I updated anubis today... can you see if it changed anything? There's a bunch of changes, but it's hard to tell if any would affect this issue.

ok, I updated anubis today... can you see if it changed anything? There's a bunch of changes, but it's hard to tell if any would affect this issue.
ben@bean:~/fedora/rust-sig/rust-jod-thread$ fedpkg sources
Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs

Could not execute sources: jod-thread-1.0.0.crate failed checksum

Looks the same to me.

``` ben@bean:~/fedora/rust-sig/rust-jod-thread$ fedpkg sources Downloading jod-thread-1.0.0.crate from https://src.fedoraproject.org/repo/pkgs Could not execute sources: jod-thread-1.0.0.crate failed checksum ``` Looks the same to me.
Author

Filed a PR for rpkg too: https://pagure.io/rpkg/pull-request/758 , PTAL.

Filed a PR for `rpkg` too: https://pagure.io/rpkg/pull-request/758 , PTAL.

I just encountered this problem in a real Koji build again.

https://koji.fedoraproject.org/koji/taskinfo?taskID=139122958

I just encountered this problem in a real Koji build again. https://koji.fedoraproject.org/koji/taskinfo?taskID=139122958

I just encountered this problem in a real Koji build again.

Just to clarify the impact, the fact that this is happening in koji builds from dist-git again means that a large number of Rust packages will fail to build. No external workaround exists, and much Rust packaging work will be impossible until some kind of fix is implemented.

> I just encountered this problem in a real Koji build again. Just to clarify the impact, the fact that this is happening in koji builds from dist-git again means that a large number of Rust packages will fail to build. No external workaround exists, and much Rust packaging work will be impossible until some kind of fix is implemented.
Owner

ok, I dug into this and I think I see what the problem is.

I did make changes to varnish to cache src, and I think it's doing the same thing anubis did...

I'll look at a commit to fix this.

ok, I dug into this and I think I see what the problem is. I did make changes to varnish to cache src, and I think it's doing the same thing anubis did... I'll look at a commit to fix this.
Owner

ok, please try now. I tested and it seems to be working in that testing.

ok, please try now. I tested and it seems to be working in that testing.

ok, please try now. I tested and it seems to be working in that testing.

Yes, it looks like it’s working again!

https://koji.fedoraproject.org/koji/taskinfo?taskID=139139537

Thank you for investigating.

> ok, please try now. I tested and it seems to be working in that testing. Yes, it looks like it’s working again! https://koji.fedoraproject.org/koji/taskinfo?taskID=139139537 Thank you for investigating.
Owner

Sorry for the trouble... ;( I knew varnish would break something there, but it's really helping the scraper load...

Sorry for the trouble... ;( I knew varnish would break something there, but it's really helping the scraper load...

Sorry for the trouble... ;( I knew varnish would break something there, but it's really helping the scraper load...

We certainly need all the help we can get! Thanks for looking at it. I’m glad it wasn’t too difficult to get things back to the status quo.

> Sorry for the trouble... ;( I knew varnish would break something there, but it's really helping the scraper load... We certainly need all the help we can get! Thanks for looking at it. I’m glad it wasn’t too difficult to get things back to the status quo.

I think I'm seeing this in copr, where I'm trying to build a rust package:

https://download.copr.fedorainfracloud.org/results/gordonmessmer/nodejs-electron/fedora-rawhide-x86_64/10071219-rust-dialoguer/builder-live.log.gz

INFO: Calling: curl -H Pragma: -o dialoguer-0.12.0.crate --location --connect-timeout 60 --retry 3 --retry-delay 10 --remote-time --show-error --fail --retry-all-errors https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  202k    0  202k    0     0  1498k      0 --:--:-- --:--:-- --:--:-- 1507k
INFO: Reading stdout from command: md5sum dialoguer-0.12.0.crate
ERROR: Check-sum 99310bece51622968cf251bf047c0277 is wrong, expected: 778577241284848d0611892e61701e5b

99310bece51622968cf251bf047c0277 is the check-sum of the file after un-compressing it.

$ curl -I https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate | grep Encoding
Content-Encoding: x-gzip
I *think* I'm seeing this in copr, where I'm trying to build a rust package: https://download.copr.fedorainfracloud.org/results/gordonmessmer/nodejs-electron/fedora-rawhide-x86_64/10071219-rust-dialoguer/builder-live.log.gz ``` INFO: Calling: curl -H Pragma: -o dialoguer-0.12.0.crate --location --connect-timeout 60 --retry 3 --retry-delay 10 --remote-time --show-error --fail --retry-all-errors https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 202k 0 202k 0 0 1498k 0 --:--:-- --:--:-- --:--:-- 1507k INFO: Reading stdout from command: md5sum dialoguer-0.12.0.crate ERROR: Check-sum 99310bece51622968cf251bf047c0277 is wrong, expected: 778577241284848d0611892e61701e5b ``` 99310bece51622968cf251bf047c0277 is the check-sum of the file after un-compressing it. ``` $ curl -I https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate | grep Encoding Content-Encoding: x-gzip ```

Hi @kevin, @decathorpe, and @music. I spent some time digging into this.

It looks like the root cause wasn't the backend configuration, but rather the frontend proxies configured via roles/fedora-web/main/files/deflate.conf. The mod_deflate rules were intercepting everything that wasn't an image, applying gzip compression, and tacking on the header.

I've submitted a PR (infra/ansible#3170) to add .crate, .tar.gz, .rpm, and other common archives to the SetEnvIfNoCase exclusion list so the proxies pass them through natively.

Let me know what you think and if further refinements are needed

Hi @kevin, @decathorpe, and @music. I spent some time digging into this. It looks like the root cause wasn't the backend configuration, but rather the frontend proxies configured via roles/fedora-web/main/files/deflate.conf. The mod_deflate rules were intercepting everything that wasn't an image, applying gzip compression, and tacking on the header. I've submitted a PR (https://forge.fedoraproject.org/infra/ansible/pulls/3170) to add .crate, .tar.gz, .rpm, and other common archives to the SetEnvIfNoCase exclusion list so the proxies pass them through natively. Let me know what you think and if further refinements are needed

It looks like the root cause wasn't the backend configuration, but rather the frontend proxies

Thanks for tracking it down!

The mod_deflate rules were intercepting everything that wasn't an image, applying gzip compression, and tacking on the header.

The bug that @decathorpe originally described appears to be that the proxies where NOT applying gzip compression, but tacking on the header anyway. That is very probably a bug in httpd. It might be a bug that's specific to reverse proxying.

I found a discussion that probably describes the same bug, which mentions an entry in the upgrading document, "mod_deflate will now skip compression if it knows that the size overhead added by the compression is larger than the data to be compressed. " which could fit the behavior we see. Already-compressed data will probably be larger if compressed again, so mod_deflate doesn't compress the data stream, but it does add the header (which is bad).

Personally, I would strongly recommend throwing out the entire legacy configuration and use the sample configuration in the mod_deflate docs:

AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript

The legacy configuration is: "don't compress data for some specific ancient versions of Mozilla", "but do compress data if it is an ancient MSIE pretending to be an ancient Mozilla", "but for some specific fille types, don't compress data AND don't tell the client that its user-agent affected that decision". None of that seems necessary, and keeping it seems like a good way to keep running into Heisenbugs.

> It looks like the root cause wasn't the backend configuration, but rather the frontend proxies Thanks for tracking it down! > The mod_deflate rules were intercepting everything that wasn't an image, applying gzip compression, and tacking on the header. The bug that @decathorpe originally described appears to be that the proxies where *NOT* applying gzip compression, but tacking on the header anyway. That is very probably a bug in httpd. It might be a bug that's specific to reverse proxying. I found [a discussion](https://stackoverflow.com/questions/48117245/apache-httpd-2-4-reverse-proxy-does-not-compress) that probably describes the same bug, which mentions [an entry](https://httpd.apache.org/docs/current/upgrading.html) in the upgrading document, "mod_deflate will now skip compression if it knows that the size overhead added by the compression is larger than the data to be compressed. " which could fit the behavior we see. Already-compressed data will probably be larger if compressed again, so mod_deflate doesn't compress the data stream, but it does add the header (which is bad). Personally, I would strongly recommend throwing out the entire legacy configuration and use [the sample configuration](https://httpd.apache.org/docs/current/mod/mod_deflate.html) in the mod_deflate docs: ``` AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript ``` The legacy configuration is: "don't compress data for some specific *ancient* versions of Mozilla", "but do compress data if it is an ancient MSIE pretending to be an ancient Mozilla", "but for some specific fille types, don't compress data AND don't tell the client that its user-agent affected that decision". None of that seems necessary, and keeping it seems like a good way to keep running into Heisenbugs.
Owner

Well, it's a nice theory, but as I noted in the pr... this deflate config is only for fedoraproject.org, not src.fedoraproject.org or any other subdomain sites. Those are all using default DEFLATE settings as far as I can tell.

Well, it's a nice theory, but as I noted in the pr... this deflate config is only for fedoraproject.org, not src.fedoraproject.org or any other subdomain sites. Those are all using default DEFLATE settings as far as I can tell.

Quick update: Kevin was right, deflate.conf wasn't being applied to the src subdomains. I've completely pivoted the linked PR (infra/ansible#3170). The fix now targets the actual distgit role (lookaside.conf) and cleanly disables the global DEFLATE filter for the cache directory. We should be good to test it in staging once the freeze lifts

Quick update: Kevin was right, `deflate.conf` wasn't being applied to the `src` subdomains. I've completely pivoted the linked PR (infra/ansible#3170). The fix now targets the actual `distgit` role (`lookaside.conf`) and cleanly disables the global DEFLATE filter for the cache directory. We should be good to test it in staging once the freeze lifts
Owner

Well, staging isn't frozen, we can test in staging anytime. ;)

I'll comment on the pr. Thanks again for working on it.

Well, staging isn't frozen, we can test in staging anytime. ;) I'll comment on the pr. Thanks again for working on it.
Owner

ok, merged and live in staging. Can anyone test? A quick test here seems promising...

ok, merged and live in staging. Can anyone test? A quick test here seems promising...
I can reproduce the problem using "https://copr-dist-git.fedorainfracloud.org/repo/pkgs/gordonmessmer/nodejs-electron/rust-dialoguer/dialoguer-0.12.0.crate/md5/778577241284848d0611892e61701e5b/dialoguer-0.12.0.crate" Is there a path to that file that runs through staging systems?
Owner

The copr thing is completely seperate as far as I know.

The fix should be live for src.stg.fedoraproject.org

I was seeing this before I think here, but now it seems working:

curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jo
d-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate
The copr thing is completely seperate as far as I know. The fix should be live for src.stg.fedoraproject.org I was seeing this before I think here, but now it seems working: ``` curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jo d-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate ```

I don't think it's working:

$ wget https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate

$ sha512sum jod-thread-1.0.0.crate 
1d495ff29fdcfbda5e170e4fec1887cf06c18d4b39b220cf71af2f641587863439f7719c357a76d02b59d23205bddd24d69bf10fa5acd669fcc64173c3cd9371  jod-thread-1.0.0.crate

$ file jod-thread-1.0.0.crate 
jod-thread-1.0.0.crate: POSIX tar archive (GNU)

When I get that file, the sha512sum doesn't match the one in the URL, and the file that's saved to disk isn't compressed.

I don't think it's working: ``` $ wget https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate $ sha512sum jod-thread-1.0.0.crate 1d495ff29fdcfbda5e170e4fec1887cf06c18d4b39b220cf71af2f641587863439f7719c357a76d02b59d23205bddd24d69bf10fa5acd669fcc64173c3cd9371 jod-thread-1.0.0.crate $ file jod-thread-1.0.0.crate jod-thread-1.0.0.crate: POSIX tar archive (GNU) ``` When I get that file, the sha512sum doesn't match the one in the URL, and the file that's saved to disk isn't compressed.
Owner

Ah yes, I somehow thought they were not supposed to be compressed, but re-reading they are.

So, yeah, not working. ;(

Ah yes, I somehow thought they were not supposed to be compressed, but re-reading they are. So, yeah, not working. ;(

@decathorpe wrote that "adding -H "Accept-Encoding: identity" to the curl CLI" worked around the problem, but:

$ curl -D - -H "Accept-Encoding: identity" https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate -o jod-thread-1.0.0.crate
...
content-encoding: x-gzip

I get the correct file when I request it this way, but the server's response still suggests that it's doing on-the-fly compression, which isn't supposed to happen.

Whereas, if I do this instead:

wget -S --progress=none --no-compression https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate

... I get the wrong (uncompressed) file as a result, but the response headers no longer indicate on-the-fly compression.

So I'm going to ask a very stupid question.

Has anyone looked at these files on disk to make sure they weren't mistakenly uncompressed after upload?

@decathorpe wrote that "adding -H "Accept-Encoding: identity" to the curl CLI" worked around the problem, but: ``` $ curl -D - -H "Accept-Encoding: identity" https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate -o jod-thread-1.0.0.crate ... content-encoding: x-gzip ``` I get the correct file when I request it this way, but the server's response still suggests that it's doing on-the-fly compression, which isn't supposed to happen. Whereas, if I do this instead: ``` wget -S --progress=none --no-compression https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate ``` ... I get the wrong (uncompressed) file as a result, but the response headers no longer indicate on-the-fly compression. So I'm going to ask a very stupid question. Has anyone looked at these files on disk to make sure they weren't mistakenly uncompressed after upload?
Author

Note: As far as I can tell, curl and wget behave differently here, so that complicates things.

Note: As far as I can tell, curl and wget behave differently here, so that complicates things.

Ok, I dug into this quite some more.

So, it turns out the deflate config was a bit of ared herring. src doesn't even have gzip enabled, so it never applied the DEFLATE filter to anything. ;(

The real problem is the backend apache on pkgs01. It has mod_mime_magic enabled, which sniffs the .crate files, guesses they are tarballs, and tacks on Content-Encoding: x-gzip before sending it to the proxy.

This also explains why wget and curl were behaving differently. wget sees the false x-gzip header and helpfully decompresses it on the fly, saving a plain tar archive. curl ignores it unless you pass --compressed, but the pycurl bindings fedpkg uses try to be smart and decompress it too.

I have pushed a new fix that forces octet stream and unsets the encoding so it stops guessing. I tested locally and it seems to be working in that testing.

Once it's merged we can test in staging again...

Check the PR here and let me know what you think: infra/ansible#3193

@decathorpe @kevin @gordonmessmer

Ok, I dug into this quite some more. So, it turns out the deflate config was a bit of ared herring. src doesn't even have gzip enabled, so it never applied the DEFLATE filter to anything. ;( The real problem is the backend apache on pkgs01. It has `mod_mime_magic` enabled, which sniffs the `.crate` files, guesses they are tarballs, and tacks on `Content-Encoding: x-gzip` before sending it to the proxy. This also explains why `wget` and `curl` were behaving differently. `wget` sees the false `x-gzip` header and helpfully decompresses it on the fly, saving a plain tar archive. `curl` ignores it unless you pass `--compressed`, but the pycurl bindings `fedpkg` uses try to be smart and decompress it too. I have pushed a new fix that forces octet stream and unsets the encoding so it stops guessing. I tested locally and it seems to be working in that testing. Once it's merged we can test in staging again... Check the PR here and let me know what you think: https://forge.fedoraproject.org/infra/ansible/pulls/3193 @decathorpe @kevin @gordonmessmer

They do behave differently, and I think the curl man page explains why. In the section titled "OUTPUT":

curl does not parse or otherwise "understand" the content it gets or writes as output. It does no encoding or decoding, unless explicitly asked to with dedicated command line options.

Something does seem wrong with the service, because adding the -H "Accept-Encoding: identity" option to curl sends that header as expected, but the service responds with a "content-encoding: gzip" header and a compressed response. Curl saves the compressed response to a file, just like the documentation says it will.

I think the reason this seems confusing is that in some cases we're directly modifying curl's headers, but doing that doesn't change curl's behavior beyond the header it sends. Adding a header to the request does not change how curl handles the response. The header we add on the command line is basically opaque to the application.

We actually get an uncompressed file in at least 3 situations:

  1. using wget
  2. using curl without additional options
  3. using curl with the --compressed option

We only get a compressed file if we use curl and mangle its request by inserting a header that results in a compressed reply.

It's possible that there's some bad interaction between some pair of systems in the back end path, but the simplest explanation for the behavior we're seeing is that the files were inflated by mistake during the upload process.

crate=https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate

### Retrieve the file with wget

\rm jod-thread-1.0.0.crate*

wget --debug -S --progress=none --no-compression $crate
# Note that there is no "content-encoding" header in the response

file jod-thread-1.0.0.crate 
# jod-thread-1.0.0.crate: POSIX tar archive (GNU)

sha512sum jod-thread-1.0.0.crate
# Note that the sha512sum does not match the sha512sum in the URL, because this file is uncompressed

### Retrieve the file with curl, default request

\rm jod-thread-1.0.0.crate*

curl -v $crate -o jod-thread-1.0.0.crate
# Same as wget: no "content-encoding" header this time

file jod-thread-1.0.0.crate 
# Same as wget: jod-thread-1.0.0.crate: POSIX tar archive (GNU)

sha512sum jod-thread-1.0.0.crate
# Same as wget, this is the uncompressed file

### Request the file with curl, add an encoding header

\rm jod-thread-1.0.0.crate*

curl -v -H "Accept-Encoding: identity" $crate -o jod-thread-1.0.0.crate
# Accept-Encoding: identity in the request headers
# content-encoding: x-gzip in the response headers! Why? We said we only accept identity!

file jod-thread-1.0.0.crate 
# jod-thread-1.0.0.crate: gzip compressed data, was "jod-thread-1.0.0.crate", max compression, original size modulo 2^32 25088
# We now have a compressed file, but that's because curl wasn't told to make a request that accepted compression!

sha512sum jod-thread-1.0.0.crate
# This file has the right sha512sum, because it's compressed

### Request the file with curl, tell curl to add its own headers and manage compression

\rm jod-thread-1.0.0.crate*

curl -v --compressed $crate -o jod-thread-1.0.0.crate
# Accept-Encoding: deflate, gzip in the request headers
# content-encoding: gzip in the response headers

file jod-thread-1.0.0.crate 
# jod-thread-1.0.0.crate: POSIX tar archive (GNU)
# curl was told to make a request that accepted compression, and it inflated the result

sha512sum jod-thread-1.0.0.crate
# checksum does not match URL
They do behave differently, and I think the curl man page explains why. In the section titled "OUTPUT": ` curl does not parse or otherwise "understand" the content it gets or writes as output. It does no encoding or decoding, unless explicitly asked to with dedicated command line options.` Something does seem wrong with the service, because adding the `-H "Accept-Encoding: identity"` option to curl sends that header as expected, but the service responds with a "content-encoding: gzip" header and a compressed response. Curl saves the compressed response to a file, just like the documentation says it will. I think the reason this seems confusing is that in some cases we're directly modifying curl's headers, but doing that doesn't change curl's behavior beyond the header it sends. Adding a header to the request does not change how curl handles the response. The header we add on the command line is basically opaque to the application. We actually get an uncompressed file in at least 3 situations: 1. using wget 2. using curl without additional options 3. using curl with the --compressed option We only get a compressed file if we use curl and mangle its request by inserting a header that results in a compressed reply. It's possible that there's some bad interaction between some pair of systems in the back end path, but the simplest explanation for the behavior we're seeing is that the files were inflated by mistake during the upload process. ``` crate=https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate ### Retrieve the file with wget \rm jod-thread-1.0.0.crate* wget --debug -S --progress=none --no-compression $crate # Note that there is no "content-encoding" header in the response file jod-thread-1.0.0.crate # jod-thread-1.0.0.crate: POSIX tar archive (GNU) sha512sum jod-thread-1.0.0.crate # Note that the sha512sum does not match the sha512sum in the URL, because this file is uncompressed ### Retrieve the file with curl, default request \rm jod-thread-1.0.0.crate* curl -v $crate -o jod-thread-1.0.0.crate # Same as wget: no "content-encoding" header this time file jod-thread-1.0.0.crate # Same as wget: jod-thread-1.0.0.crate: POSIX tar archive (GNU) sha512sum jod-thread-1.0.0.crate # Same as wget, this is the uncompressed file ### Request the file with curl, add an encoding header \rm jod-thread-1.0.0.crate* curl -v -H "Accept-Encoding: identity" $crate -o jod-thread-1.0.0.crate # Accept-Encoding: identity in the request headers # content-encoding: x-gzip in the response headers! Why? We said we only accept identity! file jod-thread-1.0.0.crate # jod-thread-1.0.0.crate: gzip compressed data, was "jod-thread-1.0.0.crate", max compression, original size modulo 2^32 25088 # We now have a compressed file, but that's because curl wasn't told to make a request that accepted compression! sha512sum jod-thread-1.0.0.crate # This file has the right sha512sum, because it's compressed ### Request the file with curl, tell curl to add its own headers and manage compression \rm jod-thread-1.0.0.crate* curl -v --compressed $crate -o jod-thread-1.0.0.crate # Accept-Encoding: deflate, gzip in the request headers # content-encoding: gzip in the response headers file jod-thread-1.0.0.crate # jod-thread-1.0.0.crate: POSIX tar archive (GNU) # curl was told to make a request that accepted compression, and it inflated the result sha512sum jod-thread-1.0.0.crate # checksum does not match URL ```

The real problem is the backend apache on pkgs01. It has mod_mime_magic enabled, which sniffs the .crate files, guesses they are tarballs, and tacks on Content-Encoding: x-gzip before sending it to the proxy.

Makes sense. In that case, the backend is setting a content-encoding header indicating that the file is being compressed, the proxy determines that the client did not indicate it can accept gzip encoding, and decompresses the file on the fly to match the client's expectations.

A different change that would confirm that would be to update /etc/httpd/conf/magic, replacing:

# gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver)
0       string          \037\213        application/octet-stream        x-gzip

with:

# gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver)
0       string          \037\213        application/octet-stream

And then I can argue with the httpd developers that this is a bug, because https://httpwg.org/specs/rfc9110.html#field.content-encoding states:

"If the media type includes an inherent encoding, such as a data format that is always compressed, then that encoding would not be restated in Content-Encoding"

> The real problem is the backend apache on pkgs01. It has mod_mime_magic enabled, which sniffs the .crate files, guesses they are tarballs, and tacks on Content-Encoding: x-gzip before sending it to the proxy. Makes sense. In that case, the backend is setting a content-encoding header indicating that the file is being compressed, the proxy determines that the client did not indicate it can accept gzip encoding, and decompresses the file on the fly to match the client's expectations. A different change that would confirm that would be to update /etc/httpd/conf/magic, replacing: ``` # gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver) 0 string \037\213 application/octet-stream x-gzip ``` with: ``` # gzip (GNU zip, not to be confused with [Info-ZIP/PKWARE] zip archiver) 0 string \037\213 application/octet-stream ``` And then I can argue with the httpd developers that this is a bug, because https://httpwg.org/specs/rfc9110.html#field.content-encoding states: "If the media type includes an inherent encoding, such as a data format that is always compressed, then that encoding would not be restated in Content-Encoding"
Owner

I merged the pr... can folks retest?

I can try and revert and do the magic change after that if still desired?

I merged the pr... can folks retest? I can try and revert and do the magic change after that if still desired?

No change to any of the 4 commands I outlined earlier.

No change to any of the 4 commands I outlined earlier.

Well, technically one change. "content-type: application/x-tar" has changed to "content-type: application/octet-stream", but the file saved to disk is the same, regardless.

Well, technically one change. "content-type: application/x-tar" has changed to "content-type: application/octet-stream", but the file saved to disk is the same, regardless.

I checked out an older version of the rust-jod-thread package to get its crate and sha512 sum from the sources file:

curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-0.1.2.crate/sha512/fe3a3feb983b273bf86ec26dcf4edbb1fc0c5f583c3115cedcc63279cb72f0b40bf4134f95d673d5f3e532bcbeafff09759509f55543c98850e750aea39711e2/jod-thread-0.1.2.crate | sha512sum 
fe3a3feb983b273bf86ec26dcf4edbb1fc0c5f583c3115cedcc63279cb72f0b40bf4134f95d673d5f3e532bcbeafff09759509f55543c98850e750aea39711e2  -

For the older file, the sha512 sum of the file matches the URL.

curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sha512sum
1d495ff29fdcfbda5e170e4fec1887cf06c18d4b39b220cf71af2f641587863439f7719c357a76d02b59d23205bddd24d69bf10fa5acd669fcc64173c3cd9371

For the new file, the sha512 sum does not match the URL.

It seems very very likely that something changed with the upload process, and the new crate was inflated before it was written to storage in the lookaside cache.

There's some indication that the proxies are doing weird things (e.g. returning compressed data when curl requests "identity" encoding), but I think the problem is elsewhere.

I checked out an older version of the rust-jod-thread package to get its crate and sha512 sum from the sources file: ``` curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-0.1.2.crate/sha512/fe3a3feb983b273bf86ec26dcf4edbb1fc0c5f583c3115cedcc63279cb72f0b40bf4134f95d673d5f3e532bcbeafff09759509f55543c98850e750aea39711e2/jod-thread-0.1.2.crate | sha512sum fe3a3feb983b273bf86ec26dcf4edbb1fc0c5f583c3115cedcc63279cb72f0b40bf4134f95d673d5f3e532bcbeafff09759509f55543c98850e750aea39711e2 - ``` For the older file, the sha512 sum of the file matches the URL. ``` curl https://src.stg.fedoraproject.org/lookaside/pkgs/rpms/rust-jod-thread/jod-thread-1.0.0.crate/sha512/14d0851a0a7d8d805a81313e6ec60a778267acd83f600d259dffbf63fe3f7ebd6a8d98d3ed49a1cb271ff024fac2c35acc1b287d5fb91f4bbe52bb3df3f2b4b1/jod-thread-1.0.0.crate | sha512sum 1d495ff29fdcfbda5e170e4fec1887cf06c18d4b39b220cf71af2f641587863439f7719c357a76d02b59d23205bddd24d69bf10fa5acd669fcc64173c3cd9371 ``` For the new file, the sha512 sum does not match the URL. It seems very very likely that something changed with the upload process, and the new crate was inflated before it was written to storage in the lookaside cache. There's some indication that the proxies are doing weird things (e.g. returning compressed data when curl requests "identity" encoding), but I think the problem is elsewhere.

@kevin checked the files on disk for me and confirmed they match the expected sha512sums.

So, I tried a few more things, and I can confirm this behavior in the default httpd config:

curl -v --compressed http://localhost:8080/jod-thread-1.0.0.crate
...
< Content-Type: application/x-tar
< Content-Encoding: gzip
# client decompresses data, incorrectly

vs:

curl -v --compressed http://localhost:8080/jod-thread-0.1.2.crate
...
< Content-Type: application/x-troff-man
# client does not decompress this data

And the problem can be resolved by turning off mod_mime_magic.

mod_mime_magic always does the wrong thing with gzip content. RFC 9110 specifically says that content-encoding: gzip should not be added to files just because they're compressed.

I'm not sure what the underlying cause is, but it's probably something like http doesn't have the correct magic for one of the tar file types. (mod_mime_magic won't add the content-encoding header if it can't identify the mime type of the uncompressed data)

$ gzip -dc jod-thread-1.0.0.crate | file -
/dev/stdin: POSIX tar archive (GNU)
$ gzip -dc jod-thread-0.1.2.crate | file -
/dev/stdin: POSIX tar archive
@kevin checked the files on disk for me and confirmed they match the expected sha512sums. So, I tried a few more things, and I can confirm this behavior in the default httpd config: ``` curl -v --compressed http://localhost:8080/jod-thread-1.0.0.crate ... < Content-Type: application/x-tar < Content-Encoding: gzip # client decompresses data, incorrectly ``` vs: ``` curl -v --compressed http://localhost:8080/jod-thread-0.1.2.crate ... < Content-Type: application/x-troff-man # client does not decompress this data ``` And the problem can be resolved by turning off mod_mime_magic. mod_mime_magic always does the wrong thing with gzip content. RFC 9110 specifically says that content-encoding: gzip should not be added to files just because they're compressed. I'm not sure what the underlying cause is, but it's probably something like http doesn't have the correct magic for one of the tar file types. (mod_mime_magic won't add the content-encoding header if it can't identify the mime type of the uncompressed data) ``` $ gzip -dc jod-thread-1.0.0.crate | file - /dev/stdin: POSIX tar archive (GNU) $ gzip -dc jod-thread-0.1.2.crate | file - /dev/stdin: POSIX tar archive ```

@gordonmessmer wrote in #12812 (comment):

I can confirm this behavior in the default httpd config:

I did not expect that! It would appear that mod_mime_magic is broken by design.

@gordonmessmer wrote in https://forge.fedoraproject.org/infra/tickets/issues/12812#issuecomment-579711: > I can confirm this behavior in the default httpd config: I did *not* expect that! It would appear that mod_mime_magic is broken by design.
Owner

Wow. ok... good detective work. ;)

So, what are our options then?

  1. Disable mod_mime_magic (but this could cause other things to break?)
  2. add .crate to the magic file to explicitly tell it what this is?
  3. Something else?

This has been a pretty confusing issue, but one thing I'll note: external access to these files shows the issue. Internal access (for builds, which uses another set of proxies that does not have anubis enabled) doesn't? At least official builds of things using these crates have worked, just not external access? So, to me that points to the pass via anubis being involved. that is: request -> apache -> proxy to anubis -> anubis proxies back to apache -> apache proxies to backend. But it shouldn't change anything, it's just weighing the connection and if it's ok passing it along...

Wow. ok... good detective work. ;) So, what are our options then? 1. Disable mod_mime_magic (but this could cause other things to break?) 2. add .crate to the magic file to explicitly tell it what this is? 3. Something else? This has been a pretty confusing issue, but one thing I'll note: external access to these files shows the issue. Internal access (for builds, which uses another set of proxies that does not have anubis enabled) doesn't? At least official builds of things using these crates have worked, just not external access? So, to me that points to the pass via anubis being involved. that is: request -> apache -> proxy to anubis -> anubis proxies back to apache -> apache proxies to backend. But it shouldn't change anything, it's just weighing the connection and if it's ok passing it along...

Disable mod_mime_magic (but this could cause other things to break?)

I recommend this regardless of any other issue. If other things break, a static mime mapping can be added to un-break them. For any file that doesn't get a mime type based on its extension, mod_mime_magic will look for a couple of bytes that indicate that it might be compressed data, and if they are present, it will fork/exec gzip and pass a block of data to that process.

Passing untrusted data to gzip, which might be vulnerable to "zip bombs" or other vulnerabilities is, frankly, insane. I already have a patch to remove that specific functionality from mod_mime_magic, and I'm going to ask the httpd developers to drop it.

add .crate to the magic file to explicitly tell it what this is?

That should fix the problem, whether or not mod_mime_magic is enabled (since mod_mime_magic only activates for files that don't have a mime type from another source).

You can add a line to /etc/mime.types:

application/gzip                                crate

Or to an httpd.conf file:

    AddType application/x-gzip .crate

And you can test the configuration by downloading a file that ends in ".0.crate" or ".9.crate", such as jod-thread-1.0.0.crate

Why "0.crate" specifically? Apparently because a file that ends in ".<1-8>.crate" will be "application/x-troff-man" because of this entry in /etc/mime.types:

application/x-troff-man                         man 1 2 3 4 5 6 7 8

... which explains why the problem only affects a small number of files. mod_mime_magic won't add the bad content-encoding header to files that end in ".[1-8].crate" or ".[1-8].*.crate"

> Disable mod_mime_magic (but this could cause other things to break?) I recommend this regardless of any other issue. If other things break, a static mime mapping can be added to un-break them. For any file that doesn't get a mime type based on its extension, mod_mime_magic will look for a couple of bytes that indicate that it might be compressed data, and if they are present, it will fork/exec gzip and pass a block of data to that process. Passing untrusted data to gzip, which might be vulnerable to "zip bombs" or other vulnerabilities is, frankly, insane. I already have a patch to remove that specific functionality from mod_mime_magic, and I'm going to ask the httpd developers to drop it. > add .crate to the magic file to explicitly tell it what this is? That should fix the problem, whether or not mod_mime_magic is enabled (since mod_mime_magic only activates for files that don't have a mime type from another source). You can add a line to /etc/mime.types: ``` application/gzip crate ``` Or to an httpd.conf file: ``` AddType application/x-gzip .crate ``` And you can test the configuration by downloading a file that ends in ".0.crate" or ".9.crate", such as jod-thread-1.0.0.crate Why "0.crate" specifically? Apparently because a file that ends in ".<1-8>.crate" will be "application/x-troff-man" because of this entry in /etc/mime.types: ``` application/x-troff-man man 1 2 3 4 5 6 7 8 ``` ... which explains why the problem only affects a small number of files. mod_mime_magic won't add the bad content-encoding header to files that end in ".[1-8].crate" or ".[1-8].\*.crate"
Owner

ok. Anyone want to do a pull request disabling mod_mime_magic? we can test it in stg...

ok. Anyone want to do a pull request disabling mod_mime_magic? we can test it in stg...

On it

On it

Passing untrusted data to gzip, which might be vulnerable to "zip bombs" or other vulnerabilities is, frankly, insane. I already have a patch to remove that specific functionality from mod_mime_magic, and I'm going to ask the httpd developers to drop it.

Trivia: Apache will disable this module by default in a future release: github.com/apache/httpd@10232a0180

> Passing untrusted data to gzip, which might be vulnerable to "zip bombs" or other vulnerabilities is, frankly, insane. I already have a patch to remove that specific functionality from mod_mime_magic, and I'm going to ask the httpd developers to drop it. Trivia: Apache will disable this module by default in a future release: https://github.com/apache/httpd/commit/10232a0180a85f97d47b2d38436196397c3ccc5b
Sign in to join this conversation.
No milestone
No project
No assignees
8 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
infra/tickets#12812
No description provided.