Drop XML parsing, switch to using dnf excludepkgs (#5) #32
No reviewers
Labels
No labels
ai-review-please
Backlog Status
Needs Review
Backlog Status
Ready
chore
documentation
points
01
points
02
points
03
points
05
points
08
points
13
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Sprint Status
Blocked
Sprint Status
Done
Sprint Status
In Progress
Sprint Status
Review
Sprint Status
To Do
Technical Debt
Work Item
Bug
Work Item
Epic
Work Item
Spike
Work Item
Task
Work Item
User Story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
quality/rmdepcheck!32
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dnf-excludepkgs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Instead of painfully retrieving, decompressing, parsing, editing
and reconstructing the repodata, let's use dnf's excludepkgs
option to achieve the same thing. This is much simpler, faster,
and more robust.
To 'modify' the modified base repos, we use dnf repoquery to
list the packages they contain and their source RPM names, then
construct a table with the NEVRs to be removed for each repo.
Then when we do the repoclosure command, we pass
--setopt .excludepkgs= for each modified repo.
This tells dnf to act as if the specified NEVRs simply do not
exist in the specified repo, which is exactly what we want to
achieve.
Signed-off-by: Adam Williamson awilliam@redhat.com
Wow, I should have tried this sooner. Would've saved me a week of messing about with XML parsers. OTOH, I know a lot more about writing a streaming XML token parser now.
The only thing that worries me here is potentially exceeding subprocess' or bash's or something's argument length limit for extremely big updates. I tried it with a 60 package update and it worked, but I'd be worried with those 400+ package KDE updates. I'll see if I can test this somehow before merging. If that's a problem we can probably switch to using config files.
AI Code Review
📋 MR Summary
Replaces manual XML repository metadata manipulation with DNF's native
excludepkgsfunctionality for faster and more robust dependency testing.lxmland compression-related dependencies.dnf repoqueryto identify package NEVRs based on source names.--setopt=<repo>.excludepkgs=<nevrlist>to dynamically hide packages duringrepoclosure.Detailed Code Review
The architectural shift from manual XML parsing to utilizing native DNF capabilities is an excellent move for both performance and maintainability. It significantly reduces the tool's complexity and fragility.
However, there are two primary edge cases to address to ensure robustness. First, subprocess output parsing assumes a strict comma-separated format from
dnf repoquery, which can fail if DNF outputs warnings or progress text tostdout. Second, as discussed, injecting potentially thousands of NEVRAs via command-line arguments risks exceeding OSARG_MAXlimits. Mitigating theARG_MAXissue by dynamically generating a temporary.repoconfig file instead of relying heavily on command-line flags would guarantee scalability.Lastly, there are a few residual artifacts from the previous implementation, such as unused directory parameters, that can be cleaned up.
📂 File Reviews
📄 `rmdepcheck.py` - Core execution script restructured to use DNF queries and options rather than XML manipulation.
dnf repoqueryoutput. If DNF outputs any warnings, blank lines, or unexpected metadata tostdout,line.split(",")will throw aValueErrorand crash the execution.maxsplit=1for safety.Major [Architecture]: Risk of exceeding
ARG_MAXlimits. Ifexcludesgrows very large, passing it via--setoptinsubprocess.runwill throwOSError: [Errno 7] Argument list too long.--repofrompathand--setopt, dynamically generate a temporary DNF configuration file containing the[repo]blocks andexcludepkgs=...lists, then pass it to DNF using-c.Minor [Style]: The
mreposdirargument is still passed toget_modified_and_new_repoclosurebut is no longer used anywhere inside the function body.mreposdirparameter from the function signature and the corresponding call inmain().✅ Summary
dnf repoqueryoutput parsing against unexpected output lines lacking commas.ARG_MAXvulnerability by switching from command-line--setoptto generated temporary configuration files if exclusion lists can grow arbitrarily large.mreposdirparameter fromget_modified_and_new_repoclosure.🤖 AI Code Review | Generated with ai-code-review | Model:
gemini-3.1-pro-preview⚠️ AI-generated suggestions may be incorrect. Verify before applying. Not a replacement for human review.
b8d1c41a083a96888ca93a96888ca9db1414151cSo I tested and this handles a 224-package update with panache and aplomb. ARG_MAX seems to be 2097152 in typical Fedora environments, which really ought to be enough for any foreseeable update (famous last words, hah) - we're only putting package NEVRs in here, they're typically like 10-50 characters long, so if we save 97k characters for contingencies, we can have like 40,000 packages in there. So I think I'm inclined to go with this at least until it blows up. Switching to using config files is a bit of a faff.
AI Code Review
📋 MR Summary
This MR drastically simplifies dependency checking by replacing complex manual XML parsing with DNF's native
excludepkgsoption viarepoqueryandrepoclosure.curl,zstd,gzip, andlxml.get_modified_and_new_repoclosurewhich relies ondnf repoqueryto build an exclusion list and applies it using--setopt <repo>.excludepkgs=<nevrlist>.Detailed Code Review
Overall, this is an excellent refactor that greatly reduces technical debt by leveraging the native capabilities of
dnfinstead of trying to replicate repository modifications through raw XML mutation. The test suite has been properly trimmed down to match the simplified logic.From a performance standpoint, there's a minor but standard algorithmic improvement possible. String concatenation in a loop (
excludes += ...) has an O(N^2) complexity in Python. While the number of packages is typically small (e.g., 200-300), switching to an array append and.join()is a safer best practice to handle larger scale inputs effectively. Other than that, the new approach perfectly handles the intended design.📂 File Reviews
📄 `rmdepcheck.py` - Replaced manual XML editing functionalities with sub-process calls leveraging DNF configuration parameters.
','.join()to generate theexcludesstring, which is O(N) and significantly more efficient if the exclusion list grows large.✅ Summary
excludesto list.append()and','.join()for optimal O(N) performance.🤖 AI Code Review | Generated with ai-code-review | Model:
gemini-3.1-pro-preview⚠️ AI-generated suggestions may be incorrect. Verify before applying. Not a replacement for human review.
typing didn't like that, unless I use two different variable names for it, so I went with this. ehh. Fine, I'll just use two variables, I guess.
actually, no I won't, because that makes pylint complain about too many locals. We'll stick with this.
db1414151c251fd11d3bWell, StackOverflow notes that "The maximum size for a single string argument is limited to 131072", and that does seem to be the case in my environment. Still, that seems like enough. Still assuming 50 character NVRs, that's enough room for 2621 of them.
This looks to be working fine in testing on staging so far - gives all results the same as prod, including some correct failures. So let's go with it.