RFE: Define a hibernate-safe rpm-ostree deployment contract for Atomic Desktops #112

Closed
opened 2026-05-03 13:29:42 +00:00 by prorokxp · 1 comment

Summary

Please define and implement a production-grade hibernate/resume safety contract for Fedora Atomic Desktops when an rpm-ostree deployment is pending.

This is not a request to enable hibernation by default on every laptop. It is a request to make opt-in hibernation safe, predictable, and architecturally correct on Fedora Atomic Desktops.

The main requirement is:

If a system is configured for hibernation, Fedora Atomic must guarantee that hibernate/resume never silently crosses from one rpm-ostree deployment into another incompatible deployment.

In other words, a system that hibernates from deployment A must either resume into deployment A, or the hibernate operation must be rejected before the system powers off. It must not resume or boot in an undefined way through deployment B.

Background

On a traditional mutable Fedora Workstation installation, hibernation can be configured manually with persistent swap, proper initramfs/resume configuration, and suitable hardware. I have used hibernation for years on Fedora Workstation and Ubuntu without issues.

On Fedora Atomic Desktops, the situation is different because rpm-ostree can stage a new deployment for the next boot while the currently running deployment remains active.

A typical problematic scenario is:

  1. The system is currently running deployment A.
  2. rpm-ostree stages deployment B for the next boot.
  3. The user hibernates instead of rebooting.
  4. The machine powers on later.
  5. The system must decide whether to resume the hibernation image from deployment A or boot into deployment B.

Hibernate/resume requires a strong compatibility guarantee between the saved memory image and the kernel/initramfs/deployment used to resume it. If the hibernation image was created by deployment A, resume must not silently continue through deployment B.

Why simple user notification is not enough

Hibernate is especially important on laptops and other personal systems exactly when the user is not actively operating the machine.

For example, the system may hibernate because the lid is closed, battery is low, the machine is placed into a bag, or the user expects the OS to preserve state while consuming no power.

Because of that, a runtime message such as:

"The system cannot hibernate because an update is pending"

is only acceptable for systems where hibernation is not configured, not supported, or not intentionally enabled.

For a system where the administrator/user has explicitly configured hibernation, the expected contract should be stronger:

Once hibernation is enabled and supported, rpm-ostree/update state must not make hibernation unsafe.

In that case, the system should either:

  1. guarantee that resume will use the currently booted deployment, or
  2. automatically make the pending deployment non-active for the next resume/boot path before hibernating, or
  3. block hibernation only if it cannot safely enforce either of the above.

The important part is that this should be handled by the platform, not left as an undocumented local workaround.

Desired invariant

Fedora Atomic Desktops should guarantee the following invariant:

A hibernation image must only be resumed by the same compatible rpm-ostree deployment that created it.

A more operational form:

If hibernation is enabled, a pending rpm-ostree deployment must never cause the next power-on after hibernate to boot an incompatible deployment instead of safely resuming the hibernation image.

Requested behavior

Please define one official behavior for Atomic Desktops.

Preferred behavior: same-deployment resume guarantee

When hibernation is requested and a pending deployment exists:

  1. The system detects that deployment B is staged.
  2. The system guarantees that the next power-on used for hibernate resume will boot/resume through the currently running deployment A.
  3. Deployment B remains staged for a normal reboot after successful resume, or is safely unstaged and the user can apply the update later.
  4. The user is not left with an unsafe or undefined boot/resume path.

Possible implementation approaches could include:

  • making the bootloader explicitly boot the currently running deployment for the hibernate resume path;
  • temporarily disabling or deferring the pending deployment before hibernation;
  • removing/unstaging the pending deployment before hibernation if that is the only safe option;
  • adding an rpm-ostree/OSTree/systemd integration point that distinguishes "hibernate resume" from "normal reboot into pending deployment".

I am not asking for a specific implementation. I am asking for a defined and tested platform contract.

Fallback behavior: fail safely before poweroff

If Fedora Atomic cannot guarantee same-deployment resume, hibernation should fail before the machine powers off.

This failure should be explicit and visible through systemd/logind/journal diagnostics.

However, I do not think this should be the normal long-term behavior for a system where hibernation has been intentionally configured. If hibernation is configured, the better production-grade solution is to preserve hibernate safety by controlling the pending deployment state.

Suggested implementation areas

This likely needs coordination between several components:

  • rpm-ostree / OSTree deployment state
  • bootloader deployment selection
  • systemd-logind hibernate path
  • systemd-hibernate-resume
  • GNOME/KDE power actions
  • Fedora Atomic Desktop documentation
  • Fedora QA tests

Useful technical hooks may include:

  • detecting whether a pending deployment exists;
  • detecting the currently booted deployment;
  • exposing whether the current deployment is safe for hibernate;
  • inhibiting hibernate only when the system cannot guarantee safe resume;
  • clearing, deferring, or protecting the pending deployment before hibernate.

Suggested acceptance criteria

Case 1: no pending deployment

  1. System is running deployment A.
  2. No pending rpm-ostree deployment exists.
  3. User runs systemctl hibernate or triggers hibernate from the desktop.
  4. System resumes into deployment A.

Expected result:

  • resume succeeds;
  • journal clearly shows the hibernate/resume path;
  • no deployment mismatch occurs.

Case 2: pending deployment exists

  1. System is running deployment A.
  2. rpm-ostree stages deployment B.
  3. User triggers hibernate.

Expected result:

Either:

  • the system guarantees resume into deployment A;

or:

  • the system blocks hibernation before poweroff with a clear diagnostic.

The system must not silently attempt to resume the hibernation image through deployment B.

Case 3: after successful resume

  1. System hibernates from deployment A while deployment B was pending.
  2. System resumes into deployment A.
  3. User later performs a normal reboot.

Expected result:

  • deployment B is either still correctly staged and used on normal reboot;
  • or deployment B was explicitly unstaged before hibernate and the user/update tool can stage it again;
  • no hidden inconsistent deployment state remains.

Case 4: rollback/layered packages/kmods

Please also consider systems with:

  • rpm-ostree rollback;
  • layered packages;
  • akmods/kmods;
  • proprietary GPU drivers where applicable;
  • different Atomic Desktop variants such as Silverblue and Kinoite.

The safety contract should not silently break in these cases.

Case 5: Secure Boot / kernel lockdown

If hibernation is not possible because of Secure Boot/kernel lockdown or other platform limitations, the system should expose that clearly.

This is separate from the rpm-ostree deployment safety issue.

There is also a broader Linux/Fedora usability issue around hibernation storage.

Today, hibernation normally uses swap. In many modern Fedora setups, swap is either zram or general-purpose swap, which is not ideal for hibernation.

From an operational point of view, the most robust setup would be a dedicated hibernation swap partition or hibernation file that is used specifically for suspend-to-disk.

On my own systems, I have worked around this by using a dedicated swap partition for hibernation purposes and enabling it only around hibernate, then disabling it again after resume. Conceptually:

  1. before hibernate: swapon dedicated hibernation swap;
  2. hibernate;
  3. after resume: swapoff that dedicated hibernation swap.

This works as a local workaround, but it is not a clean platform-level solution.

It would be useful if Fedora documented or supported a cleaner model for dedicated hibernation storage, especially for systems that normally use zram.

However, this is secondary. The main request in this issue is the Atomic Desktop rpm-ostree deployment/resume safety contract.

Non-goals

This request is not asking Fedora to:

  • enable hibernation by default on all systems;
  • guarantee hibernation on unsupported hardware;
  • bypass Secure Boot/kernel lockdown;
  • ignore kernel, driver, or firmware resume limitations;
  • support unsafe local hacks;
  • treat this as only a Silverblue issue.

This affects Fedora Atomic Desktops generally: Silverblue, Kinoite, Sway Atomic, Budgie Atomic, COSMIC Atomic, and other rpm-ostree based desktop variants.

Why this matters

Hibernate is a valid and important power-management feature for many laptop users.

Atomic Desktops should be able to support opt-in hibernation safely by defining a clear contract between:

  • hibernate/resume;
  • rpm-ostree pending deployments;
  • bootloader deployment selection;
  • update tooling;
  • desktop power management.

The current situation appears to be unspecified. That makes hibernation on Atomic Desktops feel like a workaround rather than a production-grade feature.

I think the right goal is not "enable hibernate everywhere by default", but:

If a user/admin enables hibernation on Fedora Atomic, the OS must guarantee that pending rpm-ostree deployments cannot create an unsafe or undefined resume path.

## Summary Please define and implement a production-grade hibernate/resume safety contract for Fedora Atomic Desktops when an rpm-ostree deployment is pending. This is not a request to enable hibernation by default on every laptop. It is a request to make opt-in hibernation safe, predictable, and architecturally correct on Fedora Atomic Desktops. The main requirement is: > If a system is configured for hibernation, Fedora Atomic must guarantee that hibernate/resume never silently crosses from one rpm-ostree deployment into another incompatible deployment. In other words, a system that hibernates from deployment A must either resume into deployment A, or the hibernate operation must be rejected before the system powers off. It must not resume or boot in an undefined way through deployment B. ## Background On a traditional mutable Fedora Workstation installation, hibernation can be configured manually with persistent swap, proper initramfs/resume configuration, and suitable hardware. I have used hibernation for years on Fedora Workstation and Ubuntu without issues. On Fedora Atomic Desktops, the situation is different because rpm-ostree can stage a new deployment for the next boot while the currently running deployment remains active. A typical problematic scenario is: 1. The system is currently running deployment A. 2. rpm-ostree stages deployment B for the next boot. 3. The user hibernates instead of rebooting. 4. The machine powers on later. 5. The system must decide whether to resume the hibernation image from deployment A or boot into deployment B. Hibernate/resume requires a strong compatibility guarantee between the saved memory image and the kernel/initramfs/deployment used to resume it. If the hibernation image was created by deployment A, resume must not silently continue through deployment B. ## Why simple user notification is not enough Hibernate is especially important on laptops and other personal systems exactly when the user is not actively operating the machine. For example, the system may hibernate because the lid is closed, battery is low, the machine is placed into a bag, or the user expects the OS to preserve state while consuming no power. Because of that, a runtime message such as: > "The system cannot hibernate because an update is pending" is only acceptable for systems where hibernation is not configured, not supported, or not intentionally enabled. For a system where the administrator/user has explicitly configured hibernation, the expected contract should be stronger: > Once hibernation is enabled and supported, rpm-ostree/update state must not make hibernation unsafe. In that case, the system should either: 1. guarantee that resume will use the currently booted deployment, or 2. automatically make the pending deployment non-active for the next resume/boot path before hibernating, or 3. block hibernation only if it cannot safely enforce either of the above. The important part is that this should be handled by the platform, not left as an undocumented local workaround. ## Desired invariant Fedora Atomic Desktops should guarantee the following invariant: > A hibernation image must only be resumed by the same compatible rpm-ostree deployment that created it. A more operational form: > If hibernation is enabled, a pending rpm-ostree deployment must never cause the next power-on after hibernate to boot an incompatible deployment instead of safely resuming the hibernation image. ## Requested behavior Please define one official behavior for Atomic Desktops. ### Preferred behavior: same-deployment resume guarantee When hibernation is requested and a pending deployment exists: 1. The system detects that deployment B is staged. 2. The system guarantees that the next power-on used for hibernate resume will boot/resume through the currently running deployment A. 3. Deployment B remains staged for a normal reboot after successful resume, or is safely unstaged and the user can apply the update later. 4. The user is not left with an unsafe or undefined boot/resume path. Possible implementation approaches could include: - making the bootloader explicitly boot the currently running deployment for the hibernate resume path; - temporarily disabling or deferring the pending deployment before hibernation; - removing/unstaging the pending deployment before hibernation if that is the only safe option; - adding an rpm-ostree/OSTree/systemd integration point that distinguishes "hibernate resume" from "normal reboot into pending deployment". I am not asking for a specific implementation. I am asking for a defined and tested platform contract. ### Fallback behavior: fail safely before poweroff If Fedora Atomic cannot guarantee same-deployment resume, hibernation should fail before the machine powers off. This failure should be explicit and visible through systemd/logind/journal diagnostics. However, I do not think this should be the normal long-term behavior for a system where hibernation has been intentionally configured. If hibernation is configured, the better production-grade solution is to preserve hibernate safety by controlling the pending deployment state. ## Suggested implementation areas This likely needs coordination between several components: - rpm-ostree / OSTree deployment state - bootloader deployment selection - systemd-logind hibernate path - systemd-hibernate-resume - GNOME/KDE power actions - Fedora Atomic Desktop documentation - Fedora QA tests Useful technical hooks may include: - detecting whether a pending deployment exists; - detecting the currently booted deployment; - exposing whether the current deployment is safe for hibernate; - inhibiting hibernate only when the system cannot guarantee safe resume; - clearing, deferring, or protecting the pending deployment before hibernate. ## Suggested acceptance criteria ### Case 1: no pending deployment 1. System is running deployment A. 2. No pending rpm-ostree deployment exists. 3. User runs `systemctl hibernate` or triggers hibernate from the desktop. 4. System resumes into deployment A. Expected result: - resume succeeds; - journal clearly shows the hibernate/resume path; - no deployment mismatch occurs. ### Case 2: pending deployment exists 1. System is running deployment A. 2. rpm-ostree stages deployment B. 3. User triggers hibernate. Expected result: Either: - the system guarantees resume into deployment A; or: - the system blocks hibernation before poweroff with a clear diagnostic. The system must not silently attempt to resume the hibernation image through deployment B. ### Case 3: after successful resume 1. System hibernates from deployment A while deployment B was pending. 2. System resumes into deployment A. 3. User later performs a normal reboot. Expected result: - deployment B is either still correctly staged and used on normal reboot; - or deployment B was explicitly unstaged before hibernate and the user/update tool can stage it again; - no hidden inconsistent deployment state remains. ### Case 4: rollback/layered packages/kmods Please also consider systems with: - rpm-ostree rollback; - layered packages; - akmods/kmods; - proprietary GPU drivers where applicable; - different Atomic Desktop variants such as Silverblue and Kinoite. The safety contract should not silently break in these cases. ### Case 5: Secure Boot / kernel lockdown If hibernation is not possible because of Secure Boot/kernel lockdown or other platform limitations, the system should expose that clearly. This is separate from the rpm-ostree deployment safety issue. ## Related hibernation storage concern There is also a broader Linux/Fedora usability issue around hibernation storage. Today, hibernation normally uses swap. In many modern Fedora setups, swap is either zram or general-purpose swap, which is not ideal for hibernation. From an operational point of view, the most robust setup would be a dedicated hibernation swap partition or hibernation file that is used specifically for suspend-to-disk. On my own systems, I have worked around this by using a dedicated swap partition for hibernation purposes and enabling it only around hibernate, then disabling it again after resume. Conceptually: 1. before hibernate: `swapon` dedicated hibernation swap; 2. hibernate; 3. after resume: `swapoff` that dedicated hibernation swap. This works as a local workaround, but it is not a clean platform-level solution. It would be useful if Fedora documented or supported a cleaner model for dedicated hibernation storage, especially for systems that normally use zram. However, this is secondary. The main request in this issue is the Atomic Desktop rpm-ostree deployment/resume safety contract. ## Non-goals This request is not asking Fedora to: - enable hibernation by default on all systems; - guarantee hibernation on unsupported hardware; - bypass Secure Boot/kernel lockdown; - ignore kernel, driver, or firmware resume limitations; - support unsafe local hacks; - treat this as only a Silverblue issue. This affects Fedora Atomic Desktops generally: Silverblue, Kinoite, Sway Atomic, Budgie Atomic, COSMIC Atomic, and other rpm-ostree based desktop variants. ## Why this matters Hibernate is a valid and important power-management feature for many laptop users. Atomic Desktops should be able to support opt-in hibernation safely by defining a clear contract between: - hibernate/resume; - rpm-ostree pending deployments; - bootloader deployment selection; - update tooling; - desktop power management. The current situation appears to be unspecified. That makes hibernation on Atomic Desktops feel like a workaround rather than a production-grade feature. I think the right goal is not "enable hibernate everywhere by default", but: > If a user/admin enables hibernation on Fedora Atomic, the OS must guarantee that pending rpm-ostree deployments cannot create an unsafe or undefined resume path.
Owner

5. The system must decide whether to resume the hibernation image from deployment A or boot into deployment B.

It does not, it will always resume from deployment A. Deployments are only made "effective" on shutdown via ostree-finalize-staged.service. So if anything happens (power cut, hibernation, etc.), the system will boot the current deployment again.

I don't think there is anything to do here. This is already the default.

> 5\. The system must decide whether to resume the hibernation image from deployment A or boot into deployment B. It does not, it will always resume from deployment A. Deployments are only made "effective" on shutdown via `ostree-finalize-staged.service`. So if anything happens (power cut, hibernation, etc.), the system will boot the current deployment again. I don't think there is anything to do here. This is already the default.
siosm closed this issue 2026-05-06 20:00:24 +00:00
Sign in to join this conversation.
No description provided.