Skip to content

vm: improve error handling#749

Open
etiaro wants to merge 5 commits intomasterfrom
etiaro/vm-error-handling
Open

vm: improve error handling#749
etiaro wants to merge 5 commits intomasterfrom
etiaro/vm-error-handling

Conversation

@etiaro
Copy link
Contributor

@etiaro etiaro commented Mar 13, 2026

Description

Changes aiming at increasing system reliability when running out of memory.

Motivation and Context

This is part of series of PRs that increase kernel stability when system is out of memory, inspired by work on reliability of separation given by partitioning mechanisms and related to fork bomb issue phoenix-rtos/phoenis-rtos-project#560

Also related to phoenix-rtos/phoenix-rtos-project#348

JIRA: RTOS-1235

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Chore (refactoring, style fixes, git/CI config, submodule management, no code logic changes)

How Has This Been Tested?

  • Already covered by automatic testing.
  • New test added: (add PR link here).
  • Tested by hand on: (list targets here).

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing linter checks and tests passed.
  • My changes generate no new compilation warnings for any of the targets.

Special treatment

  • This PR needs additional PRs to work (list the PRs, preferably in merge-order).
  • I will merge this PR by myself when appropriate.

Previous handling of ENOMEM during physical page allocation ignored that
_map_map can merge new allocation with existing entries, leading to
invalid map state, accidental virtual address reallocation and leaking
physical pages.

JIRA: RTOS-1235
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and reliability of the virtual memory management subsystem, particularly in scenarios involving memory allocation failures. By transitioning key functions to return explicit error codes and implementing more thorough cleanup routines, the system is better equipped to handle out-of-memory conditions gracefully, preventing potential resource leaks and improving overall kernel stability. These changes are part of a broader effort to fortify the kernel against issues like fork bombs, as highlighted in the motivation.

Highlights

  • Improved Error Propagation: Functions like amap_page and vm_objectPage now return integer error codes instead of NULL pointers, providing more explicit error information.
  • Enhanced Memory Allocation Failure Handling: Mechanisms for handling ENOMEM (out of memory) conditions have been strengthened across various VM components.
  • Robust Process Forking: proc_vfork now includes a posix_waitpid call to properly clean up child processes if initial object creation fails.
  • Consistent Cleanup on mmap Failure: The _vm_mmap function now utilizes _vm_munmap for comprehensive cleanup when mapping operations encounter errors.
  • Safer Map Copying: vm_mapCopy now ensures that the destination map is destroyed if any page mapping fails during the copy process.
Changelog
  • proc/process.c
    • Added posix_waitpid call in proc_vfork to wait for child processes if vm_objectPut fails, ensuring proper cleanup.
  • vm/amap.c
    • Refactored amap_create to move the amap->refs-- decrement, ensuring correct reference counting.
    • Modified amap_page function signature to return int (error code) and accept page_t **res for the page result.
    • Updated error handling within amap_page to return specific error codes like EOK or -ENOMEM.
  • vm/amap.h
    • Updated the amap_page function prototype to reflect the new return type and parameter.
  • vm/map.c
    • Replaced specific cleanup calls in _vm_mmap with a single _vm_munmap call for more robust error handling.
    • Modified _map_force to handle the new int return type from amap_create, vm_objectPage, and amap_page.
    • Enhanced vm_mapCopy to check for errors during page forcing and destroy the destination map (vm_mapDestroy) if an error occurs.
  • vm/object.c
    • Changed vm_objectPage function signature to return int (error code) and accept page_t **res for the page result.
    • Updated error handling within vm_objectPage to return specific error codes and free allocated pages on failure.
  • vm/object.h
    • Updated the vm_objectPage function prototype to reflect the new return type and parameter.
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@etiaro etiaro mentioned this pull request Mar 13, 2026
14 tasks
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves error handling within the virtual memory subsystem, primarily by refactoring functions to return error codes instead of NULL on failure. This makes the system more robust, especially in out-of-memory situations. The changes are logical and well-implemented, with one exception in vm_mapCopy where the error handling logic is flawed and could lead to a kernel panic. I've provided a detailed comment and a suggested fix for this issue.

@github-actions
Copy link

github-actions bot commented Mar 13, 2026

Unit Test Results

9 553 tests  +8 123   8 961 ✅ +8 104   52m 24s ⏱️ + 45m 32s
  591 suites +    8     592 💤 +  518 
    1 files   ±    0       0 ❌  -   499 

Results for commit 432ea7d. ± Comparison against base commit cbfa5d0.

♻️ This comment has been updated with latest results.

@etiaro etiaro force-pushed the etiaro/vm-error-handling branch 2 times, most recently from 84e969e to e8ef13d Compare March 13, 2026 11:53
@etiaro etiaro marked this pull request as ready for review March 13, 2026 12:15
@etiaro etiaro requested review from Darchiv and adamgreloch March 13, 2026 12:15
@Darchiv Darchiv requested a review from ziemleszcz March 17, 2026 11:29
adamgreloch
adamgreloch previously approved these changes Mar 23, 2026
vm_pageFree(p);
p = NULL;
vm_pageFree(*res);
*res = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version returns -ENOMEM in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks

vm_mapDestroy(proc, dst);
}
for (offs = 0; offs < e->size; offs += SIZE_PAGE) {
LIB_ASSERT_ALWAYS(_map_force(src, e, (void *)((ptr_t)e->vaddr + offs), e->prot) == EOK, "Broken src map during mapCopy");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why kernel panic on ENOMEM?

Copy link
Contributor Author

@etiaro etiaro Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENOMEM should not happen for source map when lazy == 0. This loop should effectively clean NEEDSCOPY flags and remap pages with original attributes.

What better to do here if not kernel panic?
source process memory is corrupted, so we shouldn't let it run. To me reboot with some error message seems more reasonable on RTOS system than killing a parent process that tried to fork(). Restarting the process could be a viable option, but we are running Out Of memory so it is likely to fail anyway.

Ideally, we should stop pretending that lazy mechanism is available and handle mapCopy without changing src map at any given moment, but IMO this is a larger rewrite that deserves separate PR.

(void)vm_objectPut(spawn->object);
ret = spawn->state;
vm_kfree(spawn);
if ((ret < 0) && (posix_getppid(pid) >= 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comment to this code.

}
return _page_get((addr_t)offs);
*res = _page_get((addr_t)offs);
/* res can be NULL, when address outside of defined physical maps is used */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe this in the header file?

@etiaro etiaro requested a review from ziemleszcz March 24, 2026 12:09
Previously due to out of memory errors source map and its refcounts
could be left in invalid state, leading to memory waste and errors in
source process. Also, the dst map was never destroyed.

JIRA: RTOS-1235
etiaro added 3 commits March 24, 2026 19:21
The previous notation of returning NULL on error of page allocation was
overloaded, as failure to copy page from amap backed by VM_OBJ_PHYSMEM
was indistinguishable from VM_OBJ_PHYSMEM outside of defined maps.
Introduced error codes to improve maintainability, instead of stacking
additional if conditions.

JIRA: RTOS-1235
Hanging refs not only led to wasted resources, but also to
invalidating src map in vm_mapCopy due to fail to copy pages that
shouldn't be copied during any following map_force.
Amap pointer is not updated on failure to avoid hanging anon refs.

JIRA: RTOS-1235
@etiaro etiaro force-pushed the etiaro/vm-error-handling branch from a16b39b to 432ea7d Compare March 24, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants