Conversation
Previous handling of ENOMEM during physical page allocation ignored that _map_map can merge new allocation with existing entries, leading to invalid map state, accidental virtual address reallocation and leaking physical pages. JIRA: RTOS-1235
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and reliability of the virtual memory management subsystem, particularly in scenarios involving memory allocation failures. By transitioning key functions to return explicit error codes and implementing more thorough cleanup routines, the system is better equipped to handle out-of-memory conditions gracefully, preventing potential resource leaks and improving overall kernel stability. These changes are part of a broader effort to fortify the kernel against issues like fork bombs, as highlighted in the motivation. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves error handling within the virtual memory subsystem, primarily by refactoring functions to return error codes instead of NULL on failure. This makes the system more robust, especially in out-of-memory situations. The changes are logical and well-implemented, with one exception in vm_mapCopy where the error handling logic is flawed and could lead to a kernel panic. I've provided a detailed comment and a suggested fix for this issue.
84e969e to
e8ef13d
Compare
| vm_pageFree(p); | ||
| p = NULL; | ||
| vm_pageFree(*res); | ||
| *res = NULL; |
There was a problem hiding this comment.
Previous version returns -ENOMEM in this case.
| vm_mapDestroy(proc, dst); | ||
| } | ||
| for (offs = 0; offs < e->size; offs += SIZE_PAGE) { | ||
| LIB_ASSERT_ALWAYS(_map_force(src, e, (void *)((ptr_t)e->vaddr + offs), e->prot) == EOK, "Broken src map during mapCopy"); |
There was a problem hiding this comment.
Why kernel panic on ENOMEM?
There was a problem hiding this comment.
ENOMEM should not happen for source map when lazy == 0. This loop should effectively clean NEEDSCOPY flags and remap pages with original attributes.
What better to do here if not kernel panic?
source process memory is corrupted, so we shouldn't let it run. To me reboot with some error message seems more reasonable on RTOS system than killing a parent process that tried to fork(). Restarting the process could be a viable option, but we are running Out Of memory so it is likely to fail anyway.
Ideally, we should stop pretending that lazy mechanism is available and handle mapCopy without changing src map at any given moment, but IMO this is a larger rewrite that deserves separate PR.
| (void)vm_objectPut(spawn->object); | ||
| ret = spawn->state; | ||
| vm_kfree(spawn); | ||
| if ((ret < 0) && (posix_getppid(pid) >= 0)) { |
There was a problem hiding this comment.
Please add some comment to this code.
| } | ||
| return _page_get((addr_t)offs); | ||
| *res = _page_get((addr_t)offs); | ||
| /* res can be NULL, when address outside of defined physical maps is used */ |
There was a problem hiding this comment.
Could you describe this in the header file?
Previously due to out of memory errors source map and its refcounts could be left in invalid state, leading to memory waste and errors in source process. Also, the dst map was never destroyed. JIRA: RTOS-1235
e8ef13d to
a16b39b
Compare
The previous notation of returning NULL on error of page allocation was overloaded, as failure to copy page from amap backed by VM_OBJ_PHYSMEM was indistinguishable from VM_OBJ_PHYSMEM outside of defined maps. Introduced error codes to improve maintainability, instead of stacking additional if conditions. JIRA: RTOS-1235
Hanging refs not only led to wasted resources, but also to invalidating src map in vm_mapCopy due to fail to copy pages that shouldn't be copied during any following map_force. Amap pointer is not updated on failure to avoid hanging anon refs. JIRA: RTOS-1235
JIRA: RTOS-1235
a16b39b to
432ea7d
Compare
Description
Changes aiming at increasing system reliability when running out of memory.
Motivation and Context
This is part of series of PRs that increase kernel stability when system is out of memory, inspired by work on reliability of separation given by partitioning mechanisms and related to fork bomb issue phoenix-rtos/phoenis-rtos-project#560
Also related to phoenix-rtos/phoenix-rtos-project#348
JIRA: RTOS-1235
Types of changes
How Has This Been Tested?
Checklist:
Special treatment