Skip to content

Fix some bugs#77

Merged
tastelikefeet merged 3 commits intomodelscope:mainfrom
tastelikefeet:fix/0226-1
Feb 27, 2026
Merged

Fix some bugs#77
tastelikefeet merged 3 commits intomodelscope:mainfrom
tastelikefeet:fix/0226-1

Conversation

@tastelikefeet
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

  1. Add install.sh to guide env installation
  2. Increase the weight sync buffer
  3. fix ray resource manager

Experiment results

Paste your experiment result here(if needed).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an automated environment setup script to simplify dependency installation, enhances checkpoint engine performance by increasing the weight synchronization buffer size, and resolves a bug in the Ray resource manager to ensure correct GPU resource allocation. These changes collectively aim to improve the project's stability, efficiency, and ease of use.

Highlights

  • Environment Setup Automation: An install.sh script was added to automate the installation of deep learning dependencies. This script detects GPU architecture, installs core Python packages like peft, accelerate, transformers, vllm, transformer_engine, megatron_core, flash-attn, numpy, and DeepGEMM, streamlining the setup process.
  • Checkpoint Engine Buffer Size Increase: The default bucket_size for HCCLCheckpointEngine and NCCLCheckpointEngine, as well as the _bucket_size in CheckpointEngineMixin, was increased from 2GB to 3GB. This change aims to improve the efficiency of weight synchronization during checkpointing.
  • Ray Resource Manager Fix: A bug in the Ray resource manager was addressed by explicitly initializing the node_ranks list. This ensures correct behavior and prevents potential errors when determining GPU allocations for workers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • INSTALL.sh
    • Added a new shell script to automate the installation of deep learning dependencies.
    • Implemented GPU architecture detection using nvidia-smi to configure TORCH_CUDA_ARCH_LIST.
    • Included installation steps for various Python packages: peft, accelerate, transformers, modelscope, oss2, vllm, transformer_engine, megatron_core, flash-attn, numpy, and DeepGEMM.
  • src/twinkle/checkpoint_engine/hccl_checkpoint_engine.py
    • Increased the default bucket_size parameter in the HCCLCheckpointEngine constructor from 2GB to 3GB.
  • src/twinkle/checkpoint_engine/mixin.py
    • Updated the default _bucket_size class variable from 2GB to 3GB within the CheckpointEngineMixin.
  • src/twinkle/checkpoint_engine/nccl_checkpoint_engine.py
    • Modified the default bucket_size parameter in the NCCLCheckpointEngine constructor from 2GB to 3GB.
  • src/twinkle/infra/_ray/resource_manager.py
    • Initialized the node_ranks list to an empty list before its usage within the get_visible_devices function to prevent potential runtime errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an installation script, increases the weight synchronization buffer size across checkpoint engines, and fixes a potential issue in the Ray resource manager. The changes generally improve the robustness and performance of the system. However, there are a few areas that could be improved for better maintainability and clarity, particularly in the new installation script and an outdated comment.

@tastelikefeet tastelikefeet merged commit d3f39f4 into modelscope:main Feb 27, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants