-
Notifications
You must be signed in to change notification settings - Fork 22
Npu grpo fix2 #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Npu grpo fix2 #54
Changes from all commits
9f72ea6
9ca28d0
520991e
9adb7d0
b21f8ad
b713d66
6561350
9e4d950
853f289
3841384
3837bc9
99ffea8
fadd7ac
20fed75
f485955
7b4acd9
1cceb74
5cda1af
a3b761e
aeb2027
5b285e7
27768d4
6a004af
51cc9ba
dbd0c52
b531079
d83ce5a
896170a
21ed267
109a8d7
c21bba6
d156ac7
bac13f3
dfbf13b
2613924
4190ef9
452e419
9ffd27d
8570b94
6f297a5
69c0790
5f1cec6
143350f
af20ed7
dfb4ff7
f72aba3
2ce2eb9
5adea48
5dd887b
68505a8
ef8e170
665cd50
b6aafbc
bce02e8
0d2c1d0
88f9252
ac21024
08bd0e9
d0745e7
ad909ac
e3ab369
b4fb21e
cbffd52
113e8f1
4631843
422b798
54c99bf
902f9dd
a63d6a4
1145fb5
1e0d369
33566fb
789c5d0
6229d40
4d3c443
dd93baf
9de4613
1cebf0a
b5679bd
dc67b12
88cacd0
256ce76
ba89de4
39e3404
0c99fd3
478e538
76d9f10
c39bee6
b1f6746
8d639d3
33b3c1b
3f40ddd
42e84d8
fe54791
d0709b2
cb7ee9a
46b6948
9842025
43ada26
a96934f
8b4604d
47632ee
39c52a0
b1b11b4
1e6f95b
5522ff1
f096efb
4dcc452
35f349d
00e8819
0803a21
8ef21e9
3a8e3ee
09abd37
396b5b5
ad616ab
a91aa70
0ffbacb
ca2d52e
9948ab0
d680e22
0abc1bb
ca797d3
b5c535b
356612f
3fa6f17
5054ef6
04f9f71
1dfcf43
86539cc
71dd6a9
36c9a4e
5d83e5c
e03d087
2af73f2
a8fa9ef
61fecc6
2c3a20e
b481099
57af42d
9a1aa2e
9053664
b591870
4c996af
3fe6615
990928b
d7b101f
22b42fd
f20fbfa
0f59785
0e0def8
5339485
eab3ca6
6c663d0
9f64a62
13cf6ae
c1769ac
1b72ffc
2f37cdd
7f149c9
e351f1c
d651f83
3aeaa81
4d2f59b
446a521
7124b1c
05eda76
93702fc
3d573cb
21065ee
0373d7a
e41815d
e2e4383
2d5f266
26f25c5
6bad063
d357589
bc7c4c0
3970412
cb65bff
f4c4fec
1ec06f0
ef7ec14
65ae1ed
aa72398
6d20015
b1b0e71
25e89ab
4328a33
91e8b49
68ccd6c
7aad0b3
7ce5899
94f3768
caaa7d4
b081b5b
2228ea8
dad0119
615047b
5e1f465
3981b40
34e3d2f
41d2861
b63d1b1
877c28d
ec6016f
2dc5ff8
51e3c15
976c98f
b2d56f0
cf840aa
b811680
9d44518
8045ef9
1451afc
0095fc0
efc99ca
1a49246
00c1e52
aaddedf
d77153b
0d81634
acf00e3
1113e1a
9e82a21
58388ef
edc9b36
595d97c
d3cc3a0
808cadb
e2537a6
42a9934
2995fbb
9fbffc8
6d8cbec
74b5922
c6d9ad1
197cd2a
c98fee9
46cd9ad
5890afa
f8ae876
6488986
de6b8a3
baaa426
ba835f2
08ee10a
3e4c283
b941858
a8450de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| cd docs | ||
| rm -rf build | ||
|
|
||
| # update api rst | ||
| #rm -rf source/api/ | ||
| #sphinx-apidoc --module-first -o source/api/ ../modelscope/ | ||
| make html |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then | ||
| # pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple | ||
| git config --global --add safe.directory /twinkle | ||
| git config --global user.email tmp | ||
| git config --global user.name tmp.com | ||
|
|
||
| # linter test | ||
| # use internal project for pre-commit due to the network problem | ||
| if [ `git remote -v | grep alibaba | wc -l` -gt 1 ]; then | ||
| pre-commit run -c .pre-commit-config_local.yaml --all-files | ||
| if [ $? -ne 0 ]; then | ||
| echo "linter test failed, please run 'pre-commit run --all-files' to check" | ||
| echo "From the repository folder" | ||
| echo "Run 'pre-commit install' install pre-commit hooks." | ||
| echo "Finally run linter with command: 'pre-commit run --all-files' to check." | ||
| echo "Ensure there is no failure!!!!!!!!" | ||
| exit -1 | ||
| fi | ||
| fi | ||
|
|
||
| pip install decord einops -U -i https://mirrors.aliyun.com/pypi/simple/ | ||
| pip uninstall autoawq -y | ||
| pip uninstall lmdeploy -y | ||
| pip uninstall tensorflow -y | ||
| pip install optimum | ||
|
|
||
| # test with install | ||
| pip install . | ||
| else | ||
| echo "Running case in release image, run case directly!" | ||
| fi | ||
| # remove torch_extensions folder to avoid ci hang. | ||
| rm -rf ~/.cache/torch_extensions | ||
| if [ $# -eq 0 ]; then | ||
| ci_command="pytest tests" | ||
| else | ||
| ci_command="$@" | ||
| fi | ||
| echo "Running case with command: $ci_command" | ||
| $ci_command |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| #!/bin/bash | ||
| MODELSCOPE_CACHE_DIR_IN_CONTAINER=/modelscope_cache | ||
| CODE_DIR=$PWD | ||
| CODE_DIR_IN_CONTAINER=/twinkle | ||
| mkdir -p ~/.cache | ||
| MODELSCOPE_CACHE=~/.cache | ||
| IMAGE_NAME=modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope | ||
| IMAGE_VERSION=ci_image | ||
| MODELSCOPE_HOME_CACHE=~/.cache | ||
| CI_TEST=True | ||
| MODELSCOPE_SDK_DEBUG=True | ||
| CI_COMMAND='bash .dev_scripts/ci_container_test.sh pytest tests' | ||
| MODELSCOPE_SDK_DEBUG=True | ||
| echo "$USER" | ||
| gpus='0,1 2,3' | ||
| cpu_sets='0-15 16-31' | ||
| cpu_sets_arr=($cpu_sets) | ||
| is_get_file_lock=false | ||
| echo "ci command: $CI_COMMAND" | ||
| PR_CHANGED_FILES="${PR_CHANGED_FILES:-}" | ||
| echo "PR modified files: $PR_CHANGED_FILES" | ||
| PR_CHANGED_FILES=${PR_CHANGED_FILES//[ ]/#} | ||
| echo "PR_CHANGED_FILES: $PR_CHANGED_FILES" | ||
| idx=0 | ||
| for gpu in $gpus | ||
| do | ||
| exec {lock_fd}>"/tmp/gpu$gpu" || exit 1 | ||
| flock -n "$lock_fd" || { echo "WARN: gpu $gpu is in use!" >&2; idx=$((idx+1)); continue; } | ||
| echo "get gpu lock $gpu" | ||
|
|
||
| CONTAINER_NAME="twinkle-ci-$idx" | ||
| let is_get_file_lock=true | ||
|
|
||
| # pull image if there are update | ||
| docker pull ${IMAGE_NAME}:${IMAGE_VERSION} | ||
| if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then | ||
| echo 'debugging' | ||
| docker run --rm --name $CONTAINER_NAME --shm-size=16gb \ | ||
| --cpuset-cpus=${cpu_sets_arr[$idx]} \ | ||
| --gpus='"'"device=$gpu"'"' \ | ||
| -v $CODE_DIR:$CODE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_HOME_CACHE/$idx:/root \ | ||
| -v /home/admin/pre-commit:/home/admin/pre-commit \ | ||
| -e CI_TEST=True \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_CACHE=$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -e MODELSCOPE_DOMAIN=$MODELSCOPE_DOMAIN \ | ||
| -e MODELSCOPE_SDK_DEBUG=True \ | ||
| -e HUB_DATASET_ENDPOINT=$HUB_DATASET_ENDPOINT \ | ||
| -e TEST_ACCESS_TOKEN_CITEST=$TEST_ACCESS_TOKEN_CITEST \ | ||
| -e TEST_ACCESS_TOKEN_SDKDEV=$TEST_ACCESS_TOKEN_SDKDEV \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_ENVIRONMENT='ci' \ | ||
| -e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \ | ||
| -e MODEL_TAG_URL=$MODEL_TAG_URL \ | ||
| -e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \ | ||
| -e PR_CHANGED_FILES=$PR_CHANGED_FILES \ | ||
| --workdir=$CODE_DIR_IN_CONTAINER \ | ||
| ${IMAGE_NAME}:${IMAGE_VERSION} \ | ||
| $CI_COMMAND | ||
| else | ||
| docker run --rm --name $CONTAINER_NAME --shm-size=16gb \ | ||
| --cpuset-cpus=${cpu_sets_arr[$idx]} \ | ||
| --gpus='"'"device=$gpu"'"' \ | ||
| -v $CODE_DIR:$CODE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -v $MODELSCOPE_HOME_CACHE/$idx:/root \ | ||
| -v /home/admin/pre-commit:/home/admin/pre-commit \ | ||
| -e CI_TEST=True \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_CACHE=$MODELSCOPE_CACHE_DIR_IN_CONTAINER \ | ||
| -e MODELSCOPE_DOMAIN=$MODELSCOPE_DOMAIN \ | ||
| -e HUB_DATASET_ENDPOINT=$HUB_DATASET_ENDPOINT \ | ||
| -e TEST_ACCESS_TOKEN_CITEST=$TEST_ACCESS_TOKEN_CITEST \ | ||
| -e TEST_ACCESS_TOKEN_SDKDEV=$TEST_ACCESS_TOKEN_SDKDEV \ | ||
| -e TEST_LEVEL=$TEST_LEVEL \ | ||
| -e MODELSCOPE_ENVIRONMENT='ci' \ | ||
| -e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \ | ||
| -e MODEL_TAG_URL=$MODEL_TAG_URL \ | ||
| -e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \ | ||
| -e PR_CHANGED_FILES=$PR_CHANGED_FILES \ | ||
| --workdir=$CODE_DIR_IN_CONTAINER \ | ||
| ${IMAGE_NAME}:${IMAGE_VERSION} \ | ||
| $CI_COMMAND | ||
| fi | ||
| if [ $? -ne 0 ]; then | ||
| echo "Running test case failed, please check the log!" | ||
| exit -1 | ||
| fi | ||
| break | ||
| done | ||
| if [ "$is_get_file_lock" = false ] ; then | ||
| echo 'No free GPU!' | ||
| exit 1 | ||
| fi | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| #!/bin/bash | ||
| MODELSCOPE_CACHE_DIR=/modelscope_cache | ||
| CODE_DIR=$PWD | ||
| MODELSCOPE_SDK_DEBUG=True | ||
| echo "$USER" | ||
| gpus='0,1 2,3' | ||
| is_get_file_lock=false | ||
| CI_COMMAND=${CI_COMMAND:-bash .dev_scripts/ci_container_test.sh pytest tests} | ||
| echo "ci command: $CI_COMMAND" | ||
| PR_CHANGED_FILES="${PR_CHANGED_FILES:-}" | ||
| echo "PR modified files: $PR_CHANGED_FILES" | ||
| PR_CHANGED_FILES=${PR_CHANGED_FILES//[ ]/#} | ||
| echo "PR_CHANGED_FILES: $PR_CHANGED_FILES" | ||
| idx=0 | ||
| for gpu in $gpus | ||
| do | ||
| exec {lock_fd}>"/tmp/gpu$gpu" || exit 1 | ||
| flock -n "$lock_fd" || { echo "WARN: gpu $gpu is in use!" >&2; idx=$((idx+1)); continue; } | ||
| echo "get gpu lock $gpu" | ||
|
|
||
| let is_get_file_lock=true | ||
|
|
||
| # 设置环境变量 | ||
| export CI_TEST=True | ||
| export TEST_LEVEL=$TEST_LEVEL | ||
| export MODELSCOPE_CACHE=${MODELSCOPE_CACHE:-$MODELSCOPE_CACHE_DIR} | ||
| export MODELSCOPE_DOMAIN=$MODELSCOPE_DOMAIN | ||
| export HUB_DATASET_ENDPOINT=$HUB_DATASET_ENDPOINT | ||
| export TEST_ACCESS_TOKEN_CITEST=$TEST_ACCESS_TOKEN_CITEST | ||
| export TEST_ACCESS_TOKEN_SDKDEV=$TEST_ACCESS_TOKEN_SDKDEV | ||
| export MODELSCOPE_ENVIRONMENT='ci' | ||
| export TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN | ||
| export MODEL_TAG_URL=$MODEL_TAG_URL | ||
| export MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN | ||
| export PR_CHANGED_FILES=$PR_CHANGED_FILES | ||
| export CUDA_VISIBLE_DEVICES=$gpu | ||
|
|
||
| if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then | ||
| export MODELSCOPE_SDK_DEBUG=True | ||
| echo 'debugging' | ||
| fi | ||
|
|
||
| # 切换到代码目录并执行命令 | ||
| cd $CODE_DIR | ||
| eval $CI_COMMAND | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using |
||
|
|
||
| if [ $? -ne 0 ]; then | ||
| echo "Running test case failed, please check the log!" | ||
| exit -1 | ||
| fi | ||
| break | ||
| done | ||
|
|
||
| if [ "$is_get_file_lock" = false ] ; then | ||
| echo 'No free GPU!' | ||
| exit 1 | ||
| fi | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| name: "🐛 Bug Report" | ||
| description: Create a bug report to help us improve twinkle | ||
| labels: ["bug"] | ||
|
|
||
| body: | ||
| - type: markdown | ||
| attributes: | ||
| value: | | ||
| Thank you for supporting twinkle and taking the time to submit this issue. | ||
| 感谢你对 twinkle 的支持和抽出时间提交相关 issue。 | ||
|
|
||
| - type: checkboxes | ||
| id: checklist | ||
| attributes: | ||
| label: Checklist / 检查清单 | ||
| options: | ||
| - label: I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。 | ||
| required: true | ||
|
|
||
|
|
||
| - type: textarea | ||
| id: bug-description | ||
| validations: | ||
| required: true | ||
| attributes: | ||
| label: Bug Description / Bug 描述 | ||
| description: | | ||
| Please describe the issue you encountered. It's better to include error screenshots or stack trace information. | ||
| 请详细描述你遇到的问题,最好包含报错截图或报错栈信息。 | ||
|
|
||
|
|
||
| - type: textarea | ||
| id: reproduction-steps | ||
| validations: | ||
| required: true | ||
| attributes: | ||
| label: How to Reproduce / 如何复现 | ||
| description: | | ||
| Please provide steps to reproduce the issue, including twinkle version, runtime environment, and detailed reproduction steps. | ||
| 请提供复现问题的步骤,包括 twinkle 的版本、运行环境、详细的复现步骤等。 | ||
|
|
||
|
|
||
| - type: textarea | ||
| id: additional-information | ||
| attributes: | ||
| label: Additional Information / 补充信息 | ||
| description: | | ||
| Please provide any additional information here. | ||
| 在这里补充其他相关信息。 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| name: "🚀 Feature Request" | ||
| description: Submit a request for a new feature | ||
| labels: ["enhancement"] | ||
|
|
||
| body: | ||
| - type: markdown | ||
| attributes: | ||
| value: | | ||
| Thank you for supporting twinkle and taking the time to submit this issue. | ||
| 感谢你对 twinkle 的支持和抽出时间提交相关 issue。 | ||
|
|
||
| - type: checkboxes | ||
| id: checklist | ||
| attributes: | ||
| label: Checklist / 检查清单 | ||
| options: | ||
| - label: I have searched existing issues, and this is a new feature request. / 我已经搜索过现有的 issues,确认这是一个新的 Feature Request。 | ||
| required: true | ||
|
|
||
| - type: textarea | ||
| id: feature-request-description | ||
| validations: | ||
| required: true | ||
| attributes: | ||
| label: Feature Request Description / Feature Request 描述 | ||
| description: | | ||
| Please provide a detailed description of the new feature you would like to see added. | ||
| 请详细描述您希望添加的新功能特性。 | ||
|
|
||
|
|
||
| - type: textarea | ||
| id: pull-request | ||
| attributes: | ||
| label: Pull Request / Pull Request 信息 | ||
| description: | | ||
| Have you already submitted or plan to submit a Pull Request? Please share your plans. | ||
| 你是否已经提交或即将提交 Pull Request?请说明你的计划。 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| name: "🤔 Question & Discussion" | ||
| description: Create an issue for questions and discussions | ||
| labels: ["question"] | ||
|
|
||
| body: | ||
| - type: markdown | ||
| attributes: | ||
| value: | | ||
| Thank you for supporting twinkle and taking the time to submit this issue. | ||
| 感谢你对 twinkle 的支持和抽出时间提交相关 issue。 | ||
|
|
||
| - type: checkboxes | ||
| id: checklist | ||
| attributes: | ||
| label: Checklist / 检查清单 | ||
| options: | ||
| - label: I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。 | ||
| required: true | ||
|
|
||
| - type: textarea | ||
| id: question-description | ||
| validations: | ||
| required: true | ||
| attributes: | ||
| label: Question Description / 问题描述 | ||
| description: | | ||
| Please describe the question or topic you would like to discuss. | ||
| 请描述你想要讨论的问题或话题。 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| blank_issues_enabled: false |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # PR type | ||
| - [ ] Bug Fix | ||
| - [ ] New Feature | ||
| - [ ] Document Updates | ||
| - [ ] More Models or Datasets Support | ||
|
|
||
| # PR information | ||
|
|
||
| Write the detail information belongs to this PR. | ||
|
|
||
| ## Experiment results | ||
|
|
||
| Paste your experiment result here(if needed). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Reporting Security Issues | ||
|
|
||
| Usually security issues of a deep learning project come from non-standard 3rd packages or continuous running services. If you are suffering from security issues from our project, please consider reporting to us. We appreciate your efforts to responsibly disclose your findings, and will make every effort to acknowledge your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a large amount of duplicated code between the
ifandelseblocks for running the docker container. This makes the script hard to read and maintain. Any changes to thedocker runcommand will need to be applied in two places, which is error-prone.You can refactor this by storing the common docker arguments in an array and conditionally adding the debug-specific arguments. This also allows you to remove the duplicated
-e TEST_LEVEL=$TEST_LEVELflag.docker_args=( --rm --name "$CONTAINER_NAME" --shm-size=16gb --cpuset-cpus="${cpu_sets_arr[$idx]}" --gpus='"'"device=$gpu"'"' -v "$CODE_DIR:$CODE_DIR_IN_CONTAINER" -v "$MODELSCOPE_CACHE:$MODELSCOPE_CACHE_DIR_IN_CONTAINER" -v "$MODELSCOPE_HOME_CACHE/$idx:/root" -v /home/admin/pre-commit:/home/admin/pre-commit -e CI_TEST=True -e TEST_LEVEL="$TEST_LEVEL" -e MODELSCOPE_CACHE="$MODELSCOPE_CACHE_DIR_IN_CONTAINER" -e MODELSCOPE_DOMAIN="$MODELSCOPE_DOMAIN" -e HUB_DATASET_ENDPOINT="$HUB_DATASET_ENDPOINT" -e TEST_ACCESS_TOKEN_CITEST="$TEST_ACCESS_TOKEN_CITEST" -e TEST_ACCESS_TOKEN_SDKDEV="$TEST_ACCESS_TOKEN_SDKDEV" -e MODELSCOPE_ENVIRONMENT='ci' -e TEST_UPLOAD_MS_TOKEN="$TEST_UPLOAD_MS_TOKEN" -e MODEL_TAG_URL="$MODEL_TAG_URL" -e MODELSCOPE_API_TOKEN="$MODELSCOPE_API_TOKEN" -e PR_CHANGED_FILES="$PR_CHANGED_FILES" --workdir="$CODE_DIR_IN_CONTAINER" ) if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then echo 'debugging' docker_args+=(-e MODELSCOPE_SDK_DEBUG=True) fi docker run "${docker_args[@]}" "${IMAGE_NAME}:${IMAGE_VERSION}" $CI_COMMAND