-
Notifications
You must be signed in to change notification settings - Fork 78
[Bug] In multi GPUs parallel environment, weights cannot be loaded correctly. #38
Copy link
Copy link
Open
Description
When running main.py, set world_size:2 and
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
The following error occurred:
================================================================================
Weight Loading Summary:
================================================================================
Successfully loaded: 115 parameter groups
⚠️ WARNING: 112 model parameters NOT loaded from checkpoint:
================================================================================
Weight Loading Summary:
================================================================================
Successfully loaded: 115 parameter groups
⚠️ WARNING: 112 model parameters NOT loaded from checkpoint:
- model.layers.0.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.0.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 114983805902368965550594326528.000000)
- model.layers.0.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 511.984375)
- model.layers.0.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 149664223110721969728228687872.000000)
- model.layers.0.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: 0.000000)
- model.layers.0.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: 0.000000)
- model.layers.0.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 988.845093)
- model.layers.0.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 0.000000)
- model.layers.1.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.271820)
- model.layers.1.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.1.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 659.216675)
- model.layers.1.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 7653227994546176.000000)
- model.layers.1.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: -298157523041517568.000000)
- model.layers.1.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: 2551103199641600.000000)
- model.layers.1.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 0.111903)
- model.layers.1.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 5102152175321088.000000)
- model.layers.2.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.2.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.2.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 659.216675)
- model.layers.2.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 3826613997273088.000000)
- model.layers.2.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: -298157523041517568.000000)
- model.layers.2.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: 1275565290029056.000000)
- model.layers.2.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 0.000000)
- model.layers.2.mlp.down_proj.weight (shape: torch.Size([1024, 1536]), mean: 2551076087660544.000000)
- model.layers.3.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.3.self_attn.qkv_projection.weight (shape: torch.Size([2048, 1024]), mean: 0.000000)
- model.layers.3.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 659.216675)
- model.layers.3.self_attn.o_proj.weight (shape: torch.Size([1024, 1024]), mean: 3826613997273088.000000)
- model.layers.3.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: -298157523041517568.000000)
... and 97 more
Skipped (merged into other weights): 84
================================================================================
- model.layers.3.mlp.gate_up.weight (shape: torch.Size([3072, 1024]), mean: 1275565290029056.000000)
... and 97 more
Skipped (merged into other weights): 84
================================================================================
/home/lihao/study/LeeWant/MinivLLM/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4876: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
warnings.warn( # warn only once
51 number of processed tokens 7.132203981073677 tokens/sec during prefilling
3 number of processed tokens 2.2941281356988696 tokens/sec during decoding
3 number of processed tokens 39.67132749444522 tokens/sec during decoding
3 number of processed tokens 39.41434683323077 tokens/sec during decoding
3 number of processed tokens 40.17904589871887 tokens/sec during decoding
3 number of processed tokens 41.09578190559035 tokens/sec during decoding
3 number of processed tokens 40.31162834453686 tokens/sec during decoding
3 number of processed tokens 40.061485507965855 tokens/sec during decoding
3 number of processed tokens 40.26634916449404 tokens/sec during decoding
3 number of processed tokens 41.48244651091053 tokens/sec during decoding
3 number of processed tokens 41.53502752157621 tokens/sec during decoding
3 number of processed tokens 41.21409989555906 tokens/sec during decoding
3 number of processed tokens 39.962752355727226 tokens/sec during decoding
3 number of processed tokens 40.062378362396636 tokens/sec during decoding
3 number of processed tokens 40.110012378236725 tokens/sec during decoding
3 number of processed tokens 40.18263854898938 tokens/sec during decoding
3 number of processed tokens 40.31601977235447 tokens/sec during decoding
3 number of processed tokens 39.981672369876655 tokens/sec during decoding
3 number of processed tokens 40.09339785618025 tokens/sec during decoding
3 number of processed tokens 40.114487871761426 tokens/sec during decoding
3 number of processed tokens 41.43982236580089 tokens/sec during decoding
3 number of processed tokens 40.3053012046982 tokens/sec during decoding
3 number of processed tokens 41.26789714501092 tokens/sec during decoding
3 number of processed tokens 41.665271465312316 tokens/sec during decoding
3 number of processed tokens 41.49106395200742 tokens/sec during decoding
3 number of processed tokens 41.27195790673413 tokens/sec during decoding
3 number of processed tokens 40.25964978727437 tokens/sec during decoding
3 number of processed tokens 40.04809746559497 tokens/sec during decoding
3 number of processed tokens 40.24896116816002 tokens/sec during decoding
3 number of processed tokens 41.89751131246349 tokens/sec during decoding
3 number of processed tokens 41.802027767886194 tokens/sec during decoding
3 number of processed tokens 39.44189773639874 tokens/sec during decoding
3 number of processed tokens 40.34070705920787 tokens/sec during decoding
3 number of processed tokens 40.2294015363134 tokens/sec during decoding
3 number of processed tokens 40.19842816158571 tokens/sec during decoding
3 number of processed tokens 40.401065927290304 tokens/sec during decoding
3 number of processed tokens 41.69675113168589 tokens/sec during decoding
3 number of processed tokens 42.725195859126934 tokens/sec during decoding
3 number of processed tokens 41.39156630703303 tokens/sec during decoding
3 number of processed tokens 40.12497722580021 tokens/sec during decoding
3 number of processed tokens 41.52598067601965 tokens/sec during decoding
3 number of processed tokens 41.49489507517406 tokens/sec during decoding
3 number of processed tokens 41.089877127135765 tokens/sec during decoding
3 number of processed tokens 39.86058986128449 tokens/sec during decoding
3 number of processed tokens 41.1113571866365 tokens/sec during decoding
3 number of processed tokens 41.39483435251733 tokens/sec during decoding
3 number of processed tokens 40.28504190244577 tokens/sec during decoding
3 number of processed tokens 41.24071064013973 tokens/sec during decoding
3 number of processed tokens 40.117429454215284 tokens/sec during decoding
3 number of processed tokens 41.46248968656396 tokens/sec during decoding
3 number of processed tokens 41.586092646641426 tokens/sec during decoding
3 number of processed tokens 40.58166050550473 tokens/sec during decoding
3 number of processed tokens 41.73685985396904 tokens/sec during decoding
3 number of processed tokens 39.24029895417129 tokens/sec during decoding
3 number of processed tokens 41.73187664582673 tokens/sec during decoding
3 number of processed tokens 41.701173137500035 tokens/sec during decoding
3 number of processed tokens 41.540923799542014 tokens/sec during decoding
3 number of processed tokens 41.249768827566676 tokens/sec during decoding
3 number of processed tokens 39.79667272853459 tokens/sec during decoding
3 number of processed tokens 40.14123376431111 tokens/sec during decoding
3 number of processed tokens 40.3527385185661 tokens/sec during decoding
3 number of processed tokens 39.893068148945 tokens/sec during decoding
3 number of processed tokens 39.721420874930566 tokens/sec during decoding
3 number of processed tokens 40.384727909278695 tokens/sec during decoding
3 number of processed tokens 41.87617764448051 tokens/sec during decoding
3 number of processed tokens 41.54874237675524 tokens/sec during decoding
3 number of processed tokens 40.10873384921962 tokens/sec during decoding
3 number of processed tokens 40.312661535600675 tokens/sec during decoding
3 number of processed tokens 40.17891760166216 tokens/sec during decoding
3 number of processed tokens 39.88168840932473 tokens/sec during decoding
3 number of processed tokens 39.16189285437299 tokens/sec during decoding
3 number of processed tokens 38.623243429122915 tokens/sec during decoding
3 number of processed tokens 40.46121536600995 tokens/sec during decoding
3 number of processed tokens 40.14840618729632 tokens/sec during decoding
3 number of processed tokens 40.026822526932435 tokens/sec during decoding
3 number of processed tokens 40.19380552725869 tokens/sec during decoding
3 number of processed tokens 39.696483615109216 tokens/sec during decoding
3 number of processed tokens 40.13073591018903 tokens/sec during decoding
3 number of processed tokens 40.20690575335673 tokens/sec during decoding
3 number of processed tokens 40.36154025780226 tokens/sec during decoding
3 number of processed tokens 40.51802113986273 tokens/sec during decoding
3 number of processed tokens 41.67534530704473 tokens/sec during decoding
3 number of processed tokens 40.316148946480304 tokens/sec during decoding
3 number of processed tokens 40.39860141155492 tokens/sec during decoding
3 number of processed tokens 41.31260952095108 tokens/sec during decoding
3 number of processed tokens 40.54491608691691 tokens/sec during decoding
3 number of processed tokens 40.05383410218114 tokens/sec during decoding
3 number of processed tokens 40.251407459557456 tokens/sec during decoding
3 number of processed tokens 40.216029580169504 tokens/sec during decoding
3 number of processed tokens 40.4526302046048 tokens/sec during decoding
3 number of processed tokens 40.217186417317166 tokens/sec during decoding
3 number of processed tokens 39.9167334961495 tokens/sec during decoding
3 number of processed tokens 40.20485025126007 tokens/sec during decoding
3 number of processed tokens 41.57798523153687 tokens/sec during decoding
3 number of processed tokens 40.54556932072551 tokens/sec during decoding
3 number of processed tokens 41.681419564922656 tokens/sec during decoding
3 number of processed tokens 41.667892955262815 tokens/sec during decoding
3 number of processed tokens 41.538729640094395 tokens/sec during decoding
3 number of processed tokens 41.6370079334762 tokens/sec during decoding
3 number of processed tokens 41.72730975025219 tokens/sec during decoding
3 number of processed tokens 41.63411480990182 tokens/sec during decoding
3 number of processed tokens 40.348339088353576 tokens/sec during decoding
3 number of processed tokens 41.683766911532146 tokens/sec during decoding
3 number of processed tokens 41.44951439070407 tokens/sec during decoding
3 number of processed tokens 40.35493859346489 tokens/sec during decoding
3 number of processed tokens 40.15199336004951 tokens/sec during decoding
3 number of processed tokens 40.335405145844156 tokens/sec during decoding
3 number of processed tokens 41.63645683132006 tokens/sec during decoding
2 number of processed tokens 22.999185683725013 tokens/sec during decoding
2 number of processed tokens 28.6421829989026 tokens/sec during decoding
2 number of processed tokens 28.82673251263464 tokens/sec during decoding
1 number of processed tokens 0.21760974422065835 tokens/sec during decoding
1 number of processed tokens 15.371633781252912 tokens/sec during decoding
1 number of processed tokens 15.735347166980366 tokens/sec during decoding
1 number of processed tokens 16.257307285727208 tokens/sec during decoding
1 number of processed tokens 16.232454790013602 tokens/sec during decoding
1 number of processed tokens 16.373255024564187 tokens/sec during decoding
Prompt: <|im_start|>user
introduce yourself<|im_end|>
<|im_start|>assistant
Completion: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Prompt: <|im_start|>user
list all prime numbers within 100<|im_end|>
<|im_start|>assistant
Completion: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Prompt: <|im_start|>user
give me your opinion on the impact of artificial intelligence on society<|im_end|>
<|im_start|>assistant
Completion: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels